PyMML

PyMML is a Python package for statistical analysis and automatic classification of data. It implements a selection of the MML estimators described by Professor Chris Wallace in the book "Statistical and Inductive Inference by Minimum Message Length", including estimators for hidden factor analysis and mixture modelling.

PyMML is open source software, and uses the GPL license.

Example:

>>> data = array([[9.42,9.20], [1.47,1.15], [5.45,5.85], [7.09,7.04], [9.15,9.39],
...               [30.26,0.53], [30.84,0.82], [30.21,0.32], [30.53,0.52], [30.78,0.45]])
>>> mml.Mixture_estimate(data, ( mml.Hidden_factor_estimate, 
...   array([[-100.0,100.0],[-100.0,100.0]]), array([0.01,0.01]), array([100.0,100.0]) ))
Mixture_estimate: 151.95 nits
 50% Hidden_factor_estimate: 64.15 nits
                mean=[ 30.52   0.53]
               sigma=[ 0.29  0.18]
          has_factor=False
        factor_loads=[ 0.  0.]
 50% Hidden_factor_estimate: 79.76 nits
                mean=[ 6.52  6.53]
               sigma=[ 0.28  0.29]
          has_factor=True
        factor_loads=[ 3.24  3.34]
       factor_scores=[ 0.84 -1.57 -0.26  0.16  0.83  2.75  2.88  2.71  2.79  2.82]

Documentation:

Current documentation

Download:

PyMML-0.6.tar.gz

PyMML uses the numpy and scipy packages.

If you prefer numarray, use PyMML-0.5.tar.gz. Thanks Loki Davison for the port to numpy.

A variety of MML software written in C and FORTRAN may also be downloaded from here.

The book "Statistical and Inductive Inference by Minimum Message Length" describes the theory behind MML in detail.

PyMML was written on behalf of the School of Computer Science and Software Engineering at Monash University, under the supervision of Dr. David Albrecht.