Entropy Based distance metric applied to genomic sequence analysis ~ Calvin Ahlbrandt, William Casey ~
Main Code/download Examples Contribute Contact
Downloads.
Download code as a tgz file.  To extract files on a unix system: "> gunzip -c EF.tgz | tar xvf -"
Downlaod code as a zip file.  Zip is supported in multiple system including DOS and Windows.
 
e02daug.m computes the entropy minimizing integral for rich probability vectors, unconstrained paths. ( entropy distance metric ) 
  
e02daug.m is a numerical approximate solution to 
the problem of finding a minimum value of equation ( 2 ) presented as in the paper.

The code is written in matlab script language.
below is the call chart for a set of matlab files.

Files ~*.m ~ extension
e02HN.m see description below ..
e02Hofh.m see description below ..
e02Jacsq.m see description below ..
e02daug.m see description below ..
e02faug.m see description below ..
e02fdyn.m see description below ..
e02h.m.htm see description below ..
e02init.m see description below ..
e02phi.m see description below ..
e02regr.m see description below ..
 
e02dlinear.m computes the entropy minimizing integral on a linearly constrained path. 
This code may be run on any probability vector rich or non-rich.

The relationship to e02daug.m is discussed in the paper in section 6.

e02dlinear.m is a a numerical approximate solution to 
the problem of finding a minimum value of equation ( 2 ) when the path is constrained to be linear.

The code is written in matlab script language.
below is the call chart for a set of matlab files.
Files ~*.m ~ extension
e02Hofh.m see description below ..
e02dlinear.m see description below ..
e02h.m see description below ..
e02quad.m see description below ..
e02regr.m see description below ..
Main
Examples
Contribute
Contact
December 2002.
 
Software Description and Links 
The link http://www.math.missouri.edu/~calvin/entropyfiles/index.html
contains Matlab program files for three methods of estimating the minimal 
entropy distance d(a,b).

The minimal entropy distance d(a,b), assuming a rich probability path 
minimzing arc, is computed by the Matlab program e02daug.m via e02daug(a,b) 
which calls functions  e02faug, e02fdyn, e02h, e02HN, e02Hofh, e02init, 
e02Jacsq, e02phi and  e02regr.

The function e02init initializes variables ``hatsig'' and ``hatz'' for use 
of the differential system determined by e02fdyn which implements 
Proposition 12(Explicit f) for f.

The functions e02Hofh and e02HN use e02h (which is the code given in 
Proposition  12(Explicit f)) to evaluate the entropy function of a vector.

The program e02phi gives the result of following the solution of the initial
value problem across the s interval. Newton's method uses e02Jacsq, the
 numerical Jacobian matrix. Linear regression to improve the arc length 
estimate is accomplished via e02regr. Finally, the differential system is
augmented by e02faug which adds step 13 of Section (deflated) in order 
that d can be computed without quadrature methods by solving the augmented 
system.

The program e02dlinear uses an input argument N, an integer of at least 
100, to estimate the value of the entropy integral over the linear path 
containing N equally spaced points on the closed linear path joining 
a to b. It calls e02Hofh and e02h to generate the functional values
followed by e02quad which calls e02regr to carry out a composite
 Simpson's rule. Note that N can be any integer as opposed to for the 
usual Simpson's rule which requires and even number of subintervals, 
which would have made N odd.

The elementary metric d_G(a,b) is computed by e02d_G(a,b) which 
calls the function e02G.

Programs e02dlinear and e02d_G will run on any pair of probablility 
vectors a and b of the same length N.

We used a package called Weighbor to create the tree structures found in the examples section. 

In addition C programs implementing to generate the dendronic 
tree encapsulated post-script were written see the link Will's code.  

web page:
 http://www.math.missouri.edu/~calvin/entropyfiles/index.html.