F Nov. 19

Shu-Chuan (Grace) Chen
Department of Mathematics and Statistics, ASU

Title:
Mixture Trees with Application to Genome Sequence Analysis.

Abstract:
Clustering methods have been broadly investigated in the last decade.
Since the rapid progress of human genome sequencing, more efficient
clustering methods are highly demanded.
In this talk, I will first show how an ancestral mixture model can be used
to build up a hierarchical tree from binary sequence data, using as an
example genetic single nucleotide polymorphisms (SNP) data.  Some
properties of the ancestral mixture model, such as its nested structure
and the relationship to the coalescent process of population genetics,
will be presented.  A model selection method based on an easy-to-calculate
quadratic-distance will then be proposed.  This distance arises by first
applying kernel smoothing to both the data and the fitted model to get
densities e* and a* on the sequence space.  Then one uses the L2 distance
between these to assess the fit of the data to the model. An example of
SNP data will be presented to demonstrate how our method works.