Friday, April 10, 2009

Building Tree for Large Data Sets

Hierarchical agglomerative trees are used extensively in life sciences
because they provide an intuitive way to organize and visualize the clustering
results. However, there are two limitations with such trees. First, hierarchical
agglomerative clustering may not be the optimal way to cluster data in which
there is no biological reason to suggest that the objects are related to each other
in a tree fashion. Second, hierarchical agglomerative clustering algorithms
have high computational and memory requirements, making them impractical
for data sets with more than a few thousand objects.
To address these problems CLUTO provides the -fulltree option that can be used
to produce a complete tree using a hybrid of partitional and agglomerative
approaches. In particular, when -fulltree is specifi ed, CLUTO builds a complete
hierarchical tree that preserves the clustering solution that was computed. In
this hierarchical clustering solution, the objects of each cluster form a subtree,
and the different subtrees are merged to get an all-inclusive cluster at the end.
Furthermore, the individual trees are combined in a meaningful way, so that
the similarities within each tree are accurately represented.

No comments:

Post a Comment