Sawtooth Software: The Survey Software of Choice

Breathing New Life into Sawtooth Software’s Cluster Analysis

Very soon, we’ll be releasing a new cluster analysis package that includes Cluster Ensemble Analysis, a relatively new methodology for improving the quality of cluster solutions.

One of the most enigmatic of Sawtooth Software’s products is the CCA system for convergent cluster analysis (k-means). First released in 1988, it has one of the most loyal and enthusiastic group of users. But compared to our more popular systems, it hasn’t been widely adopted. Perhaps it is because other well-known statistical packages already offer cluster routines. Perhaps it is because many analysts don’t realize the pitfalls in using traditional cluster routines. Or maybe it is because of CCA’s antiquated, clunky software interface.

CCA established a devoted following due to the way it repeats the k-means solutions from many, intelligently drawn starting points. It selects the one best run that is the most reproducible of the replicates. This shields the analyst from accepting a poor solution due to an unlucky choice of starting points. Importantly, how reproducible a cluster solution is reflects on how naturally the data can be segmented into the number of groups the analyst requested. Thus, reproducibility is useful from a diagnostic standpoint to help determine an appropriate number of clusters.

We are enthusiastic about a relatively new development called cluster ensemble analysis that provides even better solutions than the previous CCA approach. We are grateful to Joe Retzer and Ming Shan of Maritz Research for calling our attention to this development at the 2007 conference.

Cluster ensemble analysis originated in the machine learning and data mining fields, and is commonly attributed to Strehl and Ghosh (2002). Instead of picking the one best cluster solution from a set of available solutions, it develops a consensus solution based on all the information in the ensemble of cluster solutions. The consensus solution is usually different from all of the available cluster runs, and it usually represents a superior cluster solution. We have tried it, and indeed it does.

We are releasing a new version of our cluster analysis package that of course brings it up to Windows usability standards. This new package incorporates our own flavor of cluster ensemble analysis (as well as the earlier method supported by CCA). The current plan is to name the new software Convergent Cluster & Ensemble Analysis (CCEA). We expect these changes to breathe new life into the package, and we hope that our users come to recognize the benefit of using CCEA instead of older approaches.

The Convergent Cluster & Ensemble Analysis package is currently in a beta test phase, and should be available soon for purchase. We will be presenting results of our tests of ensemble analysis versus the older CCA methodology at the joint SKIM-Sawtooth Software Conference and Training, in Barcelona, May 26-28.