Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

How is the consensus in CCEA derived?

Dear Keith,

first of all thank you so much for sharing your knowledge of segmentations with me and providing me very helpful hints on my problems.

A colleague of mine asked me how the consensus solution in your CCEA is obtained and in the paper by Orme/Johnson, 2008 I found two possible explanations:

1.) Running a k-means with different starting points on the indicator matrix and considering the most reproducible solution as the final solution.

2.) Running a k-means with different starting points on the indicator matrix and repeat the process (CCC....C) until no reclassification occurs. The previous candidate solution is then taken as final solution.

Now, in my CCEA Output I see both:

"Reproducibility in first clustering step is xx.x%"


"Clustering on Clusters converged in 2 steps."

Which solution is now taken as the Consensus? The one that has the reproducibility of xx.x% or the one that has been obtained by CCC in 2 steps? Or am I on the wrong path and both solutions are the same?

Thanks again for your help

asked Jun 9, 2014 by Arnold

2 Answers

0 votes
CCEA builds the initial ensemble using the settings specified in the project; following that k-means is run, and the resulting solution's group membership is appended onto the ensemble.  The process of running k-means and appending the solution is continued until the solution stabilizes and no one switches groups.
answered Jun 9, 2014 by Walter Williams Gold Sawtooth Software, Inc. (15,275 points)
0 votes
Very interesting question.

We found that the reproducibility of the first step (of the cluster analysis on cluster membership indicators across the emsemble of segmentation solutions) was the better approach for examining the reproducibility: how well the number of dimensions the researcher requested seemed to reflect structure found in the data.  Please note that this is the initial step that takes the ensemble (which usually involves all sorts of different partitions of the data...some 2-group, some 3-group, some 12-group, etc.) and tries across 30 different replicates (using k-means clustering) to make a single segmentation solution of the dimension size the researcher requests.

But, the final solution that is given as the consensus solution is based on the subsequent repetition of clustering on cluster membership, where now we are just boiling the dimensionality down to that requested of the researcher (in the indicator coding matrix).  Once the clustering on cluster membership repetitions fail to reclassify people, then we have converged.  But (and this may now be obvious to you) if you were to take reproducibility from that very last iteration of clustering on cluster membership (the step prior to breaking out with convergence, where we tried 30 different replicates), the reproducibility would be near 100% every time (because the lack of people moving groups very much was our signal to break out with convergence)!
answered Jun 9, 2014 by Bryan Orme Platinum Sawtooth Software, Inc. (138,915 points)