Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

CCEA with weighted data

I would like to run survey respondent clustering in CCEA (both Convergent Cluster Analysis and Cluster Ensemble Analysis) with data containing respondent weights (for example, data where each respondent/row contains a weight value such that weighted data will match key population targets on variables like gender, income, etc.).

To run the clustering in CCEA using this weighted data, do I?
(1) For each respondent, multiply each cluster input variable value by the respondent's weight value in order to obtain a new set of "weighted" cluster input variables.
(2) Use this new set of "weighted" cluster input variables as cluster inputs in CCEA .

Is this the correct procedure for weighted analysis in CCEA OR are there other considerations or steps I should take?

Thank you.
asked May 23, 2014 by anonymous

1 Answer

0 votes
If you weight as you describe, the most likely results will be solutions where clusters are strongly influenced by the respondent weights, but in a way you will not want:  you will end up with segments of respondents with large weights and other segments of respondents with small weights.   

I have always weighted my data AFTER running CCEA.  

If you really want to generate clusters that represent respondent type frequencies different from what occurred in your data file you could do this:

1.  Divide each respondent's weight by the smallest weight of any respondent in your data file.  Call this the normalized weight .  So the very smallest normalized weight will be 1.0 (by definition) and others will be larger (but hopefully not larger than 10, for reasons other than cluster analysis).
2.   Round each respondent's normalized weight to the nearest integer.
3.  Make as many copies of each respondent's row of data as her normalized integer weight would suggest.  So if respondents Alyosha, Ivan and Dmitri have weights of 1, 3 and 6, respectively, your data file will have 1 row for Alyosha, 3 rows for Ivan (the original and 2 copies) and 6 for Dmitri (the original and 5 copies).
4.  Submit this  expanded data file to CCEA to produce clusters influenced by respondent weights.
5.  Of course use this only to generate cluster assignments and delete all those replicated rows of data before continuing with your analyses.
answered May 24, 2014 by Keith Chrzan Platinum Sawtooth Software, Inc. (53,875 points)
Thank you for the very helpful response.