Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Clustering respondents on point estimates of beta from HB Regression?

I would like to cluster/segment respondents on the point estimates of beta obtained from Sawtooth's HB Regression software (using Sawtooth's CCA software for clustering). Is it acceptable practice to cluster on the point estimates produced from the HB Regression software in the format contained in the "studyname.csv" file OR does a transformation, rescaling, normalization, etc. need to be done to the respondent-level betas prior to clustering?
asked Jun 21, 2012 by anonymous

1 Answer

0 votes
I assume you are indeed using data that are appropriate for HB-Reg, meaning that for each respondent there are multiple observations (cases).

HB certainly involves some Bayesian shrinkage (smoothing) across respondents, meaning that folks tend to be smoothed to some degree or another toward population means.  So, that process reduces the amount of differentiation across people and tends to reduce the variance and tends to smooth out the troughs in multimodal distributions that represent different segments.  But, if you have more observations than parameters to estimate for each individual, that smoothing is probably fairly minimal, and segments will still emerge quite decently from clustering.

For logit-based models, discrete choices, and CBC/HB, there is a big worry about differences in "scale factor" of the betas between people, making it difficult to cluster well on the raw betas.

But, using HB-Reg and continuous dependent variables I don't believe suffers so much from these "scale factor" issues across respondents (assuming the scale of the dependent variable is the same across respondents).  There is a long history of using ratings-based conjoint results (from OLS at the individual level) in cluster.  So, decades of practice seem to support the practice.

As long as your designs aren't especially sparse at the individual level (the observations relative to parameters to estimate is reasonable), then the Bayesian smoothing will be less influential, and the cluster results should be robust.  That's my opinion.
answered Jun 22, 2012 by Bryan Orme Platinum Sawtooth Software, Inc. (132,290 points)
Thank you for the response - very helpful.

One additional question - is it advisable to Z score the predictors prior to running HB regression in order to have all predictors on a camparable scale (mean=0, standard deviation=1)? If this is done, are the resulting respondent-level coefficients similiar to standardized regression coefficients obtained in ordinary least squares regression - meaning they can be directly compared for predictors on different scales (e.g., income and number of dependents) in order to assess relative variable importance in predicting the regression outcome variable? Is it acceptable to then cluster respondents (using Sawtooth's CCA) on these respondent-level standardized regression coefficients?

Also - my data has 5 rows/records of data per respondent. They are brand ratings for 5 different brands on brand attributes as well as overall brand satisfaction (each row is for a different brand). Should the Z scoring be done on the "stacked" HB regression data (where each respondent has 5 rows on data) OR should it be done on the "unstacked" data (where each respondent has only one row of data storing their responses for the 5 different brands)?
If the brand ratings are all on the same scale, then it wouldn't seem necessary to worry about rescaling them.  However, if you want to get more technical, issues of scale-use bias might come into play.  It might be useful to zero-center the scores within each respondent, to remove some issues of scale use bias for each respondent.  But, that won't solve the issue of respondents tending to use bigger or smaller variance in their individual ratings.  Academics have built complex models with HB to try to model scale use bias.  I don't have enough experience to be able to prescribe much here, and it get very deep very fast.  The zero-centering within each respondent might be helpful, however, as a simple band-aid and attempt to remove some scale use bias.
Thank you very much again for the information.
Don't forget that HB-Reg doesn't automatically compute an intercept for you, so you need to add a column of 1s in the design matrix if you want to have an intercept computed.  If zero-centering the data, then this wouldn't be necessary.