Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

0 votes

I assume you are indeed using data that are appropriate for HB-Reg, meaning that for each respondent there are multiple observations (cases).

HB certainly involves some Bayesian shrinkage (smoothing) across respondents, meaning that folks tend to be smoothed to some degree or another toward population means. So, that process reduces the amount of differentiation across people and tends to reduce the variance and tends to smooth out the troughs in multimodal distributions that represent different segments. But, if you have more observations than parameters to estimate for each individual, that smoothing is probably fairly minimal, and segments will still emerge quite decently from clustering.

For logit-based models, discrete choices, and CBC/HB, there is a big worry about differences in "scale factor" of the betas between people, making it difficult to cluster well on the raw betas.

But, using HB-Reg and continuous dependent variables I don't believe suffers so much from these "scale factor" issues across respondents (assuming the scale of the dependent variable is the same across respondents). There is a long history of using ratings-based conjoint results (from OLS at the individual level) in cluster. So, decades of practice seem to support the practice.

As long as your designs aren't especially sparse at the individual level (the observations relative to parameters to estimate is reasonable), then the Bayesian smoothing will be less influential, and the cluster results should be robust. That's my opinion.

HB certainly involves some Bayesian shrinkage (smoothing) across respondents, meaning that folks tend to be smoothed to some degree or another toward population means. So, that process reduces the amount of differentiation across people and tends to reduce the variance and tends to smooth out the troughs in multimodal distributions that represent different segments. But, if you have more observations than parameters to estimate for each individual, that smoothing is probably fairly minimal, and segments will still emerge quite decently from clustering.

For logit-based models, discrete choices, and CBC/HB, there is a big worry about differences in "scale factor" of the betas between people, making it difficult to cluster well on the raw betas.

But, using HB-Reg and continuous dependent variables I don't believe suffers so much from these "scale factor" issues across respondents (assuming the scale of the dependent variable is the same across respondents). There is a long history of using ratings-based conjoint results (from OLS at the individual level) in cluster. So, decades of practice seem to support the practice.

As long as your designs aren't especially sparse at the individual level (the observations relative to parameters to estimate is reasonable), then the Bayesian smoothing will be less influential, and the cluster results should be robust. That's my opinion.

...

One additional question - is it advisable to Z score the predictors prior to running HB regression in order to have all predictors on a camparable scale (mean=0, standard deviation=1)? If this is done, are the resulting respondent-level coefficients similiar to standardized regression coefficients obtained in ordinary least squares regression - meaning they can be directly compared for predictors on different scales (e.g., income and number of dependents) in order to assess relative variable importance in predicting the regression outcome variable? Is it acceptable to then cluster respondents (using Sawtooth's CCA) on these respondent-level standardized regression coefficients?

Also - my data has 5 rows/records of data per respondent. They are brand ratings for 5 different brands on brand attributes as well as overall brand satisfaction (each row is for a different brand). Should the Z scoring be done on the "stacked" HB regression data (where each respondent has 5 rows on data) OR should it be done on the "unstacked" data (where each respondent has only one row of data storing their responses for the 5 different brands)?