Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

draw vs aggregate part-worth of HB


I have a fundamental question about the HB.

Why do we need to continue many further iterations after burn-in?
In my understanding, we can have confidence of convergence after the burn-in process. If so, Can we just use a draw instead of the average part-worth?

However, I observed the average part-worth and the draws returned very different preference share values in market simulation. above 10% difference in share of preference.
Why the aggregate part-worth and the draw are different in preference share prediction?
asked Apr 28, 2015 by JK
retagged Sep 1, 2016 by Walter Williams

1 Answer

+1 vote
Each iteration of HB produces a candidate "draw" for each respondent, which is a vector of utilities that represent one realization of the possible preferences for that individual.   The next iteration may produce a different "draw" for that respondent which is actually somewhat different from the previous draw.  Across hundreds and thousands of draws, if you were to plot a histogram of a single part-worth utility for a single level of an attribute, you will notice a quite normal distribution.  The variance of this distribution represents the degree of uncertainty we have about this respondent's preference and its mean represents our best guess of this respondent's preference score.

If we were to use just one draw per respondent after convergence was obtained (after the burn-in period), we would get a pretty noisy view of our data, since the draws have a substantial amount of variance around their true means.  Only after 100s or preferably thousands of draws accumulated (after convergence is assumed) per respondent would we obtain a stabilized view of the mean and distribution of uncertainty around each respondent's preferences.

Out of convenience, practitioners have typically just taken the collapsed mean for each respondent (the mean of the draws, also called the "point estimates") across the 1000s of "used" draws after the burn-in period.  This historically has been easier to deal with from a memory and data processing time standpoint.  However, Bayesians would argue that the more correct way to use the data is to simulate preference shares based on the 1000s of draws after convergence.  

Randomized First Choice is a middling position, allowing for the convenience and ease of data storage due to using the point estimates, but simulating draws of uncertainty for each respondent around those means.  Randomized First Choice also imposes an extra degree of correlation among "similar" product concepts, leading to greater penalty for "me-too" imitation products.
answered Apr 28, 2015 by Bryan Orme Platinum Sawtooth Software, Inc. (131,390 points)
It might help clarify matters if I mentioned that unlike other algorithms (such as aggregate MNL-logit) wherein convergence means that the utilities snap into almost precisely the same answer across subsequent iterations, HB continues to iterate indefinitely with quite a lot of variability across the subsequent draws (that's because it is tracing and exploring the uncertainty around the estimates across draws).
Thank you for your answer.
I conducted two market simulations using the point estimates and 1000 draws, respectively.
They returned very different preference share values.
Point estimate: 30%
1000 draws: 21%
If I use 10000 draws, can I get almost 30%?
Or, do you recommend RFC instead of draws?
The differences you are seeing are likely mostly due to "scale factor"...meaning the sharpness or flatness of the shares.  If you were to use billions of more draws per respondent, it wouldn't change that.

Simulating on the draws via the first choice rule should produce estimates that are closer to the same "scale factor" as simulating on the point estimates using vote-splitting share of preference (logit rule) or RFC.  But, if you simulate on the individual draws using share of preference (logit rule) or RFC, you will typically get accumulated shares of preference that are flatter (show smaller differences between best and worst products) than those you obtain from simulating on point estimates.
Thank you so much.
Your comment is very helpful in doing my research.
I hope my comments help.  These issues are very interesting and different methods of simulating shares of preference for products in market scenarios will produce different results.  Many times the main differences are due to scale factor: one set of product shares is steeper than the other (greater differences in shares), though the rank-order of preference for the products may be essentially the same.