Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Statistical Testing Strangeness

There are multiple ways to statistically test the utility of one level vs. another (within the same attribute).  Very helpfully, Bryan Orme did a presentation on this at the recent Amsterdam conference a few months ago (loved it!) (explanations are also on this forum).  However, I have noticed that in practice the statistical tests can give very difference answers which I find very curious.

Specifically, a paired samples t-test makes the most sense from a frequentist's perspective (we have observations from the same person for both combinations).  From a bayesian perspective, there is the simple "count the draws" technique, which makes quite a bit of sense.

My question is if HB is perhaps too sensitive in it's wobbly nature.  Both the paired samples t-test and the bayesian method are based on the utilities run through the same HB algorithm, so if it is indeed too sensitive, then we're over-stating the statistical power.  I've found that a 1% difference in shares estimated is significant at the 99% confidence level.
asked Mar 3, 2015 by Joel Anderson Bronze (1,310 points)
retagged Mar 3, 2015 by Walter Williams

1 Answer

0 votes
Since you're referring to "1% difference in shares estimated" and you are referring to an HB test, I assume you have somehow used mean population draws (alpha vector) to estimate the shares of of preference across successive posterior draws (after assuming convergence)?  Then, you've compared the shares of preference using those posterior draws between one product vs. another?

Sorry if I'm misunderstanding the share of preference test you've done.  

In any case, one issue to grapple with is whether a statistically significant difference is managerially significant.  For example, with millions of respondents, you might see a difference between two groups in the likelihood to vote for some political candidate of 1%.  Although the difference may be hugely statistically significant (given the massive sample sizes for the groups you are comparing), the practical significance of that statistically significant difference is nil.
answered Mar 5, 2015 by Bryan Orme Platinum Sawtooth Software, Inc. (131,390 points)