Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Pooled VS split sample VS covariates


I would like to explore differences across two groups  of approximately  400 respondents each (i.e. respondents who already bought a product and respondents who did not).
I first estimated my model in CBC/HB (with uninformative hyperpriors) and computed the Mean Absolute Error (MAE) using my two holdouts and the simulation results from the SMRT package. I obtained a MAE of  0.6375.
I then estimated the same model using "purchase" as covariate and obtained a MAE of 0.64.
Finally, I estimated the same model on a split sample, merged the .hbu files,  imported the merged .hbu in the SMRT package and computed again the MAE, which this time was 0.63.
At this point:
1) Are these differences in MAE relevant for model comparison? If not, which model is the best and why?
2) If the pooled model performs slightly better in predicting the holdouts than the model with mixture distributions, how can the model computed on a split sample perform the best? Is there anything I am overlooking?

Thanks for any hint!
related to an answer for: Comparison between groups using MANOVA
asked May 15, 2015 by Veronica

1 Answer

0 votes
One holdout unfortunately is not enough information to judge whether one model is better than another.  Typically 5 or more holdouts are needed to obtain enough precision to detect modest to strong differences between models.  Even more holdouts are needed to detect small differences in predictive ability between models.

Multiple papers given at our conference with ample holdouts show that generic HB, HB with covariates, or HB run on subsegments of the sample and then combined produce very little difference in predictive validity of holdouts...so your results (even with just one holdout task) are not surprising at all to me.
answered May 16, 2015 by Bryan Orme Platinum Sawtooth Software, Inc. (132,290 points)