# Evaluate Holdout (fixed) tasks with SMRT

So, i conducted a CBC with 10 tasks, of which 8 were random tasks and 2 were fixed holdout tasks, with 3 concepts per task (C1-C3). Now i want to evaulate how good the model (or the estimated utilities) reflects the reality. For that purpose i used SMRT to simulate both holdout tasks. Using SPSS, i calculated the observed shares too (meaning how respondents decided for each holdout task (H1, H2). Now i got this table:

H1Simulated            H1Observed        H2Simulated          H2Observed
C1           45.92%                        40%                         28.57%                      34%
C2           7.84%                          12%                         37.5%                         30.7%
C3           46.24%                        48%                         33.93%                      35.3%

So, how can i interprete this? I thought about calculating the difference between simulated and observed shares. But how do i interprete the differences? Is there a value for which it can be assumed that the model fits good? Or do you have any other ideas on how to test validity of the model?

Thanks in advance

EDIT: As i checked some literature, i found that it would be most useful to check hit rates for holdout tasks. how do i do that?
asked Nov 27, 2013
edited Nov 27, 2013

## 1 Answer

+3 votes

Best answer
Within SMRT, I'm assuming you are using the default method to estimate the shares of preference (Randomized First Choice), right?

The easiest way to compute how well your model is fitting the two holdout tasks is to use the Mean Absolute Error test.  You simply figure out how big your absolute misses are for each of the concepts, and you sum them up across concepts and across holdout tasks.  Then, take the average of those misses.

For example, for task #1, concept #1, your absolute miss is 5.92%.  Across all six concepts, your average miss (MAE) is 4.24.  In my experience, for holdouts that involve 3 concepts, this is a good result.  But, so much depends on sample size.

Hit rates are an individual measure of predictive validity.  You look at your simulation results for each respondent and see if the predicted highest probability concept was also the one that the respondent chose.  If it is, count a hit.  If not, count a miss.  The hit rate is the percentage of hits.
answered Nov 27, 2013 by Platinum (159,810 points)
edited Nov 27, 2013
Thank you Bryan for your fast answer. Sample size was 150. And yes, i used Randomized First Choice to estimate the shares of preference. Though, if i understand you correctly, there is no limit value for MAE for which MAE values below of it indicate that the model is fine and for MAE values above of it that the model is not so good? Or do you have some literature for interpreting MAE? Because i have two more models to evaluate (First: MAE = 3.26, N = 64, Second: MAE = 5.74, N = 71)

For Hit rates i used First Choice simulation method. What are good hit rates? i got 60% for both holdout tasks (for the other two models hitrates are 54.7% and 66.2% respectively).

Thanks again