Nice work to obtain a latent class solution and find that BIC and CAIC are minimized (lower is better) at a 5-group solution.

Latent class (especially low-dimensionality solutions like a 5-group solution) is not a very good way to obtain good individual-level utilities. In other words, to compute hit rates for held out tasks at the individual level using low-dimensionality latent class utilities is inferior to using HB utilities to do the same.

Our software creates pseudo individual-level utilities for each respondent from latent class runs by taking the weighted average of the group utilities for each respondent, where the weights are each respondent's probability of belonging to each group. Indeed, you could export those pseudo individual-level utilities from our software to a .CSV file to separately compute hit rates for held out tasks, but results usually will be inferior to HB.

Three holdouts sets provide some data for holdout validation, true. But, research by Keith Chrzan here at Sawtooth Software shows that only 3 holdouts isn't enough to do a reliable job in comparing different model specifications and identifying the winning one. But, three holdouts is certainly enough to obtain some face validity confidence that the utilities seem to be predicting held out tasks pretty well.

Other than computing hit rates at the individual level, there is the approach of using the utilities to predict summary shares of preference for the respondents and comparing those to the summary counts of choices of the (fixed, meaning the same task was asked of all respondents) holdout alternatives across the holdout sets. Often researchers compute MAE (Mean Absolute Error) or MSE (Mean Squared Error) to quantify how well the predictions fit the holdout tasks.

Latent class isn't great for individual-level hit rates, but it is very appropriate to use market simulators built on the backbone of latent class utilities to predict summary shares of choice for (fixed) held out tasks and to compute MAE as the loss function. You don't need to build predictions for each separate latent class segment. Our market simulator just applies the (say) 5-group solution by creating pseudo individual-level utilities as I earlier described, and summarizes the results for the sample across the individual-level share predictions. Such latent class simulators often match or occasionally exceed the predictive quality of simulators built using HB utilities.

The size of MAE you'll obtain not only depends on the quality of your model and the conjoint data, but on the sample size, plus critically the number of alternatives per set (#products in the market simulator). With large sample sizes (n=600 or larger), few concepts per task (4 or fewer concepts), I like to see MAE around 3 absolute percentage points or less.

Is this the right way to compute the MAE: I get the following actual and predicted preference shares (3 concepts and No choice option) for one fixed hold out set and 5 group Latent class solution :

Actual Preferences Predicted Preferences

20,63% 20,68%

19,73% 19,97%

39,91% 43,73%

19,73% 15,62%

Then I calculate the differences between percentages (e.g. 20,68-20,63% for the first row), sum all four differences up and divide it by 4 (three alternatives and none option). In this case I get 0.02 or 2% as MAE. Is this way correct?

For this simulation I used the random first choice method. If I use the first choice method I get high MAE and huge differences between the percentages. So is it better to use the random first choice method to test the fit?

Can I compare the MAE with a Random Sample to show the good fit of the model?

I would calculate the difference between 25% (preference in a random model: 1/4) and each actual preference percentage, sum it up and divide it by 4. Then I can compare the both MAE. In this case: 0.07. Thus, the latent class model with 5 groups is a good fitting model.

Does this consideration make sense?

Thanks for your support

Regards, Sophia