Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Output of Latent Class Analysis

I have a question regarding the interpretation of the Latent Class output and consequently the selection of group numbers.
I ran the LC analysis with the settings, seeds=0 and number of replications= 5 (the default).
So in the output I get a summary of best replications and a summary of all replication. To identify the best group solution (looking for lowest BIC and CAIC values), do I compare the values in the Summary of best replications or of all replications?
Furthermore, there is the advice to run several LCs with different starting points to avoid solutions by luck. Does this mean to run a second analysis (with seed=0) or do I have to look at the Summary of all replications and check if every replication shows me the same differences in BIC/CAIC values?
Thank you for your help.
Regards, Sophia
asked Mar 29, 2017 by Sophia

1 Answer

0 votes
Different random seeds can produce different results in latent class, because the algorithm does not guarantee a globally optimal solution each time.  Therefore, it is a good idea to run it multiple times (multiple replications) and select the one replication (for each group size) that obtains the best fit.  Although you specify a starting seed for the first replication, the software chooses a different seed for each replication (otherwise, you'd just get the same result as the previous replication).  

Researchers usually only pay attention to the best replication reported (the summary of best replications) per group size.  And, if you are really concerned about finding the near-optimal solution for each number of groups, I'd recommend you increase the number of replications from 5 to 10 for each solution.  5 is just a default in the software to help things run faster while you are doing preliminary investigation with a data set.  Once you get really serious about securing the most optimal answer, you should probably increase the number of replications.  If you can afford the time, 30 replications wouldn't hurt.

A reason to pay attention to how much the fit statistics vary across all replications would be to see how stable the solutions are (how often the same fit will occur from different random starting points).  If very different fits often happen from different starting points, that can be some indication that the data do not naturally break out well over that number of dimensions (groups) in your latent class solution.
answered Mar 29, 2017 by Bryan Orme Platinum Sawtooth Software, Inc. (134,015 points)
Thanks four your answer.
So this means that I have to run the analysis only once with e.g. 10 replications?
In my latent class output, there are many replications with CAIC (most of the replications) and BIC (a few) having not the lowest value for the 5 group solution. But the best summary looks like this:

Groups          CAIC
2        5749,83718
3        5589,00730
4        5587,43794
5        5577,84376
6        5594,54510
7        5629,06527

So is it reasonable to choose the 5 group solution although there are other replications with higher values for CAIC (like 5595, 5627, 5630). The 3 group CAIC is always at 5589. Thus, is more stable.

Thanks for your help.
Yes, one run with many replications.

And, just because statistically speaking the 5-groups solution seems best (per CAIC), it doesn't mean that's the best solution for you to use to solve your business problem!  Perhaps the 4-group solution is more interpretable and more palatable for management minds?  Or, if you are looking for predictive accuracy of holdouts, perhaps an 11-group solution provides better predictive accuracy.  There is both art and science to latent class analysis, and the decision of which solution to use can take you one path or another.