Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Need help in finding the best LC-class solution

Hey guys,

this is my LC output for a five class solution (Summary of best replications).
I have looked up in the LC manuel on how to find the best class solution. And in the manuel it seems to be obvious.
But from my  output below, I can't say what the optimal solution is.

Could you please help me and tell me even why you think which solution is best.

Thanks you!

Groups                Replication          Log-likelihood               Pct Cert                         AIC
     2                            5                            -526.22676                    39.67471                 1110.45352
     3                            1                            -457.37340                      47.56788                 1002.74679
     4                            3                          -437.47262                      49.84926                    992.94524
     5                        2                           -423.81523                      51.41491                    995.63045
     CAIC                                   BIC                                   ABIC                    Chi-Square                     Relative Chi-Square
1264.01623                       1235.01623             1142.95948             692.17718                          23.86818
1235.73850                       1191.73850             1052.06619             829.88390                         18.86100
1305.36593                       1246.36593             1059.07807             869.68546                        14.74043
1387.48014                       1313.48014             1078.57671             897.00025                         12.12162
asked Nov 4, 2013 by anonymous

1 Answer

+1 vote
If using synthetic data with known segments (plus noise), the various statistics offered in Latent Class don't always point to the "correct" solution.  So, you cannot always rely on the fit statistics in Latent Class to point to truth.

But, the fact of the matter is that "truth" is not known when conducting latent class analysis for real respondent data.  Different latent class results (e.g. different numbers of groups: 3-group solution, 4-group solution, etc.) could produce very similar BIC statistics.  

You've really got to think about the reason you are doing segmentation.  If the main reason is to develop and guide managerial insight and strategy, then the best solution is the one that is most meaningful and has the best face validity with regard to communicating differences between key segments to management.  You have to consider how complex a segmentation solution management is able to navigate and how well those segments also break out on other key metrics such as usage, purchases, reachable (targetable) demographics, etc.

But, if the main goal for using latent class is to improve predictability of holdout scenarios (or new scenarios such as actual purchases in the real world), then the appropriate test is to hold out some of the tasks for validation and then check how well different latent class solutions can lead to models that predict these holdouts well.

It is quite possible that the best solution leading to managerial insights has fewer groups than the best solution leading to predictive accuracy.
answered Nov 8, 2013 by Bryan Orme Platinum Sawtooth Software, Inc. (154,105 points)

first of all thanks for your comments and sorry that the rows and lines got messed up. It wasn't so when I entered it.

The thing is that I need the segmentation in the course of my academic work (that's why it is very important to explain my choice of approriate classes proberly), but the main purpose is of course to derive managerial implications based on consumers choices and to see whether there are different consumer types that can be addressed differently via marketing etc.

I thought that unlike PctCert and Chi Square, smaller values of CAIC, AIC, BIC and ABIC are preferred. But from the above output, I can't identify the "best" LC Class solution. Or is my data not appropriate for a segmenation? Or are there simply no groups to be identified that make sense?

Can you help me on that?

thanks in advance
Bryan's focus on managerial relevance is, of course, an important consideration.  Statistically you are correct that your goal is to minimize "penalized" information criteria, such as BIC, ABIC, CAIC, etc.  As in your data set, however, these values typically point to different latent class solutions.  In your data set, for example, AIC suggests a 4-segment solution.   BIC, ABIC and CAIC, in contrast, suggest a 3-segment solution. Studies examining the performance of these criteria suggest that AIC tends to overfit or generate more segments than are known to exist.  Although some studies suggest that BIC provides a better estimate, it may be prone to underfitting.  For example, see Nylund, Asparouhov & Muthen's article entitled "Deciding on the number of classes in latent class analysis and growth mixture modeling:  A Monte Carlo simulation study" which appeared in the journal, Structural Equation Modeling (2007).   Consistent with Bryan's recommendation, others suggest that the choice of an information criteria needs to consider the goals of the study (either specificty or sensitivity).  If the goal is a simple, parsimonious model, BIC would be most appropriate.  If a rich exploration of population heterogenity is indicated, AIC might be preferred.  Dziak, Coffman, Lanza and Li, from the Methodology Center at Penn State provide a very thoughtful discussion of this issue entitled Sensitivity and Specificity of Information Criteria (on line).  These studies may provide the academic rationle you require.