Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Measuring holdout validity (cross validity) by using latent class data

I performed a logit and latent class analysis in lighthouse. Resulting from the CAIC and BIC figures, I should use 5 classes to describe the data. Although I want to check for cross validity to see if this model isn't overfitting. I want to use the two fixed tasks as the valdiation sample, and the random tasks as the calibration sample or I want to use 200 persons as calibration sample and 50 as validation sample. Which method is preferred?

Ideally I would have a converted U-shape figure to see how many latent classes would give me the highest % of correct estimates. How should I perform such analysis?
asked Mar 2, 2017 by anonymous

1 Answer

0 votes
I think your latter suggestion "use 200 persons as calibration sample and 50 as validation sample" is the kind of out-of-sample validation that will be much more valuable than predicting two in-sample holdout questions.  

I do not understand your last question, about a U-shaped figure, unless you're talking about the goodness-of-fit measures like BIC and CAIC.
answered Mar 2, 2017 by Keith Chrzan Platinum Sawtooth Software, Inc. (52,450 points)
Thank you for your answer. How can i calculate the out-of-sample validation then?

I ment with the u shaped figure a table or figure where you can see how much your % of correct estimated increases by having more classes.
If you have 250 respondents, create a random integer variable that ranges from 1 to 5.  Now run the analysis 5 times, one with respondents who have a value of 1-4 on that variable and use them to predict the choices of the 50 respondents with the value of 5 for the grouping variable.  Then repeat with the other 4 groups left out.  In Lighthouse you can just add this variable by creating it in Excel and then pasting it into a column when you go to "view data."  Then you can choose which groups to include in your analysis by creating a respondent filter and selecting it in the settings section of your LC run.  

I am not used to seeing such a U shaped chart (though you will have the data to make it, whether it is U-shaped or not) for predictive accuracy - I usually see that done for the goodness-of-fit statistics like BIC and CAIC.