# Testing attribute significance in Latent Class

I have run a latent class analysis and my conclusion is that I should work with 2 segments. Next, I would like to test whether my 5 attributes are all significant in my 2 groups/segments.
I've read that it is possible to have insignificant attributes in latent class which were significant in the logit analysis, is this correct? In my logit analysis all my attributes were significant, but I am not sure if this is still the case.

Do you have an example of a calculation or can you explain how to calculate whether the attributes are significant in the segments or not?

+1 vote
A simple way to do this is to compute a t-ratio for each of the levels of an attribute within each of the class solutions.  Divide the effect (utility) by the standard error of the effect (our software does this automatically for you and reports the t-ratios for each level within each class).

If none of the levels for a given attribute have a t-ratio larger than 1.96 within a class, then for that class this attribute does not cross the threshold of significance at the 95% confidence level.
answered Feb 16, 2017 by Platinum (170,115 points)

Do I understand you correctly when I say that an attribute is only insignificant when ALL levels have a t-ratio smaller than 1.96? So if only 1 level out of 3 is smaller than 1.96, the attribute itself is still significant within the model?
And should I also test whether for example a 2group model has a better fit than a 3group model based on significance of the attributes and levels?

I hope to hear from you. Thanks a lot!
If ALL levels are insignificant for an attribute and a group, then that attribute is not significant for that group.

Whether to choose a 2-group solution or a 3-group solution is a HUGE subject.  So much depends on why you are running Latent Class and what your goals are.  Regarding fit statistics, researchers usually look at things like BIC and CAIC to see whether a 2-group solution provides significantly (meaningfully) better fit than a 3-group solution (after accounting for the fact that a 3-group solution fits more parameters).
Oh, and one looks for LOWER BIC and CAIC.  Lower means better.
Thanks! I figured out that I have to look at BIC and CAIC before making a decision on the number of segments.
When an attribute is significant for one of the groups, but it is not for the other, what is the conclusion I should make?
So, when doing a laten class analysis (4 segments is optimal in my study), you do not have to calculate significance levels anymore? And how can I calculate levels of significance for every attribute level within each class? I do not see any st. deviations in the latent class report.
Lastly, does it mean that a t-ratio of below 1.96 means that it is not significant?
By default, our latent class software doesn't show standard errors within classes.  But, if you go to the Advanced tab and click "Report Standard Errors" then additional tables are shown in the report for the standard errors.  Try that.

A t-ratio with lower absolute magnitude than 1.96 is not significant at the 95% confidence interval.  However, it might be significant at the 90% or 80% confidence level, of course.
Maybe a bit strange question but does this hold for the significance of the attributes in the logit test as well? Or how should i test significans of the attributes (and their levels) with a logit test?

But how does that exactly work? I have conducted a latent class analysis and my optimal number of segments is 5 (after looking at CAIC and BIC). When I look at the t-ratio's for all the levels of the attributes AND for each segment separately, an attribute is significant in segment X, but is not significant in segment Y. Does that mean that I have to remove that attribute only in the particular segment where it' s not significant (an attribute is considered not to be significant if NONE of the t-ratio values is above 1.96). If so, how do I do that? I do not see an option to remove an attribute for a particular segment.

Lastly, what does 'percent certainty' exactly mean? I read in the guide that it indicates how much better the solution is than the null solution when
compared to an “ideal” solution. Is this the same as the 'hit rate'? My percent certainty is 34% at 5 segments, I think that's quite low isn't it?

It's not necessary to remove an attribute from one of the segment's part-worth utility vectors when it is not significant.  It already will have very little impact on predictions since the attribute has such low signal.  But, if you believe ahead of time that all groups should think that this attribute has a logical direction of preference (such as low prices preferred to high prices), then you can constrain the utilities for all groups to follow this assumed direction.

Percent certainty means what percent of the way is the log-likelihood fit between the naive solution and a perfect solution.  Where...the naive solution is what we'd expect if the choice predictions were  equal across concepts within each task: the probability for each of k items in a set is equal to 1/k.  And, the perfect solution is what we see if we can predict with 100% likelihood each concept that each respondent actually picked for all tasks.

Percent Certainty is similar, but not identical, to hit rate.  With the naive predictions (coin flip among alternatives), you will get a hit rate equal to 1/k.  But, Percent Certainty would be 0%.

When you use aggregate utility estimation methods, you get much worse fit to the data than if you use individual-level utility methods (such as HB)...because people often have quite different preferences.

A one-group solution (the aggregate logit solution) might have Percent Certainty of 15 or 20%.  As you conduct latent class analysis to find and model separate segments, the fit improves.  But, even a 5-group solution still is lumping a lot of respondents into probably an unrealistically low number of dimensions of preference.  So, you still should not expect the same degree of fit as with an HB solution, or a 15-group latent class solution.

In the first model where I included all the attributes, in two segments only two attributes (for each segment) were significant and the other 4 were not. In these two segments, price had by far the highest T-value (respectively 8.90412 & 6.38196). So I could say that in these two segments, price is by far the most important attribute when considering to buy this product. When I remove this attribute, it might bias the result, since I removed the most important attribute.

In my case price has very logical (and also strong) prediction. The lower the price, the higher the utility. And that applies to all the 5 segments (and they are all significant). I constrained the attribute price like you said, but one problem is that the percent certainty dropped from 34% to 25%. But on the other hand, in the new model without price, the number of insignificant attributes decreased from 10 to 3 (for all 6 attributes across 5 segments).
So removing price as an attribute I can predict what other attributes are important (particular in these two segment), but the prediction certainly dropped with 9% of the overall model. So, isn't it much better to keep this attribute in play and conclude that price has the highest impact on these two segments?
I recommend keeping all attributes in the Latent Class model.  If some of the segments have reversals on the utilities for attributes with known order, and you are certain that all respondents should agree with that known order of preference, then you should constrain the utilities for that problematic attribute to have the utility order that you rationally expect.
When I look at the T-Ratio's at my latent class analysis within the different segments not all levels of an attribute are > 1.96.  What does it mean when for example an attribute has 3 levels and when looking at segment one for that specific attribute only one level is >1.96?
Does this mean that the other levels are useless? What should i do in that case?
Note that we use effects-coding in our X matrix (our design matrix) so that the levels within an attribute are zero-centered.  So, for an important attribute with 3 or more levels, at least some of the levels would be expected to fall around zero just due to our zero-centering (the middle levels of preference will likely fall near zero).  But, this does not mean this is an overall useless attribute!

Now, if you ran a 4-group latent class solution and you see for a given attribute that all of its levels have t<|1.96| for all four segments, then you would question whether this attribute should be in the model or not.  But, if for even one group this attribute had for even one of its levels a significant t, then you would think that it is probable that this attribute adds fit to the model.

The more appropriate test for figuring out if an attribute adds significant fit to the model is by comparing the overall LL fit of the model with and without that attribute added.  Two times the difference in LL between those two latent class runs is distributed as Chi square, with degrees of freedom equal to the number of additional parameters in the model for that attribute included vs. not included.
Attibute significane Multinominal logit model