Scale Constraints


Note: Scale is a complex issue within MNL and our treatment here assumes that the standard deviation of the utility vector may be used as an approximate proxy for scale.  We offer this approach as an option (but not the default procedure) within our latent class software.


Latent Class for Multinomial Logit (MNL) is a popular procedure for finding segments of respondents with different preferences from choice data such as CBC and MaxDiff.  However, one concerning aspect of standard Latent Class analysis is that it sometimes can form segments that mainly differ in terms of scale (response error) but don’t differ much in terms of real preference patterns (Magidson, Vermunt 2007).  With MNL, the larger the response error, the smaller the magnitude of the part-worth utilities; the smaller the response error, the larger the magnitude of the part-worth utilities.

Researchers can use Latent Class to 1) find segments of respondents with different preferences for strategic segmentation and targeting purposes, 2) estimate part-worth utilities for the different segments that fit the data better than aggregate logit.   Scale Constrained Latent Class is for researchers who are mainly focusing on the first purpose: strategic market segmentation.

Latent Class analysts are already familiar with the notion of employing utility constraints (to avoid finding nonsensical segments with utility reversals).   Scale constraints are for avoiding segmentation solutions where the preferences between segments differ mostly in terms of scale (response error).  Of course, both utility and parameter magnitude constraints can be applied within the same latent class run.  

Scale Constraint Procedure

The standard Latent Class procedure consists of the following steps:

1. Set the number of clusters (classes), and choose a random initial part-worth solution for each cluster with values in the interval -.1 to .1.

2.  Use each group’s logit coefficients to fit each respondent’s data and (per the logit rule) estimate the likelihood of each respondent’s belonging to each class.  

3. Estimate a weighted logit solution for each class.  Each solution uses data for all respondents, with each respondent weighted by his or her estimated probability of belonging to that class.

4. (Underlying this method is a model which expresses the likelihood of the data, given estimates of group coefficients and group sizes.)  Compute the likelihood of the data, given those estimates.

5.  Repeat steps 2 through 4 until the improvement in likelihood is sufficiently small.

For Scale Constrained Latent Class analysis, we insert one small step between steps 3 and 4 above.  We set the utilities across the classes to have the same magnitude, equal to the average standard deviation observed in that iteration.  That procedure is described below.

Let’s imagine that after a given iteration of the latent class procedure, the part-worth utilities for three classes are as follows:


Class 1

Class 2

Class 3

Attribute 1, Level 1



- 0.83

Attribute 1, Level 2


- 0.67


Attribute 1, Level 3

- 1.38







Attribute 2, Level 1




Attribute 2, Level 2

- 0.35

- 1.06

- 0.79


A proxy for scale factor (response error) is the dispersion of the part-worth utilities.  We take the simple approach of using the standard deviation of each utility vector as a proxy for scale.  The standard deviation for Class 1 in the table above is the standard deviation across the five values (+1.05, +0.33, -1.38, +0.35, -0.35) or 0.820.  The standard deviations across the five part-worths for each of the three classes are:

 Class 1:  0.820

 Class 2:  0.768

 Class 3:  0.679


 Average:  0.756


Within each iteration of the Latent Class procedure, we constrain the standard deviation for all three classes to the average standard deviation of the three classes, by multiplying the vectors of part-worths by the following magnitude adjustments:


 Class 1:  0.756/0.820 = 0.922

 Class 2:  0.756/0.768 = 0.984

 Class 3:  0.756/0.679 = 1.113


After multiplying the part-worth utilities for each class by its magnitude adjustment, the new scale constrained part-worth utilities are:



Class 1

Class 2

Class 3

Attribute 1, Level 1



- 0.92

Attribute 1, Level 2


- 0.66


Attribute 1, Level 3

- 1.27







Attribute 2, Level 1




Attribute 2, Level 2

- 0.32

- 1.04

- 0.88





Standard Deviation:




After the adjustment, the standard deviation of the part-worth utility values for each class is 0.756.  The likelihood of the solution is evaluated using the constrained utility vectors.  In each subsequent iteration, the part-worths for the classes are again scale constrained, though the magnitude of the parameters (as expressed by the standard deviation) tend to increase from one iteration to the next as the updated utilities provide better fit to the data.

Note: if interaction terms or None parameters are involved, these also are used to compute the standard deviation of the vector of part-worth utilities for each class.

Model Fit

Just as utility-constraints lead to slightly decreased fit relative to unconstrained solutions, Scale Constrained Latent Class always leads to a slightly decreased fit to the data.  Typically, the fit for a scale constrained latent class run is about 98% or better relative to an unconstrained solution (unless a CBC data set involves a None parameter).

We should note that CBC data sets that involve None concepts lead to less quick and consistent convergence when applying scale constraints.  The None parameter can become a relatively large and decisive parameter for defining groups, so constraining the solution such that the standard deviation across all the parameters (including the None) is the same across groups leads to greater losses in model fit compared to non-constrained solutions.  This is to be expected.

Page link: