Aggregate Score Estimation via Logit Analysis

Top  Previous  Next

Note: Aggregate Logit has been used for more than three decades in the analysis of choice data.  It is useful as a top-line diagnostic tool (both to assess the quality of the experimental design and to estimate the average preferences for the sample).  Logit can be quite useful for studies in which you are studying very many items and where respondents cannot see each item enough times to support individual-level analysis of scores via HB.  You can compute aggregate logit scores for the entire sample, or for different respondent groups (e.g. large companies vs. small companies).


When you run Logit (Analysis | Analysis Manager select Logit as the Analysis Type and click Run), the results are displayed in the report window and utility scores are saved into a subfolder within your project directory.  You can weight the data, or select subsets of respondents or tasks to process.  A single set of utility scores (representing an average across the included respondents) is estimated that provides a maximum likelihood fit to the respondents' choices.  Aggregate logit pools all respondents within the analysis and treats them as if a single respondent had completed a very long interview.


The utility score results are reported using three different rescaling transformations:


Zero-Centered Interval Scale (where the scores have a range of 100 and a mean of zero)

Rescaled Scores (0 to 100 scaling), also known as Probability Scale

Raw Utilities


The choice of rescaling method for reporting depends on the kinds of comparisons you intend to make (e.g. among items vs. between respondent groups) and the sophistication of the audience.


What Are Raw Logit Utility Scores?


A utility score is a measure of relative desirability or worth.  When computing utility scores using logit, latent class, or HB every item in the MaxDiff project receives a utility (preference score). The higher the utility, the more likely the item is to be chosen as best (and not chosen as worst).  The score for the last item in your study is always fixed at 0.  The other items' scores are estimated in relation to this final item having a fixed zero score.  Because only the relative utility values have meaning, it doesn't matter which item is chosen as the fixed 0 point.


The utility scores coming directly out of the logit, latent class, or HB algorithm are called raw scores.  These scores have logit scaling properties, usually involve both positive and negative values, and are on an interval scale.  An item with a score of 2.0 is higher (more important or more preferred) than an item with a score of 1.0.  But, when interpreting raw logit utility scores, we cannot say that the item with a score of 2.0 is twice as preferred as an item with a score of 1.0.  To do that, we must transform the utility scores to a positive probability scale that supports ratio operations.


Rescaled Scores (0 to 100 scaling)


It's easier for most people to interpret the logit scores if we transform them to be all positive values on a ratio scale, where the scores sum to 100.  Rescaled Scores (0 to 100 scaling) does that.  Because it is on a ratio scale, a score of 4 is twice as preferred as a score of 2. Click here for more details regarding this rescaling procedure.



Zero-Centered Interval Scale


The more consistent respondents are in their answers (and the more they agree with one another), the larger the magnitude of the Raw Logit Scores from aggregate analysis.  The magnitude of the scaling can also affect the sensitivity of the Rescaled Scores (0 to 100 scaling).  Researchers sometimes want to zero-center and normalize the raw utility scores to have a constant range (such as 100), allowing them better ability to compare results across different groups of respondents (who might have different degrees of consistency and who may have quite different preferences for the final reference item) .  The Zero-Centered Interval Scaling does this.  The scores are centered around zero (given a mean of zero) and the range is set to 100 points.


As with Raw Logit scores, an item with a score of 2.0 is higher (more important or more preferred) than an item with a score of 1.0.  But, we cannot say that the item with a score of 2.0 is twice as preferred as an item with a score of 1.0.


More Information about Running Logit Analysis


Logit analysis is an iterative procedure to find the maximum likelihood solution for fitting a multinomial logit model to the data. A raw utility score is produced for each attribute level (where the last item is fixed at a utility of zero for identification of the model) which can be interpreted as an average utility value for the respondents analyzed. After logit converges on a solution, the output is displayed in the results window.  


The computation starts with estimates of zero for all items' scores (utilities), and determines a gradient vector indicating how those estimates should be modified for greatest improvement.  A step is made in the indicated direction, with a step size of 1.0.  The user can modify the step size; a smaller step size will probably produce a slower computation, but perhaps more precise estimates.  Further steps are taken until the solution stops improving.


For each iteration the Chi-Square (directly related to the log-likelihood) is reported, together with a value of "RLH."  RLH is short for "root likelihood" and is an intuitive measure of how well the solution fits the data.  The best possible value is 1.0, and the worst possible is the reciprocal of the number of choices available in the average task.  For a study in which four items were shown in each MaxDiff set, the minimum possible value of RLH is .25.


Iterations continue until the maximum number of iterations is reached (default 20), or the log-likelihood increases by too little (less than 1 in the 5th decimal place), or the gradient is too small (every element less than 1 in the fifth decimal place).


The user also has the option of saving variances and covariances of the estimates (when you request a Full report).  The square roots of the variances are equal to the standard errors, and are always displayed.  The default is not to display variances and covariances.


The Logit Rule and MaxDiff


The raw logit utility scores are estimated such that their antilog is proportional to choice likelihood.  We'll demonstrate this property below.  


Consider a MaxDiff task in which 4 items were shown to a sample of respondents.  Under aggregate logit, a single vector of utility scores is estimated that best fits the respondents' answers (across all respondents and all tasks).  Let's imagine that items D, A, G, and J are shown within a MaxDiff set.  The utility scores for those items are:


  Item    Utility

   Item D    -0.58

   Item A     0.29

   Item G     0.43

   Item J     0.00


Given these utility scores, we can use the logit rule to predict the likelihood that the sample would pick each item as "Best" within this set of four items.  To do this, we take the antilog of each of the item's scores (we "exponentiate" the scores) and then normalize the results to sum to 1.0.


         Likelihood of Selected "Best"


  Item    Utility Exp(Utility) Likelihood (Best)

   Item D    -0.58    0.56         0.13 (0.56/4.43)

   Item A     0.29    1.34         0.30 (1.34/4.43)

   Item G     0.43    1.54         0.35 (1.54/4.43)

   Item J     0.00    1.00         0.23 (1.00/4.43)

                     ------       ------

                Sum:  4.43         1.00


Item G is most likely to be chosen best among these four items, with a 35% chance of being selected.  Item D is 13% likely to be selected best.


Given these utility scores, we can also predict the likelihood that the sample would pick each item as "Worst" within this set of four items.  To do this, we take the antilog of the negative of each of the item's scores (we "exponentiate" the negative of the scores) and then normalize the results to sum to 1.0.


        Likelihood of Selected "Worst"


  Item    Utility  -Utility  Exp(-Utility) Likelihood (Worst)

   Item D    -0.58     0.58    1.79         0.43 (1.79/4.18)

   Item A     0.29    -0.29    0.75         0.18 (0.75/4.18)

   Item G     0.43    -0.43    0.65         0.16 (0.65/4.18)

   Item J     0.00     0.00    1.00         0.24 (1.00/4.18)

                              ------       ------

                         Sum:  4.18         1.00


Item D is most likely to be chosen worst among these four items, with a 43% chance of being selected.  Item G is 16% likely to be selected worst.

Page link: