Monotone Regression (Pure Individual Analysis)

Top  Previous  Next



Monotone Regression can estimate partworth utilities uniquely for each respondent, without the benefit of borrowing information across a sample of respondents.  With extremely tiny samples of a few respondents (such as fewer than 5), monotone may be preferred to HB.  In our experience, even with as few as 10 or 20 respondents, HB may be preferred.  With hundreds of respondents, we've seen consistent evidence that HB works better than monotone regression.  


If using monotone regression, we recommend that your ACBC questionnaires be of substantial length, such that all non-BYO levels occur in each respondent's design at least 4x.  If using monotone regression, we expect you are working with tiny sample sizes, and we encourage you to impose utility constraints on all attributes.  You can specify global constraints for attributes with known logical order, especially price, and idiosyncratic constraints for attributes like brand, style, and color.


As a word of caution, partworth utilities are given arbitrary scale under monotone regression, unlike HB where the scaling of the partworths directly maps to probabilities of choice via the logit rule.  Thus, you should adjust the Exponent (Scale Factor) appropriately in market simulators that employ partworths developed with monotone regression.  Otherwise, the overall sensitivity of the model may be significantly lower or higher than appropriate for predicting buyer choices.




Monotone regression uses a method similar to that described by Richard M. Johnson in "A Simple Method of Pairwise Monotone Regression", Psychometrika, 1975, pp 163-168.  Monotone regression may be appropriate for estimation problems where the model is intended to fit the data only in a rank order sense.   An example of such a situation would be full profile conjoint analysis, with data consisting only of rank orders of preference for a collection of product concepts.  More recently, monotone regression has been adapted to choice modeling, where the data consist of only ones and zeros, with ones indicating chosen concepts and zeros indicating those not chosen.  In this context, monotone regression seeks a set of partworths which imply higher utility for each chosen concept than for non-chosen concepts in the same choice set.


The method is iterative, starting with random values and finding successive approximations which fit the data increasingly well.  Two measures of goodness of fit are reported: Theta and Tau.




Suppose a choice task presented four concepts, P, Q, R, and S, and the respondent chose concept P.  Also, suppose that at some intermediate stage in the computation, utilities for these concepts are estimated as follows:


Concept Utility Chosen

P        4.5        1

Q        5.6        0

R        1.2        0

S    -2.3        0


We want to measure "how close" the utilities are to the rank orders of preference.  One way we could measure would be to consider all of the possible pairs of concepts, and to ask for each pair whether the member with the more favorable rank also has the higher utility.   But since our observed data consist only of a one and three zeros, we can focus on the differences in utility between the chosen concept and the three non-chosen concepts.  


In this example the preferred concept (P) has a higher utility than either R or S, but it does not have a higher utility than Q.  


Kendall's Tau is a way of expressing the amount of agreement between the preferences and the estimated utilities. It is obtained by subtracting the number of "wrong" pairs (1) from the number of "right" pairs (2), and then dividing this difference by the total number of pairs. In this case,


Tau = (2 - 1) /3 = .333


A Tau value of 1.000 would indicate perfect agreement in a rank order sense. A Tau of 0 would indicate complete lack of correspondence, and a Tau of -1.000 would indicate a perfect reverse relationship.  (Although this example involves only one choice set, in a real application the number of correct and incorrect pairs would be summed over all choice sets in the questionnaire.)


Tau is a convenient way to express the amount of agreement between  rank orders of two sets of numbers, such as choices and utilities for concepts. However, it is not useful as a measure on which to base an optimization algorithm, because of its discontinuous nature.  As a solution is modified to fit increasingly well, the Tau value will remain constant and then suddenly jump to a higher value. Some other measure is required which, while similar to Tau, is a continuous function of the utility values.




For this purpose Johnson described a statistic "Theta."  In his original article, Johnson measured differences in utility by squaring each difference.  For this example we have:


Preference  Utility    Squared

           Difference Difference

P minus Q        -1.1                 1.21

P minus R         3.3                10.89

P minus S         6.8                46.24


                 Total 58.34


Theta is obtained from the squared utility differences in the last column of the table above. We sum the squares of those utility differences that are in the "wrong order," divide by the total of all the squared utility differences, and then take the square root of the quotient. Since there is only one difference in the wrong direction,


Theta = square root(1.21/ 58.34) = .144


Theta can be regarded as the percentage of information in the utility differences that is incorrect, given the data. The best possible value of Theta is zero, and the worst possible value is 1.000.

Now that we have defined Theta, we can describe the nature of the computation.


The process is iterative. It starts with random values as estimates of the partworths. In each iteration a direction of change (a gradient vector) is found, for which a small change in partworths will produce the greatest improvement in Theta.  Small changes are made in that direction, which continue as long as Theta improves. Each iteration has these steps:


1. Obtain the value of Theta for the current estimates of partworths and a direction (gradient) in which the solution should be modified to decrease Theta most rapidly.

2. Try a small change of the partworths in the indicated direction, which is done by subtracting the gradient vector from the partworth vector and renormalizing the partworth estimates so as to have a sum of zero within each attribute and a total sum of squares equal to unity.  Each successive estimate of partworths is constrained as indicated by the a priori settings or additional utility constraints.

3. Re-evaluate Theta. If Theta is smaller than before, the step was successful, so we accept the improved estimates and try to obtain further improvement using the same procedure again, by returning to step (2). If Theta is larger than before, we have gone too far, so we revert to the previous estimate of partworths and begin a new iteration by returning to step (1).


If any iteration fails to improve Theta from the previous iteration, or if Theta becomes as small as 1e-10, the algorithm terminates.  In theory, the iterations could continue almost indefinitely with a long series of very small improvements in Theta. For this reason it is useful to place a limit on the number of iterations.  A maximum of 50 iterations are permitted, and within any iteration a maximum of 50 attempts at improvement are permitted.


To avoid the possibility of stumbling into a bad solution due to a poor starting point, the process is repeated 5 separate times from different starting points. For each respondent, the weighted average of the five resulting vectors of partworths is computed (weighted by Tau, where any negative Tau is set to an arbitrarily small positive number). A weighted Tau is also reported with this final estimate of partworths.  


Although Theta was originally defined as a ratio of sums of squared differences, more recently we have experimented with a modified measure which uses absolute values of differences.  The current software permits the user to choose which of the two measures to use.  


We would really like to find a way to optimize Tau, which is a "pure" measure of fit to observations which consist solely of zeros and ones.  However, because Tau does not vary continuously with small changes in partworths, it was necessary to find some other statistic which was "like" Tau but which varied continuously with small changes in the partworths, and that is the role played by Theta.


The rationale for the quadratic definition of Theta is one of historical precedent.  The principle of least squares has been used widely and for a long time in statistics.  However, when using squared differences, the value of Theta is affected much more by larger differences than by smaller ones, so the algorithm may tolerate many small errors to reduce a few large errors.  


The rationale for the linear definition of Theta is that when based on absolute values of errors, Theta is more similar to Tau, in the sense that the algorithm will tolerate a few large errors in order to avoid many smaller errors.


Our experience with the linear definition of Theta has been limited to a few recent data sets.  However, the linear version seems to result not only in better values of Tau (as one would predict) but also in slightly better hit rates for holdout choice sets, and better (more consistently negative) estimates of partworth coefficients for price.  We leave it to our users to experiment with the two versions of Theta, and we will be interested to learn whether the linear version continues to outperform the quadratic version.




No interaction effects:  With purely individual-level analysis, there generally isn't enough information to estimate interaction effects.


No piecewise regression:  This restriction avoids the common occurrence of having little (or no) information to estimate betas for regions of the price demand curve that are less relevant to the respondent.  For example, a respondent who chooses a BYO product in the very expensive range may see very few or no products in the very inexpensive region for which we are attempting to estimate a beta coefficient.


Note: The restriction of using only linear/log-linear price functions might seem limiting.  We can imagine advanced researchers building their own market simulators that include individual-level threshold price cutoffs.  For example, one could use the linear price coefficient estimated via monotone regression, but provide a strong utility penalty should the price exceed the respondent's expressed "unacceptable" price threshold (this information can be exported per respondent using the Counts function).

Page link: