Have an idea?

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

How can part worth utilities be interpreted as interval data?

Hi,

regarding the measurability of main effects/part worth utilities, I read on "interpreting the results"-notes and got to know that they are interval data. I do understand what interval data are, though my question is, how can you interpret utilities as interval data, by which the difference btw. utilities has a meaning.  Normally, utilities are ordinal not cardinal. A utility of 3 is bigger than a utility of 2, but the difference btw. a utility of 2 and 3 is meaningless.  

In choice experiments, respondents have to choose their preferred alternative among several alternatives. In the data collected, one can then observe which alternative was chosen and infer that it was preferred over other available alternatives. However, one cannot say by how much the chosen alternative was more preferred than the other available alternatives. One cannot observe, by how much one attribute level is more preferred than another attribute level. Thus, I don't understand how can part worth utilities be interpreted as interval data?

I would appreciate any clarification.

Thank you!
asked Dec 30, 2015 by anonymous

3 Answers

+1 vote
Thanks for this question as it raises a critical point that is the key to the foundation of why conjoint analysis works.  Even though you phrase the question with respect to discrete choice experiments (CBC), I’ll give an illustration of how this occurs for purely rank-order card-sort conjoint.  The same logic (for how we obtain interval-scale utilities, not just rank-order utilities) carries over to the later CBC.

Conjoint analysis as published by Paul Green in the early 1970s with its rank-ordering of conjoint cards (profiles) leads to metric, interval data quality in its partworths, even though the data collection is purely rank-order.  

Conjoint measurement (multiple dimensions) is different from measurement on a single dimension.  If the measurement task involved just rank ordering, say, 10 levels along a single dimension (factor), without any repeated measures or response error for a given respondent, then this would be ordinal data and there would be no way to know how much more preferred the best level was vs. the 2nd best level vs. the 3rd best level, etc.  But, conjoint analysis involves combining more than one factor (attribute) into conjoined profiles and asking respondents to rank-order, rate, or choose among those conjoined elements.  This leads to much more than just rank-order preference information within each factor as facilitated by a) orthogonal or near-orthogonal experimental designs across the product profiles, and critically b) the compensatory rule decision process assumption we make: one very preferred brand with a very high true utility score can counterbalance (compensate for) multiple negative attribute levels from multiple (less important) factors.

Just now, I just did an empirical proof of this using Paul Green’s card-sort conjoint method (facilitated by our CVA software).  Imagine a conjoint design with 4 attributes, each with 3 levels.  There are 3x3x3x3 = 81 possible profiles in the full factorial, but we can pick an orthogonal and level-balanced design in 27 cards (product profiles).

Further assume that the true utilities for the 4 attributes are as follows:

Attribute 1:
   Level 1: 1 utile
   Level 2: 2 utiles
   Level 3: 3 utiles

Attribute 2:
   Level 1: 1 utile
   Level 2: 2 utiles
   Level 3: 3 utiles

Attribute 3:
   Level 1: 1 utile
   Level 2: 2 utiles
   Level 3: 4 utiles

Attribute 4:
   Level 1: 1 utile
   Level 2: 2 utiles
   Level 3: 4 utiles

You can see how I’ve set this up that if conjoint can only recover ordinal scale utilities, it will not be able to figure out that level 3 (at 4 utiles) for attributes 3 and 4 is twice as preferred as level 2 (at 2 utiles) and level2 is twice as preferred as level 1 (at 1 utile).

Now, let’s assume that the respondent rank-orders the 27 cards with perfect consistency according to the true total utilities (randomly breaking ties if two cards have exactly the same utility).  Because of the particular way I set up the “true” utilities to always have utilities of exactly 1, 2, 3, or 4 within each factor there are quite a few ties in total utility across the 27 cards in the design.  So, I generated 5 respondents who answered according to the true utilities above but randomly broke any ties in their rank ordering of the profiles.

Using a purely non-parametric regression technique (monotone regression) I estimated utilities for these 5 respondents.  Due to convention, we scale the utilities to be zero-centered within each attribute.  (I also rescaled the utilities with a global scaling multiplier to be on the same magnitude scale as the true utilities.)  The estimated conjoint utilities (averaged across the 5 respondents) are given below:

Attribute 1:
-1.01
0.00
1.01

Attribute 2:
-1.02
0.00
1.02

Attribute 3:
-1.35
-0.34
1.69

Attribute 4:
-1.34
-0.34
1.68

You can see that for attributes 1 and 2, the metric distance in utility between all three levels is nearly exactly 1 utile (as expected, given the true utilities).  For attributes 3 and 4, the utility distance between levels 1 and 2 is almost exactly 1 utile and the distance between levels 2 and 3 is almost exactly 2 (again, as expected).

Thus, you can see that even though respondents sorted the cards on a rank-order scale, the part-worth utilities that are recovered have interval quality within each attribute.  Interval quality utilities support the ability to add the part-worths across attributes to obtain a predicted total utility for any given product concept that may be constructed using the original factors.  This of course gives rise to the market simulations that are subsequently performed on conjoint analysis data.

Even though I’ve shown this example using the 1970s-era card-sort conjoint, one can show the same results for multi-attribute discrete choice experiments (CBC) with utilities estimated under MNL (logit).
answered Dec 30, 2015 by Bryan Orme Platinum Sawtooth Software, Inc. (131,990 points)
Thank you very much to all of you!


However, I still don't have it totally clear this measurement issue. In the example provided by Bryan, I don't understand where the true utilities for the 4 attributes came from. What are these true utilities? The respondent had to order the 27 cards described by the different levels of the 4 attributes. So, one only has this ordinal scale data as input.

Would it be possible to provide an example for CBC?

Otherwise could you please give me some references where I could read about the measurement issue (preferable a good textbook where these things are explained in detail).


Thank you!
I made up these hypothetical "true" utilities for this respondent as an example to demonstrate to you that purely ordinal respondent input (ranking of cards) can still reveal metric (interval) scale utilities.

I could do the same thing for CBC, but the results would turn out exactly the same (I've done it before to prove to myself this issue).  To do this for CBC, however, we'd need to make the common assumption that follows from the logit rule theory that respondents answer CBC tasks according to true utilities but perturbed by response error distributed as Gumbel.  When respondents answer CBC tasks with response error (real respondents always answer with some degree of error) distributed Gumbel, the utilities you estimate (via logit) from the observed choices to the CBC tasks precisely match the "true" utilities that the respondents were following as they answered the questionnaire.

I am no academic scholar regarding this...I'm a practitioner.  An academic could better point you to the proper readings supporting this.  Regarding references for why ordinal inputs for traditional card-sort conjoint can still recover metric interval preference utilities, I'd start with the seminal article:

Luce, R.D.; Tukey, J.W. (January 1964). "Simultaneous conjoint measurement: a new scale type of fundamental measurement". Journal of Mathematical Psychology 1 (1): 1–27.

Regarding the key work behind multinomial logit (conditional logit) that is used with CBC, I'd go to the seminal article by Dan McFadden "Conditional Logit Analysis of Qualitative Choice Behavior" from FRONTIERS IN ECONOMETRICS, 105-142, Academic Press: New York, 1974.:

http://eml.berkeley.edu/reprints/mcfadden/zarembka.pdf
Thank you very much Bryan!
0 votes
Because the utilities are estimated from reported choices via a statistical model called multinomial logit (MNL) an attribute level with a utility of 2.5 adds twice as much utility to a product as would a level with a utility of 1.25.  It's the creation of the utilities via statistical prediction of reported choices that allows this and that makes the utilities an interval measurement system.  

Now, there's a non-linear relationship between utility and choice probability (via the MNL choice rule) so having a utility of 2.5 does not double the effect on choice probability relative to having a utility of 1.25 .
answered Dec 30, 2015 by Keith Chrzan Platinum Sawtooth Software, Inc. (50,675 points)
0 votes
As an addendum to Keith's comment, while you say a choice does not tell you how much someone prefers an item (which is definitely true), it would be very rare for us to try to build a model from a single choice task.  If your design is nice and balanced and orthogonal, then the respondent should be seeing, for example, Brand A/B/C and all the different price points.  If someone chooses Brand A every time it comes up in a task, regardless of price, we do have information to help us estimate the strength of preference.
answered Dec 30, 2015 by Brian McEwan Gold Sawtooth Software, Inc. (37,410 points)
As further addendum, the questioner's concern about cardinality makes me think of the axiomatic choice theorists and their concerns about the meaning of the numbers we use as utilities.  I think you could trace the intervalness of the utilities choice modelers use back to Thurstone who, in 1929, published his Law of Comparative Judgment.  This, among other things, shows how to get from choice data to interval measures.
...