# How can part worth utilities be interpreted as interval data?

Hi,

regarding the measurability of main effects/part worth utilities, I read on "interpreting the results"-notes and got to know that they are interval data. I do understand what interval data are, though my question is, how can you interpret utilities as interval data, by which the difference btw. utilities has a meaning.  Normally, utilities are ordinal not cardinal. A utility of 3 is bigger than a utility of 2, but the difference btw. a utility of 2 and 3 is meaningless.

In choice experiments, respondents have to choose their preferred alternative among several alternatives. In the data collected, one can then observe which alternative was chosen and infer that it was preferred over other available alternatives. However, one cannot say by how much the chosen alternative was more preferred than the other available alternatives. One cannot observe, by how much one attribute level is more preferred than another attribute level. Thus, I don't understand how can part worth utilities be interpreted as interval data?

I would appreciate any clarification.

Thank you!

+1 vote
Thanks for this question as it raises a critical point that is the key to the foundation of why conjoint analysis works.  Even though you phrase the question with respect to discrete choice experiments (CBC), I’ll give an illustration of how this occurs for purely rank-order card-sort conjoint.  The same logic (for how we obtain interval-scale utilities, not just rank-order utilities) carries over to the later CBC.

Conjoint analysis as published by Paul Green in the early 1970s with its rank-ordering of conjoint cards (profiles) leads to metric, interval data quality in its partworths, even though the data collection is purely rank-order.

Conjoint measurement (multiple dimensions) is different from measurement on a single dimension.  If the measurement task involved just rank ordering, say, 10 levels along a single dimension (factor), without any repeated measures or response error for a given respondent, then this would be ordinal data and there would be no way to know how much more preferred the best level was vs. the 2nd best level vs. the 3rd best level, etc.  But, conjoint analysis involves combining more than one factor (attribute) into conjoined profiles and asking respondents to rank-order, rate, or choose among those conjoined elements.  This leads to much more than just rank-order preference information within each factor as facilitated by a) orthogonal or near-orthogonal experimental designs across the product profiles, and critically b) the compensatory rule decision process assumption we make: one very preferred brand with a very high true utility score can counterbalance (compensate for) multiple negative attribute levels from multiple (less important) factors.

Just now, I just did an empirical proof of this using Paul Green’s card-sort conjoint method (facilitated by our CVA software).  Imagine a conjoint design with 4 attributes, each with 3 levels.  There are 3x3x3x3 = 81 possible profiles in the full factorial, but we can pick an orthogonal and level-balanced design in 27 cards (product profiles).

Further assume that the true utilities for the 4 attributes are as follows:

Attribute 1:
Level 1: 1 utile
Level 2: 2 utiles
Level 3: 3 utiles

Attribute 2:
Level 1: 1 utile
Level 2: 2 utiles
Level 3: 3 utiles

Attribute 3:
Level 1: 1 utile
Level 2: 2 utiles
Level 3: 4 utiles

Attribute 4:
Level 1: 1 utile
Level 2: 2 utiles
Level 3: 4 utiles

You can see how I’ve set this up that if conjoint can only recover ordinal scale utilities, it will not be able to figure out that level 3 (at 4 utiles) for attributes 3 and 4 is twice as preferred as level 2 (at 2 utiles) and level2 is twice as preferred as level 1 (at 1 utile).

Now, let’s assume that the respondent rank-orders the 27 cards with perfect consistency according to the true total utilities (randomly breaking ties if two cards have exactly the same utility).  Because of the particular way I set up the “true” utilities to always have utilities of exactly 1, 2, 3, or 4 within each factor there are quite a few ties in total utility across the 27 cards in the design.  So, I generated 5 respondents who answered according to the true utilities above but randomly broke any ties in their rank ordering of the profiles.

Using a purely non-parametric regression technique (monotone regression) I estimated utilities for these 5 respondents.  Due to convention, we scale the utilities to be zero-centered within each attribute.  (I also rescaled the utilities with a global scaling multiplier to be on the same magnitude scale as the true utilities.)  The estimated conjoint utilities (averaged across the 5 respondents) are given below:

Attribute 1:
-1.01
0.00
1.01

Attribute 2:
-1.02
0.00
1.02

Attribute 3:
-1.35
-0.34
1.69

Attribute 4:
-1.34
-0.34
1.68

You can see that for attributes 1 and 2, the metric distance in utility between all three levels is nearly exactly 1 utile (as expected, given the true utilities).  For attributes 3 and 4, the utility distance between levels 1 and 2 is almost exactly 1 utile and the distance between levels 2 and 3 is almost exactly 2 (again, as expected).

Thus, you can see that even though respondents sorted the cards on a rank-order scale, the part-worth utilities that are recovered have interval quality within each attribute.  Interval quality utilities support the ability to add the part-worths across attributes to obtain a predicted total utility for any given product concept that may be constructed using the original factors.  This of course gives rise to the market simulations that are subsequently performed on conjoint analysis data.

Even though I’ve shown this example using the 1970s-era card-sort conjoint, one can show the same results for multi-attribute discrete choice experiments (CBC) with utilities estimated under MNL (logit).
answered Dec 30, 2015 by Platinum (152,955 points)
Thank you very much to all of you!

However, I still don't have it totally clear this measurement issue. In the example provided by Bryan, I don't understand where the true utilities for the 4 attributes came from. What are these true utilities? The respondent had to order the 27 cards described by the different levels of the 4 attributes. So, one only has this ordinal scale data as input.

Would it be possible to provide an example for CBC?

Otherwise could you please give me some references where I could read about the measurement issue (preferable a good textbook where these things are explained in detail).

Thank you!
I made up these hypothetical "true" utilities for this respondent as an example to demonstrate to you that purely ordinal respondent input (ranking of cards) can still reveal metric (interval) scale utilities.

I could do the same thing for CBC, but the results would turn out exactly the same (I've done it before to prove to myself this issue).  To do this for CBC, however, we'd need to make the common assumption that follows from the logit rule theory that respondents answer CBC tasks according to true utilities but perturbed by response error distributed as Gumbel.  When respondents answer CBC tasks with response error (real respondents always answer with some degree of error) distributed Gumbel, the utilities you estimate (via logit) from the observed choices to the CBC tasks precisely match the "true" utilities that the respondents were following as they answered the questionnaire.

I am no academic scholar regarding this...I'm a practitioner.  An academic could better point you to the proper readings supporting this.  Regarding references for why ordinal inputs for traditional card-sort conjoint can still recover metric interval preference utilities, I'd start with the seminal article:

Luce, R.D.; Tukey, J.W. (January 1964). "Simultaneous conjoint measurement: a new scale type of fundamental measurement". Journal of Mathematical Psychology 1 (1): 1–27.

Regarding the key work behind multinomial logit (conditional logit) that is used with CBC, I'd go to the seminal article by Dan McFadden "Conditional Logit Analysis of Qualitative Choice Behavior" from FRONTIERS IN ECONOMETRICS, 105-142, Academic Press: New York, 1974.:

Thank you very much Bryan!
Because the utilities are estimated from reported choices via a statistical model called multinomial logit (MNL) an attribute level with a utility of 2.5 adds twice as much utility to a product as would a level with a utility of 1.25.  It's the creation of the utilities via statistical prediction of reported choices that allows this and that makes the utilities an interval measurement system.

Now, there's a non-linear relationship between utility and choice probability (via the MNL choice rule) so having a utility of 2.5 does not double the effect on choice probability relative to having a utility of 1.25 .
answered Dec 30, 2015 by Platinum (65,925 points)
Hello Keith,

If you say "an attribute level with a utility of 2.5 adds twice as much utility to a product as would a level with a utility of 1.25.", doesn't that already mean interpreting part-worths as a ratio data? I'm having difficulties to understand that and interpret part-worth utilities.

Could you please tell me, whether the following interpretation would then be correct?

Example: A product is described by 3 attributes, the respondent could choose btw. 2 products and the estimated part-worth utilities for attribute 1 are as follows:

Attribute 1       beta (part-worth utilities)
Level 1:             10
Level 2:             20
Level 3:             40

Can I now say that a product described with Level 2 (of attribute 1) adds twice as much utility to the average respondent than would a product described with Level 1 (of attribute 1) (all else being equal)?

Do I understand correctly: A consumer's utility from a product is the sum of part-worth utilities (utilities the consumer derives from the characteristics of the product)?

I suppose it would be more accurate to say that with this particular scaling, level 2's utility (10) is twice level 1's (10).  But as interval measures utilities are subject to (and valid with) any linear transformation, so they could be represented as 5, 15 and 35, respectively (subtracting 5 from each).  Now the utility for level 2 isn't twice that for level 1 but 3 times.  As you can see, you have to be very careful how you talk about utilities.
Thanks a lot Keith. I know that I have to be very careful with interval data and therefore I was confused with your comment above "an attribute level with a utility of 2.5 adds twice as much utility to a product as would a level with a utility of 1.25.".

In conclusion, I can't really interpret the results much except for saying one utility is more preferred than another. Or what am I missing? How could you interpret those three part-worth utilities in the previous example? Can I say: Level 1 is 10 points less preferred than Level 2? Or Level 3 adds 20 points more utility than Level 2 and 30 points more utility than level 1??

Thank you!
That is correct - the only place you can really start making stronger interpretations is when you're running simulations.
so, I cannot interpret part-worth utilities as mentioned in the last post? Because the difference btw. them stays always the same whatever number I add or subtract...
thanks
You can say less and more, but the "how much" part really needs to wait until you run simulations.