Thanks for this question as it raises a critical point that is the key to the foundation of why conjoint analysis works. Even though you phrase the question with respect to discrete choice experiments (CBC), I’ll give an illustration of how this occurs for purely rank-order card-sort conjoint. The same logic (for how we obtain interval-scale utilities, not just rank-order utilities) carries over to the later CBC.

Conjoint analysis as published by Paul Green in the early 1970s with its rank-ordering of conjoint cards (profiles) leads to metric, interval data quality in its partworths, even though the data collection is purely rank-order.

Conjoint measurement (multiple dimensions) is different from measurement on a single dimension. If the measurement task involved just rank ordering, say, 10 levels along a single dimension (factor), without any repeated measures or response error for a given respondent, then this would be ordinal data and there would be no way to know how much more preferred the best level was vs. the 2nd best level vs. the 3rd best level, etc. But, conjoint analysis involves combining more than one factor (attribute) into conjoined profiles and asking respondents to rank-order, rate, or choose among those conjoined elements. This leads to much more than just rank-order preference information within each factor as facilitated by a) orthogonal or near-orthogonal experimental designs across the product profiles, and critically b) the compensatory rule decision process assumption we make: one very preferred brand with a very high true utility score can counterbalance (compensate for) multiple negative attribute levels from multiple (less important) factors.

Just now, I just did an empirical proof of this using Paul Green’s card-sort conjoint method (facilitated by our CVA software). Imagine a conjoint design with 4 attributes, each with 3 levels. There are 3x3x3x3 = 81 possible profiles in the full factorial, but we can pick an orthogonal and level-balanced design in 27 cards (product profiles).

Further assume that the true utilities for the 4 attributes are as follows:

Attribute 1:

Level 1: 1 utile

Level 2: 2 utiles

Level 3: 3 utiles

Attribute 2:

Level 1: 1 utile

Level 2: 2 utiles

Level 3: 3 utiles

Attribute 3:

Level 1: 1 utile

Level 2: 2 utiles

Level 3: 4 utiles

Attribute 4:

Level 1: 1 utile

Level 2: 2 utiles

Level 3: 4 utiles

You can see how I’ve set this up that if conjoint can only recover ordinal scale utilities, it will not be able to figure out that level 3 (at 4 utiles) for attributes 3 and 4 is twice as preferred as level 2 (at 2 utiles) and level2 is twice as preferred as level 1 (at 1 utile).

Now, let’s assume that the respondent rank-orders the 27 cards with perfect consistency according to the true total utilities (randomly breaking ties if two cards have exactly the same utility). Because of the particular way I set up the “true” utilities to always have utilities of exactly 1, 2, 3, or 4 within each factor there are quite a few ties in total utility across the 27 cards in the design. So, I generated 5 respondents who answered according to the true utilities above but randomly broke any ties in their rank ordering of the profiles.

Using a purely non-parametric regression technique (monotone regression) I estimated utilities for these 5 respondents. Due to convention, we scale the utilities to be zero-centered within each attribute. (I also rescaled the utilities with a global scaling multiplier to be on the same magnitude scale as the true utilities.) The estimated conjoint utilities (averaged across the 5 respondents) are given below:

Attribute 1:

-1.01

0.00

1.01

Attribute 2:

-1.02

0.00

1.02

Attribute 3:

-1.35

-0.34

1.69

Attribute 4:

-1.34

-0.34

1.68

You can see that for attributes 1 and 2, the metric distance in utility between all three levels is nearly exactly 1 utile (as expected, given the true utilities). For attributes 3 and 4, the utility distance between levels 1 and 2 is almost exactly 1 utile and the distance between levels 2 and 3 is almost exactly 2 (again, as expected).

Thus, you can see that even though respondents sorted the cards on a rank-order scale, the part-worth utilities that are recovered have interval quality within each attribute. Interval quality utilities support the ability to add the part-worths across attributes to obtain a predicted total utility for any given product concept that may be constructed using the original factors. This of course gives rise to the market simulations that are subsequently performed on conjoint analysis data.

Even though I’ve shown this example using the 1970s-era card-sort conjoint, one can show the same results for multi-attribute discrete choice experiments (CBC) with utilities estimated under MNL (logit).

However, I still don't have it totally clear this measurement issue. In the example provided by Bryan, I don't understand where the true utilities for the 4 attributes came from. What are these true utilities? The respondent had to order the 27 cards described by the different levels of the 4 attributes. So, one only has this ordinal scale data as input.

Would it be possible to provide an example for CBC?

Otherwise could you please give me some references where I could read about the measurement issue (preferable a good textbook where these things are explained in detail).

Thank you!