Sawtooth Software: The Survey Software of Choice

Adaptive CBC: Summary of Three Experiments

This article is excerpted from an article by Rich Johnson, Bryan Orme, Joel Huber, and Jon Pinnell entitled “Testing Adaptive Choice-Based Conjoint Designs” presented at the 2005 Design and Innovations Conference, held in Berlin. The full text may be downloaded from our Technical Papers library at www.sawtoothsoftware.com.

Choice-Based Conjoint analysis (CBC) has achieved a dramatic increase in use during the past decade. One important reason for the growth of CBC is that choices are more like actual marketplace behavior than are the rankings or ratings used by other conjoint methods. Another reason is that hierarchical Bayes (HB) methods now permit estimation of partworths for individuals, where previously it had seldom been possible to collect enough choices from each individual to support individual-level analysis. However, despite the important contribution of HB, there remains considerable incentive to make choice designs more efficient.

Huber & Zwerina (1996) showed that efficient choice designs have four characteristics: orthogonality, level balance, minimal overlap, and utility balance. The first three of these are present in designs provided by Sawtooth Software’s CBC System, but utility balance is not.

Various combinations of authors have been involved in three (2003, 2004, and 2005) recent Sawtooth Software experiments to test an algorithm for adaptive CBC (ACBC) which accounts for individual-level utilities when designing choice tasks.

An Algorithm for Adaptive CBC:

In a good design the estimation error for the parameters is as small as possible. The adaptive algorithm creates a unique design for each respondent, where the questions are chosen to maximize D-efficiency. Because partworths affect D-efficiency (utility balance yields greater statistical efficiency), preliminary estimates of partworths are needed for each respondent.

ACBC begins with an estimate of the respondent’s partworths, obtained differently in the three experiments we conducted. The first choice task is always random, subject to requiring only minimum overlap among the attribute levels represented. After the first choice task has been completed, the information matrix for completed choice tasks is calculated. Alternatives for the next choice set are constructed to maximize D-efficiency, based on the preliminary partworth estimates. There were differences in the three experiments in how the initial estimate of partworths was obtained, whether estimated partworths were updated during the questionnaire, and in how the next task was chosen to improve design efficiency. (Those details are contained in the full written paper, but not in this summary.)

The First Experiment:

This study was done with approximately 1000 allergy sufferers who were members of Knowledge Networks’ web-based panel. The product category was antihistamines, described by 9 attributes having two or three levels, for a total of 23 levels.

The respondents were randomly allocated to receive either standard CBC, or an Adaptive CBC questionnaire. Self-explicated questions similar to ACA’s “prior” section were used to estimate partworth utilities for each respondent for use in the adaptive CBC design algorithm.

Success was measured by hit rates measuring accuracy of prediction of individual choices in holdout choice tasks, and MAEs (mean absolute errors) of share predictions for the holdout tasks. Hit rates for ACBC and CBC were nearly identical, but share predictions showed a directional improvement for ACBC. However, the experiment didn’t allow for testing significant differences between share prediction accuracy.

The Second Experiment:

The second study dealt with laptop computers, and was conducted with approximately 1000 members of AOL’s Opinion Place panel. The data were contributed by SPSS. Respondents were selected using a “river” methodology, recruited from a variety of popular Web portals.

There were two differences from the first experiment that may have affected the results.

  • In this study the self-explicated questions included within-attribute desirability ratings but not attribute importance ratings. This decision was made to save interview time since the between-attribute constraints had not been useful in the first study.
  • We made a small adjustment to the design algorithm for ACBC.

As with the first experiment, hit rates showed a virtual tie between ACBC and CBC. But in contrast to the first experiment, share prediction accuracy was directionally worse for ACBC, relative to CBC.

This inconsistency led the authors to search for possible causes. They calculated D-Efficiencies based on preliminary partworths, and found the ACBC designs to have average efficiencies nearly twice those of the CBC group. Thus it appeared that the adaptive algorithm behaved as expected. However, when computed using final partworth estimates, the efficiencies for the ACBC group were much smaller. For the Full Profile treatment the ACBC designs were only 9% better than the CBC designs.

For Full Profile respondents, the average correlation between self-explicated estimates and final partworths was only 0.445. The authors concluded that the preliminary partworths had not been sufficiently accurate to be useful in guiding the adaptive algorithm. The authors concluded by observing that the adaptive algorithm appeared to have worked as expected, but that the preliminary partworths available to it were not effective. They suggested further research in which other means might be used to estimate preliminary partworths.

The Third Experiment:

The third experiment was done with approximately 450 respondents who were members of MarketVision Research’s Web panel. The product category was hotels, described by 9 attributes.

An important difference between this experiment and the previous ones was the way preliminary partworths were estimated. Self-explicated questions were asked about desirabilities of levels within attributes so we might investigate the effect of constraining partworths to have desired rank orders, but those answers were not used to construct preliminary partworths. Instead, a hierarchical Bayes procedure was used to estimate partworths and to update them “on-the-fly” after each question.

Although in this third study it appeared that hit rates for ACBC had a slight edge over CBC, that improvement turned out to be illusory. A covariance analysis was done to remove any spurious difference in hit rates due to the difference in test-retest reliabilities, and when the groups were equated on reliability the difference in hit rates disappeared. We are left with results for hit rates very similar to those for the first experiment: no significant difference between treatments.

Results for share predictions showed slightly lower share prediction predictive accuracy for ACBC relative to CBC. ACBC again failed to demonstrate superiority over regular CBC. We believe that the algorithm used in this third study was superior to previous versions, and are disappointed that the results were not better. In the last section we consider possible reasons for its failure.

Discussion:

The ACBC algorithm was tested extensively with simulated respondents before the first study with human respondents. It had worked well in that artificial context. But simulated respondents provide fewer challenges. Accurate partworths for our simulated respondents were available at the beginning of the interview, simulated respondents did not change their preferences during the interview, and they made choices faithfully according to the logit model. We can think of several possible reasons for ACBC’s failure to perform as well with human respondents, some of which are listed below.

Initial estimates of partworths not good enough. The adaptive algorithm requires preliminary estimates of partworths. The first study, where ACBC seemed to work well, collected self-explicated judgments of attribute importances and desirabilities of attribute levels in a way similar to that of ACA. Perhaps this provided better preliminary partworth estimates than either of the later two studies. The second study did not ask for attribute importances, and its preliminary partworths did not correlate well with final partworths. In the third study the “on-the-fly” partworth estimates became quite good for later tasks, but may not have been good enough during the early tasks.

Respondents change their values during the interview. We know from other research (Johnson and Orme, 1996) that brand tends to become less important and price more so as interviews progress. Perhaps there are other changes as well, and partworths obtained from information available early in the interview may not be capable of leading to a design that is efficient for estimating partworths at the end of the interview.

Respondents don't use logit models to make choices. We know respondents don’t make choices by summing partworths. There is ample evidence that they use various schemes to simplify the job of answering choice tasks in market research questionnaires. Despite this, the multinomial logit model has been generally successful in predicting respondent choices. For example, in the three studies reported here, hit rates were usually nearly as large as test-retest reliability percentages, indicating that the hit rates were nearly as good as possible.

D-Efficiency may not be a good criterion to maximize. Although the logit model has many desirable properties and provides a useful approximation to respondent choice behavior, high D-Efficiencies may not translate into favorable hit rates and MAEs. Utility balance is desirable under the logit model, but for respondents who use response strategies different from the logit model, the other three design characteristics may be more critical. Although D-Efficient designs produce good estimation of logit parameters, those parameters may predict choice behavior less well than parameters developed from designs that maximize orthogonality, level balance, and minimal overlap. Designs produced by Sawtooth Software’s CBC System, to which we have been comparing ACBC designs, do precisely that. Perhaps designs which do not take account of partworths, such as those produced by regular CBC, have an advantage for predicting hit rates and choice shares in holdout concepts.

In the meantime, we are impressed that standard CBC designs appear to be surprisingly robust, and regular CBC appears to be hard to beat.

References

Huber, Joel and Klaus Zwerina (1996), “The Importance of Utility Balance in Efficient Choice Designs,” Journal of Marketing Research, 33 (August) 307-317.

Johnson, Richard M. and Bryan Orme (1996), “How Many Questions Should You Ask in Choice-Based Conjoint Studies?”, Available at http://sawtoothsoftware.com/technicaldownloads.shtml#howmany

Johnson, Richard M., Joel Huber, and Lynd Bacon (2003), “Adaptive Choice-Based Conjoint,” Sawtooth Software Conference Proceedings.

Johnson, Richard M., Joel Huber, and Bryan Orme (2004), “A Second Test of Adaptive Choice-Based Conjoint Analysis (The Surprising Robustness of Standard CBC Designs),” Sawtooth Software Conference Proceedings.