To this point, we have conducted three fairly ambitious methodological studies comparing ACBC to standard CBC (and other researchers outside of Sawtooth Software have conducted other comparisons). Because we have documented the results of those experiments in two white papers published on our website, we refer the reader to those articles and just cover the highlights here.
Chris Chapman of Microsoft conducted a series of comparisons of CBC and ACBC as it relates to real product selection, and reported the results at the 2009 Sawtooth Software Conference (Chapman et al. 2009).
The SKIM group also conducted a recent study to compare ACBC and CBC (Fotenos et al. 2013).
To read more about the details of these studies, we recommend you download and read the following white papers, available from our Technical Papers Library at www.sawtoothsoftware.com:
•"A New Approach to Adaptive CBC"
•"Testing Adaptive CBC: Shorter Questionnaires and BYO vs. 'Most Likelies' "
•"CBC vs. ACBC: Comparing Results with Real Product Selection"
•"ACBC Revisited" (Fotenos et al. Sawtooth Software 2013 Proceedings)
ACBC passed through an eight-month beta test program. Over fifty beta testers conducted over 40 studies during that period. We received very positive feedback from our beta testers. Some of their experiences are described on our website, in the ACBC area.
The critical design issue when comparing one conjoint method to another is the criteria for success. Ideally, the researcher would have access to sales data (or subsequent actual choices of respondents) and could compare predicted choices with actual choices.
In the absence of data on real-world choices, many researchers turn to holdout choice tasks included within stated preference surveys. Traditionally, these have looked exactly like standard CBC questions: typically 3 to 5 product concepts. One of the key things we and other researchers have learned about standard CBC tasks is that respondents often key on just a few levels to make decisions. They often establish a few cutoff rules for excluding or including concepts within their consideration sets. And, if one uses standard CBC tasks that employ "minimal overlap" (where each level is typically available no more than once per task), often only one product concept can satisfy respondents. Choices to such tasks often reflect simplified (superficial) behavior, and other choice tasks designed in the same way not surprisingly are quite successful in predicting answers to those holdouts. We have found that ACBC does about as well as CBC (sometimes better, sometimes worse) in predicting these kinds of holdout CBC tasks. And, that parity doesn't concern us, as we'll further explain.
We have wondered whether traditional holdout tasks really do a good job in mimicking actual purchase behavior, or whether they reflect simplified processing heuristics that are a byproduct of respondents (especially internet sample) completing marketing research surveys with less motivation than they would apply to real-world purchases. When people make real decisions, they often narrow down the choices to an acceptable consideration set (with respect to must-have and must-avoid features) and then make a final choice within the consideration set. To better mimic that process in the holdouts, our first three methodological studies used a customized type of holdout CBC task that involved comparing winning concepts chosen from previous CBC tasks. For example, the respondent might be shown four standard (fixed) CBC holdout tasks, but the fifth holdout included the four winning concepts from the first four tasks. Such customized tasks lead to questions that can probe beyond just the first few critical levels of inclusion and reflect more challenging tradeoffs. We have found that ACBC consistently predicts these choices more accurately than traditional CBC.
Some researchers (such as Elrod 2001) have stressed that if we validate to holdout tasks within survey questionnaires, not only should the holdout tasks be excluded from the calibration tasks used for estimating the utilities, but a sample of holdout respondents should be held out as well. For example, we should be using calibration tasks from respondent group A to predict holdout tasks from respondent group B (and the actual tasks used in group B should not have been used in the experimental design for group A). When this is done, one does not compute individual-level hit rates (since it is no longer a within-respondent analysis), but one uses market simulations and measures share prediction accuracy. Simulations of market choices for group A are used to predict choices to fixed holdouts for group B.
We were fortunate enough to have a robust enough sample in our first ACBC methodological study to follow Elrod's recommendations. In addition to the six hundred respondents in the sample used to estimate part-worth utilities, we were able to collect a separate sample of 900 respondents who all received a single-version (fixed) set 12 CBC tasks. We found that ACBC predicted shares of choice for these scenarios just as well as CBC. But, when we split the 900 respondents into three equal-sized groups based on interview time, we found that ACBC performed worse than CBC in predicting the choices for respondents who were the quickest but better than CBC for respondents that took the most time (and presumably were providing more in-depth and careful answers).
We mention these conclusions and findings to provide direction to researchers who may want to test the merits of ACBC relative to more traditional methods like CBC. If standard holdout CBC tasks are used as the criterion for success, there is methods bias that strongly favors CBC in predicting those tasks. If respondents are using the same (probably unrealistic) simplification heuristics to answer calibration CBC tasks, they will employ the same heuristics in answering holdouts. One needs to consider carefully the appropriate criteria for success. You may find that ACBC does not consistently beat CBC in predicting static minimal-overlap holdout choice tasks, for the reasons outlined above. It occurs to us, however, that if static choice tasks include a greater number of concepts than are traditionally used, more overlap will be introduced, increasing the likelihood that respondents will need to make a more careful decision regarding multiple acceptable concepts per task prior to making a decision. In those cases (which are probably more representative of real choice situations), ACBC may very well consistently beat CBC.
We suspect that ACBC will be more predictive of real-world buying behavior than CBC, but we need practitioners and firms with access to the appropriate sales data to verify this. This is an exciting new area of research, and we are optimistic that ACBC will indeed prove better than traditional CBC for complex conjoint problems involving about five or more attributes.
Previous Results and Conclusions
To date, we have completed three substantial methodological studies comparing ACBC to CBC. The study names, sample sizes, and numbers of attributes are as follows:
•Laptop PC Study (n~600 calibration respondents; n~900 holdout respondents; 10 attributes)
•Recreational Equipment Study (n~900; 8 attributes)
•Home Purchases study (n~1200; 10 attributes)
The experiments all used web-based interviewing. In the case of the Laptop and Home Purchase studies, we used Western Wats panel. For the Recreational Equipment study, customer sample was provided by the client. For the Laptop and Home Purchase studies, we implemented a carefully controlled random split-sample design, where respondents were randomly assigned to receive ACBC or CBC questionnaires, or assigned to be holdout respondents. With the Recreational Equipment study, because of conditions related to working within the client's time line, we weren't able to conduct a perfectly controlled random assignment of respondents to different conditions, and the sample pull receiving the ACBC survey had potentially different characteristics from those receiving CBC. This weakens our ability to draw firm conclusions from that study.
The ACBC questionnaires took longer than the CBC questionnaires to complete (50% longer in the Housing survey, 100% longer in the Laptop study, and 200% longer in the Recreational Equipment study). We suspect that the Recreational Equipment study was relatively longer because it interviewed respondents from a customer list, who may have been even more enthusiastic and engaged in the process. The Housing survey was relatively shorter, because we investigated whether we could shorten the survey length without sacrificing much in terms of predictive validity (see results below, which show that even the shortened ACBC surveys were successful).
A key finding is that even though ACBC respondents were asked to do a longer task, they rated the survey as more interesting and engaging than respondents taking CBC surveys rated the CBC surveys. Our thought is that when collecting conjoint data, speed alone shouldn't be the goal, as we cannot ignore quality. And, if we can improve the survey experience for the respondent even though we are asking them to spend more time, then the additional investment to acquire better data is well worth the effort.
Interest/Satisfaction with the Survey
For the Laptop and Home Purchases studies (which provide the strongest results due to the controlled experiments), the ACBC respondents reported higher satisfaction and interest in the survey (than CBC respondents). They reported that the survey was more realistic, less monotonous/boring, and that it made them more likely to want to slow down and make careful choices (than the CBC respondents). All these differences were significant at p<0.05.
In the two studies involving random assignment to either ACBC or CBC cells, the ACBC respondents had higher hit rates (for the customized CBC holdout task) than CBC respondents, with differences significant at p<0.05:
Hit Rates, Custom Holdout Task
For the Recreational Equipment study (which lacked a completely random split-sample control), ACBC had a 2-point edge in hit rate prediction over CBC, but the result was not statistically significant. All these figures reflect the generic HB estimation method.
Some researchers have wondered whether the edge in predictive ability demonstrated by ACBC over CBC might be accounted for by the longer questionnaire time for ACBC. Maybe ACBC's success is due to more precise estimates because we've asked more questions of each respondent. While this seems plausible, we doubt that this can entirely explain the results, because it has been demonstrated that hit rates for CBC respondents gain very little beyond about 10 to 15 choice tasks (and the CBC surveys for the Laptop and Houses studies both included 18 tasks). We come to this conclusion based on a meta-study of 35 commercial data sets by Hoogerbrugge and van der Wagt of SKIM Analytical as presented at the 2006 Sawtooth Software Conference. They randomly threw out one task at a time and estimated the loss in hit rate precision depending on questionnaire length. Based on their research, in the range of 10 to 15 choice tasks, the gains in hit rate rapidly decreased and in some cases leveled off through up to 19 tasks. The net pattern and absolute magnitude of the gains (observed across 35 data sets) by extending the questionnaire would not seem to be able to eventually make up the gaps of 11 and 7 absolute hit rate points in favor of ACBC as seen in our two experiments. In the future, it would be good to see research that time-equalized the CBC and ACBC questionnaires (though respondents receiving the CBC questionnaire will face a monotonous challenge!).
In the Laptop study, we were able to estimate shares for holdout CBC tasks completed by a new sample of 900 respondents. The 900 new respondents completed 12 fixed CBC tasks. The part-worths generated from ACBC performed just as well as those from CBC in predicting these holdout shares, when including all respondents. But, we split the 900 respondents into three equal samples, based on time taken in the interview. After segmenting based on time, we found that CBC predicted the fastest respondents better than ACBC, but ACBC predicted the middle and slower respondent groups better than CBC (with the best prediction occurring for the holdouts completed by the slowest respondents). In each case, we tuned the simulations for scale, so the differences cannot be explained by the random error components (which were significantly different for the three segments of respondents). Also, a Swait-Louviere test (which also controls for scale) found differences in the parameters across the groups of holdout respondents. This finding suggests that ACBC can do a better job at predicting choices made by respondents who take the most time and presumably are being more careful and considerate in their choices.
For the Houses Study, we did not collect holdout respondents. However, we were able to use the part-worths from the ACBC and CBC respondents to predict choices made by the two groups of respondents combined (a total of 1200 respondents in the validation sample). Thus, ACBC part-worths were being used to predict holdout responses made by both ACBC and CBC respondents; and CBC respondents the same. Based on predictions of four fixed holdout tasks (each with four product alternatives), the ACBC predictions were better than those from CBC (the mean absolute errors in prediction were 3.00 and 5.47 share points for ACBC and CBC, respectively).
In the three methodological studies we conducted, the price sensitivity derived from ACBC estimates was quite similar to that from CBC. Our beta testers have reported some instances where ACBC has produced significantly greater price sensitivity than CBC. In those instances, the ACBC result was viewed as potentially more realistic. We hope to see further research conducted in this area.