Sawtooth Software: The Survey Software of Choice

Conference 2000: Summary of Findings

Nearly two-dozen presentations were given at our most recent Sawtooth Software conference in Hilton Head. We've summarized some of the high points below. Since we cannot possibly convey the full worth of the papers in a few paragraphs, the authors have submitted complete written papers (not the presentation slides) for the 2000 Sawtooth Software Conference Proceedings. If you haven't yet ordered your copy, please consider adding this valuable reference to your shelf. Call us at 360/681-2300 to order.

Moving Studies to the Web: A Case Study (Karlan Witt): Karlan described the process her company has gone through in moving a tracking study (media readership for publications) from paper-and-pencil data collection to the Web. Her firm designed a study to compare the paper-and-pencil approach to three different administration formats over the Web. Respondents were recruited and pre-screened in the same way, and randomly assigned to one of the four questionnaire version cells. The Web data collection cells had lower response rates than the paper-based technique. The demographics of the respondents did not differ by gender, age, education or computer expertise between cells. But, there were some significant differences. Individuals with faster microprocessors were slightly more likely to complete the Web based survey, as well as those who owned other high-tech products. Karlan also reported some significant differences in the readership estimates for certain publications, depending on the data collection modality.

She concluded that generally similar results can be obtained over the Web as with paper. The look and feel of the graphical user interface of a Web survey, she maintained, can strongly affect both the response rate and the final sample composition. She suggested that the survey layout over the Web can affect the results as much as the choice of survey collection modality. Rather than try to replicate the look and functionality of past non-Web questionnaires over the Web, Karlan suggested designing the study to take advantage of what the Web can offer.

Trouble with Conjoint Methodology in International Industrial Markets (Stefan Binner): Stefan reported on a small survey his firm conducted among users of conjoint analysis (research professionals) in industrial or business to business markets. They achieved 37 responses. Stefan reported general satisfaction among these researchers for conjoint methods applied within b-to-b markets. The study also revealed some problems and challenges for conjoint researchers. The biggest hurdles in selling conjoint analysis projects are that clients don't always understand the technique and the costs of conjoint analysis studies. Other concerns voiced regarding conjoint analysis are the limited number of attributes that can be studied, the perception by some that the conjoint questions are not realistic, and the length of conjoint questionnaires. Despite the challenges, 70% of respondents reported being either satisfied or very satisfied with the use of conjoint analysis for industrial b-to-b markets. Furthermore, 63% of the respondents planned to increase the number of conjoint projects in the future.

Validity and Reliability of Online Conjoint Analysis (Torsten Melles, Ralf Laumann and Heinz Holling): Torsten presented evidence that useful conjoint analysis data can be collected over the Internet, although its reliability may be lower than for other data collection modalities. He cautioned that the suitability of conjoint analysis over the Internet depends on a number of aspects: the number of attributes in the design, the characteristics of the respondent, and the researcher's ability to identify unreliable respondents. Torsten suggested using multiple criteria for determining the reliability of respondents using individual-level parameters. He also cautioned that IP-addresses and personal data should be carefully compared to guard against double-counting respondents.

Brand/Price Trade-Off Via CBC and Ci3 (Karen Buros): Brand-Price Trade-Off (BPTO) is a technique that has been in use since the 1970s. Karen reviewed the complaints against BPTO (too transparent, encourages patterned behavior). She also proposed a hybrid technique that combines aspects of BPTO with discrete choice analysis. Her approach used an interactive computer-administered survey (programmed using Ci3) that starts with all brands at their mid-prices. Prior self-explicated rankings for brands are collected, and that information is used in determining which non-selected brands should receive lower prices in future tasks. She argued that her procedure results in an interview that is not as transparent as the traditional BPTO exercise. Furthermore, the data from the experiment can be used within traditional logit estimation of utilities.

Choice-Adapted Preference Modeling (Roger Brice & Stephen Kay): The high cost of interviewing physicians makes it especially desirable to maximize the amount of information obtained from each interview. Although choice-based conjoint studies have many desirable properties, traditional conjoint procedures involving ranking or rating of concepts can provide more information per unit of interview time. One of the features of choice-based conjoint is the availability of the "None" option. The authors used an extension of traditional conjoint which asked respondents not only to rank concepts, but also to indicate which of them were below the threshold of the respondent's willingness to accept. They reported that this question extension allowed them to extend the None option into the simulation module on an individual-level basis and then to more realistically simulate the ability of new products to capture share from their current competition.

Cutoff-Constrained Discrete Choice Models (Michael Patterson & Curtis Frazier): With traditional discrete choice data, analysis is performed at the aggregate by pooling data. This assumes respondent homogeneity, and ignores differences between individuals or segments. The authors tested two recent techniques for incorporating heterogeneity in discrete choice models: Hierarchical Bayes (HB) and "soft penalty" cutoff models.

The "soft penalty" models involve asking respondents to state what levels of attributes they would never consider purchasing. "Soft" penalties are estimated through logit analysis, by creating a variable for each attribute level that reflects whether (for categorical variables) or by how much (for quantitative variables) a choice alternative violates a respondent's cutoff.

A research study with 450 respondents was used to test the predictive validity of HB estimation versus the "soft penalty" approach. HB performed better, both in terms of hit rates and share predictions for holdout choice tasks. They concluded that penalty models do not always produce better predictive validity, often have odd coefficients, and can make the model explanation more complex rather than with HB estimation.

Calibrating Price in ACA: The ACA Price Effect and How to Manage It (Peter Williams & Denis Kilroy): Even though ACA is one of the most popular conjoint analysis techniques, it has been shown often to understate the importance of price. Peter summarized hypotheses about why the effect happens, and what to do about it. He suggested that the ACA price effect is due to a) inadequate framing during importance ratings, b) lack of attribute independence, c) equal focus on all attributes, and d) restrictions on unacceptable levels.

To overcome the effect, Peter explained, the first step is to quantify it. Other researchers have proposed a "dual conjoint" approach: employing either full-profile traditional conjoint or CBC along with ACA in the same study. Peter proposed a potentially simpler technique, that of using Choice-Based holdout tasks (partial profile). The holdout tasks typically include a stripped down product at a cheap price, a mid-range product at a medium price, and a feature-rich product at a premium price. He suggested counting the number of times respondents choose higher-priced alternatives (this requires multiple choice tasks), and segmenting respondents based on that scored variable. Then, Peter developed a separate weighting factor for the ACA price utilities for each segment. He demonstrated the technique with an ACA study involving 969 respondents. He found that no adjustment was necessary for the group that preferred high-priced, high-quality offers; a scaling factor of 2 for price was required for the mid-price segment; and a scaling factor of slightly more than 4 was required for the price sensitive segment.

Using Evoked Set Conjoint Designs (Sue York): Some conjoint designs require very large numbers of brands or other attribute levels. One way around this problem is to ask each respondent about a customized "evoked set" of brands. Sue tested this approach using a study with slightly more than a dozen brands and a dozen package types, along with price, which needed to be modeled using CBC. Respondents were either shown a traditional CBC interview including all attribute levels, or were given a customized CBC interview (programmed using Ci3), based on an evoked set of brands and package sizes. For the customized design cell, respondents indicated which brands and package types they would be most likely to buy, and which they would not buy under any conditions ("unacceptables.") Holdout tasks were used to test the predictive validity of the two design approaches, and to compare the performance of aggregate logit versus individual-level HB modeling.

HB estimation generally was superior to a main-effects specified logit model. The customized "Evoked Set" design resulted in higher hit rates, but lower share prediction accuracy relative to the standard CBC approach. Sue reported that the "unacceptables" judgements had low reliability. Nine percent of respondents chose a brand in the holdout choice tasks that they previously declared unacceptable, and a full 29% chose a package size previously marked unacceptable. This inconsistency made it difficult to use that information to improve the predictability of the model. She concluded that customized "evoked set" CBC designs can be useful, especially if many levels of an attribute or attributes need to be studied and predicting respondents' first choices is the main goal.

Practical Issues Concerning the NOL Effect (Marco Hoogerbrugge): Marco reviewed the "Number of Levels Effect" (NOL) and examined its magnitude in commercial data sets. The NOL effect is a widely-experienced tendency for attributes to be given more importance when they have larger numbers of intermediate levels, even though the range of values expressed by the extreme levels remains constant. Marco concluded " the NOL effect clearly exists and is large when comparing 2-level attributes with more-level attributes, but the effect is questionable when you only have attributes with at least 3 levels.

An Examination of the Components of the NOL Effect in Full-Profile Conjoint Models (Dick McCullough): There has been much discussion about whether the "Number of Levels Effect" (NOL) is "algorithmic," e.g., due to mathematical or statistical artifacts, or "psychological," e.g., due to respondents' behavioral reactions to perceiving that some attributes have more levels than others. Dick McCullough carried out an ingenious and elaborate experiment designed to measure the extent to which the NOL might be caused by each factor. Data were collected on the Web, and the variable whose number of levels was manipulated was categorical, rather than a continuous one. Preliminary self-explicated questions were used to determine each respondent's best-and-least liked categories, and then two conjoint studies were carried out for each individual, one with just those extreme levels and the other with all levels. There were four cells in the design, in which those two treatments were administered in different orders, and with different occurring activities between them. Dick found some evidence for a positive algorithmic effect and a negative psychological effect, but the results may have been compromised by difficulties with respondent reliability. This is a promising method that we hope will be tried again, manipulating the number of levels of a continuous rather than a categorical variable.

Creating Test Data to Objectively Assess Conjoint and Choice Algorithms (Ray Poynter): Ray characterized the success of conjoint techniques in terms of two factors: the ability of the technique to process responses (efficiency), and the likelihood that the technique elicits valid responses. In his research, he has sometimes observed respondents who don't seem to respond in the expected way in conjoint studies. He suggested that respondents may not be able to understand the questions, visualize the attributes, or the respondent's own preferences may not be stable. He even suggested that respondents may not know the answer, or that asking the questions may change the responses given.

Ray described a procedure for using computer-generated data sets based on the part worths contained in real data sets to test different conjoint designs and different respondent response patterns. He found no evidence of a number-of-levels effect from simulated data. He suggested that analysts can make better decisions regarding the type of conjoint design using simulated data.

A Bayesian Approach to an Old Problem: Rethinking Product Attribute Segmentation (Stuart Drucker): Stuart demonstrated an interesting combination of the method of paired comparisons with hierarchical Bayes analysis. The paired comparison (or "round robin") method has a long history in marketing research, where it has been used to measure differences in preferences and perceptions of product characteristics. However, analysis of paired comparisons data has almost always been done in the aggregate, since reasonable numbers of questions yielded too little information for individual estimation. Stuart analyzed paired comparison data with hierarchical Bayes, which strengthens individual estimation by making use of pooled data from other respondents. He showed that paired comparisons with HB can produce reasonable individual importances, even though each individual has answered relatively few questions.

Modeling Constant Sum Allocation in Conjoint Studies (Jim Gallagher & Douglas Willson): Respondents in conjoint studies are sometimes asked to allocate a fixed number of points across a set of product profiles. Ideally, models for these data should reflect their constant-sum nature, and their property of being positive or zero. Yet, although such answers are easy to understand, commonly available methods for analyzing them present many problems. Converting the data to logits involves arbitrary assumptions about what to do with answers of zero. The authors proposed a Tobit model, drawn from the field of Economics, which explicitly recognizes the constant-sum nature of the allocations while dealing appropriately with answers of zero.

Classifying Elements with All Available Information (Luiz Sa Lucas): Luiz described a neural network procedure that combines different classification solutions obtained using different methods. He showed how results from different approaches such as Discriminant Analysis and Automatic Interaction Detection can be combined to produce final classifications better than those of any single method. This can have practical value classifying new individuals into pre-existing groups, such as segments expected to respond differently to various marketing approaches.

Perceptual Mapping and Politics 2000 (John Fiedler & Robert Maxwell): John reviewed Rich Johnson's 1968 political segmentation paper from JMR, noting that only one politician on that map is still alive (Reagan). He then presented general survey results and a perceptual space for the 2000 presidential campaign in the US. The respondents to the survey were most familiar with Gore, but held Powell and Bush in higher regard. He presented a perceptual map using discriminant analysis, and compared that to a perceptual map using CPM's composite ideal point method. The two maps were quite similar.

John also presented a cluster segmentation (12 segments) for the US public, based on the federally-funded General Social Survey (GSS). He showed how these segments would be positioned on the perceptual map, based on average ideal point location. He finished his presentation with some reasons why either Al Gore or George Bush could win the presidency. "If we had to guess today," John concluded, "we'd have to say that Al Gore would win."

* An Overview and Comparison of Design Strategies for Choice-Based Conjoint Analysis (Keith Chrzan & Bryan Orme): Keith reviewed the basics of designing conjoint experiments, including the use of design catalogues for traditional card-sort designs. He described the additional complexities of designing Choice-Based Conjoint experiments, along with the potential benefits that accompany them (i.e. ability to model cross-effects, alternative-specific effects, inclusion of "none"). Keith compared four methods of generating CBC tasks: catalog-based approaches, recipe-based designs for partial-profile experiments, computer optimized designs (SAS OPTEX) and random designs using Sawtooth Software's CBC program. Different model specifications were explored: main-effects, interaction effects, cross- and alternative-specific effects.

Using computer-generated data sets, Keith compared the relative design efficiency of the different approaches to CBC designs. The main conclusions were as follows: minimal level overlap within choice tasks is optimal for main effects estimation, but some level overlap is desirable for estimating interactions. SAS OPTEX software can create optimal or near-optimal designs for most every design strategy and model specification. Sawtooth Software's CBC system can create optimal or near optimal designs in most every case as well, with the exception of a design in which many more first-order interaction terms are to be estimated than main effects (in that case, its best design was 12% less efficient than SAS OPTEX). Keith concluded that no one design strategy is best for every situation. Keith suggested that analysts create synthetic data sets to test the efficiency of alternative design generation methods for their particular studies.

(* Best Presentation Award, based on attendee ballots.)

Customized Choice Designs: Incorporating Prior Knowledge and Utility Balance in Choice Experiments (Jon Pinnell): Jon investigated whether customizing the degree of utility balance within CBC designs can offer significant improvements in predictive validity. He reviewed past findings that have pointed toward increased design efficiency if the alternatives in each choice set have relatively balanced choice probabilities. He reported results for two experimental and three commercial CBC studies. Jon found that those choice tasks with a higher degree of utility balance resulted in slightly more accurate predictive models than tasks with a lower degree of utility balance. He also observed that HB estimation can significantly improve hit rates at the individual-level relative to the simple aggregate logit model.

Jon reported results from a computer-administered CBC survey that dynamically customized the choice design for each respondent using the criterion of utility balance. Again, he demonstrated small positive gains for utility balance. One interesting finding was that HB estimation applied to a partial-profile CBC design resulted in inferior predictions (hit rates) relative to aggregate logit for one data set. Jon was puzzled by this finding, and used this experience to suggest that we not blindly apply HB without some cross-checks. Overall, he concluded that the gains from HB generally outweighed the gains from utility balance for the data sets he analyzed.

Understanding HB: An Intuitive Approach (Rich Johnson): Many speakers at this Sawtooth Software conference spoke about Hierarchical Bayes estimation or presented HB results. HB can be difficult to understand, and Rich's presentation helped unravel the mystery using an intuitive example. In Rich's example, an airplane pilot named Jones often had rough landings. With classical statistics, Rich stated, we generally think of quantifying the probability of Jones having a rough landing. With Bayesian reasoning, we turn the thinking around: given that the landing was rough, we estimate the probability that the pilot was Jones. Rich presented Bayes' rule within that context, and introduced the concepts of Priors, Likelihoods and Posterior Probabilities.

Rich outlined some of the benefits of HB analysis. He argued that it can produce better estimates from shorter conjoint questionnaires. Analysts can get individual estimates from CBC, customer satisfaction models or scanner data instead of just aggregate results. Rich showed results from real data sets where HB improved the performance of CBC, ACA and traditional ratings-based conjoint data. He also provided some run time estimates ranging between 1 and 14 hours for 300 respondents and models featuring between 10 to 80 parameters. He stated the opinion that analysts should use HB whenever the project scheduling permits it.

* HB Plugging and Chugging: How Much Is Enough? (Keith Sentis & Lihua Li): Hierarchical Bayes (HB) analysis is gaining momentum in the marketing research industry. It can estimate individual-level models for problems too difficult for previous algorithms. But, the added power and accuracy of the models come with the cost of run times often measured in multiple hours or even days. HB is an iterative process, and it has been argued that tens of thousands of iterations may be necessary before the algorithm converges on relatively stable estimates.

Keith and Lihua investigated whether HB run times could be shortened without sacrificing predictive accuracy (as measured by hit rates for hold out tasks). Based on 22 real choice-based conjoint data sets, thousands of separate HB runs and tens of thousands of computing hours, Keith and Lihua concluded that 1,000 initial iterations and 1,000 saved draws per respondent are enough to maximize predictive accuracy of hit rates. The authors presented evidence that their findings with respect to choice-based conjoint also generalize to regression-based HB and HB for Adaptive Conjoint Analysis.

(* Honorable Mention, based on attendee ballots.)

Predictive Validation of Conjoint Analysis (Dick Wittink): Dick enumerated the conditions under which conjoint analysis predictions can be accurate reflections of actual market choices: respondents are a probability sample of the target market; respondents are decision makers; respondents are motivated to process conjoint tasks as they would in the marketplace; all relevant attributes are included; respondents understand the product category and attributes; respondents can be weighted by purchase volume; individual-level brand awareness and availability data are available.

Dick stressed that if the conditions are right, conjoint analysis can provide accurate forecasts of market behavior. However, most conjoint analysis projects do not have the benefit of actual purchase data, collected some time after the conjoint study. Rather, holdout tasks are added to the study. Dick argued that holdout tasks are easy to collect, but they are in many respects artificial and reflective of the same thought processes used in the conjoint judgements. Holdout task predictability, he maintained, is therefore biased upward relative to how well the conjoint model should predict actual behavior. Nonetheless, holdout tasks can be useful for comparing the suitability of different methods, modes and models. Dick advised that holdout tasks should be carefully designed to resemble marketplace alternatives, and so that no one option dominates.

Dick also shared some guiding principles gleaned from years of experience: if using aggregate measures of performance, such as the error in predicting shares, complex models (e.g. flexible functional forms, interaction effects between attributes) generally outperform simple models; on individual measures of performance, such as the percent of holdout choices predicted, simple model specifications often outperform complex models; constraints on part worths, as currently practiced, generally improve individual-level hit rates but usually damage aggregate share prediction accuracy; motivating respondents improves forecast accuracy.

Projecting Market Behavior for Complex Choice Decisions (Joel Huber): Joel presented the results of what he characterized as a very difficult CBC study: 21 attributes with 76 total levels, administered on paper to 1400 respondents. To cope with the difficulty of the design, Joel used partial-profile CBC tasks, in which only a subset of the 21 attributes were shown in each choice task. The subsets of attributes were not varied randomly, but were grouped into logical "clusters," with each group of attributes pertaining to similar aspects of the product. A fact sheet followed by a ranking task introduced each cluster. The "attribute clusters" were held constant across four choice tasks before shifting the focus to a new cluster of attributes. Joel designed the choice tasks to have approximate utility balance (using aggregate utilities from a pre-test). He used HB to estimate individual-level utilities.

Joel found that educating the respondents about each attribute cluster and holding those clusters constant for four tasks may have helped respondents become more familiar and comfortable with the tasks, but probably also created some biases. To avoid these problems, Joel suggests if researchers plan to group attributes in clusters in partial-profile choice that they also include some choice sets containing attributes that cut across the attribute clusters. Joel concluded that HB can work well even for very large CBC problems. The results can be good when considering aggregate share predictions, even though the data may contain a significant amount of noise at the individual level.

Estimating the Part-Worths of Individual Customers: A Flexible New Approach (Kenneth Train): Kenneth described a method (MLCOIT , "Maximum Likelihood Conditioning On Individual Tastes") of estimating mixed logit models, which shares many characteristics and capabilities with the HB methods becoming familiar to marketing researchers. His approach uses maximum likelihood (rather than HB) to estimate the means and covariances of the population, and then samples from that distribution to estimate individual values. Given similar assumptions about the population distribution, MLCOIT produces results equivalent to HB. However, MLCOIT is very flexible and can deal easily with distributions other than the normal, such as triangular and uniform distributions.

Comparing Hierarchical Bayes Draws and Randomized First Choice for Conjoint Simulations (Bryan Orme & Gary Baker): A number of recent advances can improve part worth estimation and the accuracy of market simulators. The authors stated their opinion that HB estimation is the most valuable recent addition to the conjoint analyst's toolbox. Another advancement has been the introduction of Randomized First Choice (RFC) as a market simulation method.

Conjoint simulators have traditionally used part worths as point estimates of preference. HB results in multiple estimates of each respondent's preferences called "draws" which might also be used in simulations. Both HB and Randomized First Choice reflect uncertainty (error distributions) about the part worths. RFC makes simplifying assumptions. The authors showed that HB draws, though theoretically more complete, have some unexpected properties, and the data files can become enormous (on the order of 100 Meg or easily more).

Bryan and Gary provided theoretical and practical justifications for RFC. They demonstrated that RFC using point estimates of preference performed slightly better than first choice or share of preference (logit) simulations on either point estimates or the draws files for one particular data set. Their findings suggest that draws files are not needed for market simulations, which is good news for in-the-trenches researchers. Using RFC with the much smaller and more manageable file of point estimates seems to work just as well or even better.