Sawtooth Software: The Survey Software of Choice

Conference 1999: Summary of Findings

Nearly two dozen presentations were given at our most recent Sawtooth Software conference in San Diego. We've summarized some of the high points below. Since we cannot possibly convey the full worth of the papers in a few paragraphs, we are making the complete written papers (not the presentation slides) available in the 1999 Sawtooth Software Conference Proceedings. If you haven't yet ordered your copy, please consider adding this valuable reference to your shelf.

Evaluating the Representativeness of Online Convenience Samples (Karlan Witt): "The widespread adoption of the internet has provided a low-cost, rapid-turnaround alternative for conducting market research," Karlan explained. She warned against the notion that large sample sizes (in the thousands and tens of thousands) can alleviate problems of representativeness, citing the well-known erroneous prediction of 1936 presidential election results by Literary Digest due to a biased convenience sample.

Karlan compared the results of online convenience samples to those recruited through RDD (Random Digit Dialing). She found that the convenience sample differed significantly from RDD in terms of respondent demographics, online usage behavior, and brand share. She stressed that weighting the convenience sample to match known targets was not the solution. "In many cases it did make the data look more like the RDD data, but did not fully address the inherent non-response bias present in the online sample."

Conjoint on the Web Lessons Learned (Michael Foytik): Mike shared insights based on experience fielding both traditional full-profile (pairwise comparisons) and choice-based conjoint surveys over the Internet. Response rates to Internet surveys have exceeded response rates for mail surveys by 25% to 50%, though the novelty of taking web surveys will certainly wear off in the future and response rates will decline. Mike reported that the internal consistency and predictability of holdouts for Internet conjoint data has exceeded that of mail in split-sample tests.

For successful Internet conjoint surveys, Mike suggested making the survey adaptive in tailoring the consideration set or further customizing the conjoint tasks, assigning unique passwords, using clickable links to the survey in E-mail messages, providing a contact name and number on each Internet page, permitting restarts, and programming to the least common browser denominator.

But Why? Putting the Understanding into Conjoint (Ray Poynter): Ray pointed out that conjoint analysis has been a very useful technique for quantifying preferences for product features. Despite its popularity, Ray argued that this is often not enough.

Ray described a simple BASIC program his firm developed for "replaying" a completed ACA interview. The computer displays the tradeoff questions again on the screen together with the respondent's answer. A qualitative assessment is made together with the respondent regarding why he/she responded in a particular way. Ray explained how the replay feature can be used in designing the questionnaire to ensure that respondents understand the instrument and that the information gained will benefit the client.

Convention Interviewing Convenience or Reality? (Don Marshall): Don reviewed his experiences with conducting pharmaceutical research using conjoint research. Doctors, he claimed, are usually good conjoint subjects due to their scientific background and familiarity with making tradeoffs between such issues as safety and efficacy. However, they are difficult to recruit and expensive to interview. "These medical conventions clearly present a wonderful opportunity to complete a large number of physician interviews in a relatively short time frame," Don explained, but ". . . they also have been criticized as providing a non-representative sample." Central facility interviews have been proposed as a more representative technique.

Using a split-sample study, Don found no significant difference between the part worths resulting from convention versus central site recruitment. "Overall, these results indicate that convention interviewing in an efficient and economic alternative to central facility interviewing when interviewing doctors. . ." he determined.

Disk-Based Mail Surveys: A Longitudinal Study of Practices and Results (Arthur Saltzman and William H. MacElroy): The authors conveyed results from two surveys (1994, 1998) among market research professionals. The move toward "personalization"of DBM surveys was one of the most noticeable trends, they noted. More researchers are pre-notifying and qualifying respondents and sending personal outbound envelopes and cover letters. In spite of the increase in personalization and targeting, response rates to DBM surveys are falling. The reported average response rate to DBM surveys was 52% in 1994 and 33% in 1998.

Researchers participating in the 1998 research said that they are becoming more interested in on-line (Internet) interviewing rather than DBM, citing "ease of implementation, quicker response time, lower cost." Other researchers noted the limitations and "newness" of the Web as obstacles and plan to continue using DBM in the future.

What Will Work Over There? Computer-Based Questionnaires in Foreign Languages (Brent Soo Hoo and Lori Heckmann): The authors provided an overview of the different character representation and computing protocols in different countries. Most Anglo/European based alphabets are based on the ASCII set of codes. Standard English-based software systems (including systems from Sawtooth Software) can manage these languages. However, many countries use "multi-stroke" double-byte characters, such as in Japan. Custom programming is required when fielding studies with double-byte characters. Foreign language projects also involve a host of other complicating factors. The authors provided an extensive list of supporting articles, Website references, and books.

Two Ordinal Regression Models (Tony Babinec): Tony explained that ordinal variables abound in marketing research. Purchase likelihood, satisfaction scores, and income groups are common examples. "It is probably a safe guess," noted Tony, "that many if not most researchers scale these variables using sequential integer scores . . . and then analyze them using conventional linear regression. Doing so involves the implicit assumption that the intervals between adjacent categories are equal."

Tony argued that there can be "serious statistical problems" if one uses OLS regression with ordinal data. Two methods proposed to deal with ordinal responses are the "cumulative logit model" and the "adjacent category logit model." Using a sample data set, Tony showed that "the adjacent-categories logit model can work well in situations where the cumulative logit model does not fit the data." "It is hoped," Tony concluded, "that this paper will help spur the wider use of this new modeling approach."

The Tandom Approach to K-Means Cluster Analysis and Some Alternatives: A Simulation Study (Andrew Elder): Deciding how to pre-treat the input variables (standardization, centering, weighting or factor analyzed) is a critical first step to conducting any cluster analysis procedure. Andy reviewed some of the arguments made over the last few decades regarding whether to pre-process input variables through Principal Components analysis (the Tandem approach), or to use the variables in their raw form.

Andy used a synthetic data set to test alternative pre-processing approaches. He found that when there was a great degree of imbalance in the loading of input variables on independent factors, the tandem approach excels. Otherwise, raw scores performed better when the variables provide relatively balanced representation across factors. Andy concluded that the decision to use the Tandem approach versus raw variables should be data driven and admonished researchers to investigate the structure and independence of the input variables through factor analysis before making that decision for a particular data set.

The Number of Levels Effect: A Proposed Solution (Dick McCullough): Dick reviewed past evidence on the Number of Levels (NOL) Effect: "The effect occurs when one attribute has more or fewer levels than other attributes. For example, if price were included in a study and defined to have five levels, price would appear more important than if price were defined to have two levels (holding the range of variation constant)."

Dick reported results from two studies. The first featured rank-order data for traditional conjoint, and confirmed the existence of the NOL effect. In the second study, Dick introduced a clever solution to the NOL problem involving a two-stage design. In the first half of the design, only the most and least desirable levels were included (customized per respondent, based on stated preferences). The second half used all levels in the full study design. Respondents first completed the equal-levels exercise, followed by the complete design. As Dick explained: "The full-level utility estimates can then be linearly scaled into the two-level estimates. The resulting utilities will exhibit the correct attribute relative importance and also maintain the relative positions of levels within each attribute."

Predicting Product Registration Card Response Rates with Conjoint Analysis (Paul Wollerman): Paul used conjoint analysis to predict response rates to the "warranty cards" that manufacturers place in their products. In the past, "live tests" had been conducted to measure the actual return rate for alternative versions of warranty cards. "Implementing these tests," Paul explained, "has always proved to be logistically difficult; plus, it takes a long time to obtain usable results."

A conjoint survey that included "mocked-up" product registration cards and pictures of the products in which those cards were supposedly packed were mailed to respondents. Respondents indicated how likely they were to return each card, had they purchased the displayed appliance or product.

The conjoint predictions provided at face-value a reasonable relative fit with actual measured response rates for the same profiles from live tests. More importantly, Paul suggested that "the rank order and magnitude of the attributes' importance and the utility scores of the individual attributes are a good fit with our experience." Paul also noted that respondents grossly over-estimated their likelihood of returning cards relative to actual return rates. He concluded that "conjoint analysis can be used to measure consumers' preferences for product registration card design features, and predict how a given card will perform relative to another."

Conjoint Analysis on the Internet (Jill Johnson and John Fiedler): The authors provided a case study involving conjoint analysis over the Internet. The client was a major telecommunications company that wanted to measure preferences for a high-speed Internet access product. Johnson and Fiedler used Sawtooth Software's CVA Internet Module.

Respondents were recruited by telephone, and provided email addresses during the phone interview. The authors stressed how important it was to train telephone interviewers to exactly record the email addresses. Passwords were assigned for each respondent, and an email package called MailKing was used to send personalized email messages to each respondent.

The phone-recruit phase of the internet study was a key to the success of the project. According to the authors: "some at the company were skeptical of online methodology (due to concerns about representativeness). The initial telephone recruit combined with 48% response rate served to allay any concerns."

Matching Candidates with Job Openings (Man Jit Singh and Sam Kingsley): The authors demonstrated how ACA is being used on the Web in matching potential job candidates with job openings. Sing and Kingsley are the first to develop a Web-enabled version of ACA based on Sawtooth Software's underlying code. To date, hundreds of thousands of applicants have completed their ACA survey on job preferences on the Web.

According to the authors, "the job search and recruiting business has traditionally relied on the telephone tree'", followed by a detailed review by senior associates, follow-up interviews, offers and counter-offers. By collecting much information up-front in a standard survey, followed by a more detailed assessment of the tradeoffs (ACA) each applicant makes regarding compensation and other job-related elements, a large number of applicants can be sorted through efficiently.

Bargain Hunting or Star Gazing? How Consumers Choose Mutual Funds (Ronald T. Wilcox): Ron conveyed how choice-based conjoint analysis (and individual-level analysis under ICE) can be used to learn how consumers evaluate key attributes of a mutual fund. The weight consumers give to fees a fund charges can be used by fund managers to design the fee structures that will maximize utility for both the consumer and the fund manager.

Ron's research suggests new arenas in which conjoint analysis can be successfully employed. The findings can be useful for both government regulators of securities and mutual fund managers. Ron concluded, "We know so little, yet there could be truly substantial business practice and public policy implications resulting from an increased knowledge of consumers' decision processes in this marketplace."

Forecasting Scanner Data by Choice-Based Conjoint Models (Markus Feurstein and Martin Natter): The authors present solid evidence that correctly tuned CBC models can accurately predict actual sales (as reported by scanner data). They also found that ignoring the existence of heterogeneity in choice-based conjoint modeling "leads to biased forecasts."

Alternative methods for capturing heterogeneity are tested: a priori segments: regional segmentation and usage frequency; K-means segments based on demographics; latent class; and individual-level estimation. The Latent Class model performed the best in predicting actual sales, followed by individual estimation. The other methods of recognizing heterogeneity only marginally improved predictions relative to the aggregate model, which performed the worst.

Predicting Actual Sales with CBC: How Capturing Heterogeneity Improves Results (Bryan Orme and Michael Heft): The authors presented evidence that, under proper conditions, conjoint analysis can accurately predict what buyers do in the real world. Their results were based on CBC interviews conducted in grocery stores, where the CBC results were used to predict actual sales for three product categories of packaged goods from those same stores with good success.

Additionally, Orme and Heft showed that capturing heterogeneity (reflecting differences in preference between groups or individuals) with Latent Class or ICE can improve predictions. Many complex effects (substitution, cross-effects and interactions) can be accounted for nearly "automatically" with disaggregate Main Effect models. They noted that complex terms can be built into large aggregate logit models, but that such models risk overfitting. Moreover, that approach places a great deal of responsibility on the analyst to choose the right combination of terms.

Using Scanner Panel Data to Validate Choice Model Estimates (Jay L. Weiner): The main purpose of Jay's paper was to "compare three forms of multinomial logit model estimation . . . main-effects only, main-effects with interactions and brand-specific parameter estimation." The key advantage to the latter approach would be to "allow the model to deal with the Independence of Irrelevant Alternatives (IIA) problem." Additionally Jay addressed whether using a constant sum allocation (next 10 purchases) versus a First Choice approach significantly affected results in his discrete choice study.

Having IRI scanner data as a criterion for predictive validity was a strong component of Jay's paper. Jay found that "The main-effects only model seems to be quite robust and efficient for fast-moving consumer package goods." The choice results did a good job in predicting actual market shares, with a Mean Absolute Error of between 2 to 4%. "It is clear, however, that there are different price curves for each brand," he cautioned. "To gain key insight into these effects, estimating interactions works well." Jay also reported success with the constant-sum allocation model, noting that "the additional degrees of freedom gained from using next 10' purchases enhance the predictability of the model." Modeling brand-specific effects also slightly improved the predictive validity of the models, relative to main-effects and main-effects plus interactions models.

Should Choice Researchers Always Use "Pick One" Respondent Tasks? (Jon Pinnell): Jon presented results from a commercial study in which different respondents received four alternative types of choice tasks. He tested two non-metric approaches: First Choice and Full Rank Order; and two metric approaches: Constant Sum Allocation and "Scaling".

Jon reported that the First Choice task took the least time for respondents to complete, followed by Rank Order, Allocation and Scaling. Respondents could answer ten first choice tasks in the time it took to complete just short of six Allocation exercises. All methods resulted in similar utilities and inferred importances, but Jon found that the non-metric methods were more reliable than the metric approaches. He concluded: "The metric methods (especially Scaling) appear very expensive relative to first choice. For the time required, they offer apparently little information above the first choice utilities."

Assessing the Relative Efficiency of Fixed and Randomized Experimental Designs (Michael G. Mulhern): Much research has been presented regarding design efficiency in choice analysis, particularly for fixed designs. Mike presented results comparing fixed and randomized designs. The main question he focused on was the relative gains in efficiency as more versions of fixed designs were added to the pool. Mike commented that "There is some evidence that the total number of choice sets required may differ for symmetric and asymmetric designs. A symmetric design is an experimental design where each attribute contains the same number of levels. An asymmetric design contains attributes with varying numbers of levels."

Using computer-generated data, Mike assessed the relative efficiency of different designs. He found, "For the large symmetrical choice experiment, the randomized design is 95% as efficient as the optimal fixed design with approximately 190 choice sets. For the large asymmetrical choice experiment investigated in this study, it was found that the randomized design is approximately 14% more efficient with 980 choice sets than the fixed asymmetric design with 49 choice tasks." The conclusion that randomized designs can actually be more efficient than fixed orthogonal designs for asymmetric designs is an important finding that lends even greater credibility to the use of the easy-to-implement randomized designs in CBC.

Full Versus Partial Profile Choice Experiments: Aggregate and Disaggregate Comparisons (Keith Chrzan): In contrast to full-profile (FP) experiments in which concepts contain every attribute, partial-profile experiments (PP) show only a subset of the attributes (typically five) at a time. A key benefit, Keith argued, of PP choice designs is the ability to measure many more attributes than is traditionally thought prudent with FP choice designs.

In contrast to previous studies which have reported very little difference in the logit part-worths, he found some statistically significant differences between full- and partial-profile choice effects. Despite the differences, both full- and partial-profile simulations provided roughly equally accurate predictions of holdout choice shares, no matter the method of analysis.

Keith analyzed his data using three alternative methods: aggregate logit, ICE and Hierarchical Bayes. He found that utility estimates from aggregate logit and individual-level HB predicted aggregate holdout choice shares with reasonable accuracy, though neither method prevailed. ICE did not perform as well, leading Keith to speculate that "the relative sparseness of the partial-profile choice experiment data set could have been expected to hamper the operation of ICE." Keith reported that ICE predictions for the full-profile cell also were poor. ICE's poor showing, he explains " . . . probably reflects ICE's instability when too few observations are collected from each individual respondent." An exciting implication of his study was that useful individual-level utilities may be possible from partial profile choice experiments.

Dealing with Product Similarity in Conjoint Simulations (Joel Huber, Bryan Orme and Richard Miller): The authors tackled a long-standing problem by offering a new simulation approach for dealing with product similarity. They explained that traditional conjoint simulators based on the BTL or logit model have suffered from IIA problems. Similar or identical products placed in IIA simulators tend to result in "share inflation." The first choice model, while not susceptible to the IIA difficulty of unrealistic share inflation for similar offerings, typically has produced shares of preference that are too extreme relative to real world behavior. Also, first choice models are inappropriate for use with logit or latent class models. In the family of Sawtooth Software products, a Model 3 "Correction for Product Similarity" has been offered to deal with problems stemming from product similarity. However, this model is often too simplistic to accurately reflect real world behavior.

The authors proposed a new method called "Randomized First Choice (RFC)" for tuning market simulators to real world behavior. RFC adds random variation to both attribute part-worths and to the product utility, and simulates respondent choices under the first choice rule. They demonstrated how RFC can be tuned to reflect any similar product substitution behavior between the extreme first choice rule and the IIA-grounded logit rule. They also showed that RFC improved predictions of holdout choice tasks (reflecting severe differences in product similarity) for logit, latent class, ICE and hierarchical Bayes. The greatest gains were for the aggregate methods. The disaggregate methods, while less in need of corrections for product similarity, still benefitted from RFC. This paper was voted "Most Valuable Presentation" by attendees at the conference.

A Comparison of Alternative Solutions to the Number-of-Levels Effect (Dick R. Wittink and P.B. Seetharaman): The authors restated the danger of the Number of Levels (NOL) effect. They reported on a ratings-based full profile study which showed that the insertion of two interior levels (holding the range constant) produced a greater effect on derived attribute importances than an expansion of the range to three times the original range (holding the number of levels constant). This demonstrated that the NOL effect is potentially huge, especially in full-profile data (whereas in ACA 3.0 it seems to be only half the magnitude).

They also reported on a methodological study involving both ACA (version 3) and CBC. The NOL effect was found to be more prevalent in CBC. A customized ACA approach was proposed in which the number of levels describing an attribute was related to a respondent's self-explicated importance score. They demonstrate that the customized version reduced the NOL effect and improved ACA version 3 results, though the authors noted that the NOL effect "in ACA 4.0 is much smaller than it is in ACA 3.0 which we used here."

Using LISREL and PLS to Measure Customer Satisfaction (Lynd D. Bacon): CSM (covariance structure models; most commonly modeled with software called LISREL) and PLS (Partial Least Squares) are two SEM methods that can be used to understand the drivers of customer satisfaction and to quantify their importance.

Lynd articulated the pros and cons of each technique, and concluded that CSM and PLS can provide distinct advantages as methods for analyzing satisfaction data. They both provide a means of estimating measurement error and reducing its biasing effects on the estimation of other quantities. Each method provides a means of explicitly modeling multicollinearity so that its deleterious effects on estimation can be reduced. They also differ from each other in important ways. The CSM approach typically requires continuous data and assumptions about their distribution. When uncertain about which method should be used, Lynd suggested applying both procedures to study their discrepancies, and then deciding which is more useful for your application.

Product Mapping with Perceptions and Preferences (Rich Johnson): Rich reviewed the history of perceptual mapping as it relates to marketing research. He noted that approaches have been developed to map products or objects based on preferences and on perceptions, but seldom have both elements been combined in product mapping. "Maps based on perceptions are easy to interpret and good at conveying insights, but they are often less good at predicting individual preferences," Rich explained. "Maps based on preferences are better at accounting for preferences, but their dimensions are sometimes hard to interpret."

Rich presented a new method that he termed "Composite Product Mapping" that combines both perceptions of brands on attributes and preferences among brands. The perceptual information results from attribute ratings for brands, and the preference information can come from a variety of sources, including pairwise judgments or conjoint part worths.

He demonstrated that the composite methods often result in maps that closely resemble discriminant-based perceptual maps, but that the attribute vectors and product positions are better linked to preferences. Rich concluded, "All in all, there seems to be no downside to using composite mapping methods, and the benefit of possibly improved interpretation and prediction can be great."