Sawtooth Software: The Survey Software of Choice

We are pleased to announce the release of the 2015 Sawtooth Software Conference Proceedings. Our eighteenth conference was held in Orlando, Florida, March 25–27, 2015. Click here to download the full conference proceedings.

The summaries below capture some of the main points of the presentations and provide a quick overview of the articles available within the proceedings.

Mobile Choice Modeling: A Paradigm Switch

(Dirk Huisman and Jeroen Hardon, SKIM Group): Jeroen and Dirk reviewed the history of market research data collection practice: evolving from paper to CATI, then eventually to web-based and lastly mobile interviewing. Each shift in data collection methodology was met by resistance but each change largely has been validated. The most recent challenge is that the change to mobile interviewing involves dealing with smaller screen sizes and lower attention spans for respondents. Yet, a great benefit of device-based interviewing nowadays is that it allows researchers to test many elements of the modern web-based marketplace. Website commerce can be near-perfectly imitated within market research surveys, allowing researchers to test modifications to websites that can have immediate positive impact on sales. The modifications can be more than just A vs. B, using conjoint experimental designs to simultaneously test a variety of aspects of an offering to determine the best combinations of changes that can improve conversion rates.

MaxDiff on Mobile

(Jing Yeh and Louise Hanlon, Millward Brown): At the same time that the use of MaxDiff is increasing as a technique for measuring the importance or preference of items, the prevalence of respondents taking surveys on mobile is also increasing. Jing and Louise studied the impact of asking MaxDiff questions on mobile devices and examined ways to make MaxDiff surveys work well irrespective of the device used to display the survey. The authors compared MaxDiff scores between interviews taken on the PC, tablet, and smartphones. After pulling demographically matching samples, the MaxDiff scores were nearly identical irrespective of which device the survey was completed on. They found that different interviewing devices tended to be used by different types of people. In addition, MaxDiff surveys on smartphone took longer to complete than on PCs and had higher dropout rates. They concluded that devices themselves did not impact substantive results, but the nuances of the demographic groups represented via the devices should not be overlooked.

A Forecaster's Guide to the Future: How to Make Better Predictions

(David Bakken, Foreseeable Futures Group): In his presentation, David described why he felt predictions and forecasts often fail. Some of the reasons, he argued, were due to methods that rely entirely on historical associations, due to too many assumptions, lack of a causal model to explain how the particular future comes about, and models that are either too simple or too complex. David described how agent-based models may be used to predict emergent behavior that depends on the interactions between agents (such as consumers and sellers) as well as between agents and their environment. For example, the Bass diffusion model can be realized as an agent-based simulation that explicitly models word-of-mouth networks. The best models, said Bakken, involve disaggregate (typically individual-level) approaches that use the simplest model to capture the important behavior of the system of interest.

Wallet Economics?: Credit Card Choice Based Conjoint—Beyond Preference and Application

(Dimitry Estrin, Michelle Walkey, Vision Critical; Vidya Subramani, Client Bank; Carla Wilson, VISA; Jang Tang and Rosanna Mau, Vision Critical): The authors described a conjoint analysis approach to create portfolios of credit card offerings that not only appeal to customers but are profitable to the firm. Credit cards make money mostly via transaction fees, annual fees, and interest charges. The costs to the firm include costs for acquiring customers, the rewards paid out, and redemption costs. Although traditional CBC research can identify the proportion of respondents likely to adopt a credit card, CBC alone does not indicate how customers will use the cards, which directly impacts profitability. So, the authors modified CBC surveys to include additional questions. Respondents were asked how much they would spend per month on the credit card shown on the screen. This information was bridged with information respondents provided regarding what types of expenditures they would make on a new credit card depending on the reward level for different merchant categories. The authors built an integrated simulation that predicted the likelihood of adopting the cards, expenditures, rewards that each respondent would receive, probable attrition, and also rewards not redeemed. This model validated well against actual known figures for monthly profit per card as well as other metrics. In sum, they were pleased that they had built a model that balanced the often conflicting needs of appealing to consumers while maintaining profit margins.

Conjoint for Financial Products: The Example of Annuities

(Suzanne B. Shu, Robert Zeithammer, UCLA and John Payne, Duke University): Annuities seem to offer for many people in the US an opportunity to reduce their risk and insure themselves against outliving their savings. However, few people actually purchase annuities and when they do they often make poor decisions regarding different annuity offerings. The authors employed conjoint analysis to study and make recommendations regarding how insurance companies should market annuities to consumers and how regulators can help consumers make better decisions. They found that if insurance companies were required to give concrete information ("do the math") regarding the expected payoff value of different annuity packages, consumers would be able to make better decisions that benefited them. At the same time, if insurance companies would "do the math" for consumers and show the payout rates for different annuities, they could gain an edge in terms of increasing the likelihood that consumers would purchase them.

Comparing Message Bundle Optimization Methods: Should Interactions Be Addressed Directly?

(Dimitri Liakhovitski, GfK; Faina Shmulyian, MetrixLab and Tatiana Koudinova, GfK): Dimitri and his co-authors examined multiple methods for finding near-optimal bundles of messages for promoting a product or service. The approaches involved MaxDiff, traditional ratings scales, and two variations of choice-based conjoint (CBC). They analyzed the MaxDiff results two ways—by simply summing the MaxDiff preference scores and by defining reach and applying TURF. They analyzed rating scale results using TURF. They analyzed CBCs with and without interaction terms. They found that the TURF-based procedures performed least well of the approaches investigated, most likely because TURF does not directly address semantic synergies among messages. Simply summing the MaxDiff preference scores worked better than the TURF-based approaches. The best method among those they tested was CBC with interaction terms. This does not mean that TURF is not an appropriate method for other optimization problems (e.g., line optimization), the authors concluded. It is just that TURF is not best suited for message bundling applications.

Using TURF Analysis to Optimize Reward Portfolios

(Paul Johnson and Kyle Griffin, Survey Sampling International): Paul and Kyle described how Survey Sampling International (SSI), like other panel providers, faces the challenge of keeping their panelists happy and involved in the panel. The rewards SSI offers its panelists is the key way to improve retention and activity rates. Rewards are most likely given in the form of gift certificates that may be redeemed at a variety of retail partners. However, it costs SSI to manage such a program. Those costs include attracting new panelists, the price of buying gift certificates, the costs of managing inventory of gift certificates (some may expire if not used in time), and any volume discounts some retailers may provide to SSI. The authors used questionnaires (employing both MaxDiff+TURF and a self-explicated TURF approach) to ask respondents their preferences for gift certificates from different retailers. Their analysis pointed to streamlined portfolios of retailers that could satisfy panelists as well as reduce the cost of managing the rewards program to SSI. They were able to compare the survey results (stated preference for retailers) actual choice behavior (picking gift cards) for these same respondents and found excellent validation for the survey results. The findings will allow SSI to provide better rewards for their panelists while potentially lowering the costs for managing the panel.

Bandit Adaptive MaxDiff Designs for Huge Number of Items

(Kenneth Fairchild, Bryan Orme, Sawtooth Software, Inc. and Eric Schwartz, University of Michigan): Sometimes researchers use MaxDiff to find the best few items among large lists of items. In such situations, traditional level-balanced MaxDiff designs are inefficient, spending a lot of respondent effort evaluating least desirable items. A statistical approach called Thompson Sampling has been used to solve "bandit" problems (so called because of the academic example involving maximizing the payout when playing multiple slot gambling machines, also known as "one-armed bandits"). It turns out that the same theory may be applied to MaxDiff problems involving huge numbers of items. After a few respondents have been interviewed using MaxDiff, aggregate logit may be used to estimate the means and standard errors for the items in the list. Applying Thompson Sampling based on the aggregate logit parameters, most-preferred items are then oversampled for subsequent respondents. The logit results are updated in a continuous, real time way, after new respondents have been interviewed. Using robotic respondents answering with realistic preference functions and error, the authors demonstrated that the bandit MaxDiff approach can be as much as 4x more efficient at identifying the top few items than the traditional level-balanced approach.

What Is the Right Size for My MaxDiff Study?

(Stan Lipovetsky, Dimitri Liakhovitski and Mike Conklin, GfK North America): MaxDiff studies are commonly employed nowadays and the authors develop a theory for sample size planning. They conducted various simulations to validate their proposed formula for estimating needed sample size. The results give a tool for practitioners to use when planning sample sizes for MaxDiff studies.

"Performance, Motivation and Ability"—Testing a Pay-for-Performance Incentive Mechanism for Conjoint Analysis

(Philip Sipos and Markus Voeth, University of Hohenheim): For a number of years now, researchers have proposed ways to try to motivate respondents to provide more truthful and higher-quality conjoint analysis data. These efforts are grouped under the term incentive alignment. The central idea is to give respondents rewards or other motivations so that they realize there is a consequence for their choices in a conjoint questionnaire and act in their self interest, which in turn provides better data to the researcher. Previous researchers have rewarded respondents with products that either exactly matched profiles they picked in conjoint questionnaires or were near fits to choices they made. But, Philip and Markus pointed out that such rewards are not always feasible in market research, particularly when the cost of the product or service involved is prohibitive. They propose giving respondents a higher payout (incentive) based on their performance in the conjoint interview. Performance can be measured in terms of internal consistency or hit rate. College students served as respondents to a conjoint analysis survey, where some received additional payment based on performance. The authors found a statistically significant improvement in holdout predictive validity among respondents who were given additional incentive based on performance. Moreover, Philip and Markus demonstrated that—though motivation constitutes an important factor to enhance performance—high performance is also about respondents' ability to make the cognitive effort required during a conjoint task.

Perceptual Choice Experiments: Enhancing CBC to Get from Which to Why

(Bryan Orme, Sawtooth Software, Inc.): Traditional CBC simulators tell us which products are preferred, but provide no insights into why they are preferred. Bryan introduced perceptual choice experiments as a way to enhance traditional CBC simulators to give greater insights into the perceptions, motivations, and attitudes of respondents toward the product concepts defined in the market simulation scenarios. The approach involved adding perceptual pick-any agreement questions beneath the standard CBC questions. For each product concept, respondents click whether they agree that it is associated with given perceptual items. Bryan used aggregate logit to build models that predict the likelihood that respondents would agree that any product concept (defined using the attributes and levels of the CBC experiment) would be associated with each of many perceptual items such as fun, creates memories, educates, etc. The agreement scores may be shown as interactive heat-maps within Excel-based market simulators. The main drawbacks are that it takes about double the respondent effort to complete CBC surveys that have the included perceptual agreement questions and the sample sizes needed to stabilize the perceptual models can add to data collection costs. But, some good news may counteract the bad news: Bryan's empirical test suggested that respondents may give better CBC data when they additionally are asked the perceptual choice agreement questions.

Profile CBC: Using Conjoint Analysis for Consumer Profiles

(Chris Chapman, Kate Krontiris and John S. Webb, Google): Product design teams in technology often use qualitative research to develop consumer descriptions (often known as "personas") for design inspiration and targeting. For example, a persona may read as, "Kathleen is 33 years old and is a stay-at-home mom with two children . . ." While a persona may be a good way for managers to attach a memorable description to a market segment made up of many people, few consumers will fit the exact description for every attribute. This makes it difficult to size the market for a target persona. Google Social Impact was interested in quantifying what weighted percent of respondents at least approximately fit into different personas they had already developed in qualitative field research regarding engagement with elections and civic life. The authors developed a conjoint analysis study with attributes derived from the qualitative personas, such as: I'm not working or in school right now; I spend as much time with my family as I can; I try to do as much civic engagement as I can, etc. Respondents saw partial-profile CBC tasks and picked within each task the concept that best represented them. The authors used latent class analysis to identify six key civic profiles, which gave composite class descriptions and market sizing. They concluded by discussing CBC design principles they suggest for such research, including response format, number of levels shown, and number of concepts.

RUM and RRM—Improving the Predictive Validity of Conjoint Results?

(Jeroen Hardon and Kees van der Wagt, SKIM Group): Kees and Jeroen provided a useful overview of the differences between Random Utility Modeling (RUM—the additive compensatory model) versus RRM (Random Regret Modeling—a relatively new non-IIA bound method of modeling CBC experiments which may be done with most any commercial logit-based utility estimation routine, including CBC/HB). RRM posits that respondents pick the concept within a task that minimizes their regret for not getting aspects that were better in the competing concepts. Kees and Jeroen compared the predictive validity of the different models with a few CBC studies, finding mixed results. They also investigated a hybrid model which incorporated both RUM and RRM characteristics. The hybrid model also produced mixed results, with some challenges of multicolinearity to overcome (which the authors handled via additional utility constraints in CBC/HB). Although RRM seems promising for certain kinds of product categories and applications, it doesn't always lead to better models than the standard RUM model specification. The hybrid approach would seem to offer some of the benefits of RRM, but could be a safer approach due to its leverage of the robust RUM model. One challenge for RRM modeling is that only ordered attributes (like speed and price) may be RRM coded.

Capturing Individual Level Behavior in DCM

(Peter Kurz, TNS Infratest and Stefan Binner, bms market research + strategy): Peter and Stefan illustrated how sometimes in larger DCM study designs, respondents can be observed following certain decision rules in their choice questionnaires and yet the part-worth utilities estimated by HB can suggest otherwise. The authors pointed out that sparse designs (many attribute levels to estimate relative to few choices made per respondent) result in quite a bit of Bayesian smoothing of respondents toward the population (or covariate) means. They showed simulations that varied the number of tasks per respondent. With fewer tasks, the respondents' utilities tend to be shrunk more toward the population means. Peter and Stefan pointed out that when the number of tasks is few and the respondent's personal preferences differ from the vast majority of respondents, HB utilities for that respondent may seem to conflict with that individual's preference and more revert to the population preferences. They concluded by recommending that researchers be on the lookout for issues due to Bayesian smoothing and recognize that DCM models have great predictive accuracy in terms of total market, but can have problems for small segments or niches. They suggested that if you expect sparse data for specific and important sub-segments, then you should apply covariates and also oversample such sub-segments.

Occasion Based Conjoint—Augmenting CBC Data to Improve Model Quality

(Björn Höfer and Susanne Müller, IPSOS): Björn and Susanne described how to enhance CBC questionnaires with additional questions regarding product use occasions to deliver better insights. In addition to the standard CBC questions respondents indicate which SKUs they use under different occasions (pick-any data) and how relevant each occasion is. Since the CBC data collection does not differ from standard CBC the integration of occasions is shifted to the utility estimation and/or preference share calculation. In their methodological comparison they found that the Occasion-Based Conjoint (OBC)—although it improves the estimation of substitution effects (face validity)—does not perform better than a standard volumetric CBC model in terms of internal and external validity criteria. Nevertheless the integration of occasions can be recommended to benefit from more realistic estimates of new product sales potential and substitution effects as well as from the additional insights on the motivations behind product choice that can support marketing strategy decisions.

Precise FMCG Market Modeling Using Advanced CBC

(Dmitry Belyakov, Synovate Comcon): CBC studies for Fast Moving Consumer Goods (FMCG) categories can become quite complex. Often there are a dozen or more offerings (SKUs) that differ in terms of brands, package sizes, product forms, and prices. Dmitry described different strategies for designing CBC questionnaire for complex FMCG studies and for modeling the data. Dmitry reported results for a simulation study involving consideration sets. With consideration sets designs, respondents see only the SKUs within the CBC tasks that they screen in (would consider). The coding method that compares both accepted and dropped SKUs (in a series of binary paired comparisons) to a threshold SKU parameter performed best among those he tested. The second challenge Dmitry described involved modeling price sensitivity for dozens of SKUs. If the data were sufficient, ideally the researcher would estimate a separate price slope for each SKU. But, the data are typically too sparse to do a good job with this approach. Dmitry suggested a method of grouping SKUs into just a few segments based on the slope of their aggregate logit alternative-specific price coefficients. The SKUs within the same segments can be coded with a shared price slope to economize on the total number of parameters for HB estimation while still capturing good SKU-based price information.

Defining the Employee Value Proposition

(Tim Glowa, Garry Spinks, and Allyson Kuper, Bug Insights): Tim and his co-authors described how conjoint analysis may be used to help retain employees by improving rewards packages. Designing rewards packages involves not only measuring what employees think is valuable but balancing those desires against the costs of different programs. Tim suggested using best-worst conjoint, which is a conjoint-style variation on the traditional MaxDiff survey. With best-worst conjoint, respondents are shown an employment package (just as a conjoint profile, composed using one level from each of many attributes) and indicate which one level from that profile has the most positive impact on them and which one level from that profile has the least positive impact. Using logit-based analysis (e.g., logit, latent class MNL, HB), scores are estimated for each attribute level, similar to conjoint analysis. Some of the challenges of doing best-worst conjoint among employees are: 1) studies often are global, spanning multiple countries and languages, 2) large sample sizes, sometimes 20,000+, 3) the emotional sensitivity of the subject matter to the respondents, and 4) managing anxiety and expectations of the employees.

Combining Latent-Class Choice, CART, and CBC/HB to Identify Significant Covariates in Model Estimation

(George Boomer, StatWizards LLC and Kiley Austin-Young, Comcast Corp.): George and his co-author Kiley explained that covariates are often important, for example, gender in the handbag market, income in the exotic car market, age in the market for geriatric medicine. They proposed an approach for identifying key covariates and incorporating them into a CBC simulation within a time frame that comports with practitioners' schedules.

Their approach makes use of three techniques applied to a common data set. First, CBC/HB is employed to produce a set of individual-level utilities. Second, a latent-class choice (LGC) estimation identifies groups of respondents who share a common set of utilities. Third, CART is used to improve upon LGC's covariate classification. Finally, the latent classes and significant covariates from modern data mining techniques are brought together in a common market simulator. The authors used both a simulated data set and a disguised, real-world example from the telecommunications industry to illustrate this approach.

Uncovering Customer Segments Based on What Matters Most to Each

(Ewa Nowakowska, GfK Custom Research North America and Joseph Retzer, Market Probe): Ewa and Joseph discussed an approach to clustering data called co-clustering. Co-clustering is an emerging method that connects two data entities, e.g., rows and columns of the data. Typically factor analysis is used to find groupings of variables and cluster analysis finds groupings of cases. Co-clustering simultaneously finds groupings of variables and cases by taking into account the pairwise (dyadic) relationship between the two. Among other aspects of co-clustering, the authors demonstrated how the same respondent can simultaneously belong to multiple co-clusters as well as how a particular variable may be used to define more than one co-cluster. An illustration employing airline traveler data was reviewed. The variables included attitudinal and behavioral information about the respondent as well as customer satisfaction data regarding specific airlines. Co-clustering may be done within R using the "blockcluster" package.

Climbing the Content Ladder: How Product Platforms and Commonality Metrics Lead to Intuitive Product Strategies

(Scott Ferguson, North Carolina State University): One of the challenges of using conjoint analysis in product line optimization problems is that many of the solutions may not make sense from a business standpoint. Scott began by reviewing his previous effort at the Sawtooth Software conference regarding multi-objective search, which finds solutions that concurrently satisfy multiple goals such as profit and market share. Beyond satisfying multiple objectives, Scott's new work dealt with the problem of creating product portfolios that make sense in terms of their structure. It is less expensive to provide multiple products that share a lot of characteristics, so a product portfolio that has a lot of commonality and yet reaches a variety of people would be desirable. Even though imposing increased commonality upon the solution space usually comes at the expense of other goals such as market share or profit, Scott reported that portfolios that emphasize commonality will also tend to avoid less extreme products.

A Machine-Learning Approach to Conjoint Analysis: Boosting and Blending Ensembles

(Kevin Lattery, SKIM Group): Kevin's presentation explored what might happen if machine learning enthusiasts analyzed conjoint analysis results. First, he pointed out the recent successes that machine learning has had for prediction problems, notably the $1 Million Netflix prize. The winner of the grand prize, as well as all the leading methods were Ensemble approaches. Ensemble analyses blend different, diverse predictive models to improve overall predictions. Critical to their success is having a large number of quality, yet different solutions to a prediction problem. Kevin demonstrated how ensembles of latent class solutions can improve prediction for conjoint analysis problems, even surpassing the predictive rates of HB (for RLH across 3 holdout tasks in two different studies). He first generated diverse models by different random seeds, followed by pruning those models with higher correlations among their predictions. Kevin then attempted to improve upon the randomly generated ensembles by using a boosting approach. He tried several different modifications of AdaBoost. His best boosted approach used the Q-function based on standardizing the likelihood across specific tasks. However, even this method did not improve over the ensembles generated from different random seeds.

The Unreliability of Stated Preferences When Needs and Wants Don't Match

(Marc R. Dotson and Greg M. Allenby, Fisher College of Business, The Ohio State University): Inviting respondents to take your conjoint analysis survey that actually have need states that lead to higher engagement in the interview yields significantly more reliable data. That's the conclusion that Marc and Greg drew after applying a statistical model that explored the mechanism through which relevance (i.e., when needs and wants match) impacts consumer choice. They reported results for an empirical study with 567 respondents and concluded that not correcting for unreliable respondents has the potential to introduce parameter bias. By screening out respondents who didn't have any of the needs met by the product category, out-of-sample hit probability improved. The authors suggested that practitioners use stricter screening criteria to ensure the choice surveys are relevant to respondents and that they really have need states that put them in the market to consider and purchase the products in question.