Sawtooth Software: The Survey Software of Choice

Conference 2003: Summary of Findings

Nearly two-dozen presentations were delivered at the tenth Sawtooth Software Conference in San Antonio, TX. We've summarized some of the high points below. Since we cannot possibly convey the full worth of the papers in a few paragraphs, the authors have submitted complete written papers for the 2003 Sawtooth Software Conference Proceedings.

The Internet: Where Are We? And Where Do We Go from Here? (Donna J. Wydra, TNS Intersearch): The internet is increasingly becoming a key tool for market researchers in data collection and is enabling them to present more interesting and realistic stimuli to respondents. In 2002, 20% of market research spending was accounted for by internet-based research. Some estimates project that to increase to 40% by 2004. Although the base of US individuals with access to the internet is still biased toward higher income and employment and lower age groups, the incidence is increasingly representative of the general population. Worldwide, internet usage by adults is highest in Denmark (63%), USA (62%), The Netherlands (61%), Canada (60%), Finland (59%) and Norway (58%).

Best practices for internet research include keeping the survey to 10 minutes or less and making it simple, fun, and interesting. Online open-ends are usually more complete (longer and more honest) than when provided via phone or paper-based modes. Donna emphasized that researchers must respect respondents, which are our treasured resource. Researchers must ensure privacy, provide appropriate incentives, and say "Thank You." She encouraged the audience to pay attention to privacy laws, particularly when interviewing children. She predicted that as broad-band access spreads, more research will be able to include video, 360-degree views of product concepts, and virtual shopping simulations. Cell phones have been touted as a new promising vehicle for survey research, especially as their connectivity and functionality with respect to the internet increases. However, due to small displays, people having to pay by the minute for phone usage, and the consumers' state of mind when using phones (short attention spans), this medium, she argued, is less promising than many have projected.

Sampling and the Internet (Expert Panel Discussion): This session featured representatives from three companies heavily involved in sampling over the internet: J. Michael Dennis (Knowledge Networks), Andrea Durning (SPSS MR, in alliance with AOL's Opinion Place), and Susan Hart (Synovate, formerly Market Facts). Two main concepts were discussed for reaching respondents online: River Sampling and Panels. River Sampling (e.g. Opinion Place) continuously invites respondents using banner ads having access to potentially millions of individuals. The idea is that a large river of potential respondents continually flows past the researcher, who dips a bucket into the water to sample a new set (in theory) of respondents each time. The benefits of River Sampling include its broad reach and ability to contact difficult to find populations.

In contrast to River Sampling, panels may be thought of as dipping the researcher's bucket into a pool. Respondents belong to the pool of panelists, and generally take multiple surveys per month. The Market Facts ePanel was developed in 1999, based on its extensive mail panel. Very detailed information is already known about the panelists, so much profiling information is available without incurring the cost of asking the respondents each time. Approximately 25% of the panel is replaced annually. Challenges include shortages among minority households and lower income groups, though data can be made more projectable by weighting. Knowledge Networks offers a different approach to the internet panel: panelists are recruited in more traditional means and then given Web TVs to access surveys. Benefits include better representation of all segments of the US (including low income and education households). Among the three sources discussed, research costs for a typical study were most expensive for Knowledge Networks, and least expensive for Opinion Place.

Online Qualitative Research from the Participants' Viewpoint (Theo Downes-Le Guin, Doxus LLC): Theo spoke of a relatively new approach for qualitative interviewing over the internet called "threaded discussions." The technique is based on bulletin board web technology. Respondents are recruited, typically by phone, to participate in an on-line discussion over a few days. The discussion is moderated by one or more moderators and can include up to 30 participants.

Some of the advantages of the method are inherent in the technology, for example, there is less bias toward people who type faster and are more spontaneously articulate, as with one-session internet focus groups. Participants also indicate that the method is convenient because they can come and go as schedule dictates. Since the discussion happens over many days, respondents can consider issues more deeply and type information at their leisure. Respondents can see the comments from moderators and other participants, and respond directly to those previous messages. Theo explained that threaded discussion groups produce much more material that can be less influenced by dominant discussion members than traditional focus groups. However, like all internet methods the method is probably not appropriate for low-involvement topics. The challenge, as always, is to scan such large amounts of text and interpret the results.

Scaling Multiple Items: Monadic Ratings vs. Paired Comparisons (Bryan Orme, Sawtooth Software): Researchers are commonly asked to measure multiple items, such as the relative desirability of multiple brands or the importance of product features. The most commonly used method for measuring items is the monadic rating scale (e.g. rate "x" on a 1 to 10 scale). Bryan described the common problems with these simple ratings scales: respondents tend to use only a few of the scale points, and respondents exhibit different scale use biases, such as the tendency to use either the upper part of the scale ("yea-sayers") or the lower end of the scale ("nay-sayers"). Lack of discrimination is often a problem with monadic ratings, and variance is a necessary element to permit comparisons among items or across segments on the items.

Bryan reviewed an old technique called paired comparisons that has been used by market researchers, but not nearly as frequently as the ubiquitous monadic rating. The method involves asking a series of questions such as "Do you prefer IBM or Dell?" or "Which is more important to you, clean floors or good tasting food?" The different items are systematically compared to one another in a balanced experimental plan. Bryan suggested that asking 1.5x as many paired comparison questions as items measured in a cyclical plan is sufficient to obtain reasonably stable estimates at the individual level (if using HB estimation). He reported evidence from two split-sample studies that demonstrated that paired comparisons work better than monadic ratings, resulting in greater between-item and between-respondent discrimination. The paired comparison data also had higher hit rates when predicting holdout observations.

* Maximum Difference Scaling: Improved Measures of Importance and Preference for Segmentation (Steve Cohen, Consultant): Steve's presentation picked up where Bryan Orme's presentation left off, extending the argument against monadic ratings for measuring preferences for objects or importances for attributes, but focusing on a newer and more sophisticated method called Maximum Difference (Best/Worst) Scaling. MaxDiff was first proposed by Jordan Louviere in the early 90s, as a new form of conjoint analysis. Steve focused on its use for measuring the preference among an array of multiple items (such as brands, or attribute features) rather than in a conjoint context, where items composing a whole product are viewed conjointly.

With a MaxDiff exercise, respondents are shown, for example, four items and asked which of these is the most and least important/preferable. This task repeats, for a number of sets, with a new set of items considered in each set. Steve demonstrated that if four items (A, B, C, D) are presented, and the respondent indicates that A is best and B is worst, we learn five of the six possible paired comparisons from this task (A>B, A>C, A>D, B>D, C>D). Steve showed that MaxDiff can lead to even greater between-item discrimination and better predictive performance of holdout tasks than monadic ratings or even paired comparisons. Between-group discrimination was better for MaxDiff than monadic, but about on par with paired comparisons. Finally, Steve showed how using this more powerful tool for measuring importance of items can lead to better segmentation studies, where the MaxDiff tasks are analyzed using latent class analysis.

(* Most Valuable Presentation award, based on attendee ballots.)

The Predictive Validity of Kruskal's Relative Importance Algorithm (Keith Chrzan & Joe Retzer, Maritz Research, and Jon Busbice, IMS America): The authors reviewed the problem of multicollinearity when estimating derived importance measures (drivers) for product/brand characteristics from multiple regression, where the items are used as independent variables and some measure of overall performance, preference, or loyalty is the dependent variable. Multicollinearity often leads to unstable estimates of betas, where some of these actually can reflect a negative sign (negative impact on preference, loyalty, etc.) when the researcher hypothesizes that all attributes should necessarily have a positive impact.

Kruskal's algorithm involves investigating all possible orderings of independent variables and averages across the betas under each condition of entry. For example, with three independent variables A, B, and C, there are six possible orderings for entry in the regression model: ABC, ACB, BAC, BCA, CAB, and CBA. Therefore, the coefficient for variable A is the average of the partial coefficients for A when estimated within separate regression models with the following independent variables: (A alone, occurs 2x), (BA), (BCA), (CA), and (CBA). The authors showed greater stability for coefficients measured in this manner, and also demonstrated greater predictive validity in terms of hit rates for holdout respondents for Kruskal's importance measure as opposed to that from standard regression analysis.

New Developments in Latent Class Choice Models (Jay Magidson, Statistical Innovations, Inc., Jeroen K. Vermunt, Tilburg University, and Thomas C. Eagle, Eagle Analytics, Inc.): Latent class analysis has emerged as an important and valuable way to model respondent preferences in ratings-based conjoint and CBC. Latent class is also valuable in more general contexts, where a dependent variable (whether discrete or continuous) is a function of a single or multiple independent variables. Latent class simultaneously finds segments representing concentrations of individuals with identical beta weights (part worth utilities) and reports the beta weights by segment. Latent class assumes a discrete distribution of heterogeneity as opposed to a continuous assumption of heterogeneity for HB.

Using output from a commercially available latent class tool called Latent GOLD Choice, Jay demonstrated the different options and ways of interpreting/reporting results. Some recent advances incorporated into this software include: ability to deal with partial- or full-ranks within choice sets; monotonicity constraints for part worths, bootstrap p-value (for helping determine the appropriate number of segments); inclusion of segment-based covariates; rescaled parameters and graphical displays; faster and better algorithms by switching to a Newton Raphson algorithm when close to convergence; and availability of individual coefficients (by weighting group vectors by each respondent's probability of membership). The authors reported results for a real data set in which latent class and HB had very similar performance in predicting shares of choice for holdout tasks (among holdout respondents). But, latent class is much faster than HB, and directly provides insights regarding segments.

Archetypal Analysis: An Alternative Approach to Finding and Defining Segments (Jon Pinnell & David Cui, MarketVision Research, and Andy Elder, Momentum Research Group): The authors presented a method for segmentation called Archetypal Analysis. It is not a new technique, but it has yet to gain much traction in the market research community. Typical segmentation analysis often involves K-means clustering. The goal of such clustering is to group cases within clusters that are maximally similar within groups and maximally different between groups (Euclidean distance). The groups are formulated and almost always characterized in terms of their within-group means. In contrast, Archetypal Analysis seeks groups that are not defined principally by a concentration of similar cases, but that are closely related to particular extreme cases that are dispersed in the furthermost corners in the complex space defined by the input variables. These extreme cases are the archetypes. Archetypes are found based on an objective function (Residual Sum of Squares) and an iterative least squares solution.

The strengths of the method are that the approach focuses on identifying more "pure" types, and those pure types reflect "aspirational" rather than average individuals. Segment means from archetypal analysis can show more discrimination on the input variables than traditional cluster segmentation. However, like K-means cluster routines, archetypal analysis is subject to local minima. It doesn't work as well in high dimension space, and it is particularly sensitive to outliers.

Trade-Off vs. Self-Explication in Choice Modeling: The Current Controversy (Lawrence D. Gibson, Eric Marder Associates, Inc.): Choice models and choice experiments are vital tools for marketing researchers and marketers, Larry argued. These methods yield the unambiguous, quantitative predictions needed to improve marketing decisions and avoid recent marketing disasters. Larry described how Eric Marder Associates has been using a controlled choice experiment called STEP for many years. STEP involves a sticker allocation among competing alternatives. Respondents are randomly divided into groups in a classic experimental design, with different price, package, or positioning statements used in the different test cells. Larry noted that this single-criterion-question, pure experiment avoids revealing the subject of the study, as opposed to conjoint analysis that asks many criterion questions of each respondent.

Larry described a self-explicated choice model, called SUMM, which incorporates a complete "map" of attributes and levels as well as each respondent's subjective, perceptions of the alternatives on the various attributes. Rather than traditional rating scales, Eric Marder Associates has developed an "unbounded" rating scale, where respondents indicate liking by writing (or typing) "L's" , or disliking by typing "D's" (as many "L" and "D" letters as desired). Each "L" indicates +1 in "utility" and every "D" -1. Preferences are then combined with the respondents' idiosyncratic perceptions of alternatives on the various features to produce an integrated choice simulator. Larry also shared a variety of evidence showing the validity of SUMM.

Larry argued that conjoint analysis lacks the interview capacity to realistically model the decision process. Collecting each respondent's subjective perceptions of the brands and using a complete "map" of attributes and levels usually eclipses the limits of conjoint analysis. If simpler self-explication approaches such as SUMM can produce valid predictions, then why bother with trade-off data, Larry challenged the audience. He further questioned why conjoint analysis continues to attract overwhelming academic support while self-explication is ignored. Finally, Larry invited the audience to participate in a validation study to compare conjoint methods with SUMM.

Perspectives Based on 10 Years of HB in Marketing Research (Greg M. Allenby, Ohio State University, and Peter Rossi, University of Chicago): Greg began by introducing Bayes theorem, which is a method for accounting for uncertainty forwarded in 1764. Even though statisticians found it to be a useful concept, it was impractical to use Bayes theorem in market research problems due to the inability to use integrate over so many variables. But, after influential papers in the 1980s and 1990s highlighting innovations in Monte Carlo Markov Chain (MCMC) algorithms, made possible because of the availability of faster computers, the Bayes revolution was off and running. Even using the fastest computers available to academics in the early 1990s, mid-sized market research problems took sometimes days or weeks to solve. Initially, reactions were mixed within the market research community. A reviewer for a leading journal called HB "smoke and mirrors." Sawtooth Software's own Rich Johnson as skeptical regarding Greg's results for estimating conjoint part worths using MCMC.

By the late 1990s, hardware technology had advanced such that most market research problems could be done in reasonable time. Forums such as the AMA's ART, Ohio State's BAMMCONF, and the Sawtooth Software Conference further spread the HB gospel. Software programs, both commercial and freely distributed by academics, made HB more accessible to leading researchers and academics. Greg predicted that over the next 10 years, HB will enable researchers to develop more rich models of consumer behavior. We will extend the standard preference models to incorporate more complex behavioral components, including screening rules in conjoint analysis (conjunctive, disjunctive, compensatory), satiation, scale usage, and inter-dependent preferences among consumers. New models will approach preference from the multitude of basic concerns and interests that give rise to needs. Common to all these problems is a dramatic increase in the number of explanatory variables. HB's ability to estimate truly large models at the disaggregate level, while simultaneously ensuring relatively stabile parameters, is key to making all this happen over the next decade.

Partial Profile Discrete Choice: What's the Optimal Number of Attributes (Michael Patterson, Probit Research, Inc. and Keith Chrzan, Maritz Research): Partial profile choice is a relatively new design approach for CBC that is becoming more widely used in the industry. In partial profile choice questions, respondents evaluate product alternatives on just a subset of the total attributes in the study. Since the attributes are systematically rotated into the questions, each respondent sees all attributes and attribute levels when all tasks in the questionnaire are considered. Partial profile choice, it is argued, permits researchers to study many more attributes than would be feasible using the full-profile approach (due to a reduction in respondent fatigue/confusion).

Proponents of partial profile choice have generally suggested using about 5 attributes per choice question. This paper formally tested that guideline, by alternating the number of attributes shown per task in a 5-cell split-sample experiment. Respondents received either 3, 5, 7, 9 or 15 attributes per task, where 15 total attributes were being studied. A None alternative was included in all cases. The findings indicate the highest overall efficiency (statistical efficiency + respondent efficiency) and accuracy (holdout predictions) with 3 and 5 attributes. All performance measures, including completion rates, generally declined with larger numbers of attributes shown in each profile. The None parameter differed significantly, depending on the number of attributes shown per task. The authors suggested that including a None in partial profile tasks is problematic, and probably ill advised. After accounting for the difference in the None parameter, there were only a few statistically significant differences across the design cells for the parameters.

Discrete Choice Experiments with an Online Consumer Panel (Chris Goglia, Critical Mix): Panels of respondents are often a rich source for testing specific hypotheses through methodological studies. Chris was able to tap into an online consumer panel to test some specific psychological, experimental, and usability issues for CBC. For the psychological aspect, Chris tested whether there might be differences if respondents saw brand attributes represented as text or as graphical logos. As for the experimental aspects, Chris tested whether "corner prohibitions" lead to more efficient designs and accurate results. Finally, Chris asked respondents to evaluate their experience with the different versions, to see if these manipulations altered the usability of the survey. The subject matter of the survey was choices among personal computers for home use.

Chris found no differences in the part worths or internal reliability whether using brands described by text or pictures. "Corner prohibitions" involve prohibiting combinations of the best levels and worst levels for two a priori ordered attributes, such as RAM and Processor Speed. For example, 128MB RAM is prohibited with 1 GHz speed, and 512MB RAM is prohibited with 2.2 GHz speed. Corner prohibitions reduce orthogonality, but increase utility balance within choice tasks. Chris found no differences in the part worths or internal reliability with corner prohibitions. Other interesting findings were that self-explicated importance questions using a 100-point allocation produced substantially different results for the importance of brand (relative to the other attributes) than importance for brand (relative to those same attributes) derived from the CBC experiment. However, self-explicated ratings of the various brands produced very similar results as the relative part worths for those same brands derived from the CBC experiment. These results echo earlier cautions by many researchers regarding the value of asking a blanket "how important is " question.

How Few Is Too Few?: Sample Size in Discrete Choice Analysis (Robert A Hart, Jr., The Gelb Consulting Group, Inc., and Michael Patterson, Probit Research, Inc.): Researchers have argued that Choice-Based Conjoint (CBC) requires relatively larger sample sizes to stabilize the parameters relative to ratings-based conjoint methods. Given the many benefits of CBC, the sensitivity of its use to sample size seems an important issue. Mike reviewed previous work by Johnson and Orme that had suggested that, if assuming aggregate analysis, doubling the number of tasks each respondent completes is roughly equal in value to doubling the number of respondents. However, this conclusion did not consider heterogeneity and more recent estimation methods such as Latent Class and HB.

Mike presented results for both synthetic (computer generated) and real data. He and his co-author systematically varied the number of respondents and tasks per respondent, and compared the stability of the parameters across multiple random "draws" of the data. They found that the Johnson/Orme conclusion essentially held for aggregate logit conditions. They concluded that researchers could obtain relatively stable results in even small (n=50) samples, given that respondents complete a large enough number of choice tasks. They suggested that further research should be done to investigate the effects of heterogeneity, and the effects of partial profile CBC tasks on parameter stability.

Validation and Calibration of CBC for Pricing Research (Greg Rogers, Procter & Gamble, Tim Renken, Coulter/Renken): The authors presented results from a series of CBC studies that had been compared to actual market share and also econometric models (marketing mix modeling) of demand for packaged goods at P&G. The marketing mix models used multiple regression, modeling weekly volume as a function of SKU price, merchandising variables, advertising and other marketing activities. The models controlled for cross-store variation, seasonality and trend. The authors presented share predictions for CBC (after adjusting for distributional differences) versus actual market shares for washing powder, salted snacks, facial tissue, and potato crisps. In some cases, the results were extremely similar; in other cases the results demonstrated relatively large differences.

The authors next compared the sensitivity of price predicted by CBC to those from the marketing mix models. After adjusting the scale factor (exponent), they found that CBC was too oversensitive to price decreases (but not price increases). Greg and Tim also calculated a scalar adjustment factor for CBC as a function of marketing mix variables (regression analysis, where the dependent variable was the difference between predicted and actual sales). While this technique didn't improve the overall fit of the CBC relative to an aggregate scalar, it shed some light on which conditions may cause CBC predictions to deviate from actual market shares. Based on the regression parameters, they concluded that CBC understates price sensitivity of big-share items, overestimates price sensitivity of items that sell a lot on deal, and overestimates price sensitivity in experiments with few items on the shelf. Despite the differences between CBC and actual market shares, the Mean Absolute Error (MAE) for CBC predictions versus actual market shares was 4.5. This indicates that CBC's predictions were on average 4.5 share points from actual market shares, and in the opinion of some members of the audience that chimed in with their assessments, reflects commendable performance for a survey-based technique.

Determinants of External Validity in CBC (Bjorn Arenoe, SKIM Analytical/Erasmus University Rotterdam): Bjorn pointed out that most validation research for conjoint analysis has used internal measures of validity, such as predictions of holdout choice tasks. Only a few presentations at previous Sawtooth Software Conferences have dealt with actual market share data. Using ten data sets covering shampoo, surface cleaner, dishwashing detergent, laundry detergent and feminine care, Bjorn systematically studied which models and techniques had the greatest systematic benefit in predicting actual sales. He covered different utility estimation methods (logit, HB, and ICE), different simulation models (first choice, logit, RFC) and correctional measures (weighting by purchase frequency, and external effects to account for unequal distribution).

Bjorn found that the greatest impact on fit to market shares was realized for properly accounting for differences in distributional effects using external effects, followed by tuning the model for scale factor (exponent). There was just weak evidence that RFC with its attribute-error correction for similarity outperformed the logit simulation model. There was also only weak evidence for methods that account for heterogeneity (HB, ICE) over aggregate logit. There was no evidence that HB offered improvement over ICE, and no evidence that including weights for respondents based on stated purchase volumes increased predictive accuracy.

Life-Style Metrics: Time, Money, and Choice (Thomas W. Miller, University of Wisconsin-Madison): The vast majority of product features research focuses on the physical attributes of products, prices, and the perceived features of brands, but according to Tom, we as researchers hardly ever bother to study how the metric of time factors into the decision process. Tom reviewed the economic literature, specifically the labor-leisure model, which explains each individual's use of a 24-hour day as a trade off between leisure and work time.

In some recent conjoint analysis studies, Tom has had the opportunity to include time variables. For example, in a recent study regarding operating systems, an attribute reflecting how long it took to become proficient with the software was included. Another study of the attributes students trade off when considering an academic program included variations in time spent in classroom, time required for outside study, and time spent in a part time job. Tom proposed that time is a component of choice that is often neglected, but should be included in many research studies.

Modeling Patient-Centered Health Services Using Discrete Choice Conjoint and Hierarchical Bayes Analyses (Charles E. Cunningham, Don Buchanan, & Ken Deal, McMaster University): CBC is most commonly associated with consumer goods research. However, Charles showed a compelling example for how CBC can be used effectively and profitably in the design of children's mental health services. Current mental health service programs face a number of problems, including low utilization of treatments, low adherence to treatment, and high drop-out rate. Designing new programs to address these issues requires a substantial investment of limited funds. Often, research is done through expensive split sample tests where individuals are assigned to either a control or experimental group, where the experimental group reflects a new health services program to be tested, but for which very little primary quantitative research has gone into designing that new alternative.

Charles presented actual data for a children's health care program that was improved by first using CBC analysis to design a more optimal treatment. By applying latent class to the CBC data, the authors identified two strategically important (and significantly different) segments with different needs. Advantaged families wanted a program offering a "quick skill tune up" whereas high risk families desired more "intensive problem-focused" programs with highly experienced moderators. The advantaged families preferred meeting on evenings and Saturdays, whereas unemployed high risk families were less sensitive to workshop times. There were other divergent needs between these groups that surfaced. The predictions of the CBC and segmentation analysis were validated using clinic field trials and the results of previously conducted studies in which families were randomly assigned to either the existing program or programs consistent with parental preferences. As predicted, high risk families were more likely to enroll in programs consistent with preferences. In addition, participants attended more sessions, completed more homework, and reported greater reductions in child behavior problems at a significantly reduced relative cost versus the standard program ($19K versus $120K).

Complementary Capabilities for Web-Based Conjoint, Choice, and Perceptual Mapping Data Collection (Joseph Curry, Sawtooth Technologies, Inc.): Joe described how advancing technology in computer interviewing over the last few decades has enabled researchers to do what previously could not be done. While many of the sophisticated research techniques and extensions that researchers would like to do have been ported for use on the Web, other capabilities are not yet widely supported. Off-the-shelf web interviewing software has limitations, so researchers must choose to avoid more complicated techniques, wait for new releases, or customize their own solutions.

Joe showed three examples involving projects that required customized designs exceeding the capabilities of most off-the-shelf software. The examples involved conditional pricing for CBC (in which a complicated tier structure of price variations was prescribed, depending on the attributes present in a product alternative), visualization of choice tasks (in which graphics were arranged to create a "store shelf" look), and randomized comparison scales (in which respondents rated relevant brands on relevant attributes) for adaptive perceptual mapping studies. In each case, the Sensus software product (produced by Joe's company, Sawtooth Technologies) was able to provide the flexibility needed to accommodate the more sophisticated design. Joe hypothesized that these more flexible design approaches may lead to more accurate predictions of real world behavior, more efficient use of respondents' time, higher completion rates, and happier clients.

Brand Positioning Conjoint (Marco Vriens & Curtis Frazier, IntelliQuest): Most conjoint analysis projects focus on concrete attribute features and many include brand. The brand part worths include information about preference, but not why respondents have these preferences. Separate studies are often conducted to determine how soft attributes (perhaps more associated with perceptual/imagery studies) are drivers (or not) of brand preference. Marco and Curtis demonstrated a technique that bridges both kinds of information within a single choice simulator. The concrete features are measured through conjoint analysis, and the brand part worths from the conjoint model become dependent variables in a separate regression step that finds weights for the soft brand features (and an intercept, reflecting the unexplained component) that drive brand preference. Finally, the weights from the brand-drivers are included as additional variables within the choice simulator.

The benefits of this approach, the authors explained, are that it includes less tangible brand positioning information, providing a more complete understanding of how consumers make decisions. The drawbacks of the approach, as presented, were that the preferences for concrete attributes were estimated at the individual level, but the brand drivers were estimated as aggregate parameters. Discussion ensued directly following the paper regarding how the brand drivers may be estimated at the individual-level using HB, and how the concrete conjoint attributes and the weights for the soft imagery attributes might be estimated simultaneously, rather than in two separate steps.

Combining Self-Explicated and Experimental Choice Data (Amanda Kraus & Diana Lien, Center for Naval Analyses, and Bryan Orme, Sawtooth Software): The authors described a research project to study reenlistment decisions for Navy personnel. The sponsors were interested in what kinds of non-pay related factors might increase sailors' likelihood of reenlisting. The sponsors felt that choice-based conjoint was the proper technique, but wanted to study 13 attributes, each on 4 levels. Furthermore, obtaining stable individual-level estimates was key, as the sponsors required that the choice simulator provide confidence interval estimates in addition to the aggregate likelihood shares. To deal with these complexities, the authors used a three-part hybrid CBC study.

In the first section, respondents completed a self-explicated preference section identical to that employed in the first stage of ACA (respondents rate levels within attributes, and the importance of each attribute). In the second stage, respondents were given 15 partial-profile choice questions, each described using 4 of the attributes studied (without a "would not reenlist" option). In the final section, nine near-full profile CBC questions were shown (11 of the 13 attributes were displayed, due to screen real estate constraints), with a "would not reenlist" option. The authors tried various methods of estimation (logit, latent class, and HB), and various ways of combining the self-explicated, partial-profile CBC, and near-full profile CBC questions. Performance of each of the models was gauged using holdout respondents and tasks. The best model was one in which the partial-profile and near-full profile tasks were combined within the same data set, and individual-level estimates were estimated using HB, without any use of the self-explicated data. All attempts to use the self-explicated information did not improve prediction of the near-full profile holdout CBC tasks. Combining partial-profile and full-profile CBC tasks is a novel idea, and leverages the relative strengths of the two techniques. Partial-profile permits respondents to deal with so many attributes in a CBC task, and the full-profile tasks are needed for proper calibration of the None parameter.

Creating a Dynamic Market Simulator: Bridging Conjoint Analysis across Respondents (Jon Pinnell & Lisa Fridley, MarketVision Research): The issue of missing data is common to many market research problems, though not usually present with conjoint analysis. Jon described a project in which after the conjoint study was done, the client wanted to add a few more attributes to the analysis. The options were to redo the study with the full set of attributes, or collect some more data on a smaller scale with some of the original attributes plus the new attributes, and bridge (fuse) the new data with the old.

Typical conjoint bridging is conducted among the same respondents, or relies on aggregate level estimation. However, Jon's method used individual-level models and data fusion/imputation. Imputation of missing data is often done through mean substitution, hot deck, or model-based procedures (missing value is a function of other variables in the data, such as in regression). To evaluate the performance of various methods, Jon examined four conjoint data sets with no missing information, and randomly deleted some of the part worth data in each. He found that the hot-deck method worked consistently well for imputing values nearest to the original data, resulting in market simulations approximating those of the original data. The "nearest neighbor" hot-deck method involves scanning the data set to find the respondent or respondents that on common attributes most closely match the current respondent (with the missing data), and using the mean value from that nearest neighbor(s). Jon tried imputing the mean of the nearest neighbor, two nearest neighbors, etc. He found consistently better results when imputing the mean value from the four nearest neighbors.

Using Genetic Algorithms in Marketing Research (David G. Bakken, Harris Interactive): There are many kinds of problems facing market researchers that require searching for optimal combinations of variables in a large and complex search space, David explained. Common problems include conjoint-based combinatorial/optimization problems (finding the best product(s), relative to given competition), TURF and TURF-like combinatorial problems (e.g. find the most efficient set of six ice cream flavors such that all respondents find at least one flavor appealing), Non-linear ROI problems (such as in satisfaction/loyalty research), target marketing applications, adaptive questionnaire design, and simulations of market evolution.

Genetic Algorithms (GA) involve ideas from evolutionary biology. In conjoint analysis problems, the product alternatives are the "chromosomes," the attributes are the "genes," and the levels the attributes can assume are "alleles." A random population of chromosomes is generated, and evaluated in terms of fitness (share, etc.). The most fit members "mate" (share genetic information through random crossover and mutation) and produce new "offspring." The least fit are discarded, and the sequence repeats, for a number of generations. David also mentioned simpler, more direct routines such as hill-climbing, which are much quicker, but more subject to local minima. David suggested that GAs may be particularly useful for larger problems, when the search space is "lumpy" or not well understood, when the fitness function is "noisy" and a "good enough" solution is acceptable (in lieu of a global optimum).

Adaptive Choice-Based Conjoint (Rich Johnson, Sawtooth Software, Joel Huber, Duke University, and Lynd Bacon, NFO WorldGroup): There have been a number of papers in the literature on how to design CBC tasks to increase the accuracy of the estimated parameters. Four main criteria for efficient choice designs are: level balance, orthogonality, minimal overlap, and utility balance. These cannot be simultaneously satisfied, but a measure called D-efficiency appropriately trades off these opposing aims. D-efficiency is proportional to the determinant of the information matrix for the design.

The authors described a new design approach (ACBC) that uses prior utility information about the attribute levels to design new statistically informative questions. The general idea is that the determinant of the information matrix can be expressed as the product of the characteristic roots of the matrix, and the biggest improvement comes from increasing the smallest roots. Thus, new choice tasks with design vectors that mirror the characteristic vectors corresponding to the smallest roots are quite efficient in maximizing precision. In addition to choosing new tasks in this way, additional utility balance can be introduced across the alternatives within a task by swapping levels. The authors conducted a split-sample study (n=1099, using a web-based sample from the Knowledge Networks panel) in which respondents received either traditional CBC designs, the new adaptive CBC method just described, or that new method plus 1 or 2 swaps to improve utility balance. The authors found that the ACBC (with no additional swaps for utility balance) led to improvements in share predictive accuracy of holdout choice tasks (among holdout respondents). There were little or no differences in the treatments in terms of hit rates. The authors emphasized that when priors are used to choose the design, and particularly when used for utility balancing, the information in the data is reduced, but can be re-introduced with monotonicity constraints during part worth estimation. To do so, they used HB estimation subject to customized (within respondent) monotonicity constraints.