|
||||||
| Home |
Solutions |
Products |
Services |
Education |
Support |
| | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
SS Winter 2008Update on Adaptive CBCAt the 2007 Sawtooth Software Conference, we presented our latest research into Adaptive CBC. Our current stream of research is a significant departure from earlier Adaptive CBC approaches we have tried and described at earlier conferences. And, we’re happy to say, this approach seems to work better than the traditional CBC for complex studies involving about five or more attributes. Importantly, respondents find the interview more engaging, realistic, and focused on levels more relevant to their choices.Our approach involves first asking respondents to indicate which product they’d most likely purchase. We’ve formatted that phase as a BYO (configurator) task, but it probably could also be done as an indication of “most likely” or “preferred” levels for attributes. In the second phase of the adaptive interview, we ask respondents to screen product concepts that resemble their “most likely” product. Respondents indicate whether each concept is a possibility or not. After a few choices, if the respondent seems to be using non-compensatory rules (i.e. “must have” or “unacceptable” levels), we identify the possible rule and ask the respondent to confirm or skip the rule. This process of observing respondent choices and following up with the opportunity to define decision rules is repeated. And, of course, if the respondent indicates that a particular level is either required or unacceptable, only products meeting the criteria will be shown throughout the remainder of the interview. In the final phase, we ask the respondent to compare screened-in products using standard CBC choice tasks. This is a round-robin tournament that identifies an overall winning concept. To complete an online Adaptive CBC survey yourself, visit: www.sawtoothsoftware.com/test/byo/byologn.htm To this point, we have completed two studies. We have published the findings in the Technical Papers library on our website, in an article entitled: “A New Approach to Adaptive CBC.” What are we doing now? We are currently wrapping up a third study, and will be presenting the results at the 2008 A/R/T Forum Conference in June. In this latest research, we are testing whether the Adaptive interviews can be made significantly shorter without losing much in terms of predictive accuracy. We’ve implemented an improvement to the experimental designs that may allow us to get away with shorter interviews, so this latest work involves more than just shortening the interview. We are also in development of a beta version for Adaptive CBC. Because this is such a new approach, it is critical that we obtain more data points and greater experience prior to launching a commercial v1 product. We’ll announce the beta software when it is available, and will be enlisting your help to further test this promising methodology. X64 HB: Making Fast Even Faster(By Walt Williams, Sawtooth Software Engineer)At the 2007 Sawtooth Software Conference, Well Howell (Harris Interactive) presented a paper detailing a comparison of software to perform HB estimation, including our own CBC/HB v4. While we were quite pleased that CBC/HB had the fastest runtime (Well, your check is in the mail), we are always looking to further improve performance. In the last two years the computing industry has seen the introduction of 64-bit (x64) desktop computing. AMD began releasing processors supporting x64 in 2003, with Intel following suit in 2004. Over the last few years more chip lines have begun to support x64, and currently all new processors support x64. While these new processors run 32-bit Windows as well, Microsoft released an x64 version of Windows XP back in April 2005. Consumers could order a new PC with it if they requested it at purchase time, but it wasn’t available in retail outlets and thus the x64 market didn’t grow very fast. In January 2007, Microsoft released Windows Vista which gave consumers two discs for 32-bit and x64, allowing users to choose which to install. Since that time, the number of users running under x64 has steadily increased. We’ve often been asked if an x64 version of HB is available. Currently CBC/HB v4 is compiled as a 32-bit application. It will run just fine under x64 Windows. The question becomes whether an x64 version of HB would perform better than the 32-bit version. This applies only to x64 Windows since x64 HB will not run on 32-bit Windows. For the truly geek at heart, let me describe the hardware used for our x64 experiment. We used a Dell Precision 490 with an Intel Xeon Quad-Core processor running at 1.86GHz, two 80 GB 10K RPM drives using RAID 0, running Windows Vista Business x64. While the processor has 4 cores, CBC/HB is a single-thread application (we are also researching multi-threading). We performed utility estimation initially with 2GB of RAM, which would be fairly typical. However, x64 bit Windows allows the ability to use more than 3GB, so we upgraded the system to 6GB and reran the computations to see what could be gained by having more memory. Now I’ll describe the experiment. We selected a dataset with 1019 respondents who answered 24 choice tasks each regarding 5 attributes involving 3-15 levels each. Under main-effects (base case), CBC/HB would estimate 34 parameters for each respondent. We would consider this a moderately sized dataset and fairly common for CBC/HB. Our performance measure is the iteration rate, defined as seconds per iteration. We compared the 32-bit iteration rate to our x64 test engine, which I will describe next. The CBC/HB v4 engine is a 32-bit component written in the programming language C++. To create the x64 version, we made no changes to the existing code and simply compiled it as x64. While not part of the experiment, we tested the engine to make sure that conversion to x64 did not lead to different utilities from the 32-bit version. In a side-by-side test given the same random seed, both produced identical results. Because we did not need to change the code, we are confident that the results reflect only the difference between 32-bit and x64. Using CBC/HB v4, we created the base case along with five variations shown in the table below. To run the estimation, we created a special application (using C#, a .NET language) that would run both the 32-bit and x64 engines, to further ensure there was no bias towards one or the other. We then performed each utility estimation run in 32-bit and x64. The results are listed below:
In each case, we noted how many iterations we ran. Although the utilities had not likely converged in most cases, the iteration rate was stable enough to estimate performance, which was the target of the experiment. We also note the build file size, which relates to whether the data files would reside in memory or were too big and were used off the disk (which is also noted). In the first three utility runs, we used the original 1019 respondents and increased the number of parameters by adding interaction terms. In the next three cases, we repeated the first three runs, but we replicated each respondent 10 times, resulting in 10190 respondents. In all cases, we note the percent less time that the x64 version required to perform iterations. The first thing to note is that the x64 version was faster than the 32-bit version in every case. The gains in the 2GB block appear to be more pronounced (with one exception) than with 6GB, but looking carefully we see that the 32-bit rates increase significantly more between 2GB and 6GB than the x64 rates do. This is likely related to the way that Windows manages its own resources. Under 6GB it appears to be able to manage 32-bit applications much more efficiently than it did with 2GB. The exception occurred with the last case under 2GB. This is an extreme case where the build file size actually exceeded the RAM by over 1.5x. In this case, the machine spent most of its time in resource management (called thrashing), and so the iteration rate was very slow in both 32-bit and x64. In both the 2GB and 6GB blocks, the last two cases ran the build file from disk instead of memory, although it would appear that (except the previously mentioned case) they should run in memory. While there might have been enough memory to do so, Windows rejected the request to run them in memory. Unfortunately, that can’t be changed, but the operating system was able to cache parts of the files in memory, and under 6GB it was able to do enough of that to make the iteration rates reasonable. Probably the most encouraging aspect of this experiment is that you can expect a significant benefit running an x64 bit HB on an x64 machine with generous amounts of RAM. We expect an x64 version of CBC/HB sometime in 2008, and we hope to increase performance in other ways as well. Breathing New Life into Sawtooth Software’s Cluster Analysis
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Dell | |
| HP | |
| Toshiba | |
| 1 GB RAM | +$0 |
| 2 GB RAM | +$100 |
| 4 GB RAM | +$200 |
| 80 MB Hard Drive | +$0 |
| 120 MB Hard Drive | +$100 |
| 160 MB Hard Drive | +$200 |
| 2.0 GHz Processor | +$0 |
| 2.5 GHz Processor | +$200 |
| 3.0 GHz Processor | +$400 |
| Low Price (-30%) | |
| Medium Price (Average Price) | |
| High Price (+30%) |
| RAM | Hard Drive | Processor | Price | Text to Display |
| 1 | 1 | 1 | 1 | $525 |
| 1 | 1 | 1 | 2 | $750 |
| 1 | 1 | 1 | 3 | $975 |
| 1 | 1 | 2 | 1 | $665 |
| 1 | 1 | 2 | 2 | $950 |
| ... | ... | ... | ... | ... |
The benefit of conditional pricing is that more reasonable prices are shown to respondents and in the simplest case price may be estimated using main effects for (in this example) the three levels of price in the design. And, critically, the design is still orthogonal and unencumbered by prohibitions.
There are a few challenges when working with conditional pricing:
Another approach not currently offered in Sawtooth Software products is continuous price. (Even though it’s not a supported feature of the software, a power user can still do it, though it requires being able to reformat the text-only studyname.CHO file.) Continuous price differs from conditional price is two ways. First, it generalizes the idea of conditional pricing (beyond the software limitations of just three attributes). Second, it estimates the effect of overall price as a linear coefficient, rather than as a part-worth utility function. As with conditional pricing, we approach the problem by considering a base price for the product as well as fixed price premiums for levels of non-price attributes (plus or minus some overall independent price variation). If we consider the example from the previous section, the base price is $750 and the most expensive product option would be $1,550 (prior to varying price by some independent amount).
As with the first two pricing approaches, we also only show a single overall price within the product concept, rather than showing prices attached to each attribute level. The only difference between conditional price and continuous price is in the coding of the design matrix, where price is coded in a single column as a continuous variable. Typically, a single price coefficient is estimated based on linear price (or the natural log of price). More complex curve fitting might be considered, as well as piecewise coding. These approaches may provide better fit to the data, but risk overfitting and also introduce some correlation in the independent variable matrix.
Because values in the price column of the design matrix are determined from information in other columns of the design matrix, that column would be linearly dependent on other columns if we didn’t do something to break up that dependence. We do this by adding random variation to the prices.
The benefits of summed price relative to conditional pricing include:
But, these benefits come with a serious potential drawback: the price attribute is positively correlated with any attributes that involve incremental prices in the study, leading to less precise estimates of all effects, but most especially the price coefficient. The amount of correlation among attributes depends on the magnitude of the random variation in overall price as well as the size of the base fixed component of price relative to the incremental prices associated with each feature level. In the worst case, with no random variation, continuous price is simply the sum of the prices associated with the attribute levels. In that case, price would be perfectly predicted by a linear combination of the attributes and the design would be deficient. But, if we additionally vary the overall price by a large enough random amount (see guidelines further below), we can obtain sufficient precision of the estimates for overall price sensitivity as well as the other features in the study.
Continuous price is not an option in the current implementations of Sawtooth Software’s CBC or CVA products. However, a power user could implement it for either type of study and estimate the results properly using Latent Class, CBC/HB or, in the case of a CVA study, HB-Reg.
The final section of this article includes a simulation study to investigate what variation should be specified in the overall price attribute to lead to reasonable estimates with continuous price.
Simulation Study
As mentioned earlier, the amount of random variation given to continuous price has a direct impact on the efficiency of the estimates. To provide guidelines regarding how much random variation in price we should include in continuous price designs, we conducted a synthetic study with 300 simulated respondents. Each (computer generated) respondent received 10 choice tasks that were answered randomly. There were five attributes, each with three levels, along with overall price. The overall price was found using the base price plus incremental prices associated with each of the other five attributes. Then, if we were varying the overall price as much as +/-10%, we used five distinct price disturbances within that range of -10%, -5%, +0%, +5%, and +10%. (One could choose any number of discrete price variations within the +/-10% range.)
Continuous price usually consists of a fixed base price plus additional upgraded feature costs. If the base price is relatively large compared to the incremental prices attached to upgraded features, then (after disturbing overall price by the price variation) the resulting price attribute will be relatively uncorrelated with the linear combination of the other attributes. However, if the base price is relatively small compared to the incremental prices for the other levels in the study, then the resulting overall price will be more strongly correlated with a linear combination of the other attributes. Therefore, we needed to consider how different relative sizes of the fixed base price relative to incremental prices would affect the results.
Study Procedure and Results
We simulated different amounts of independent price variation on continuous price, from as low as +/-5% to as much as +/-40%. We estimated price as a single coefficient, to be applied to the natural log of total price. Prior to estimating (with aggregate logit) for each price condition, we normalized the log price variable to have a variance of unity, so that the standard errors would be comparable across the different simulation runs that featured substantial differences in the amount of absolute price variation. For each study condition, we recorded the standard error for log price. The precision of the estimated price coefficients relative to a standard three-level attribute in a parallel study without continuous price is shown in the following chart.

Recommendations
According to our simulations, the precision of the utility estimate for log price depends strongly on both the amount of added random price variation and the relative size of the constant base price compared to the feature-based prices. When the fixed base price of the product is 3/4 the total average price, as little as +/-10% price variation on continuous price will achieve precision of estimates that are nearly 50% as efficient as a standard 3-level attribute coded as a part-worth function. In terms of absolute magnitude, the standard error for price was 0.038 compared to 0.026 for levels from the standard three-level attribute. Based on additional simulations, we found that a standard five-level attribute (if placed within the same study instead of continuous price) would also achieve standard errors of estimates for its levels of about 0.038. Generally, in practice, we’d be comfortable with such precision.
However, if the base price only accounts for 1/3 of the total average price of the product (most of the price is explained by the incremental feature prices), then we’d need to vary continuous price +/-30% to achieve similar precision. Based on this simulation study, we can make general recommendations for continuous price:
If base price is 3/4 of total average price: +/-10%
If base price is 1/2 of total average price: +/-20%
If base price is 1/3 of total average price: +/-30%
Of course, choosing the price variation also depends on the client’s needs and the market simulations to be run. You should avoid extrapolating beyond the total range of price included in the questionnaire. Increasing the random price variation will improve your ability to simulate extreme priced products, at the risk of making the questionnaire present products that seem to have unreasonable prices, given their features.
The Weakest Link: A Cognitive Approach to Improving Survey Data Quality (David G. Bakken, Harris Interactive): David reminded us that our inferences and theories of consumer behavior are only as good as the data on which they are based. As researchers, we often apply conventional wisdom, “judgment” and some empirical evidence in designing questionnaires. But, often in our haste to take studies to field, we fail to pretest and refine our instruments. David reviewed previous work by psychologists regarding how humans interact with surveys. The four step model of survey response involves comprehension, retrieval, judgment, and response. He advocated the use of “Think Aloud Pre-Testing” in which respondents (10-20 per wave) verbalize their thoughts while answering survey questions. These tests should be conducted over multiple days to allow survey changes to be implemented and re-tested. Based on many such tests, David offered some observations regarding how respondents interact with web-based surveys and how they can be improved. Current problem areas include: grid questions, survey navigation, error messages, multi-lingual surveys, and CBC questionnaires.
Evaluating Financial Deals Using a Holistic Decision Modeling Approach (Paul Venditti, Don Peterson, and Matthew Siegel, General Electric): Paul described a very interesting approach that he and his co-authors are implementing within GE to evaluate complex financial deals. In the past, analysts have spent many hours evaluating financial deals and presenting the details of those deals to a committee of three individuals. Paul described how the characteristics of those deals could be defined using about 20 “conjoint” attributes. A modified ACA survey was developed to study three key individuals at GE who approve deals. The standard stated importance question in ACA was substituted with a constant-sum question implemented via an Excel worksheet. The final part-worth utilities were further modified by implementing a few non-compensatory rules (red flags). A market simulator based on the three respondents was found to be highly predictive of whether deals were approved or rejected in the months following the surveys (accuracy of about 80%). Paul’s work demonstrated that effective conjoint models (to profile tiny populations) can be built using tiny sample sizes. Conjoint analysis can provide good data for implementing sophisticated decision support tools in non-traditional contexts.
Issues and Cases in User Research for Technology Firms (Edwin Love, University of Washington School of Business, and Christopher N. Chapman, Microsoft Corporation): Edwin and Christopher described how conducting market research for technology products presents unique challenges. For example, innovative features are often not well-understood by respondents, and different user groups will have different levels of understanding. Also, features might not actually yet exist while the research is being conducted. The presenters commented that vague descriptions of attributes such as “easy setup” can skew user responses (toward expressing strong preference for nondescript features), and the results create the illusion of specific value where none may exist. They further recommended segmenting respondents based on product experience: owners vs. intenders. Edwin and Christopher illustrated the challenges of conducting market research for technology products via three case studies: a digital pen project, a webcam, and a digital camera.
Minimizing Promises and Fears: Defining the Decision Space for Conjoint Research for Employees versus Customers (L. Allen Slade, Covenant College): Conjoint analysis can be a valuable tool in both consumer and employee research. However, the researcher must recognize the key differences in how the firm interacts with the respondents. Allen affirmed that customers are less interdependent with the firm than are employees. And, different employees (depending on role and experience/training) are more highly interdependent with the firm than others. With employee research, the worry is of creating false promises of rewards or unwarranted fears of takeaways. Allen suggested that researchers ask themselves three key questions prior to including something in a conjoint survey for employees: 1) Would we be willing to actually do this?, 2) How does this intervention compare to the others we are considering?, and 3) How would an employee or customer react to taking this survey? Using an actual case study at Microsoft (total rewards optimization), Allen illustrated how applying these three questions led to effective research without undue promises or fears.
A Cart-Before-the-Horse Approach to Conjoint Analysis* (Ely Dahan, UCLA Anderson School): With traditional conjoint studies, respondents are often asked to complete long surveys, they are required to rate products they don’t like, and the resulting part-worth utilities often contain reversals in the utilities. Ely described a novel, computer-administrated and adaptive method of employing a traditional full-profile conjoint design. Rather than estimate part-worth utilities after respondents take the surveys, CARDS (conjoint adaptive ranking database system) begins with a researcher-constructed database of typically thousands of potential sets of consistent part-worth utilities. Respondents are shown a set of product concepts and asked to choose which products they prefer. After the respondent provides a few answers, the database of utilities is queried to determine if certain product concepts that haven’t yet been evaluated are clearly inferior (and should not be chosen next in order). Those products are deleted from the screen, allowing respondents to focus on those product concepts that are relevant to identifying which set of utilities best fits them, while forcing respondents to maintain consistent ordering. The benefit is much shorter questionnaires. The downsides are that early answers matter a lot, and there is no real error theory. Plus, the quality of the results depends on how well researchers can develop the database of potential sets of utilities.
(*Winner of Best Presentation award, based on attendee ballots.)
Two-Stage Models: Identifying Non-Compensatory Heuristics for the Consideration Set then Adaptive Polyhedral Methods within the Consideration Set (Steven Gaskin, AMS, Theodoros Evgeniou, INSEAD, Daniel Bailiff, AMS, and John Hauser, MIT): Steven reviewed the scientific evidence that suggests that people buy products by first forming a consideration set and then choosing a product from within the consideration set. This two-stage approach helps people deal with a large number of alternatives in the choices they face. By reflecting this process in our choice models, Steven argued that we can more accurately model choices, create more realistic and enjoyable surveys, and handle more features than conventional CBC. He presented a survey design in which respondents may use non-compensatory (cut-off rules) to form consideration sets. Respondents are then asked to tradeoff considered products within a more standard-looking CBC task. He and his co-authors employed FastPace CBC to estimate the utilities for the n most important compensatory features for each respondent. Steven reported results showing that respondents preferred the adaptive survey over standard CBC.
A New Approach to Adaptive CBC (Rich Johnson and Bryan Orme, Sawtooth Software): Existing CBC questionnaires have weaknesses: they are viewed as tedious and not very focused on the particular needs of each respondent. The experimental plans have assumed compensatory behavior, and previous research has shown that many respondents apply non-compensatory heuristics to answer conjoint questionnaires. Rich and Bryan presented a new technique for adaptive CBC that helps overcome these issues. Their approach mimics the purchase process of formulating a consideration set using non-compensatory heuristics (such as “must have” or “must avoid” features), followed by a more careful tradeoff of alternatives within the consideration set using compensatory rules. This new approach involves three core stages: 1) Build-Your-Own (BYO) Stage, 2) Screening Stage, and 3) Choice Tasks Stage. They conducted a split-sample experiment comparing the new approach to traditional CBC. They found that respondents liked the adaptive survey more and felt it was more realistic—even though it took about double the time as traditional CBC. Furthermore, part-worths developed from ACBC were more predictive of holdout tasks than traditional CBC, despite the methods bias in favor of CBC for predicting the CBC-looking holdouts.
HB-Analysis for Multi-Format Adaptive CBC (Thomas Otter, Goethe University): The three-stage interview proposed by Johnson and Orme is innovative, but the formulation of a model extracting the common preference information is a challenge. Thomas first showed that such a model is required, as simply discarding any of the data collected before the CBC part results in inconsistent inferences in an HB setting. Thomas then investigated different models: a multinomial likelihood for all parts of the interview allowing for task-specific scale factors, task-specific “wiggles” in the preference vector using the same likelihood, a binary logit likelihood for the screener part and a multichoice likelihood for this same part. Thomas found that the scale factor did vary considerably between the sections. However, accounting for task specific scales had only a small effect on the predictive ability of the models. Moreover, his results suggest that a binary logit or a multichoice likelihood for the screener part of the interview are preferable to the explosion into multinomial choices both in terms of the implied story about how the data are generated and the empirical fits.
EM CBC: A New Framework for Deriving Individual Conjoint Utilities by Estimating Responses to Unobserved Tasks via Expectation-Maximization (Kevin Lattery, Maritz Research): Kevin demonstrated how EM algorithms can be used to estimate individual-level utilities from CBC data. EM is often applied in missing values analysis. In the context of CBC, each respondent could be viewed as having been shown all the tasks in a very large design plan, but having completed only a subset of them. The missing answers are imputed via EM. Once missing answers have been imputed, there is enough information available to estimate part worths for each individual. Utility constraints may be implemented as well. Kevin faced a few challenges in implementing EM for CBC. He found that if he allowed EM to iterate fully to convergence, overfitting would occur. Therefore, he relaxed the convergence criterion. Kevin also found that the estimated probabilities for the tasks respondents did versus those that were missing varied in their means and standard deviations. So he adjusted the results from each task so that means and variances of the missing data were comparable to the observed data. He then repeated the EM process again until the missing data converged. Kevin compared utilities estimated under EM to those estimated via HB, and found that the EM utilities performed as well or better than HB utilities for three data sets.
Removing the Scale Factor Confound in Multinomial Logit Choice Models to Obtain Better Estimates of Preference (Jay Magidson, Statistical Innovations, and Jeroen K. Vermunt, Tilburg University): Jay reintroduced the audience to the issue of scale factor. The size of the parameters in MNL estimation is inversely related to the amount of certainty in the respondents’ choices. Because different groups of respondents may have different scale factors, it is not theoretically appropriate to directly compare the raw MNL estimates between groups. Jay showed how such comparisons can lead to incorrect conclusions. He then turned attention toward an extended Latent Class choice model to isolate the scale parameter. Using that model, he showed how latent class segmentations can differ for real data sets as compared to the generic latent class model that doesn’t separately model scale. In one particular comparison, Jay found that the amount of time respondents spent answering a CBC questionnaire was directly related to segment membership from standard latent class estimation (without estimating the scale factor). Jay also demonstrated how scale estimation can be incorporated into DFactor Latent Class models. Jay concluded that removing the scale confound in latent class modeling will result in improved estimates of part-worths and improved targeting to relevant segments based on an improved understanding of segment preferences and levels of uncertainty.
An Empirical Test of Alternative Brand Measurement Systems (Keith Chrzan and Doug Malcom, Maritz Research): Keith and Doug presented results from three commercial studies that compared different ways of collecting brand image data. Those methods included: Likert ratings, comparative ratings, MaxDiff, pick any, semantic differential, and yes/no scaling. They argued that the brand image measurement system should produce 1) credible brand positions (face validity), 2) strong differences among brands (discriminant validity), and 3) powerful predictions of brand choice (predictive validity). The first two research studies they reported on demonstrated that Likert ratings and pick any data were generally inferior to the other methods. The third study they reported compared semantic differential, comparative ratings, yes/no, and pick any data. They concluded that, of those four methods, comparative ratings had the most discriminating power, followed by semantic differential. Pick any data measured little beyond the halo effect (a complicating issue wherein brands/objects liked overall tend to get higher ratings across the board on the attributes). To help control for the Halo Effect, the authors double-centered the scores prior to making comparisons.
Alternative Approaches to MaxDiff with Large Sets of Disparate Items–Augmented and Tailored MaxDiff (Phil Hendrix, immr and Stuart Drucker, Drucker Analytics): Phil and Stuart investigated some enhancements to standard MaxDiff questionnaires to help deal with large numbers of items while still achieving strong individual-level scores. The authors argued that with more than about 40 items, MaxDiff becomes very tedious for respondents if individual-level estimates are required. To deal with this issue, the authors proposed that respondents first perform a Q-Sort task, wherein they drag-and-drop items into one of K buckets (they used 4 buckets in their research). The information from the Q-Sort task can be added to the MaxDiff information to improve the estimates. The Q-Sort task can also be used to create customized MaxDiff questions that principally draw on items of greatest preference/importance. Phil and Stuart conducted a split-sample study comparing standard and two forms of augmented MaxDiff exercises. They found that overall the aggregate parameters were very similar across the methods. But, both forms of augmented MaxDiff exercises outperformed ordinary MaxDiff in terms of holdout predictions. They also found that respondents found the Q-Sort + MaxDiff methodology more enjoyable than standard MaxDiff alone.
Product Optimization as a Basis for Segmentation (Chris Diener, Lieberman Research Worldwide): Chris motivated his presentation by reviewing the strategic goals and outcomes of traditional segmentation approaches. With attitudinal segmentations, one finds strong segments in terms of attitudinal differences, but those differences often do not translate into segments that differ strongly in terms of product preferences. With segmentation based on product features, the hope is that the segments have targetable differences and that the preferences translate to profitable product line decisions. If product optimization is used as the focus, then there is a stronger linkage with profitable product line decisions. Of all the methods of optimization, Chris stated that he prefers Genetic Algorithms. But, Chris pointed out that segmentation based on product optimization provides no guarantee that the segments will demonstrate targetable differences in terms of attitudes, media usage, or demographics. To improve the odds that the segments are useful, Chris advocated data fusion processes which combine information from attitude segmentation and product optimization segmentation, especially when the strategic priority is on product development and you are confident in being able to find an attitudinal story.
Joint Segmenting Consumers Using both Behavioral and Attitudinal Data (Luiz Sa Lucas, IDS Market Analysis): Luiz discussed segmentation methods that incorporate both behavioral and attitudinal data. Behavior data alone are often not satisfactory to use in segmentation schemes, because the segments do not necessarily map to anything useful in terms of descriptive demographics or attitudinal data. By the same token, attitudinal data alone are not sufficient because attitudes don’t necessarily correlate strongly with behaviors. Luiz reviewed multiple procedures for incorporating both behavior and attitudinal data in segmentation, including Reverse Segmentation, Weighted Distance Matrices, Concomitant Variables Mixture Models, Joint Segmentation, and LTA models. Luiz finished by discussing different fit metrics for determining the appropriate number of clusters.
Defining the Linkages between Cultural Icons (Patrick Moriarty, OTX and Scott Porter, 12 Americans): Patrick and Scott described a mapping methodology in which cultural icons (celebrities, brands, politicians) are placed within a perceptual map. The data are in part driven by a MaxDiff questionnaire. The goal is to provide a unique understanding of the strength of linkage between brands, personalities, and media properties based on consumer attraction. Their research identified that religion and marital status are the two social identities that on average most define individuals. But, identity may also be measured by the degree to which people express connection with cultural icons. The authors explained that cultural icons can also be measured and characterized, in terms of four key components: Recognition, Attraction, Presence, and Polarization. As an example of how their mapping methods can drive strategy, they showed relationships between either Hillary Clinton or Rudy Giuliani, segments of the population, and popular consumer brands.
Cluster Ensemble Analysis and Graphical Depiction of Cluster Partitions (Joseph Retzer and Ming Shan, Maritz Research): Joe described a relatively new technique in unsupervised learning analysis called Cluster Ensemble Analysis that has been suggested as a generic approach for improving the accuracy and stability of cluster algorithm results. Cluster ensembles begin by generating multiple cluster solutions using a “base learner” algorithm, such as K-means. Multiple solutions may be generated in a variety of ways. The basic idea is to combine the results of a variety of cluster solutions to find a consensus solution that is representative of the different solutions. Joe further demonstrated how the quality of cluster solutions can be graphically depicted in terms of Silhouette plots. The silhouette shows which objects lie well within the cluster and which are somewhere in between clusters. He finished by showing how cluster ensemble analysis can improve cluster results for a particularly difficult sample data set that has non-spherical clusters.
Modeling Health Service Preferences Using Discrete Choice Conjoint Experiments: The Influence of Informant Mood (Charles Cunningham, Heather Rimas, and Ken Deal, McMaster University): Chuck presented the results of a research study that investigated how depression influences performance on discrete choice experiments designed to understand patient preferences. Previous evidence in the literature suggests that people with depressive orders can have impaired information processing and a related host of decision making deficits. Because Chuck and his co-authors often use discrete choice experiments in health care planning issues, and because the incidence of depression is relatively high within populations they often survey, these issues were of interest to them. They found that although depression did not increase inconsistent responding to identical holdout tasks (test-retest reliability), it did influence health service preferences and segment membership. Chuck also reviewed basic principles for designing and analyzing holdout questions.
Determining Product Line Pricing by Combining Choice Based Conjoint and Automated Optimization Algorithms: A Case Example (Michael Mulhern, Mulhern Consulting): Mike presented the results of a recent study where the purpose was to develop an optimal pricing strategy for a product line decision. Six price levels were included in the study, and based on the plot of average utilities, there appeared to be two “elbows” in the price function. The elbows seemed to represent optimal pricing points for mid-price and a higher-price products. Mike used the Advanced Simulation Module to conduct optimization searches to maximize revenue. He found that the optimization routines also identified those same two price points as optimal positions. The different optimization algorithms (exhaustive, grid, gradient, stochastic, and genetic) produced identical results irrespective of the starting points (with the exception of the gradient search method, which had some inconsistencies). Mike’s client also asked whether the optimal price points would change depending on different assumptions for the base case. Altering the base case and re-running the optimizations revealed similar recommendations in most cases. Mike was able to report what the client eventually did and how actual sales volume compared to the simulation’s predictions. The client followed some of the recommendations, but ignored others. The sales results suggest that ignoring the recommendations provided by the optimization simulations was costly. A poorly positioned mid-price product foundered, as would have been predicted by the model.
Using Constant Sum Questions to Forecast Sales of New Frequently Purchased Products (Greg Rogers, Procter & Gamble): Greg compared two relatively common methods for measuring buyer intent for an FMCG category: CBC and constant sum allocations (both computer-administered). Not surprisingly, the constant sum allocation (out of 10 purchases) data were more “spiked” on the 0%, 10%, 50%, and 100% allocation probabilities relative to the probabilities projected from the pick-one CBC data. Greg expanded the analysis to include a Dirichlet model (to estimate base trial for a new item) that incorporated the issues of trial and frequency. Greg concluded that analyzing the brand choices from simple constant sum scales using a Dirichlet model results in comparable base trial estimates to those derived by CBC. This finding has implications for researchers that cannot use other methods like purchase intent (requires database to interpret) or CBC (can be relatively complex and costly) to estimate trial for new products.
Replacement Modeling: A Simple Solution to the Challenge of Measuring Adding and Switching in a Polytherapy Choice Allocation Model (Larry Goldberger, Adelphi Research by Design): In pharmaceutical research, doctors sometimes will prescribe multiple drugs to treat particular condition. When this occurs, the standard allocation models that assume that each patient is assigned a single drug therapy is violated. When this happens, the allocations may sum to more than 100%, so the allocation total is no longer fixed. Larry demonstrated a Polytherapy Allocation Model that does not assume that the total sum allocated per task is 100%. The proposed solution models the likelihood that a new product will substitute for an existing product, and does not constrain the sum to 100%. Larry also reviewed other common approaches to the problem, and discussed the limitations. He discussed the common binary logit approach to the problem, and how the cross-effects can often lead to reversals.
Data Fusion to Support Product Development in the Subscriber Service Business (Frank Berkers, Gerard Loosschilder, SKIM Group, and Mary Anne Cronk, Philips Lifeline Systems): Data fusion can involve combining different datasets to learn more than the original datasets had to offer individually. The authors explained how they used data fusion to help develop new strategies with respect to a subscriber service for Lifeline monitoring (the leader in North America for Personal Emergency Response Systems). Specifically, the authors were able to develop a plan of action to approach customers with increased communication regarding specific offers depending on the pattern of signals received from the subscriber. This provided an “early warning system” that would flag subscribers as in danger of deactivating their service. By implementing this system, subscriptions could be prolonged, resulting in greater profitability to the firm. The combination of behavioral patterns and background characteristics gave a better and clearer warning of imminent deactivation, and the type of deactivation, than the separate data sources could provide. Furthermore, the combined information provided greater clarity in deciding what services to offer, and when to offer them to subscribers.
Multiple Imputation as a Benchmark for Comparison within Models of Customer Satisfaction (Jorge Alejandro and Kurt Pflughoeft, Market Probe): Kurt emphasized that many studies must deal with missing data, and the degree of missingness can be significant. Different missing value routines will lead to different degrees of bias and imprecision for statistical estimates. The authors examined a variety of techniques to deal with or impute missing data: Casewise and Pairwise deletion, the Missing Indicator Method, Mean Substitution, Regression-based Imputation, Expectation Maximization (EM), and Multiple Imputation. They used a real dataset involving customer satisfaction for a bank, and induced missingness. After deleting values to induce missingness, they estimated regression models and compared the results to the same models prior to having missing data. They determined that Multiple Imputation appeared to be the best performer in terms of reducing bias and generally was more realistic in terms of standard errors. The Missing Indicator Method and Overall Mean substitution were generally biased, as the authors expected. Point estimates of EM worked well with regression, however SPSS’s imputed dataset was biased. Pairwise deletion performed well in this experiment in estimating stable beta coefficients.
Making MaxDiff More Informative: Statistical Data Fusion by way of Latent Variable Modeling (Lynd Bacon, YouGov/Polimetrix, Inc., Peter Lenk, University of Michigan, Katya Seryakova and Ellen Veccia, Knowledge Networks): Lynd demonstrated three different ways to think about coding and estimating MaxDiff data: differences coding, coding as two separate choice tasks, and rank imputed exploding logit. All three methods produced very similar results. The authors then turned their attention to a weakness in MaxDiff experiments: the scores are scaled with respect to an arbitrary intercept (rather than a common origin) for each respondent. This makes it hard to compare a single score from one respondent to a single score from another. They applied a different model (cutpoint model for ratings) which allows them to estimate the scores for items on a common scale with a common origin. They demonstrated how using the new model can improve the ability of researchers to identify respondents to target according to overall preference for a feature. Another point they emphasized is that the lack of scale origin issue also extends to attributes within standard discrete choice methods. The new model can be applied in those situations as well.
Endogeneity Bias—Fact or Fiction? (Qing Liu, University of Wisconsin, Thomas Otter, Goethe University, and Greg Allenby, Ohio State University): In theoretically proper applications of regression modeling, the independent variables are truly independent. However, in some market research applications, the independent variables are not truly independent. Examples include sequential analysis, time series models with lagged dependent variables, and Adaptive Conjoint Analysis (ACA). Greg suggested that endogeneity bias will matter whenever an adaptive procedure is used to learn about respondents (so that informative questions can be determined) and these data are excluded from analysis. However, with ACA, all of the information from each respondent is included in the estimation. Endogeneity bias only depends on whether you rely on the likelihood principle, and therefore, explained Greg, “being Bayes” or not matters. The presence of endogenously determined designs in ACA doesn’t affect the likelihood of the data. Although a small degree of bias is introduced in ACA due to endogeneity, the bias is typically quite small and ignorable.
CBC/HB, Bayesm and other Alternatives for Bayesian Analysis of Trade-off Data (Well Howell, Harris Interactive): HB has become a mainstream tool for analyzing results of DCM and related techniques (such as MaxDiff). There are a number of tools available for HB estimation, including Sawtooth Software’s CBC/HB product, bayesm (R package), WinBUGS, and Harris Interactive’s Hlhbmkl model. Well used three data sets to compare the different tools in terms of in-sample and out-of-sample fit. The speed of the different systems varied quite a bit, with CBC/HB being significantly faster than the other methods. Both the in-sample and out-of-sample fit was strongly affected by the tuning of the priors (the amount of shrinkage permitted). Tools other than CBC/HB offer some more advanced diagnostics and model specifications, including Gelman diagnostics for convergence, and respondent covariates in the upper level model.
Respondent Weighting in HB (John Howell, Sawtooth Software): When samples include subgroups that have been oversampled, it has been reported that this can pose some problems for proper HB estimation within CBC/HB software (which assumes a single, normally-distributed population). John investigated the degree to which this is a problem, and potential solutions. Using simulated data, John demonstrated that when subsamples are dramatically oversampled, it causes the means of smaller groups to shrink disproportionately toward the larger groups. This biases the sample means for the under-represented groups, and harms the accuracy of market simulations. John found that much of the problem is due to diverging scale factors between smaller and larger subgroups. The scale for the oversampled groups is expanded, leading to stronger pull on the overall sample mean. John found that normalizing the scale post hoc can largely control this issue. He also found that implementing a simple weighting algorithm within HB (computing a weighted alpha vector) can potentially improve matters further when there are extreme differences in sample sizes between subgroups. John suggested that other methods he didn’t investigate may improve estimation when some groups are oversampled, including developing models that estimate individual-level scale factors, models that involve less shrinkage (Students-t prior) or models that utilize multiple upper-level models. He concluded that regardless how the shrinkage problem is solved, models should be tuned for scale at either the individual or group level.
|
About Us | Contact Us | Privacy | FAQ's | Forum |
|
© 2008 Sawtooth Software, Inc. All rights reserved. |