Technical Papers  Downloads  Pricing & Ordering  Your Cart


 
Email to a Friend Printer Friendly

SS Winter 2008


Update on Adaptive CBC

At the 2007 Sawtooth Software Conference, we presented our latest research into Adaptive CBC. Our current stream of research is a significant departure from earlier Adaptive CBC approaches we have tried and described at earlier conferences. And, we’re happy to say, this approach seems to work better than the traditional CBC for complex studies involving about five or more attributes. Importantly, respondents find the interview more engaging, realistic, and focused on levels more relevant to their choices.

Our approach involves first asking respondents to indicate which product they’d most likely purchase. We’ve formatted that phase as a BYO (configurator) task, but it probably could also be done as an indication of “most likely” or “preferred” levels for attributes. In the second phase of the adaptive interview, we ask respondents to screen product concepts that resemble their “most likely” product. Respondents indicate whether each concept is a possibility or not. After a few choices, if the respondent seems to be using non-compensatory rules (i.e. “must have” or “unacceptable” levels), we identify the possible rule and ask the respondent to confirm or skip the rule. This process of observing respondent choices and following up with the opportunity to define decision rules is repeated. And, of course, if the respondent indicates that a particular level is either required or unacceptable, only products meeting the criteria will be shown throughout the remainder of the interview. In the final phase, we ask the respondent to compare screened-in products using standard CBC choice tasks. This is a round-robin tournament that identifies an overall winning concept.

To complete an online Adaptive CBC survey yourself, visit: www.sawtoothsoftware.com/test/byo/byologn.htm

To this point, we have completed two studies. We have published the findings in the Technical Papers library on our website, in an article entitled: “A New Approach to Adaptive CBC.”

What are we doing now? We are currently wrapping up a third study, and will be presenting the results at the 2008 A/R/T Forum Conference in June. In this latest research, we are testing whether the Adaptive interviews can be made significantly shorter without losing much in terms of predictive accuracy. We’ve implemented an improvement to the experimental designs that may allow us to get away with shorter interviews, so this latest work involves more than just shortening the interview.

We are also in development of a beta version for Adaptive CBC. Because this is such a new approach, it is critical that we obtain more data points and greater experience prior to launching a commercial v1 product. We’ll announce the beta software when it is available, and will be enlisting your help to further test this promising methodology.

Go back to Index


X64 HB: Making Fast Even Faster

(By Walt Williams, Sawtooth Software Engineer)

At the 2007 Sawtooth Software Conference, Well Howell (Harris Interactive) presented a paper detailing a comparison of software to perform HB estimation, including our own CBC/HB v4. While we were quite pleased that CBC/HB had the fastest runtime (Well, your check is in the mail), we are always looking to further improve performance.

In the last two years the computing industry has seen the introduction of 64-bit (x64) desktop computing. AMD began releasing processors supporting x64 in 2003, with Intel following suit in 2004. Over the last few years more chip lines have begun to support x64, and currently all new processors support x64.

While these new processors run 32-bit Windows as well, Microsoft released an x64 version of Windows XP back in April 2005. Consumers could order a new PC with it if they requested it at purchase time, but it wasn’t available in retail outlets and thus the x64 market didn’t grow very fast. In January 2007, Microsoft released Windows Vista which gave consumers two discs for 32-bit and x64, allowing users to choose which to install. Since that time, the number of users running under x64 has steadily increased.

We’ve often been asked if an x64 version of HB is available. Currently CBC/HB v4 is compiled as a 32-bit application. It will run just fine under x64 Windows. The question becomes whether an x64 version of HB would perform better than the 32-bit version. This applies only to x64 Windows since x64 HB will not run on 32-bit Windows.

For the truly geek at heart, let me describe the hardware used for our x64 experiment. We used a Dell Precision 490 with an Intel Xeon Quad-Core processor running at 1.86GHz, two 80 GB 10K RPM drives using RAID 0, running Windows Vista Business x64. While the processor has 4 cores, CBC/HB is a single-thread application (we are also researching multi-threading). We performed utility estimation initially with 2GB of RAM, which would be fairly typical. However, x64 bit Windows allows the ability to use more than 3GB, so we upgraded the system to 6GB and reran the computations to see what could be gained by having more memory.

Now I’ll describe the experiment. We selected a dataset with 1019 respondents who answered 24 choice tasks each regarding 5 attributes involving 3-15 levels each. Under main-effects (base case), CBC/HB would estimate 34 parameters for each respondent. We would consider this a moderately sized dataset and fairly common for CBC/HB. Our performance measure is the iteration rate, defined as seconds per iteration. We compared the 32-bit iteration rate to our x64 test engine, which I will describe next.

The CBC/HB v4 engine is a 32-bit component written in the programming language C++. To create the x64 version, we made no changes to the existing code and simply compiled it as x64. While not part of the experiment, we tested the engine to make sure that conversion to x64 did not lead to different utilities from the 32-bit version. In a side-by-side test given the same random seed, both produced identical results. Because we did not need to change the code, we are confident that the results reflect only the difference between 32-bit and x64.

Using CBC/HB v4, we created the base case along with five variations shown in the table below. To run the estimation, we created a special application (using C#, a .NET language) that would run both the 32-bit and x64 engines, to further ensure there was no bias towards one or the other. We then performed each utility estimation run in 32-bit and x64. The results are listed below:

In each case, we noted how many iterations we ran. Although the utilities had not likely converged in most cases, the iteration rate was stable enough to estimate performance, which was the target of the experiment. We also note the build file size, which relates to whether the data files would reside in memory or were too big and were used off the disk (which is also noted).

In the first three utility runs, we used the original 1019 respondents and increased the number of parameters by adding interaction terms. In the next three cases, we repeated the first three runs, but we replicated each respondent 10 times, resulting in 10190 respondents. In all cases, we note the percent less time that the x64 version required to perform iterations.

The first thing to note is that the x64 version was faster than the 32-bit version in every case. The gains in the 2GB block appear to be more pronounced (with one exception) than with 6GB, but looking carefully we see that the 32-bit rates increase significantly more between 2GB and 6GB than the x64 rates do. This is likely related to the way that Windows manages its own resources. Under 6GB it appears to be able to manage 32-bit applications much more efficiently than it did with 2GB.

The exception occurred with the last case under 2GB. This is an extreme case where the build file size actually exceeded the RAM by over 1.5x. In this case, the machine spent most of its time in resource management (called thrashing), and so the iteration rate was very slow in both 32-bit and x64.

In both the 2GB and 6GB blocks, the last two cases ran the build file from disk instead of memory, although it would appear that (except the previously mentioned case) they should run in memory. While there might have been enough memory to do so, Windows rejected the request to run them in memory. Unfortunately, that can’t be changed, but the operating system was able to cache parts of the files in memory, and under 6GB it was able to do enough of that to make the iteration rates reasonable.

Probably the most encouraging aspect of this experiment is that you can expect a significant benefit running an x64 bit HB on an x64 machine with generous amounts of RAM. We expect an x64 version of CBC/HB sometime in 2008, and we hope to increase performance in other ways as well.

Go back to Index


Breathing New Life into Sawtooth Software’s Cluster Analysis

Very soon, we’ll be releasing a new cluster analysis package that includes Cluster Ensemble Analysis, a relatively new methodology for improving the quality of cluster solutions.

One of the most enigmatic of Sawtooth Software’s products is the CCA system for convergent cluster analysis (k-means). First released in 1988, it has one of the most loyal and enthusiastic group of users. But compared to our more popular systems, it hasn’t been widely adopted. Perhaps it is because other well-known statistical packages already offer cluster routines. Perhaps it is because many analysts don’t realize the pitfalls in using traditional cluster routines. Or maybe it is because of CCA’s antiquated, clunky software interface.

CCA established a devoted following due to the way it repeats the k-means solutions from many, intelligently drawn starting points. It selects the one best run that is the most reproducible of the replicates. This shields the analyst from accepting a poor solution due to an unlucky choice of starting points. Importantly, how reproducible a cluster solution is reflects on how naturally the data can be segmented into the number of groups the analyst requested. Thus, reproducibility is useful from a diagnostic standpoint to help determine an appropriate number of clusters.

We are enthusiastic about a relatively new development called cluster ensemble analysis that provides even better solutions than the previous CCA approach. We are grateful to Joe Retzer and Ming Shan of Maritz Research for calling our attention to this development at the 2007 conference.

Cluster ensemble analysis originated in the machine learning and data mining fields, and is commonly attributed to Strehl and Ghosh (2002). Instead of picking the one best cluster solution from a set of available solutions, it develops a consensus solution based on all the information in the ensemble of cluster solutions. The consensus solution is usually different from all of the available cluster runs, and it usually represents a superior cluster solution. We have tried it, and indeed it does.

We are releasing a new version of our cluster analysis package that of course brings it up to Windows usability standards. This new package incorporates our own flavor of cluster ensemble analysis (as well as the earlier method supported by CCA). The current plan is to name the new software Convergent Cluster & Ensemble Analysis (CCEA). We expect these changes to breathe new life into the package, and we hope that our users come to recognize the benefit of using CCEA instead of older approaches.

The Convergent Cluster & Ensemble Analysis package is currently in a beta test phase, and should be available soon for purchase. We will be presenting results of our tests of ensemble analysis versus the older CCA methodology at the joint SKIM-Sawtooth Software Conference and Training, in Barcelona, May 26-28.

Go back to Index


Three Ways to Treat Overall Price in Conjoint Analysis

(By Bryan Orme, President, Sawtooth Software)

This article discusses three ways to treat overall price in traditional ratings-based conjoint analysis or discrete choice (CBC) studies:

  • Traditional Approach
  • Conditional Price
  • Continuous Price

The traditional approach is the easiest to manage, but the other two techniques offer benefits for more advanced applications in specialized situations. Because our recent work with Adaptive CBC uses the Continuous Price approach (“A New Approach to Adaptive CBC,” Johnson and Orme 2007), it was important that we do some investigation into the stability of price estimates under Continuous Price. Those simulation results are reported at the end of this article.

Traditional Approach, with Price as Separate Attribute

In conjoint analysis, the typical approach to price is to include it as a separate attribute in the study design. For example, if we were studying laptop computers, we might include the following attributes:

  Dell
  HP
  Toshiba

  1 GB RAM
  2 GB RAM
  4 GB RAM

  80 MB Hard Drive
  120 MB Hard Drive
  160 MB Hard Drive

  2.0 GHz Processor
  2.5 GHz Processor
  3.0 GHz Processor

  $700
  $1,000
  $1,500

With this traditional approach, we vary each attribute independently of the others (an orthogonal design). Level balance is achieved if each level within each attribute appears an equal number of times. Designs that are both level balanced and orthogonal are optimally efficient for estimating the part-worth utilities with precision (assuming respondents answer using a simple additive model). In the example above, we could estimate part-worth utility values corresponding to each of the three levels of price (a part-worth utility function), or we could estimate a single linear term to reflect the slope of price (a vector utility function). Most researchers choose the part-worth utility function, because it is more flexible and can account for non-linearities in the price function. However, it comes at the cost of increased parameters to estimate.

Despite the robust statistical qualities of orthogonal designs, some researchers and respondents have been bothered that product concepts with the best features sometimes are shown at the lowest prices (and products with the worst features are sometimes shown at the highest prices). These combinations seem illogical and often lead to obvious (dominated) choices in the questionnaire. Such questions are less informative and lead to a less realistic experience for the respondent.

Conditional Pricing

Conditional pricing is one approach for increasing the realism of the concepts shown to the respondent. With conditional pricing, incremental amounts are added for premium brands or premium features, so enhanced products are generally shown at higher prices. We still treat price as a separate attribute with just a few levels (such as three to five). But, those levels of price are described with different absolute dollar amounts, depending on product characteristics. Probably the most common use among Sawtooth Software users is to associate different brands or different brand/package size combinations with different price ranges. For example, the premium brand might be shown at $10, $15, or $20 whereas the discount brand is shown at half those prices: $5, $7.50, or $10. In the design matrix, we still treat price as a single attribute with three levels, even though a larger number of actual prices are displayed.

With conditional pricing, we use a price lookup table to determine actual prices to show in the questionnaire, based on the characteristics of each product. To create the lookup table, we first decide how many attributes (including the price attribute) will participate in the conditional pricing relationship. Our current CBC software permits price to be conditional on up to three other attributes.

Let’s assume with our previous laptop PC example that we wanted to make price conditional on RAM, Hard Drive, and Processor. We first start by choosing price premiums associated with those attribute levels. These premiums will not be explicitly shown to respondents next to each attribute level, but will be used just to determine the overall average price. Only a single total price is shown within the product concept.

Example 2: Conditional Price

Dell  
HP  
Toshiba  
  
1 GB RAM +$0
2 GB RAM +$100
4 GB RAM +$200
  
80 MB Hard Drive +$0
120 MB Hard Drive +$100
160 MB Hard Drive +$200
  
2.0 GHz Processor +$0
2.5 GHz Processor +$200
3.0 GHz Processor +$400
  
Low Price (-30%)
Medium Price (Average Price)
High Price (+30%)

Let’s assume that the base price for the laptop is $750. We construct a look-up table to determine the prices that we should show on the screen for each possible product combination at each of the three prices. This table would have a total of 3 x 3 x 3 x 3 = 81 rows. The first five rows of the price lookup table look like:

RAM Hard Drive Processor Price Text to Display
1 1 1 1 $525
1 1 1 2 $750
1 1 1 3 $975
1 1 2 1 $665
1 1 2 2 $950
... ... ... ... ...

For example, row four of the table specifies what price should be displayed when a product with 1 GB RAM, 80 MB Hard Drive, and 2.5 GHz Processor appears at the low price. The price to show is $665. This is determined by taking the base price ($750) plus the price increments associated with the three conditional attributes, and then reduced by 30%.

The benefit of conditional pricing is that more reasonable prices are shown to respondents and in the simplest case price may be estimated using main effects for (in this example) the three levels of price in the design. And, critically, the design is still orthogonal and unencumbered by prohibitions.

There are a few challenges when working with conditional pricing:

  • We no longer can interpret the main effect utilities for attributes involved in the conditional relationship independent of price. For example, we cannot interpret the levels of processor speed as the preference for each of its levels holding everything else constant. The utility of each level of processor speed is confounded with the incremental price attached to that level. The levels must therefore be interpreted as the preference for levels of processor speeds given the average prices shown for those levels. So, it’s very possible to achieve a higher average utility for 2.0 GHz Processor @ +$0 than 2.5 GHz Processor @ +$200, if respondents on average did not feel that it was worth the extra $200 to have the faster processor.

  • The estimation of part-worth utilities works well when the prices shown to respondents are based on a certain percentage increase or decrease from the average price. However, the resulting prices often need to be rounded to the nearest $100 (or made to end in a “9” for consumer packaged goods). Quite small relative changes in price to round to a more presentable number don’t pose much problem. But, significant price changes due to rounding introduce error in the utility estimation for the price attribute.

  • If the conditional pricing table is not built in a consistent, proportional manner as specified here (or if rounding resulted in significant deviations from the original formula-based values), it may become impossible to model the data correctly using the conditional price approach unless imposing interaction effects. Interaction effects may lead to overfitting.

  • If respondents oversimplify by paying attention only to prices, the preference for lower levels of performance attributes will be biased upward.

  • The current software limitations specify that no more than three attributes (in addition to price) may be included in the conditional relationship.

Continuous Price

Another approach not currently offered in Sawtooth Software products is continuous price. (Even though it’s not a supported feature of the software, a power user can still do it, though it requires being able to reformat the text-only studyname.CHO file.) Continuous price differs from conditional price is two ways. First, it generalizes the idea of conditional pricing (beyond the software limitations of just three attributes). Second, it estimates the effect of overall price as a linear coefficient, rather than as a part-worth utility function. As with conditional pricing, we approach the problem by considering a base price for the product as well as fixed price premiums for levels of non-price attributes (plus or minus some overall independent price variation). If we consider the example from the previous section, the base price is $750 and the most expensive product option would be $1,550 (prior to varying price by some independent amount).

As with the first two pricing approaches, we also only show a single overall price within the product concept, rather than showing prices attached to each attribute level. The only difference between conditional price and continuous price is in the coding of the design matrix, where price is coded in a single column as a continuous variable. Typically, a single price coefficient is estimated based on linear price (or the natural log of price). More complex curve fitting might be considered, as well as piecewise coding. These approaches may provide better fit to the data, but risk overfitting and also introduce some correlation in the independent variable matrix.

Because values in the price column of the design matrix are determined from information in other columns of the design matrix, that column would be linearly dependent on other columns if we didn’t do something to break up that dependence. We do this by adding random variation to the prices.

The benefits of summed price relative to conditional pricing include:

  • In contrast to conditional pricing, the utility of each feature level is estimated independently of any price premium associated with the level. Thus, we would expect the utilities for levels of processor speed to look just like they would when using the standard conjoint approach with no conditional pricing.

  • Since we are estimating price as a continuous function, there is no worry about whether rounding prices to the nearest “9” or the nearest $100 will lead to errors in fitting the data.

But, these benefits come with a serious potential drawback: the price attribute is positively correlated with any attributes that involve incremental prices in the study, leading to less precise estimates of all effects, but most especially the price coefficient. The amount of correlation among attributes depends on the magnitude of the random variation in overall price as well as the size of the base fixed component of price relative to the incremental prices associated with each feature level. In the worst case, with no random variation, continuous price is simply the sum of the prices associated with the attribute levels. In that case, price would be perfectly predicted by a linear combination of the attributes and the design would be deficient. But, if we additionally vary the overall price by a large enough random amount (see guidelines further below), we can obtain sufficient precision of the estimates for overall price sensitivity as well as the other features in the study.

Continuous price is not an option in the current implementations of Sawtooth Software’s CBC or CVA products. However, a power user could implement it for either type of study and estimate the results properly using Latent Class, CBC/HB or, in the case of a CVA study, HB-Reg.

The final section of this article includes a simulation study to investigate what variation should be specified in the overall price attribute to lead to reasonable estimates with continuous price.

Simulation Study

As mentioned earlier, the amount of random variation given to continuous price has a direct impact on the efficiency of the estimates. To provide guidelines regarding how much random variation in price we should include in continuous price designs, we conducted a synthetic study with 300 simulated respondents. Each (computer generated) respondent received 10 choice tasks that were answered randomly. There were five attributes, each with three levels, along with overall price. The overall price was found using the base price plus incremental prices associated with each of the other five attributes. Then, if we were varying the overall price as much as +/-10%, we used five distinct price disturbances within that range of -10%, -5%, +0%, +5%, and +10%. (One could choose any number of discrete price variations within the +/-10% range.)

Continuous price usually consists of a fixed base price plus additional upgraded feature costs. If the base price is relatively large compared to the incremental prices attached to upgraded features, then (after disturbing overall price by the price variation) the resulting price attribute will be relatively uncorrelated with the linear combination of the other attributes. However, if the base price is relatively small compared to the incremental prices for the other levels in the study, then the resulting overall price will be more strongly correlated with a linear combination of the other attributes. Therefore, we needed to consider how different relative sizes of the fixed base price relative to incremental prices would affect the results.

Study Procedure and Results

We simulated different amounts of independent price variation on continuous price, from as low as +/-5% to as much as +/-40%. We estimated price as a single coefficient, to be applied to the natural log of total price. Prior to estimating (with aggregate logit) for each price condition, we normalized the log price variable to have a variance of unity, so that the standard errors would be comparable across the different simulation runs that featured substantial differences in the amount of absolute price variation. For each study condition, we recorded the standard error for log price. The precision of the estimated price coefficients relative to a standard three-level attribute in a parallel study without continuous price is shown in the following chart.

Recommendations

According to our simulations, the precision of the utility estimate for log price depends strongly on both the amount of added random price variation and the relative size of the constant base price compared to the feature-based prices. When the fixed base price of the product is 3/4 the total average price, as little as +/-10% price variation on continuous price will achieve precision of estimates that are nearly 50% as efficient as a standard 3-level attribute coded as a part-worth function. In terms of absolute magnitude, the standard error for price was 0.038 compared to 0.026 for levels from the standard three-level attribute. Based on additional simulations, we found that a standard five-level attribute (if placed within the same study instead of continuous price) would also achieve standard errors of estimates for its levels of about 0.038. Generally, in practice, we’d be comfortable with such precision.

However, if the base price only accounts for 1/3 of the total average price of the product (most of the price is explained by the incremental feature prices), then we’d need to vary continuous price +/-30% to achieve similar precision. Based on this simulation study, we can make general recommendations for continuous price:

Recommended Minimum Independent Price Variation for Continuous Price

If base price is 3/4 of total average price: +/-10%
If base price is 1/2 of total average price: +/-20%
If base price is 1/3 of total average price: +/-30%

Let’s apply the recommendations above to the laptop computer example we introduced earlier, the base price was $750, and the most expensive product (prior to introducing any independent random price variation) was $1,550. The average price falls about half-way between that interval, or at $1,150. Therefore, the fixed price component is 65% (750/1150) of the total average price. Conservatively, we would recommend varying continuous price by at least +/-20% in this situation, though one could interpolate between the functions in the relative precision chart above to justify variation of at a minimum +/-15%.

Of course, choosing the price variation also depends on the client’s needs and the market simulations to be run. You should avoid extrapolating beyond the total range of price included in the questionnaire. Increasing the random price variation will improve your ability to simulate extreme priced products, at the risk of making the questionnaire present products that seem to have unreasonable prices, given their features.

Go back to Index


Summary of Findings from the 2007 Sawtooth Software Conference

The thirteenth Sawtooth Software Conference was held in Santa Rosa, California, October 17-19, 2007. The summaries below capture some of the main points of the presentations. We hope that these introductions will help you get the most of the 2007 Sawtooth Software Conference Proceedings.

The Weakest Link: A Cognitive Approach to Improving Survey Data Quality (David G. Bakken, Harris Interactive): David reminded us that our inferences and theories of consumer behavior are only as good as the data on which they are based. As researchers, we often apply conventional wisdom, “judgment” and some empirical evidence in designing questionnaires. But, often in our haste to take studies to field, we fail to pretest and refine our instruments. David reviewed previous work by psychologists regarding how humans interact with surveys. The four step model of survey response involves comprehension, retrieval, judgment, and response. He advocated the use of “Think Aloud Pre-Testing” in which respondents (10-20 per wave) verbalize their thoughts while answering survey questions. These tests should be conducted over multiple days to allow survey changes to be implemented and re-tested. Based on many such tests, David offered some observations regarding how respondents interact with web-based surveys and how they can be improved. Current problem areas include: grid questions, survey navigation, error messages, multi-lingual surveys, and CBC questionnaires.

Evaluating Financial Deals Using a Holistic Decision Modeling Approach (Paul Venditti, Don Peterson, and Matthew Siegel, General Electric): Paul described a very interesting approach that he and his co-authors are implementing within GE to evaluate complex financial deals. In the past, analysts have spent many hours evaluating financial deals and presenting the details of those deals to a committee of three individuals. Paul described how the characteristics of those deals could be defined using about 20 “conjoint” attributes. A modified ACA survey was developed to study three key individuals at GE who approve deals. The standard stated importance question in ACA was substituted with a constant-sum question implemented via an Excel worksheet. The final part-worth utilities were further modified by implementing a few non-compensatory rules (red flags). A market simulator based on the three respondents was found to be highly predictive of whether deals were approved or rejected in the months following the surveys (accuracy of about 80%). Paul’s work demonstrated that effective conjoint models (to profile tiny populations) can be built using tiny sample sizes. Conjoint analysis can provide good data for implementing sophisticated decision support tools in non-traditional contexts.

Issues and Cases in User Research for Technology Firms (Edwin Love, University of Washington School of Business, and Christopher N. Chapman, Microsoft Corporation): Edwin and Christopher described how conducting market research for technology products presents unique challenges. For example, innovative features are often not well-understood by respondents, and different user groups will have different levels of understanding. Also, features might not actually yet exist while the research is being conducted. The presenters commented that vague descriptions of attributes such as “easy setup” can skew user responses (toward expressing strong preference for nondescript features), and the results create the illusion of specific value where none may exist. They further recommended segmenting respondents based on product experience: owners vs. intenders. Edwin and Christopher illustrated the challenges of conducting market research for technology products via three case studies: a digital pen project, a webcam, and a digital camera.

Minimizing Promises and Fears: Defining the Decision Space for Conjoint Research for Employees versus Customers (L. Allen Slade, Covenant College): Conjoint analysis can be a valuable tool in both consumer and employee research. However, the researcher must recognize the key differences in how the firm interacts with the respondents. Allen affirmed that customers are less interdependent with the firm than are employees. And, different employees (depending on role and experience/training) are more highly interdependent with the firm than others. With employee research, the worry is of creating false promises of rewards or unwarranted fears of takeaways. Allen suggested that researchers ask themselves three key questions prior to including something in a conjoint survey for employees: 1) Would we be willing to actually do this?, 2) How does this intervention compare to the others we are considering?, and 3) How would an employee or customer react to taking this survey? Using an actual case study at Microsoft (total rewards optimization), Allen illustrated how applying these three questions led to effective research without undue promises or fears.

A Cart-Before-the-Horse Approach to Conjoint Analysis* (Ely Dahan, UCLA Anderson School): With traditional conjoint studies, respondents are often asked to complete long surveys, they are required to rate products they don’t like, and the resulting part-worth utilities often contain reversals in the utilities. Ely described a novel, computer-administrated and adaptive method of employing a traditional full-profile conjoint design. Rather than estimate part-worth utilities after respondents take the surveys, CARDS (conjoint adaptive ranking database system) begins with a researcher-constructed database of typically thousands of potential sets of consistent part-worth utilities. Respondents are shown a set of product concepts and asked to choose which products they prefer. After the respondent provides a few answers, the database of utilities is queried to determine if certain product concepts that haven’t yet been evaluated are clearly inferior (and should not be chosen next in order). Those products are deleted from the screen, allowing respondents to focus on those product concepts that are relevant to identifying which set of utilities best fits them, while forcing respondents to maintain consistent ordering. The benefit is much shorter questionnaires. The downsides are that early answers matter a lot, and there is no real error theory. Plus, the quality of the results depends on how well researchers can develop the database of potential sets of utilities.

(*Winner of Best Presentation award, based on attendee ballots.)

Two-Stage Models: Identifying Non-Compensatory Heuristics for the Consideration Set then Adaptive Polyhedral Methods within the Consideration Set (Steven Gaskin, AMS, Theodoros Evgeniou, INSEAD, Daniel Bailiff, AMS, and John Hauser, MIT): Steven reviewed the scientific evidence that suggests that people buy products by first forming a consideration set and then choosing a product from within the consideration set. This two-stage approach helps people deal with a large number of alternatives in the choices they face. By reflecting this process in our choice models, Steven argued that we can more accurately model choices, create more realistic and enjoyable surveys, and handle more features than conventional CBC. He presented a survey design in which respondents may use non-compensatory (cut-off rules) to form consideration sets. Respondents are then asked to tradeoff considered products within a more standard-looking CBC task. He and his co-authors employed FastPace CBC to estimate the utilities for the n most important compensatory features for each respondent. Steven reported results showing that respondents preferred the adaptive survey over standard CBC.

A New Approach to Adaptive CBC (Rich Johnson and Bryan Orme, Sawtooth Software): Existing CBC questionnaires have weaknesses: they are viewed as tedious and not very focused on the particular needs of each respondent. The experimental plans have assumed compensatory behavior, and previous research has shown that many respondents apply non-compensatory heuristics to answer conjoint questionnaires. Rich and Bryan presented a new technique for adaptive CBC that helps overcome these issues. Their approach mimics the purchase process of formulating a consideration set using non-compensatory heuristics (such as “must have” or “must avoid” features), followed by a more careful tradeoff of alternatives within the consideration set using compensatory rules. This new approach involves three core stages: 1) Build-Your-Own (BYO) Stage, 2) Screening Stage, and 3) Choice Tasks Stage. They conducted a split-sample experiment comparing the new approach to traditional CBC. They found that respondents liked the adaptive survey more and felt it was more realistic—even though it took about double the time as traditional CBC. Furthermore, part-worths developed from ACBC were more predictive of holdout tasks than traditional CBC, despite the methods bias in favor of CBC for predicting the CBC-looking holdouts.

HB-Analysis for Multi-Format Adaptive CBC (Thomas Otter, Goethe University): The three-stage interview proposed by Johnson and Orme is innovative, but the formulation of a model extracting the common preference information is a challenge. Thomas first showed that such a model is required, as simply discarding any of the data collected before the CBC part results in inconsistent inferences in an HB setting. Thomas then investigated different models: a multinomial likelihood for all parts of the interview allowing for task-specific scale factors, task-specific “wiggles” in the preference vector using the same likelihood, a binary logit likelihood for the screener part and a multichoice likelihood for this same part. Thomas found that the scale factor did vary considerably between the sections. However, accounting for task specific scales had only a small effect on the predictive ability of the models. Moreover, his results suggest that a binary logit or a multichoice likelihood for the screener part of the interview are preferable to the explosion into multinomial choices both in terms of the implied story about how the data are generated and the empirical fits.

EM CBC: A New Framework for Deriving Individual Conjoint Utilities by Estimating Responses to Unobserved Tasks via Expectation-Maximization (Kevin Lattery, Maritz Research): Kevin demonstrated how EM algorithms can be used to estimate individual-level utilities from CBC data. EM is often applied in missing values analysis. In the context of CBC, each respondent could be viewed as having been shown all the tasks in a very large design plan, but having completed only a subset of them. The missing answers are imputed via EM. Once missing answers have been imputed, there is enough information available to estimate part worths for each individual. Utility constraints may be implemented as well. Kevin faced a few challenges in implementing EM for CBC. He found that if he allowed EM to iterate fully to convergence, overfitting would occur. Therefore, he relaxed the convergence criterion. Kevin also found that the estimated probabilities for the tasks respondents did versus those that were missing varied in their means and standard deviations. So he adjusted the results from each task so that means and variances of the missing data were comparable to the observed data. He then repeated the EM process again until the missing data converged. Kevin compared utilities estimated under EM to those estimated via HB, and found that the EM utilities performed as well or better than HB utilities for three data sets.

Removing the Scale Factor Confound in Multinomial Logit Choice Models to Obtain Better Estimates of Preference (Jay Magidson, Statistical Innovations, and Jeroen K. Vermunt, Tilburg University): Jay reintroduced the audience to the issue of scale factor. The size of the parameters in MNL estimation is inversely related to the amount of certainty in the respondents’ choices. Because different groups of respondents may have different scale factors, it is not theoretically appropriate to directly compare the raw MNL estimates between groups. Jay showed how such comparisons can lead to incorrect conclusions. He then turned attention toward an extended Latent Class choice model to isolate the scale parameter. Using that model, he showed how latent class segmentations can differ for real data sets as compared to the generic latent class model that doesn’t separately model scale. In one particular comparison, Jay found that the amount of time respondents spent answering a CBC questionnaire was directly related to segment membership from standard latent class estimation (without estimating the scale factor). Jay also demonstrated how scale estimation can be incorporated into DFactor Latent Class models. Jay concluded that removing the scale confound in latent class modeling will result in improved estimates of part-worths and improved targeting to relevant segments based on an improved understanding of segment preferences and levels of uncertainty.

An Empirical Test of Alternative Brand Measurement Systems (Keith Chrzan and Doug Malcom, Maritz Research): Keith and Doug presented results from three commercial studies that compared different ways of collecting brand image data. Those methods included: Likert ratings, comparative ratings, MaxDiff, pick any, semantic differential, and yes/no scaling. They argued that the brand image measurement system should produce 1) credible brand positions (face validity), 2) strong differences among brands (discriminant validity), and 3) powerful predictions of brand choice (predictive validity). The first two research studies they reported on demonstrated that Likert ratings and pick any data were generally inferior to the other methods. The third study they reported compared semantic differential, comparative ratings, yes/no, and pick any data. They concluded that, of those four methods, comparative ratings had the most discriminating power, followed by semantic differential. Pick any data measured little beyond the halo effect (a complicating issue wherein brands/objects liked overall tend to get higher ratings across the board on the attributes). To help control for the Halo Effect, the authors double-centered the scores prior to making comparisons.

Alternative Approaches to MaxDiff with Large Sets of Disparate Items–Augmented and Tailored MaxDiff (Phil Hendrix, immr and Stuart Drucker, Drucker Analytics): Phil and Stuart investigated some enhancements to standard MaxDiff questionnaires to help deal with large numbers of items while still achieving strong individual-level scores. The authors argued that with more than about 40 items, MaxDiff becomes very tedious for respondents if individual-level estimates are required. To deal with this issue, the authors proposed that respondents first perform a Q-Sort task, wherein they drag-and-drop items into one of K buckets (they used 4 buckets in their research). The information from the Q-Sort task can be added to the MaxDiff information to improve the estimates. The Q-Sort task can also be used to create customized MaxDiff questions that principally draw on items of greatest preference/importance. Phil and Stuart conducted a split-sample study comparing standard and two forms of augmented MaxDiff exercises. They found that overall the aggregate parameters were very similar across the methods. But, both forms of augmented MaxDiff exercises outperformed ordinary MaxDiff in terms of holdout predictions. They also found that respondents found the Q-Sort + MaxDiff methodology more enjoyable than standard MaxDiff alone.

Product Optimization as a Basis for Segmentation (Chris Diener, Lieberman Research Worldwide): Chris motivated his presentation by reviewing the strategic goals and outcomes of traditional segmentation approaches. With attitudinal segmentations, one finds strong segments in terms of attitudinal differences, but those differences often do not translate into segments that differ strongly in terms of product preferences. With segmentation based on product features, the hope is that the segments have targetable differences and that the preferences translate to profitable product line decisions. If product optimization is used as the focus, then there is a stronger linkage with profitable product line decisions. Of all the methods of optimization, Chris stated that he prefers Genetic Algorithms. But, Chris pointed out that segmentation based on product optimization provides no guarantee that the segments will demonstrate targetable differences in terms of attitudes, media usage, or demographics. To improve the odds that the segments are useful, Chris advocated data fusion processes which combine information from attitude segmentation and product optimization segmentation, especially when the strategic priority is on product development and you are confident in being able to find an attitudinal story.

Joint Segmenting Consumers Using both Behavioral and Attitudinal Data (Luiz Sa Lucas, IDS Market Analysis): Luiz discussed segmentation methods that incorporate both behavioral and attitudinal data. Behavior data alone are often not satisfactory to use in segmentation schemes, because the segments do not necessarily map to anything useful in terms of descriptive demographics or attitudinal data. By the same token, attitudinal data alone are not sufficient because attitudes don’t necessarily correlate strongly with behaviors. Luiz reviewed multiple procedures for incorporating both behavior and attitudinal data in segmentation, including Reverse Segmentation, Weighted Distance Matrices, Concomitant Variables Mixture Models, Joint Segmentation, and LTA models. Luiz finished by discussing different fit metrics for determining the appropriate number of clusters.

Defining the Linkages between Cultural Icons (Patrick Moriarty, OTX and Scott Porter, 12 Americans): Patrick and Scott described a mapping methodology in which cultural icons (celebrities, brands, politicians) are placed within a perceptual map. The data are in part driven by a MaxDiff questionnaire. The goal is to provide a unique understanding of the strength of linkage between brands, personalities, and media properties based on consumer attraction. Their research identified that religion and marital status are the two social identities that on average most define individuals. But, identity may also be measured by the degree to which people express connection with cultural icons. The authors explained that cultural icons can also be measured and characterized, in terms of four key components: Recognition, Attraction, Presence, and Polarization. As an example of how their mapping methods can drive strategy, they showed relationships between either Hillary Clinton or Rudy Giuliani, segments of the population, and popular consumer brands.

Cluster Ensemble Analysis and Graphical Depiction of Cluster Partitions (Joseph Retzer and Ming Shan, Maritz Research): Joe described a relatively new technique in unsupervised learning analysis called Cluster Ensemble Analysis that has been suggested as a generic approach for improving the accuracy and stability of cluster algorithm results. Cluster ensembles begin by generating multiple cluster solutions using a “base learner” algorithm, such as K-means. Multiple solutions may be generated in a variety of ways. The basic idea is to combine the results of a variety of cluster solutions to find a consensus solution that is representative of the different solutions. Joe further demonstrated how the quality of cluster solutions can be graphically depicted in terms of Silhouette plots. The silhouette shows which objects lie well within the cluster and which are somewhere in between clusters. He finished by showing how cluster ensemble analysis can improve cluster results for a particularly difficult sample data set that has non-spherical clusters.

Modeling Health Service Preferences Using Discrete Choice Conjoint Experiments: The Influence of Informant Mood (Charles Cunningham, Heather Rimas, and Ken Deal, McMaster University): Chuck presented the results of a research study that investigated how depression influences performance on discrete choice experiments designed to understand patient preferences. Previous evidence in the literature suggests that people with depressive orders can have impaired information processing and a related host of decision making deficits. Because Chuck and his co-authors often use discrete choice experiments in health care planning issues, and because the incidence of depression is relatively high within populations they often survey, these issues were of interest to them. They found that although depression did not increase inconsistent responding to identical holdout tasks (test-retest reliability), it did influence health service preferences and segment membership. Chuck also reviewed basic principles for designing and analyzing holdout questions.

Determining Product Line Pricing by Combining Choice Based Conjoint and Automated Optimization Algorithms: A Case Example (Michael Mulhern, Mulhern Consulting): Mike presented the results of a recent study where the purpose was to develop an optimal pricing strategy for a product line decision. Six price levels were included in the study, and based on the plot of average utilities, there appeared to be two “elbows” in the price function. The elbows seemed to represent optimal pricing points for mid-price and a higher-price products. Mike used the Advanced Simulation Module to conduct optimization searches to maximize revenue. He found that the optimization routines also identified those same two price points as optimal positions. The different optimization algorithms (exhaustive, grid, gradient, stochastic, and genetic) produced identical results irrespective of the starting points (with the exception of the gradient search method, which had some inconsistencies). Mike’s client also asked whether the optimal price points would change depending on different assumptions for the base case. Altering the base case and re-running the optimizations revealed similar recommendations in most cases. Mike was able to report what the client eventually did and how actual sales volume compared to the simulation’s predictions. The client followed some of the recommendations, but ignored others. The sales results suggest that ignoring the recommendations provided by the optimization simulations was costly. A poorly positioned mid-price product foundered, as would have been predicted by the model.

Using Constant Sum Questions to Forecast Sales of New Frequently Purchased Products (Greg Rogers, Procter & Gamble): Greg compared two relatively common methods for measuring buyer intent for an FMCG category: CBC and constant sum allocations (both computer-administered). Not surprisingly, the constant sum allocation (out of 10 purchases) data were more “spiked” on the 0%, 10%, 50%, and 100% allocation probabilities relative to the probabilities projected from the pick-one CBC data. Greg expanded the analysis to include a Dirichlet model (to estimate base trial for a new item) that incorporated the issues of trial and frequency. Greg concluded that analyzing the brand choices from simple constant sum scales using a Dirichlet model results in comparable base trial estimates to those derived by CBC. This finding has implications for researchers that cannot use other methods like purchase intent (requires database to interpret) or CBC (can be relatively complex and costly) to estimate trial for new products.

Replacement Modeling: A Simple Solution to the Challenge of Measuring Adding and Switching in a Polytherapy Choice Allocation Model (Larry Goldberger, Adelphi Research by Design): In pharmaceutical research, doctors sometimes will prescribe multiple drugs to treat particular condition. When this occurs, the standard allocation models that assume that each patient is assigned a single drug therapy is violated. When this happens, the allocations may sum to more than 100%, so the allocation total is no longer fixed. Larry demonstrated a Polytherapy Allocation Model that does not assume that the total sum allocated per task is 100%. The proposed solution models the likelihood that a new product will substitute for an existing product, and does not constrain the sum to 100%. Larry also reviewed other common approaches to the problem, and discussed the limitations. He discussed the common binary logit approach to the problem, and how the cross-effects can often lead to reversals.

Data Fusion to Support Product Development in the Subscriber Service Business (Frank Berkers, Gerard Loosschilder, SKIM Group, and Mary Anne Cronk, Philips Lifeline Systems): Data fusion can involve combining different datasets to learn more than the original datasets had to offer individually. The authors explained how they used data fusion to help develop new strategies with respect to a subscriber service for Lifeline monitoring (the leader in North America for Personal Emergency Response Systems). Specifically, the authors were able to develop a plan of action to approach customers with increased communication regarding specific offers depending on the pattern of signals received from the subscriber. This provided an “early warning system” that would flag subscribers as in danger of deactivating their service. By implementing this system, subscriptions could be prolonged, resulting in greater profitability to the firm. The combination of behavioral patterns and background characteristics gave a better and clearer warning of imminent deactivation, and the type of deactivation, than the separate data sources could provide. Furthermore, the combined information provided greater clarity in deciding what services to offer, and when to offer them to subscribers.

Multiple Imputation as a Benchmark for Comparison within Models of Customer Satisfaction (Jorge Alejandro and Kurt Pflughoeft, Market Probe): Kurt emphasized that many studies must deal with missing data, and the degree of missingness can be significant. Different missing value routines will lead to different degrees of bias and imprecision for statistical estimates. The authors examined a variety of techniques to deal with or impute missing data: Casewise and Pairwise deletion, the Missing Indicator Method, Mean Substitution, Regression-based Imputation, Expectation Maximization (EM), and Multiple Imputation. They used a real dataset involving customer satisfaction for a bank, and induced missingness. After deleting values to induce missingness, they estimated regression models and compared the results to the same models prior to having missing data. They determined that Multiple Imputation appeared to be the best performer in terms of reducing bias and generally was more realistic in terms of standard errors. The Missing Indicator Method and Overall Mean substitution were generally biased, as the authors expected. Point estimates of EM worked well with regression, however SPSS’s imputed dataset was biased. Pairwise deletion performed well in this experiment in estimating stable beta coefficients.

Making MaxDiff More Informative: Statistical Data Fusion by way of Latent Variable Modeling (Lynd Bacon, YouGov/Polimetrix, Inc., Peter Lenk, University of Michigan, Katya Seryakova and Ellen Veccia, Knowledge Networks): Lynd demonstrated three different ways to think about coding and estimating MaxDiff data: differences coding, coding as two separate choice tasks, and rank imputed exploding logit. All three methods produced very similar results. The authors then turned their attention to a weakness in MaxDiff experiments: the scores are scaled with respect to an arbitrary intercept (rather than a common origin) for each respondent. This makes it hard to compare a single score from one respondent to a single score from another. They applied a different model (cutpoint model for ratings) which allows them to estimate the scores for items on a common scale with a common origin. They demonstrated how using the new model can improve the ability of researchers to identify respondents to target according to overall preference for a feature. Another point they emphasized is that the lack of scale origin issue also extends to attributes within standard discrete choice methods. The new model can be applied in those situations as well.

Endogeneity Bias—Fact or Fiction? (Qing Liu, University of Wisconsin, Thomas Otter, Goethe University, and Greg Allenby, Ohio State University): In theoretically proper applications of regression modeling, the independent variables are truly independent. However, in some market research applications, the independent variables are not truly independent. Examples include sequential analysis, time series models with lagged dependent variables, and Adaptive Conjoint Analysis (ACA). Greg suggested that endogeneity bias will matter whenever an adaptive procedure is used to learn about respondents (so that informative questions can be determined) and these data are excluded from analysis. However, with ACA, all of the information from each respondent is included in the estimation. Endogeneity bias only depends on whether you rely on the likelihood principle, and therefore, explained Greg, “being Bayes” or not matters. The presence of endogenously determined designs in ACA doesn’t affect the likelihood of the data. Although a small degree of bias is introduced in ACA due to endogeneity, the bias is typically quite small and ignorable.

CBC/HB, Bayesm and other Alternatives for Bayesian Analysis of Trade-off Data (Well Howell, Harris Interactive): HB has become a mainstream tool for analyzing results of DCM and related techniques (such as MaxDiff). There are a number of tools available for HB estimation, including Sawtooth Software’s CBC/HB product, bayesm (R package), WinBUGS, and Harris Interactive’s Hlhbmkl model. Well used three data sets to compare the different tools in terms of in-sample and out-of-sample fit. The speed of the different systems varied quite a bit, with CBC/HB being significantly faster than the other methods. Both the in-sample and out-of-sample fit was strongly affected by the tuning of the priors (the amount of shrinkage permitted). Tools other than CBC/HB offer some more advanced diagnostics and model specifications, including Gelman diagnostics for convergence, and respondent covariates in the upper level model.

Respondent Weighting in HB (John Howell, Sawtooth Software): When samples include subgroups that have been oversampled, it has been reported that this can pose some problems for proper HB estimation within CBC/HB software (which assumes a single, normally-distributed population). John investigated the degree to which this is a problem, and potential solutions. Using simulated data, John demonstrated that when subsamples are dramatically oversampled, it causes the means of smaller groups to shrink disproportionately toward the larger groups. This biases the sample means for the under-represented groups, and harms the accuracy of market simulations. John found that much of the problem is due to diverging scale factors between smaller and larger subgroups. The scale for the oversampled groups is expanded, leading to stronger pull on the overall sample mean. John found that normalizing the scale post hoc can largely control this issue. He also found that implementing a simple weighting algorithm within HB (computing a weighted alpha vector) can potentially improve matters further when there are extreme differences in sample sizes between subgroups. John suggested that other methods he didn’t investigate may improve estimation when some groups are oversampled, including developing models that estimate individual-level scale factors, models that involve less shrinkage (Students-t prior) or models that utilize multiple upper-level models. He concluded that regardless how the shrinkage problem is solved, models should be tuned for scale at either the individual or group level.

Go back to Index


© 2008 Sawtooth Software, Inc. All rights reserved.