Technical Papers  Pricing & Ordering  Your Cart

 
Printer Friendly

SS Fall 1999


Speakers Lined up for Conference 2000

We hope you've heard by now about the next Sawtooth Software Conference in Hilton Head, SC on March 22-24, 2000. In contrast to other market research conferences, Sawtooth Software Conferences tend to be more practical and down-to-earth. It is a research conference rather than a forum for us to promote our software.

By attending, you'll stay abreast of new developments, make important industry contacts, and learn practical ways to improve your quantitative market research efforts. We treat our attendees well, with delicious food and an open bar during the evening get-togethers. Past conferences have had up to 200 participants, and we expect that many at Hilton Head.

The conference program is beginning to take shape. We thank all of you who took the time to submit abstracts for us to consider. While we do not yet have all the details (much research and work is still in progress!), here are some preliminary topics and speakers:

Web Interviewing: Karlan Witt, IntelliQuest Inc., Torsten Melles, Westfaelische Wilhelms-Universitaet

Improving ACA Results: Peter Williams, KBA Consulting Group

Segmentation/Classification Methods: Luiz Sa Lucas, IDS-Interactive Data Systems, Stuart Drucker,

Choice-Based Conjoint: Keith Chrzan, ZS Associates, Mike Patterson, IntelliQuest, Inc., Karen Buros, The Analytical Helpline

The Number of Attribute Levels Effect in Conjoint Analysis: Marco Hoogerbrugge, SKIM Analytical, Dick McCullough, Macro, Inc.

Predictive Validity: Dick Wittink, Yale University, Joel Huber, Duke University

New Part Worth Estimation Techniques: Kenneth Train, University of California at Berkeley, Rich Johnson, Sawtooth Software

In addition to these presentations, we plan to offer various workshops and tutorials. Hope to see you in Hilton Head!

Go back to Index


CBC Workshop Offered in Dallas

On December 6-7 we will be holding a Choice-Based Conjoint (CBC) workshop at the Sheraton Grand Hotel at the Dallas/Ft. Worth airport. This 1 1/2 day program will be taught by Tom Pilon, Ph.D. It will be a hands-on workshop for a maximum of 15 participants. Attendees must bring their own laptop PCs, as some of the session will be dedicated to using the CBC software and in-class practice.

The workshop is designed to help individuals relatively new to choice-based conjoint analysis become comfortable with this powerful research technique. Though much of the lecture will be practical and introductory, Tom will introduce some advanced principles including the use of Latent Class and techniques for individual-level estimation such as Hierarchical Bayes.

The price for the workshop is $700. Room rates at the Sheraton are $105 if you call the hotel prior to November 22 and identify yourself as part of our group. To learn more about the workshop or to register, please see the CBC Workshop section of our home page (www.sawtoothsoftware.com).

Go back to Index


Hierarchical Bayes: Why All the Attention?

If you've been to a technical market research conference lately, you've likely heard presentations advocating a technique called Hierarchical Bayes estimation (HB). The possible applications for HB are far-reaching. If there is heterogeneity among individuals, HB can significantly improve upon traditional aggregate models such as: OLS regression or logit for conjoint/choice analysis, customer satisfaction, brand image studies, or any other situation in which respondents provide multiple observations.

Until recently, the individuals advocating HB were academics and a few practitioners expert in statistics. HB is demanding both in terms of computational time and complexity. For realistic market research data sets, the run times were counted in days rather than minutes or hours. Given that no easy-to-use HB software existed and computers were not fast enough to deal with real world problems in a reasonable time frame, it is not surprising that some practitioners were skeptical of HB and the hype surrounding it.

Until recently, we too were doubtful that HB would soon achieve very wide-spread use in the marketing research community. But recent advances in the processing speed of PCs have exceeded our expectations and knowledgeable academics such as Greg Allenby of Ohio State have taught tutorials, published algorithms on HB estimation, and been supportive of our efforts to develop easy-to-use software.

In short, why all of the attention for HB?

  • In application after application where respondents provide multiple-observation data, HB estimation seems at least to match and usually to beat traditional models. Of particular interest to Sawtooth Software users, conjoint analysis is a prime example of an application that benefits from HB estimation.
  • HB estimation is robust. The HB models supported by Sawtooth Software are not subject to local minima.
  • HB permits estimation of individual-level models, which lets marketers more accurately target/model individuals. More specifically, HB permits estimation of models too demanding for traditional methods: even when estimating more beta coefficients per individual than there are individual observations.
  • Aggregate estimation models confound heterogeneity and noise. By modeling individuals rather than the "average," HB can separate signal (heterogeneity) from noise. This leads to more stable, accurate models whether viewed in terms of individual- or aggregate-level performance.
  • The "draws" (replicates) for each respondent provide a rich source of information for more accurately conducting statistical tests and, for example, estimating nonlinear functions of parameters such as shares of preference.
Two other articles in this newsletter illustrate the benefits of HB estimation with respect to ACA and regression-based models. We do not suggest that HB is a panacea. However, we have been impressed by the way HB handles numerous real-world and synthetic data sets that we have tested. It generally beats other analytical techniques with which we are familiar. We expect HB soon to become a mainstream analytical technique for market research.

Go back to Index


Improving the Accuracy and Stability of Models with Hierarchical Bayes Regression

Regression analysis is widely used in marketing research for quantifying the relationship between predictor variables and an outcome. The predictor variables are termed independent variables and the outcome the dependent variable. As an example, in customer satisfaction modeling, the independent variables can be respondents' evaluations of brands on different aspects such as quality, performance, and service after the sale. The dependent variable is usually a measure of overall satisfaction with the brand or likelihood of purchasing that brand again.

Multiple regression models take the general form:


                Y = b0 + b1X1 + b2X2 + . . . bnXn

	where, 
				
	Y 		= dependent variable
        b0              = constant
        b1...bn          = regression weights (betas)
        X1...Xn          = independent variables
The goal of the model is to minimize the difference between the predicted and actual values of the dependent variable. The degree of fit is termed R². An R² of zero implies that the predictor variables provide no information to predict the dependent variable, and a value of 1.0 implies perfect fit.

Often in marketing research we tend to apply regression analysis to a group of observations that individually have different relationships between the independent and dependent variables. Consider people's opinions toward anchovies on pizza. Anchovies are generally either liked or despised. It is rare to find an individual who is lukewarm about anchovies. The distribution of individuals with respect to anchovy preference is not a normal bell-shaped curve, but perhaps has two "humps," reflecting the mixture of two very different populations.

Consider a hypothetical satisfaction study for pizza in which respondents tasted four different pizzas (some with anchovies and some without) and then rated each pizza on an overall desirability scale. To analyze the data, we apply a regression model to predict respondents' satisfaction for pizza based on whether it had anchovies or not. Let's assume that the independent variable (X1) indicating whether a pizza had anchovies or not was dummy coded (0=no anchovies; 1=has anchovies). Further assume that half of the population loves anchovies and their true beta weight b1 (the increase in satisfaction due to the presence of anchovies) is +10 (plus or minus some error). The other half of the population despises anchovies, b1 = -10, again plus or minus some error.

When we pool the data and estimate b1, we discover that b1 is close to but not significantly different from 0, and the R² is also near zero. (Both values would be exactly zero if respondents answered without noise and all used the rating scale in the same way.) Without any additional information, we'd be tempted to report that anchovies do not affect people's satisfaction with pizza whatsoever. And we would be dead wrong. The aggregate regression model has ignored heterogeneity and simply tried to describe the "average" respondent. Moreover, because aggregate regression cannot distinguish between (confounds) heterogeneity and noise, the estimate of b1 is not as precise as it could be.

Hierarchical Bayes (HB) can deal much better with this situation. HB "borrows" information from other respondents to compute relatively stable individual-level results when respondents provide multiple observations (in our example, respondents evaluated multiple pizzas). One can even estimate useful individual-level models for more independent variables than a respondent has given observations an impossible feat for traditional regression.

By estimating betas separately for each individual rather than just for the average of all people, HB separates heterogeneity (signal) from noise. The use of HB for this problem would reveal that anchovies significantly affect people's satisfaction for pizza. For HB, the average R² (the result of R² measured at the individual level and then averaged across respondents) is significantly greater than 0. If we submitted the individual-level betas to a cluster analysis, we'd learn that there were two distinct types of people with opposite opinions. We'd note that mean value for b1 was near zero. But, because HB has been able to separate the heterogeneity from the noise, the average estimate of b1 is more precise, and closer to zero than with aggregate regression.

Those readers attuned to the assumptions of HB will point out that this hypothetical situation is at odds with HB's assumption that respondent betas conform to a normal distribution. The beta weights are indeed tempered by this assumption, but the observations provided by each individual still strongly influence the individual-level betas. We've analyzed a synthetic data set conforming to the pizza example with HB and observed that it deals well with this problem. Respondents are separated into their respective populations, the individual estimates of beta closely fit the true betas, and estimates of aggregate betas are measured more precisely than under aggregate regression.

It is worth noting that Latent Class methods are also useful in dealing with heterogeneous populations. For this simple pizza example, a two-group Latent Class solution would be entirely appropriate. However, Latent Class solutions are subject to local minima, they typically do not achieve proper individual-level estimates and, like cluster analysis, the analyst must decide how many groups (classes) are appropriate.

While the pizza example above is a very simple illustration, the principles are important and relevant to more complicated regression problems in marketing research. The major take-aways are as follows:

  • If respondents provide multiple observations, HB can be used to estimate individual-level betas.
  • HB can distinguish between the heterogeneity and noise that aggregate regression modeling confounds. This results in more precise estimates of average betas than under aggregate regression.
  • The individual-level beta weights can be used to segment respondents, using methods such as cluster analysis, neural networks, CHAID or AID, or banner points (filters) such as in cross-tabs.
Another problematic issue that often derails multiple regression models is lack of independence (colinearity) among the independent variables. Consider a customer satisfaction study in which respondents evaluate multiple brands on various product-related features (independent variables) and then provide an overall evaluation of the brand (dependent variable). The goal of such a study might be to derive the weight (importance) each feature has in driving overall satisfaction. If some of the attributes have overlap in meaning for many of the respondents, such as "reliability" and "durability," regression modeling will have a difficult time distinguishing the relative weight of these two related items. As a result, colinearity leads to unstable estimates of beta weights. HB significantly reduces this problem by distinguishing heterogeneity from noise and by leveraging information from respondents whose ratings reflect better discrimination between "reliability" and "durability" to improve the measurement for respondents who did not make such distinctions. The result is more precise estimates of both individual and aggregate beta weights.

Marketers should be more concerned with profiling and targeting individuals and segments rather than the market average. HB methods support this strategy and are very valuable for problems that have traditionally been analyzed using aggregate regression. Whether the researcher's interest lies in achieving aggregate- or individual-level estimates of beta, for studies in which respondents provide multiple observations, HB usually beats aggregate regression.

Go back to Index


HB Improves ACA Part Worth Estimation

According to a 1997 industry survey conducted by Wittink, Vriens and Huber, ACA is the most widely used methodology in the world for conjoint analysis. Given its popularity, it is not surprising that ACA has been widely scrutinized and been the subject of a great deal of debate. Most notably, in a 1991 Journal of Marketing Research article by Green, Krieger and Agarwal, ACA was criticized because of potential scale incompatibilities between the self-explicated priors and conjoint trade-off sections of the interview.

ACA version 4 was released shortly after the 1991 JMR article. It used a slightly different technique from earlier ACA versions for combining self-explicated priors and conjoint pairs information. Although Version 4 improved the way ACA combined the two sections, it was still necessary to assume that the self-explicated priors were properly scaled.

In 1995, Allenby, Arora and Ginter published an article also in the Journal of Marketing Research reporting improvements for ACA through Hierarchical Bayes estimation. Allenby and a number of co-authors' collective work on HB methods has been ground breaking and important. Now four years later, we have developed the ACA Hierarchical Bayes Estimation module and have documented its benefits.

ACA/HB is a post-data collection module that reads data from studyname.ACD files and computes utilities, saving them in a new ASCII file that has the same format as the familiar .UTL files that you use with the ACA simulator. The process is so automatic, even a data processor with no analytic experience can adequately run ACA/HB by just using the defaults and pressing a few keystrokes.

ACA/HB provides two major benefits:

  1. The ACA/HB module improves the quality of each individual's utility estimates by "borrowing" information from other individuals. This translates to more accurate predictions of both individual choices and share estimations. We have tested the results on dozens of real and synthetic data sets. HB at least matches and usually beats traditional ACA utility estimation.
  2. ACA/HB provides a more theoretically sound way of combining data from the self-explicated and paired comparison sections of the interview. Because the priors information can be applied in a purely ordinal way as constraints, it entirely avoids the issue of combining two separate sets of metric dependent variables with potentially different variances.
Not only is the technique more defensible, but the results are generally better. Notably:
  • ACA/HB utilities are less biased toward equal utility increments spacing between levels as compared to ACA v4.
  • ACA/HB importances reflect slightly more discrimination than under ACA v4.
  • ACA/HB does a better job of estimating utilities for the levels not taken forward into pairs when using "Most Likelies" and "Unacceptables."
In addition to those benefits, ACA surveys can now be shorter. ACA/HB does not require the calibration concept data (unless you want to calibrate the data for purchase likelihood simulations). Therefore, you can cut this sometimes confusing section from your ACA surveys. Rather than reducing the length of the interview, you might decide instead to add a few more pairs questions to further stabilize utilities.

We think that the ACA/HB module will prove a valuable tool for ACA users. For a typical data set, with a few hours of extra computational time one can significantly improve the quality of the ACA data set. This new development brings ACA up-to-date with the most cutting edge estimation techniques being applied today. It also provides a way of combining ACA's self-explicated data with answers from paired comparison questions without having to make any assumptions about the scale of the self-explicated data.

For more details on ACA/HB, please download the "ACA/HB Technical Paper" from the Technical Papers library at www.sawtoothsoftware.com.

Go back to Index


Ci3 Tech

A Simpler Way to Conduct Multi-Lingual Studies

Most Ci3 users use the ALTERNATE command to branch to different versions (i.e. languages) of the same questionnaire. The typical process is for respondents to specify what language to use in an opening question. The ALTERNATE command then loads different versions of the questionnaire (each from separate .QST files). The data are saved within the same .DAT file initially, but are split into different data files after the data have been accumulated and consolidated. Different data sets often lead to different conversion layouts (even if all of the questions were identical) depending on a number of issues that we won't go into here. The bottom line is that if you want the data from different versions of the questionnaire to end up in a single data set with a common conversion layout, there is an easier option.

Rather than branching to separate questionnaires using ALTERNATE, you can create a single interview that uses conditional logic and GET statements to display one set of interview text or another. Consider an interview with ten questions: Q1 to Q10 for which we want to interview in both English and Spanish. We could format the questionnaire with 20 questions in this order:

  • Q1 to Q10 (questions with the default English text)
  • Q1S to Q10S (corresponding dummy questions with Spanish text and PAUSE 0 instructions)
Suppose for each respondent we initially had a variable called VERSION where 1 = English and 2 = Spanish. As the first instruction for Q1, we'd specify:

	IF (VERSION = 2) GET Q1S
This tells Ci3 to display the text from Q1S (the Spanish version of Q1) if the respondent needed to see Spanish text. For Q2, we'd substitute Q2S for Spanish, etc. As the last instruction in Q10, we'd place an ENDQUEST so that the dummy questions at the bottom of the questionnaire (which were only used to supply alternate text) were never seen or processed.

This solution seems quite simple, but we haven't yet dealt with constructed lists. Most developed interviews require the use of constructed lists and list operations. We'd suggest placing the English list items in separate predefined lists (between LIST-ENDLIST instructions) from the Spanish items. Then, you can create the customized (either English or Spanish) constructed list on the fly using the APPEND command. During data processing, each question that used constructed lists will require some recoding to collapse the separate English and Spanish items into common codes.

Notes on the Use of Fonts in Ci3

We often receive questions regarding the use of fonts in Ci3. There are indeed many detailed issues. Many of you will recall that Ci3 originally was a DOS-based application supporting a 25x80 character display. In the DOS days, painting text on the screen was a much simpler matter. Twenty-five lines and 80 columns were available for text, where each typed character corresponded to a column. Text, colors or boxes could be placed on the screen by defining the line and column coordinates.

With the release of Ci3 for Windows, the assumptions of the 25x80 character display are maintained for backwards compatibility with DOS questionnaires. Text, graphics and lines are still placed on the screen using a 25x80 grid. But Ci3 for Windows lets users specify different point sizes and supports proportional fonts such as Arial, where the widths of the characters can vary. While using different fonts and point sizes dresses up the look of a questionnaire, it makes the characters no longer fit the 25x80 layout.

Fonts and point sizes can be controlled by the following commands in Ci3:

  • DEFFONT (sets default font used throughout the interview)
  • T: (lets you set the font and point size for the current question's text. It overrides any DEFFONT statement.)
  • SHOWFONT (sets the font used for the current question for text controlled by SHOW, SHOWLIST and SELECT. It overrides any DEFFONT statement.)
The default font and point size in Ci3 Windows interviews maps to the 25x80 character display. In other words, the width of a character is 1/80th the width of the screen. If you use the default font and point size, you should have no problem highlighting words within sentences (using the COLOR command) or in placing words on the screen at an exact place (using the SHOW statement). Regardless of the screen resolution (e.g. 640x480, 800x600), the text will always be located in the same relative position on the screen.

If you change the point size or use a proportional font such as Arial or Times, matters are complicated. For example, column number 20 no longer necessarily corresponds with the 20th character (letter) on that line. Furthermore, you'll find that by changing screen resolution, the text can shift around on the screen. Thus, if you don't have control over the PCs running the interview, you will want to view your questionnaires under various resolutions to make sure everything is displayed acceptably. Another alternative is to set the preferred resolution of the interview, by specifying it during the field disk creation process.

It is possible to change the font in the middle of an interview in which the default font was something different. This is useful if you need to have the characters map exactly to the 80-column layout for a particular section of your survey. You can do this on a question-by-question basis by specifying either "Ci3 Default" (ANSI) or "Ci3v20 Default" (ASCII) (these font names are case sensitive) in a T: or SHOWFONT statement.

The Ci3 Windows default font and questionnaire editor present characters in ANSI mode. If you are specifying Spanish characters such as the "ñ" symbol, you can access that character by typing "Alt-0241" (hold down the Alt key while typing 0241 using the number pad). An ANSI character map is available in Windows programs. Under Windows 98 (standard installation), you can access the ANSI character map by clicking Start | Programs | Accessories | System Tools | Character Map and then by choosing "System" as the font.

Sometimes you may have received a translation (say, in Spanish) from a translation agency where the text was provided in an ASCII font. The special extended character set for ASCII is different from ANSI (choose "Terminal" font in the Windows Character Map to see the ASCII character map). When you paste ASCII text into Ci3 (default ANSI mode) and compile the questionnaire, the characters will be mapped to their ANSI counterparts and will not look correct. For example, the ANSI character "ñ" maps to the ASCII character "±". One solution is to interview in DOS mode (using DOSQUE.EXE) because the DOS questionnaire program interprets the characters in ASCII mode. If you need to interview in Windows mode (WINQUE.EXE) and use the ASCII character map, you can specify the "Ci3v20 Default" font (this font name is case sensitive). Another solution to convert ASCII text to ANSI text is to open the file with your word processing package, and then re-save the file as a text (or ANSI text) file. Your word processor will convert the ASCII codes to their appropriate ANSI counterparts. In Microsoft Word, make sure your "Confirm Conversion at Open" option is checked, under View | Options | General.

Go back to Index


Internet Software in Development

Many of you have been asking us about software for Internet interviewing and conjoint analysis. Since its inception over 15 years ago, Sawtooth Software has leveraged emerging technology to provide sophisticated analytical tools that deliver strategic value to businesses. The Web is the current frontier, and we recognize how imperative it is to embrace this important medium. Our goal is to provide cost-effective software that delivers everything you need for self-hosting your own Internet surveys on either your ISP's or your own Internet server.

Over a year ago, we released our first software for Internet data collection, called the CVA Internet Module (traditional full-profile conjoint analysis). This was an important first step in developing our future line of software products for the Web. We are currently on track to release the ACA Internet Module during the first quarter of next year. It will be accompanied by a general survey writing package for the Web called CiW (Computer Interviewing for Web). The name is a play on our popular Ci3 software, and it will offer many (but not all) of its features. CiW will support standard survey questions such as radio button, check box, combo box, open-end, and numerics. Skip patterns and randomizations will also be included. Of course, graphics and sounds will be supported.

Our Internet software uses "least common denominator" technology (PERL and HTML) so that your surveys will work under nearly any server and browser configuration. We want to ensure that you can reach the widest audience possible, without you or your users having to invest in new hardware or upgrade the operating system/browser just to accommodate our software.

To see a short on-line example survey composed using the ACA Internet Module and CiW, please visit www.sawtoothsoftware.com/acanet/login.htm.

© 2012 Sawtooth Software, Inc. All rights reserved.