Merging Segmentation Data
Top  Previous  Next

Introduction

SMRT includes a Merge Wizard to lead you through the process of merging additional descriptive or segmentation variables into that same data file. These variables might be demographics, such as gender, household size or age, or information about a firm such as company size, volume purchased, or industry classification. You can also merge continuous variables to be used as respondent weights. The type of file the typical user will merge will probably be in text-only (DOS Text) format as created by SSI Web. The data file to be merged must include a respondent classification (Case ID) number. All merged data must be numeric only.

After you have merged additional variables into your data set, these are automatically available to you as "stub variables" for use in the Tables program or as respondent weights in either the Tables or the Market Simulator. If you undertake the additional step of defining Custom Segmentation Variables based on these merged variables, you can also use the merged information as filters or "banner points" during simulations if using individual-level utilities.

The next few sections include basic information about merging segmentation information and defining Custom Segmentation Variables. Additional help and details are available in the on-line help within SMRT, accessed by pressing F1 at any time during the merge process.
 


Merging Segmentation Data from Text-Only Files
 
Often users merge segmentation information (demographics, usage data, firmographics, etc.) into their market simulators for use as segmentation, filters or weighting variables. These data can come from any source (such as SSI Web), as long as the data have been formatted as a text-only file (either fixed-column or delimited). The respondent numbers in the file(s) to be merged must be numeric and must match the respondent numbers in the conjoint data file. The cases do not necessarily need to be in the same sort order.

The Merge program in the Market Simulator has a "Merge Wizard" that leads you through the entire process.

For the Merge program to work properly, you must have unique respondent numbers that match in both the source and destination data files (numeric data only). You should not have duplicate respondent numbers in either file. Any respondent in the file of conjoint part-worths not found in the file to be merged will receive a "missing" value for the selected merge variables.

Delimited Files

When you specify to merge information from a delimited text-only file into your data file (*.dat), you need to instruct the software on how to interpret the data file. You do so by specifying the delimiter (the character that separates data fields, such as a space, tab or comma) and specifying what value (if any) is used to designate missing values. This value may be any string of alphanumeric characters or symbols.

The Merge Wizard asks you to specify whether respondent data is on one line (example below, with three respondents numbered 1001 to 1003) or otherwise.

1001  3  2  5  7  
1002  2  3  1  5  
1003  1  3  3  7  

In the example above, each respondent record occupies a single line, and the data are separated by blank spaces. This layout is an example of "one line per respondent record."

Below is an example in which each respondent's data span more than one line:

1001  
3  2  5  7  
1002  
2  3  1  5  
1003  
1  3  3  7  

In the example above, each respondent record is spread across two lines. In order to understand how to read this layout, the Merge Wizard asks you how many fields per respondent record there are. In this case, there are five (note that this number includes the respondent number field).

The Merge Wizard asks you to indicate in which field the respondent number is located. Note that this is not the "column number" where each column is a single character, but the field in which fields are separated by the delimiter. In the examples above, respondent number is located in field #1.

To avoid misinterpretation of text strings (which may contain delimiters), text in the data file may be enclosed in quotation marks. Single (') or double (") quotation marks are supported.

The following is an example of how Merge reads data depending on the user-specified parameters for handling string variables:
 
Text in Delimiter
Quotes?
Data
Read as
String read as
Comma
Yes
5,6,"B K O",32
4 fields
B K O (1 string)
Comma
No
5,6,"B K O",32
4 fields
"B K O" (1 string)
Space
Yes
5 6 "B K O" 32
4 fields
B K O (1 string)
Space
No
5 6 "B K O" 32
6 fields
"B K O" (3 strings)
 

Fixed Text-Only Files

When you are merging data from a fixed column text-only file, the Merge Wizard asks you to specify in which columns the data are located and the length of each field. A column is the width of a single character. Many text editors (such as Microsoft's DOS editor, which you can access from the DOS prompt by typing EDIT and pressing the ENTER key) provide a column counter at the top or bottom of the edit window.

With fixed column ASCII files, the data are aligned in columns when viewed with a non-proportional font (such as Courier) such as follows (the first two lines below are a reference for counting columns, they would not appear in the data file):
 
1234567890123456789012345678901234567890  
----------------------------------------  
  6997         3           25    2  
 25283        23                 1  
   234         9            6    4  
 
Assume that the first variable is the respondent number (though it is not necessary that it be the first variable in the file). It is right-justified within an area that begins in column 1 and has a length of 6. The next variable starts in column 15 and has a length of 2. The third variable starts in column 28 and has a length of two (note that the second case has a missing value). The final variable starts in column 34 and has a length of 1.