Data Generator - Generating Test Data

Top  Previous  Next


The Data Generator creates synthetic data to test a survey prior to fielding.  This can help with testing skip patterns and quotas, for example.


The Data Generator automatically answers the questionnaire using random answers (unless you override this behavior and specify that certain questions should be answered a specific way).  Lighthouse Studio can generate data locally or on a remote study at a designated URL.  All skips, randomization, quotas, terminate links, passwords, pass-in fields, and merged fields function identically to how they function with real respondents.  This means the Data Generator can be used for creating test data quickly, for testing study logic or serving as a study tester.


If you want to generate test data remotely for a survey already posted to a remote server, click Field | Generate Data.... The Generate Data dialog is shown:





If you are testing a study posted to a remote server, Study URL should direct the Data Generator to the remote study URL's login.html page.  Study bypass URLs are also acceptable (see help).  


If Test Locally | Generate Data... is clicked, Study URL is not visible. This lets you generate data locally on your machine.


Start at Question sets the first question Lighthouse Studio will use to begin generating data.  


Starting Respondent Number makes any generated respondent data begin at this number.  If respondents already exist with those numbers, they are overwritten.  


Remove All Skip Logic ignores all skips for this session.


Remove Randomization ignores page randomization, randomized blocks, and questions that exist in the survey for this session.  


Use Browser shows a user how Lighthouse Studio is generating data instead of simple progress messages.  A web browser is opened and the user can see questions being automatically answered.  Generating data without Use Browser checked will run as many simultaneous respondents at a time as it can.  It will not use any custom JavaScript for verification or updating hidden values. Use Browser will execute custom JavaScript. If Use Browser is checked, Show Question Names and Show Variable Names becomes enabled.  These options display additional help text in the questionnaire while Lighthouse Studio is generating data.


Clicking Generate causes Lighthouse Studio to verify the study, make the local test folder, and generate data on the machine.  After the study is verified and the local test folder is made, the Generating Data dialog is shown:




 When data generation is finished, the Generate Data dialog is shown:





Clicking View Data in Admin Module opens a browser and logs the user into the Admin Module.  Clicking Get Data downloads the generated data into Lighthouse Studio.  If any problems occurred during data generation (like bad list building logic, bad skip logic, no available passwords), then the button View Errors is available.





This can help to identify where and why errors occurred.



Advanced Settings:


From the Generate Data dialog, when the Advanced button is clicked, the Generate Data Advanced Settings dialog is displayed:





Answer rate for questions that are not required:  This tells Lighthouse Studio how often to ignore questions that are not Required (such as a numeric or select question where respondents are allowed to proceed without providing a response). Required questions (including Conjoint and MaxDiff) are always answered.


Completion Rate: This indicates the percent of generated respondents that will be expected to complete the entire survey.  A proportion of the generated respondents will drop out at randomly selected questions when set to a value less than 100.


Simultaneous Respondents: If Use Browser is unchecked, this setting controls how many respondent records are generated at the same time.  The maximum value is equal to the number of available cores on your machine.  Lighthouse Studio ignores this value when using the browser.  (This setting is automatically adjusted to the number of cores if you try to set it to a value higher than the number of cores on your machine.)


Browser page load wait: This setting is to help the Lighthouse Studio Data Generator Use Browser work properly on your computer.  Faster machines allow this setting to be Shorter.  Slower machines may require this setting to be Longer. If you find the Lighthouse Studio Data Generator browser encountering page load errors when Use Browser is checked, adjust the setting to be Longer.  



Defined Values:


The Defined Values tab provides a way for users to supply specific answers instead of allowing the program to answer randomly.  Clicking Defined Values shows the tab:





All questions in the study are listed here except for page breaks, start question, terminate, quotas, and text.  Questions in loops are organized according to the loop suffix.  Text at the bottom of the dialog box provides help on acceptable input for a given question.  For example, in the screen above, S1 is a Select Radio button question with four acceptable values and is a required question.  In the field column, you may specify any value that will result in an answer of 1, 2, 3 or 4.  Defined values may be used to force data generation to take specific paths.  If a study contains restrictive screener questions, it may be virtually impossible for randomly generated answers to avoid screening out correctly.  Therefore, defined values help the Data Generator to make it through the screener question section.  


Defined values may contain any of the following for producing acceptable answers:

A fixed value – an answer that will be the same for each generated respondent record.  In the example above, this could be a 3.

Numeric range – a random answer that is constrained between two numbers.  Numeric ranges are indicated by two numbers separated by a dash, like 1-3.  Decimal numbers are allowed.  The number of decimal places will be determined by the largest number of trailing digits after a period.  (1.24-3.1415 will use 4 decimal places of precision.)

Scripting functions – Integer, Decimal, Mask, Regex, Stop, Ignore, Value.  The syntax for these functions is as follows:

- Integer(num1, num2) – Create a random whole number between num1 and num2 inclusively. Num1 may be bigger or smaller than num2.

- Decimal(num1, num2, decimals) – Create a random fractional number between num1 and num2 inclusively with the specified number of decimal places.  “decimals” must be a number greater than or equal to 0, less than or equal to 16.

- Mask(pattern) – Create a random input based on a specified mask.  The result is formatted  by the pattern that specifies the valid characters that can be contained in a particular location in the pattern.  The following characters have special significance, all other characters are considered normally





Any Number


Escape character, used to escape any of the special formatting characters


Any upper case letter


Any lower case letter


Any letter


Any letter or number




Example: Mask(#####) will generate a 5 digit number like a zip code.  Mask(###-##-####) will generate a 9 digit number in a Social Security number format.

- Regex(pattern) - Create a random input that would match the specified regular expression. See for information on regular expressions.





Any character


Any digit 0-9


White space (spaces, tabs, etc.)


Alphanumeric and _


Repeat preceding character 10 times


Repeat preceding character minimum n times, maximum m times




oRegex("\d*") - Generates a 10 digit number.

oRegex(".{5,10}") - Generate 5 to 10 random characters.

oRegex("\w*@\w*\.com") - Generates random email addresses.

oRegex("\d\d\d-\d\d-\d\d\d\d")        - Generates random numbers that are formatted like Social Security numbers.        



- Stop() – Data Generator will halt when this question is reached.  When Use Browser is checked, data generation will pause and the process can be resumed by clicking thedatagenerator0017button.  If Use Browser is not checked, the data respondent will drop out of the study and a new respondent will be started.

- Ignore() - Data Generator will not answer this field unless it is required.  If the question is required, a random answer is generated.

- Value(variablename) – Copy the data generated answer from another field.  For example, if a study has email address and confirm email address fields or password and confirm password fields, Value can duplicate the random value given for password in the confirm password field.


Unverified Perl – the scripting functions used by the Data Generator have been implemented using the Perl Programming Language.  The Perl code is evaluated locally in Lighthouse Studio, not the server, and the result is returned to the Data Generator.  See the Unverified Perl documentation. The same scripting functions as described earlier can be called but must be all capitalized, and all string arguments must be surrounded by single or double quotes.  Sawtooth Script functions used in authoring surveys will not work in this area.  Sometimes in Unverified Perl, it might be nice to have information about the current question like the current displayed design for a MaxDiff question, attribute and level information for ACBC BYO. To retrieve this information, the Perl variable $DB_QUESTION_INFO is available and is accessed just like a Perl Hash reference object.


Defined values can also contain multiple and different answer methods separated by commas.  For example, three different fixed answers and a numeric range can be supplied for given question and one of them will be randomly selected.



Use Browser:


If Use Browser is checked when Generate is clicked, the Data Generator Browser dialog is shown:





The speed that the questions are answered can be changed by moving the slider handle of the left track bar from Slow to Fast.  The Data Generator can be paused by clicking the datagenerator0019 button and resumed by clicking the datagenerator0020 button.  The buttons datagenerator0021 and datagenerator0022 are enabled when data generation is paused. datagenerator0023 answers one item on the current page (like a checkbox, radio button, combo box, or text input).  If the page has been entirely answered, datagenerator0024 will submit the page. datagenerator0025 answers all questions and submits the current page.  It then pauses on the next page.  Users can pause the process, answer some questions manually, and click datagenerator0026to resume. The Data Generator will begin answering the current page.

Page link: