# Material and method part CBC if using HB instead of Logit. (plus: p-value / percent certainty)

As part of my master's thesis, I did a choice based conjoint analysis with Sawtooth.
I had to document and interpret the part-worth utilities, the attribute importances and the latent class analysis.
For the interpretation of the part-worth utilities I was recommended to work with the HB-values.

In the method section of my master's thesis, I have to explain the following seven steps of my choice-based conjoint analysis:
(1)design of the stimuli
(2)design of the selection situation
(3)specification of a utility model
(4)specification of a selection model
(5)estimation of the utilities
(6)interpretation and implementation
(7)disaggregation of the utilities

Step (1) – (3) are clear, tut at steps (4) through (7), I'm unsure about using HB instead of logit.

I only could find studies using the logit data, who explained these seven steps.
What they wrote:
(4) using multidimensional logit choice model
(5) done by maximizing a log-likelihood function using a quasi-Newton method
(6) parth-worth utilities were calculated using cox-regression. P-value of model and p-value of every part-worth utility.
(7) none

As I found a description of these seven steps in the literature. They described the HB-approach as part of step (7).

I am absolutely confused, of what I have to report within the steps (4)-(7), if I was recommended to use the HB-approach to report part-worth utilities instead of the logit-approach.
Maybe you have some Keywords what to fill in, to get me back on track?

Does it make sense to report the information about the logit-data (step 4-6), even when in step (7) would be said, that HB is used and the following analysis is based on the HB-approach?

I also had not find out yet:
-how to estimate the P-value of model and p-value of every part-worth utility.
-how to interpret the percent certainty, as I read it is in a range from 0 to 1 and I have a solution of 5,23621

Below are two screenshots of the Logit and HB outputs

The 5.23621 is actually multiplied by 100, so your Percent Certainty is actually 0.0523621 for this aggregate model.  Aggregate models often have low fit, because different respondents have different preferences.  Using an average of them all to predict each individual can sometimes lead to poor fit.

HB is really a different animal than aggregate pooled logit.  To obtain p-values for each attribute level (the test of whether each attribute level's utility differs from zero), you look at (typically) the last 10,000 draws of alpha in the alpha.csv file.  The alpha is a vector of utilities, representing the estimate of the population preference, in the upper-level model in each iteration of the MCMC algorithm.  To obtain the p-value, you count for what % of the draws (after convergence is assumed, typically the last 10,000 draws) the utility is either above or below zero.  For example, if 99% of the draws of alpha after convergence show attribute1_level1 higher than zero, then you are 99% confident that the population thinks this level has a utility higher than zero.

Which way you report the data for academic purposes (logit or HB) really depends on your expected audience.  If your audience is mostly comfortable with aggregate logit tests of significance, then it makes sense to stick with their world and get them what they're used to.

But, for making predictions for individuals' choices, HB typically works much better than aggregate logit.  You will typically do better with predictions using HB than aggregate logit.
answered Mar 15, 2018 by Platinum (163,415 points)
Hello,

There is one more question.

I obtained the p-values for the HB like you described above with the following solution:
All Part-worth utilities (so all level) for one (out of eight) attribute are not significant (calculated on a p<0.05 level).
How do I have to deal with this for further interpretation?
I also calculated the attribute importances for the HB. The attribute  importance for the attribute with the not significant part-worth utilities is 8.6.
The output of the latent class shows me an attribute importance in two classes > 10 for this attribute.
So how do I have to deal with six level which are not significant (so that I could mention them to have no effect on decisionmaking process) but an attribute importance which tells me that they have an effect on the decisionmaking process?

A few things to remember:

From completely random respondent data, when you compute attribute importances (by percentaging the ranges across attributes), you will find across a sample of respondents that each attribute has importance very close to 1/t, where t is the number of attributes in your study.

Due to our way of effects-coding for the X matrix, the attribute level utilities are centered around zero (their sum is zero within each attribute).  So, it is quite common for one or more levels of an attribute to fail to be different from zero per the HB counting test, even though they indeed may have an influence on respondents' choices.  But, due to zero-centering, some of the level utilities fall around the central zero point.

If all levels for one of your eight attributes fail the HB counting test on alpha (not different from zero), then the next question is if it is an ordered attribute (such as price, speed, weight) where everybody would be expected to have the same preference direction, or whether it is an unordered attribute like brand, color, or style, where people's differences in preference can cancel out the aggregate signal and make it look on the aggregate (the upper-level alpha) as if the population does not care one way or the other about all the levels of that attribute.

Latent class can find groups of respondents who tend to have similar preferences.  So, for unordered attributes, Latent class can find groups of people that have similar patterns of preferences.   But, again, averaging across people for unordered attributes can damp the importance calculation.  HB estimation at the individual level can do a better job reflecting true attribute importance.

You should probably clean respondents who answer too quickly, have low internal HB fit (RLH) and who fail other consistency checks in your survey.  Otherwise, random responders can make it look like attributes tend to have the same attribute importance at 1/t (where t is the number of attributes in your study).
Thank you very much for your detailed information.
Sorry for the amount of questions, but there are still a few points which are not clear to me.

I still do not really know how to deal with the problem, that all level of one of my attributes are not significant. So if I had to eliminate the attribute or how I could report it and deal with this attribute in my further interpretation and reporting.

The attribute where all levels fail the HB counting test on alpha (and have really low t Ratios: 0.81813; -0.31968; -0.46756; -0.32763; -0.87953; 1.21112) is the attribute “alcoholic content”.
My Conjoint-analysis is about alcoholic beverages. In my opinion the “alcoholic content” is not an ordered attribute, as I do not expected everybody to have the same preference direction between choosing a beverage with a high- or a low alcoholic content.

I also wanted to have a look at the t Ratios for the levels of this attribute for every segment. But by checking the box “reporting standard errors” I only get the Lower/Upper 95% CI for the whole 4 segment solution, but no t Ratios for every single solution. Is there any possibility to get this information?

As you told me, I tried to clean respondents. Is there any option to clean respondets who answer too quickly and have low RHL within the estimating process, so that I can do a HB counting test on alpha with a cleaned alpha-file?  By using the respondent filter I think I have to deal with the “sys_elapsedtime”. But I can not find any information how to operate. By using  operators like “<” or “>” there is always a failure shown.
Well, if all levels of an un-ordered attribute (like alcoholic content, where different people can have different preferences) fail an aggregate test for significance, then it could be that there are segments of respondents who think differently and are causing the scores to cancel out at the pooled, aggregate level.  It seems you realize this, so the latent class approach, where you obtain t-tests for different segments, could show whether there are segments who tend to agree with one another and for which the attribute is significant.

Which software are you using where you cannot see the t-ratios per group of respondents for the attribute levels?  I am using Lighthouse Studio v9.5 and it is showing for my latent class estimation utility runs.

To clean respondents, I like to add a new column (a new variable) to my data table within Lighthouse Studio's Data Management, View/Edit Data tab area.  I use my new variable (e.g. where the categories are keep=1, throw away=2) as a logical flag (a filter) during HB utility estimation such that my results are only using the respondents I want to keep.
I used software 9.5.2 and no t-ratios per group had been shown. Now I updated to 9.5.3 and it worked. Also the cleaning of respondents worked after your advice about adding a new column.

The result of the different outputs for the critical attribute is as following:
Counts:
-Within Att. Chi-Square: 3.977
-Significance: not sig.

Logit:
t-ratios: 0.6460; -0.8027; -0.7758; -0.2355; -0.8241; 2.0567  only one out of six significant (p<0.05)

HB:
Average Importances: Alcohol content 7.39
(calculating attribute importance manual with the individual utilities ZC = 7.99%)

LC:
Four group solution(Segment sizes 27.6%, 16.4%, 26.3%, 29.7%):
-t-ratios:
Segment1(-1.44069;  -0.03818;  -1.95584; 2.74507; 1.01962; 0.08145) -->significant level 2
Segment2(1.33090; 0.87687; -0.76028; -2.60860; 0.22535; 1.35247) -->significant level 1
Segment3(-0.26382; 0.09928; 0.17665; -0.53619; -0.96881; 1.61642)    -->significant level 0
Segment4(1.44575; -0.94500; 1.29460; -0.94929; -1.94222; 1.30765)  -->significant level 1

-Attribute Importance:
Segment1 7.6
Segment2 7.3
Segment3 5.3
Segment4 7.4

This solution makes me still wondering how to deal with this attribute. The attribute importance in the segments is between 5.3 and 7.6%, but in my opinion even within the four segments there is no segment who tends to agree with one another.
After calculating all this solutions, do you have an advice, how to deal with this attribute? Does it make sense to delete it? Or how to report and argument for not deleting it, which means that it has influence (as the attribute importance is showing)?
Hmmm, I had forgotten that we added the T-ratios by segment in a later version of the software.  Glad you got that working.

But, you are right, at least out to a 4-class solution, there doesn't seem to be much signal being captured by that attribute (alcohol content).  This suggests that respondents are ignoring it (or that you have a data processing error somewhere in setting up your models, which would be extremely unlikely if you were using our software to manage the process from start to finish).

If this issue (whether alcohol content) is an important part of your research, it might be a stunning finding that respondents are ignoring it.  You should review how the attribute wording was presented to respondents.  To make sure of this, you probably should follow up with respondents and ask them if they really are tending to ignore this attribute (relative to the other attributes) and WHY.