Great question.

Let's imagine you have 30 items in your MaxDiff study. And, let's imagine respondents saw 4 items at a time in the MaxDiff questionnaire. HB-logit will scale the raw utilities such that the probabilities that respondents expressed (also the amount of response error) is based on the context of the questionnaire: 4 items shown at a time. The "Probability of Choice" transformation we use in the software is an exponential transformation that takes into account the original context seen by the respondents: items shown four at a time.

However, if you try to apply the Share of Preference (logit rule) to all 30 items at the same time, as if all 30 items were being considered at the same time within a competitive set of 30 items, the response error that's in play in the questionnaire will not apply well if we assume all 30 items were being compared simultaneously by the respondent. This causes the relative probabilities (ratios between items among the 30) to be too extreme relative to the probabilities actually expressed in the raw data (the responses to items out of quads). And, there isn't a nice and easy formula that lets us figure out what's the appropriate change in assumed response error as the number of items in the set changes. That's an unknown! (Unless you fielded a questionnaire that varied the number of items in the MaxDiff tasks so you could empirically measure this.)

Yet, our users like the flexibility to conduct simulations in which numbers of items in the simulation scenario are different from numbers of items shown at a time within the questionnaire. Users should just realize that the relative scale (assumed response error) will gradually shift on them as they simulate shares of preference for sets of items where the number of items departs more and more from the actual number of items shown at a time in the questionnaire.

This same issue also exists for CBC data ! Though, most Sawtooth Software users either are not aware of this or ignore it.

Changes in the scale factor mostly just affect the relative steepness or flatness of the probabilities across items. Within any one respondent, the rank-order of the items' preferences cannot change due to one exponential transformation or another. However, across respondents it's possible to see usually minor changes in rank-order position when summarizing the mean probabilities across people.

Also of note is that more extreme exponential transformations (such as the Share of Preference instead of Probability of Choice) will tend to squish the scores for the worst items more toward zero, but will stretch the scores for the best items on the high end. This can matter to you if you are wanting to focus your analysis on "best item discrimination" versus being able to keep some focus on "worst item discrimination". For example, if conducting cluster analysis on the results, issues like this can matter.