Your Version 1 uses specific patients and relies on physician recall of those patient’s details. Variants include using the last X patients, using X specific patients whose records have been selected by the physicians’ staff and so on. This version and all of its variants suffer from non-random sampling of the patient population. All sorts of memory biases affect the doctor’s recall and prevent the sample of patients he comes up with from being representative. Also, unless his records staff are also sampling statisticians, any sample they draw will likely not be representative, either.
When I have been able to compare the patient populations which physicians recall to the actual % of patients of each type, the results were not close. That may not be important, however, if you weight results to the true market proportions of patient segments.
Another possible drawback from Version 1 is that patients may not fit neatly into the categories you will want to distribute them into (this happened to me a few times with this kind of study – patient heterogeneity was richer than the segmentation we had in place). To the extent that extraneous factors may affect therapy decisions in specific flesh and blood patients in a way that is not representative, this , too can throw off your forecasts.
The problem with Version 2 is that they physician’s memory will likely tend toward typical patients or toward recent exceptional ones. The sample the physician draws constructs memory could bias the model quite a bit and what we know about memory should not give us a lot of confidence about this approach. It isn’t like asking a consumer to predict their next 10 breakfast cereal purchases. In fact, when I’ve been in a position to compare physicians “next 10” predictions (or their “last 10” or % reports of recent past prescribing behavior) these predictions matched up very poorly with behavioral data about prescriptions written which we had at the individual physician level.
I am unaware of much academic research comparing these Versions 1 and 2. One informal study I conducted found the same answer both ways, after taking into account that Version 2, as one might expect, has a logit scale factor that makes for smaller utilities.
A third option resembles your Version 1 except that instead of relying on physician’s recall of specific patients, we supply archetypal patients in the survey. For example, if there are 4 distinct patient types (whose proportions in the market we know and want to represent) we might ask each CBC question but include 4 rows for responses below the question – one response for each of the 4 patient segments. Now we don’t have the extraneous patient heterogeneity issue from Version 1 and we don’t have the recall biases of Version 2.
I have never compared this Version 3 to either Version 1 or Version 2 but I’ve used it quite a bit.