Indeed, the use of conditional pricing raises the odds of getting significant interaction effects between price and the attribute(s) that price depends on to build the conditional look-up table. This is even the case if the prices are built in a proportional way relative to the other attribute(s)--e.g. prices are always -25%, -0%, and +25% variations from a "base" price for the different brands.
In theory, examining prediction to holdout tasks would be a fine way to assess whether one model (main effects) does better than another (main effects + interaction effects). However (and my colleague Keith Chrzan has recently completed simulation work on this), you usually need more than just a few holdout tasks to provide enough information to validate one model vs. another. Although the "default" in our CBC software is to ask for two holdout tasks, this is usually very inadequate in practice to be able to be used to detect whether one model specification is better than another. Having two holdout tasks is enough to get some face validity for the model and to confirm that some big mistake hasn't been made. Keith's research suggests that one might need many more holdout tasks to make fine judgments between the suitability of one model specification vs. another, such as ten or more.
Most researchers don't have access to so many holdout tasks (unless they a) have the luxury of having separate cells of respondents who are only used for answering fixed holdout questions, or b) are able to block their holdouts across respondents so that different groups of respondents receive different fixed holdout tasks), so other approaches to decide whether to use just main effects or interaction effects are needed.
I like to plot the Counts data (the brand x price counts) to see the shape of the pseudo-demand curves. These will be a bit noisy due to the nature of counts data, but they are a good and faithful guide to what the raw data are telling us about the response to price by different brands.
One could use aggregate logit and the 2-LL test (also examining the T-values for the interaction effects) to assess whether interaction effects add significant fit to the model--but aggregate logit isn't typically the model we end up delivering to the client. That is usually HB, which is known to lessen (but not always eliminate) the need to include interaction effects.
So, typically like to run main effects HB and develop demand curves via the simulator using "sensitivity analysis". I look at the simulated demand curves and compare them to the counts demand curves.
Next, I'll run HB with the interaction between brand x price specified and repeat the process in the simulator to generate the sensitivity analysis demand curves. Compare again to counts-generated curves.
The simulated demand curves should be smoother than the counts curves, especially if you constrain price to be monotonically decreasing in preference as prices increase.
Sometimes I find that additional interaction effects are really not needed in my HB models to represent the differential slopes of the brands' counts curves adequately. Sometimes I find the additional interaction effects seem to be needed.