It seems strange to me that your MAE is a negative number. MAE (Mean Absolute Error) values are always positive. For example, let's imagine that the actual shares of choice (from counts) for three product concepts in a holdout task are A=30%, B=40%, C=30%. And, let's say your predicted choice shares (using RFC) are A=40%, B=30%, C=30%. Your MAE would be: [|0.40-0.30|+|0.30-0.40|+|0.30-0.30|}/3, or [0.10+0.10+0.00}/3, or 0.20/3, or 0.0667.
For MAE calculations to be very stable, you should have about four or five holdout choice tasks or more. So, although the illustration I showed above involved just one choice task, these calculations should be made across the four or five choice tasks and the absolute errors of prediction are averaged across the multiple choice tasks.
If you only have three or fewer holdout choice tasks to compute your MAE, then there is probably not enough data to tell if one model does a better job than another model of predicting the holdouts.
Next, it is strange to think about dropping any attributes from a conjoint model. Generally, we include all attributes in the models, because we include an attribute in the experiment because of prior hypothesis that it is meaningful at least to some groups of respondents. And, even if a 1-group solution or even a 4-group solution makes it look like an attribute didn't matter to anybody, it may just be due to an aggregation fallacy (see example directly below).
Aggregate logit can sometimes make it look like an attribute is not significant if different respondents or groups of respondents disagree about the order of preference for an attribute (like brand or color, for example). Let's say there are two colors in an attribute: red & green. Half of the respondents think that red is preferred and the other half thinks that green is preferred. In aggregate (the pooled logit solution), the two utilities may be very close to zero and not be significantly different from zero (their T-ratios do not cross the critical value of 1.96).
Next, most Sawtooth Software users employ HB estimation of utilities for their final model. It generally has proven more accurate for use in the market simulator than aggregate logit. Often, Latent Class analysis can perform almost at the level of HB for predicting aggregate shares of preference, but HB still tends to provide more accuracy and flexibility during analysis, because utilities are estimated at the individual level.