When I look at average utilities, Level X has a higher utility than Level Y, but in simulations more respondents choose Level Y.
The most common reason for an outcome like this, which could also be extended to cover flat utility reports (levels are undifferentiated), is that you have a lot of disagreement in your respondent sample.
When you have categorical variables, such as Brand or Color, they don't inherently have a preference order. It's possible that your respondents have vastly different opinions on how good the levels are. To use an extreme example, consider a beverage study study where you just happened to survey a group of respondents where 50% prefer Pepsi, 50% prefer Coke, and everyone is kind of OK with Sprite. You might end up with utilities that look like this for a "Prefer Coke" and a "Prefer Pepsi" respondent:
|Prefer Coke||Prefer Pepsi|
Let's say you had 50 of each respondent. If you calculate an average across all respondents, you'll come up with:
On average, there is no difference between Coke and Pepsi, and Sprite is preferred. This turns out to be completely the opposite of what your data really tells you.
If you look at your standard deviation, though, Coke and Pepsi will both have about 95.5, while Sprite has 0. If you simulate a Coke, Pepsi, and Sprite, no respondents would choose a Sprite (assuming a first-choice rule). Additionally, if we had other attributes, Brand would also show up as being very important, despite the apparent low spread of the utilities. When you see these types of results, it's a good indicator you have a lot of disagreement among your respondents, and averages are probably going to be misleading.