I think you don't want to use the importance scores for this analysis - they would be appropriate for answering some questions, but probably not this one.
When comparing utilities in this way make sure to use the zero-centered diffs (ZCD) transformed utilities and not the raw HB utilities, as the former include our best effort to make utilities from different respondents, who have different levels of response error, comparable.
In academic papers you'll see authors run models with and without a given demographic variable (or with and without the interactions of design variables with the demographic variable) to see if including the demographic improves model fit. That's a lot of complicated analysis, so folks outside of academia rarely do it. Much more common is the approach you're considering, which is to do ANOVAs, one per attribute level, which, you're right, doesn't exactly answer the question of whether the entire attribute differs significantly.
What I do in this case is run all the ANOVAs of potential interest, then do an appropriate correction for multiple tests (I like the Benjamini-Hochberg procedure for preventing false discovery) and then if any level from an attribute is significant we can say that the attribute is significantly different by groups.