I have run several HB models on my MaxDiff data, but Now I am not sure how to report the results and which model should I choose as the best fit?

I have followed the recommendations in chapter 12 of your advanced CBC book and generated the 95% CI & the % confidence of each level I am valuing. Still, I found that in the articles they report Standard errors, how can these two be correlated?

Furthermore, how can I discover the heterogeneity between respondents regarding each level/item and how to check if they are significant?

Finally, if I have only one hold-out, how can I check the hit rate of my model regarding this hold-out task or test the model predictability?

One last point, how can I calculate the degrees of freedom for the Logit in maxdiff case 2 if I have 31 total levels & presenting 6 of them (each time) and each respondent evaluated 10 different scenarios ?

Thank you very much

Regards'

1- I am using the standalone program to generate the row scores for the HB, and I am asking about reporting the HB raw score results. SO I got the geometric average for the RLH & the arithmetic average for each respondent scores from the _utilities.csv file, and that is my principal coefficients. Then from the _alfa.csv (or _minbeta.csv in case there are added coefficients or constraints) I used the iterations after convergence to generate the 95% CI by getting the 97.5th & 2.5th percentiles in each coefficient. Also, I calculated the % confidence by measuring the percentage of times the value of one coefficient that is larger than the one coefficient that is next to it in on dimension as the coefficients in each dimension are ordinal from the best to the worst. (Case 2 MAxDiff). I have not got the chance to check this against similar studies reporting HB of MAxDiff, so I checked HB for DCE. One of the studies reported coefficients & SE; variances & SE. [The Patient - Patient-Centered Outcomes Research https://doi.org/10.1007/s40271-019-00402-w]