Dear Bryan,

Sorry, I am afraid I did not get you right.
What I am trying to do is comparing two models using a likelihood ratio test.  For this reason I need to come up with a total LL across respondents for each of the two models.
Imagine I have got 3 respondents, the related RLH and 4 tasks.
Resp 1-> RLH= 0.635
Resp 2-> RLH= 0.589
Resp 3-> RLH= 0.308
The invididual LLs become:
Resp 1->LL1=NL(0.365)*4=-1.81
Resp 2-> LL2=NL(0.589)*4=-2.22
Resp 3-> LL3=NL(0.308)*4=-4.71
Total LL across respondents= LL1+LL2+LL3=-8.74
Is this correct?
Moreover, does it make sense to compare two models using the likelihood ratio test? DoesnÂ´it focus too much on the lower-level model and not enough on the upper-level model? This is the reason why I was asking for a measure of the RLH at the population level (in the  _alpha.csv file).

My apologies for the many questions...I would be really thankful if you could answer me one more time!
related to an answer for: On the computation and meaning of LL

+1 vote
As you can see the LL test (being the sum of the LL across tasks and respondents) is only comparable between two models if the number of tasks and respondents is identical between the models (holding the data constant).

And yes, LL focuses on the ability of the model to fit individuals' choices.  The interesting thing is that HB actually doesn't try to maximize the LL at the individual level!  It purposefully sacrifices some degree of individual-level fit for the benefit of Bayesian smoothing to the upper-level model.  That degree of smoothing can be controlled by the prior variance assumption you use in the HB model.

The take-away is as you say: it is difficult to ascertain whether one model would work better for out-of-sample predictions than another using just LL fit based on within-sample lower level models!

This has been a common challenge for researchers and common issue to much published research in journals.  Just because we can obtain better within-sample fit (measured via LL, for example) does not necessarily mean a better model in terms of ability to predict out-of-sample data.   So, the most robust research results that deal with predictive validity should use out-of-sample fit.
answered Dec 11, 2014 by Platinum (162,615 points)
Thanks Bryan!