For background, RLH is the "Root Likelihood" or the predicted probability ("share of preference" under the logit rule) for the product concept that the respondent actually chose in the choice task. Then, across choice tasks for a respondent (under HB estimation) we take the geometric mean of those likelihoods. So, in many ways we can think of RLH as a pseudo R-squared from a logistic regression.
Now, let me address the question. HB estimates part-worth utilities that have a high RLH fit to the actual choices respondents made. But, please note that HB doesn't simply try to maximize the RLH for each person (like purely individual-level logit would do). Rather, HB actually purposefully reduces the best possible fit to each individual's data by also considering how likely the respondent's set of utilities would be drawn from the population (under the assumption that the population was distributed multivariate normal). In other words, HB wants to try to maximize a joint likelihood of the part-worth utilities fitting each respondent's choices AND the part-worths for that respondent having a high likelihood of coming from the population distribution. Essentially, HB is saying that there isn't enough data at the individual level to fully trust and use that data alone, so a compromise is set up between that goal and matching the assumption that a respondent should be seen as similar to the other respondents in the same utility run.
So, I wonder why you are wanting to improve the RLH values after the utilities are computed? That would mean that we would somehow post hoc change the utilities for each respondent so they provided a better fit to each respondent's choices (essentially undoing the job that HB did to make an effective compromise between achieving maximum individual-level fit and leveraging the population distribution information)?
Then, you mention wanting to clean the data somehow to get rid of respondents with bad fit.
First, please note that some respondents who have bad fits are really just sloppy respondents who are answering randomly. There are ways to detect those respondents, such as looking for a long series of repeated choices (assuming random concept order in your CBC tasks, there should be no pattern to the choices of concepts across tasks); looking at extremely fast questionnaire times; and looking for straightlining behavior in the non-CBC questions in your survey.
However, please note that CBC questionnaires are challenging and the respondent who actually carefully tries to trade off many aspects (attributes) at the same time when making choices will actually have a lower RLH fit than a respondent who simply comes up with a quick way to answer the CBC tasks to get through the survey (such as always pick the favorite brand or always pick the lowest priced product). So, RLH alone doesn't indicate whether a respondent has been a good respondent or not.
So, please use multiple measures of goodness before cleaning the data and discarding respondents. RLH often doesn't provide enough evidence alone in the case of CBC to identify bad respondents.
But, with MaxDiff, RLH is more robust alone for identifying bad respondents. You just cannot come up with a quick and dirty way to answer MaxDiff tasks quickly and yield a high RLH (unless you do something perverse such as always pick the item with the most letters as best and the least amount of letters as worst; or pick the best items according to alphabetization priority of the first letter!).