Would please explain the difference between application of K-mean and agglomerative hierarchical clustering? In which study context each of them is more useful or efficient?

( Assume we use CBC/HB)

Thanks

Visit Sawtooth Software Feedback to share your ideas on how we can improve our products.

Would please explain the difference between application of K-mean and agglomerative hierarchical clustering? In which study context each of them is more useful or efficient?

( Assume we use CBC/HB)

Thanks

0 votes

Robin,

K-means and agglomerative hierarchical clustering are just two algorithms for developing clusters of records with "similar" profiles, at least when you have no dependent variable to use to define similarity (i.e. when you're doing "unsupervised partitioning"). Divisive hierarchical clustering, K-medians and cluster ensembles are examples of other ways of tackling the same problem. I typically use a variety of these methods when I'm creating segments because no one of them consistently outperforms the others (though I end up using the k-means, k-medians or cluster ensembles solutions the most often).

With large sample sizes k-means will run faster (so in that sense it's more efficient) but I really can't tell you that one of these is reliably more useful than all the others.

K-means and agglomerative hierarchical clustering are just two algorithms for developing clusters of records with "similar" profiles, at least when you have no dependent variable to use to define similarity (i.e. when you're doing "unsupervised partitioning"). Divisive hierarchical clustering, K-medians and cluster ensembles are examples of other ways of tackling the same problem. I typically use a variety of these methods when I'm creating segments because no one of them consistently outperforms the others (though I end up using the k-means, k-medians or cluster ensembles solutions the most often).

With large sample sizes k-means will run faster (so in that sense it's more efficient) but I really can't tell you that one of these is reliably more useful than all the others.

...

have you ever looked at the differences between the result "Mixed Logit (HB) + Clustering (e.g. k-means)" and "Latent Class Mixed Logit (HB)"? The latter is not implemented in Lighthouse Studio, but I would be interested in the differences.

Best wishes

Nico