We've been doing k-means wrong for more than half a century



Updated 2021-06-04: The k-means++ implementation I was using previously appears to have been flawed. I've updated results using a better implementation.

The above report focusses on R. @ctwardy has replicated the basic result here and done some further exploration in Python.

Updated 2021-06-19: Added Appendix 2, sketching an argument that the asymptotic density of k-means++ is optimal.