A good explanation of why handicappers "studying" their clumps of races so often go astray, and wind up chasing rainbows that don't exist in the real world. "Overfitting to a specific clump of races" should not be dismissed lightly.
"To train a machine learning system, you start with a lot of training data: millions of photos, for example. You divide that data into a training set and a test set. You use the training set to "train" the system so it can identify those images correctly. Then you use the test set to see how well the training works: how good is it at labeling a different set of images?
The process is essentially the same whether you're dealing with images, voices, medical records, or something else. It's essentially the same whether you're using the coolest and trendiest deep learning algorithms, or whether you're using simple linear regression.
But there's a fundamental limit to this process, pointed out in
Understanding Deep Learning Requires Rethinking Generalization.
If you train your system so it's 100% accurate on the training set, it will always do poorly on the test set and on any real-world data. It doesn't matter how big (or small) the training set is, or how careful you are. 100% accuracy means that you've built a system that has memorized the training set, and such a system is unlikely to indentify anything that it hasn't memorized."
https://www.oreilly.com/ideas/the-ma...wsltr_20170607