Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.
It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.
|