Random examples of what I mean can be seen here:
https://www.kaggle.com/hrosebaby/fea...se-racing/code
https://www.kaggle.com/lukebyrne/hor...r-courses/home
https://www.google.com/url?sa=t&rct=...uHxxFr3YEs1H74
https://pdfs.semanticscholar.org/pre...8691093844.pdf
These "kernels", similarly to any other ML solution I have seen published(including Benter's paper about multinomial logit) cannot provide a good solution to the problem. At best they might result to a slightly better outcome to morning line or a good aggregate metric like bris prime power.
The main problem of these approaches lies in the weakness of the provided features; a model can be improved by improving the predictive value of the metrics that is fed with, something that requires a large universe of data, some basic handicapping ability and a lot of manual work as the training process relies mostly in trial and error rather to a specific set of steps that need to be followed.
An open source tool like then one we are discussing in this thread, has the potential to
improve the data collection process and the other layers to clean normalize, divide them in train - test - validation sets etc. This pipeline can grow dramatically as more parameters are added to the model and having multiple developers will certainly make the task more feasible. Having such a tool silently running as a backend service can allow for a large community of data miners to constantly running their models, allowing them to interact among each other and create the ultimate handicapping platform.