Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board - View Single Post

DeltaLover · 08-22-2017, 10:58 AM

Quote:

Originally Posted by classhandicapper

I tried something approximating difference from the top and got pretty good results, but I think I see a flaw in the approach.

The regression is looking at multiple field values for each horse as input and then the finish position of the horses. So it is more or less trying to maximize the ability to rank all the horses in a race correctly. I'm really only interested in the ability to pick winners. It's not doing that nearly as well as I can do it with weights for each factor that came up with via trial and error.

So I need to somehow stress that the winners are the key.

Perhaps instead of ranking all the horses in each race I could look at just the top 3 finishers????

Logistic regression can be seen as a special case of neural networks; many problems are impossible to be solved with the former but require the latter. Such a problem can be found in the prediction of the finish positions that you are describing here.

What I see as the major challenge though, is not the algorithm to be used but the way to present the data to it; some of the data preprocessing tasks that need to be addressed are the following:

What the metrics to use? (ex: speed or pace figure, closing figures etc)
How to generate the necessary metrics? (ex: track variant estimation, cross distance - track adjustment etc)
How many past performances to use? (ex: Do we need individual models based in the number of available past performances? How to handle shippers? etc)
Should metrics be normalized using a per race window or passed in absolute values?)
How to pass race level data? (ex: Like wire to wire winning stats, average speed figures for all starters etc)
What kind and how to pass primitive (predicates) handicapping factors?(ex: layoffs, dirt to turf, first lasix etc)
How to handle connections? (ex: jockey/ trainer changes etc)

Even after answering all these questions, we still need to decide how to formulate the “target” of the model and by this I mean that simply targeting for raw finish ordering might very well not be any useful as at best it will match the crowd’s ranking.