Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board - View Single Post

JerryBoyle · 01-16-2019, 09:35 PM

Quote:

Originally Posted by PIC6SIX

I am asking only as a point of interest since I am 73 years old and a dyed in the wool pen and pencil capper. I would like to know from where, what and how you computer guys gather/manipulate your info. Do you buy downloads from DRF then sort such data according to your program (your own hcp parameters). Maybe someone would like to go through their step by step handicapping process and how long it takes to hcp an 8 race card at one track. No handicapping secrets asked.

I've purchased my data from DRF - they have machine readable charts which go back to some time in the 90s. You can purchase by day/month/year, but they give all races for all North American tracks, so if you purchase back far enough, you can effectively build a database.

Not sure if you've read Bill Benter's paper about computer wagering, but it effectively lays out a blue print that I suspect any (including myself) model-based wagerer uses. Whether or not someone can start from scratch and become successful using this approach is one question, but I'm confident that the successful teams are using SOME variation of this approach still. Broadly speaking, it encompasses these parts that generally follow sequentially, and assume that your database with historical data is up and running:

1. Feature (or factor) development: these are the inputs to any given model. Such things as last speed, number of days off, trainer's record with this horse, etc, etc. They can get arbitrarily complex. These are what differentiate the models imo.

2. Model development: taking the features from step 1 and using them as the input to a model, which is generally some form of regression, to determine how "important" each feature is. Think of it as fitting a line of best fit through some x, y points. There are obviously many x's (features), and it's complicated by the fact that there are many runners in one race, but at the end of the day, you're weighting the factors such that they best explain the output variable, which in this case is a probability for each runner.

3. Model backtesting: take the inputs from 1 and the model from 2 and actually make predictions on historical races. This should be done on data that wasn't used in step 2, as you want to see how the model would have done on "unseen" races. For each race, your model will output. With your predictions and tote odds from the your historical data, you can determine if you have an overlay/underlay. As part of the backtest you'll need to determine which betting strategy works best for your intended goals (i.e. maximize profit, minimize risk, etc). Some examples are flat betting, (fractional) kelly betting, etc.

4. Source the data that you will use for the day of the race**. This is a big one. I have yet to find a historical data provider that will also provide me the same data prior to the start of the race. Anything that goes in to your features prior to a race, including data used to make bets, needs to be available prior to the race running. Things as simple as the names of the horses, the distance of the race, the trainer, the current available odds, pool sizes, etc, etc. THIS IS NOT TRIVIAL. Further complications are that it needs to match the data in your historical data. Your db has William Macy as the trainer, but your only source on the day of the race has the trainer listed as Bill Macy? You need to correct that, and you need to be able to do it in a quick way - we want to bet on many races throughout the day.

5. Once you have the data for the day of the race, it's time to send it through your model and get your predictions for each runner. Take those predictions and calculate how much you'd like to bet given your bankroll.

6. Actually place the bets. We want this to be an automated system (and for more exotic combos it's not feasible to key out all possible combinations), so we need to write a program that can automatically take the output from the previous step and place the bets through your ADW. Another wrinkle here is that not all ADWs provide all tracks. So there has to be some logic that splits your bets up depending on track.

Steps 1-3 are done in the days/months/years prior to a race. 4-6 happen on the day of the race, but the software needed to perform them is written over months. I'd imagine that some people automate more than others for 4-6. On the day of the race, the question becomes how much human intervention is required? I can only say that I've designed mine to be entirely automated. So, to answer your question, I do no work on the day of the race. I have a program running that:

1. Monitors various ADWs for all available races and checks the MTP. I determine ahead of time at what MTP I want my software to begin its predictions and betting. This is tricky because you want to give it enough time to calculate all the features, make the predictions, size the bets, and place the bets, but you also want to wait as long as possible to because the odds get sharper. You also want it to be generally around the same time, becase your backtested results are dependent on the price you assumed you'd get. That is, if I tested making my wagers based on the odds that were available at 1 MTP, then I need to make sure my software runs after 1 MTP and uses the 1 MTP price. Otherwise, how can I truly be sure what I tested is what might actually happen?

2. When the MTP time for a race strikes what my software determines to be the "GO" time, the software sources all of the racing data it needs (from various ADWs). It pulls in the horse names, the trainer, distance, whether it's dirt/turf, any current scratches, etc.

3. The data sourced in 2 is sent to the system which generates all of the features. These features are then sent through the model to generate a probability for each runner.

4. Given the probabilities from 3 and the odds data pulled from 2, the system calculates how much to bet given my current available bankroll and my betting parameters (currently .5 kelly)

5. The bets from 4 are placed with the ADW which sends back a confirmation.

As I said, some might manually do some of this, but I prefer (and think it's more robust) to have it be automated from end to end. There are some gotchas here. The biggest are first, the MTP clocks are not accurate as we all know. Often times, my program will bet at "0 MTP", but the race won't go off for another 5 minutes. The 2nd issue is that late scratches can and do happen after my program wagers. In this case, the wagers for late scratches are cancelled by the ADW, but my other bets stand. I've not interfaced with any teams, so I can't say for sure that this is how they do it.