Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > General Handicapping Discussion


Reply
 
Thread Tools Rating: Thread Rating: 5 votes, 5.00 average.
Old 01-16-2019, 08:11 PM   #1
PIC6SIX
Registered User
 
Join Date: Jun 2005
Posts: 218
A/I & Computer Handicapping

I am asking only as a point of interest since I am 73 years old and a dyed in the wool pen and pencil capper. I would like to know from where, what and how you computer guys gather/manipulate your info. Do you buy downloads from DRF then sort such data according to your program (your own hcp parameters). Maybe someone would like to go through their step by step handicapping process and how long it takes to hcp an 8 race card at one track. No handicapping secrets asked.
PIC6SIX is offline   Reply With Quote Reply
Old 01-16-2019, 09:35 PM   #2
JerryBoyle
Veteran
 
Join Date: Feb 2018
Posts: 845
Quote:
Originally Posted by PIC6SIX View Post
I am asking only as a point of interest since I am 73 years old and a dyed in the wool pen and pencil capper. I would like to know from where, what and how you computer guys gather/manipulate your info. Do you buy downloads from DRF then sort such data according to your program (your own hcp parameters). Maybe someone would like to go through their step by step handicapping process and how long it takes to hcp an 8 race card at one track. No handicapping secrets asked.
I've purchased my data from DRF - they have machine readable charts which go back to some time in the 90s. You can purchase by day/month/year, but they give all races for all North American tracks, so if you purchase back far enough, you can effectively build a database.

Not sure if you've read Bill Benter's paper about computer wagering, but it effectively lays out a blue print that I suspect any (including myself) model-based wagerer uses. Whether or not someone can start from scratch and become successful using this approach is one question, but I'm confident that the successful teams are using SOME variation of this approach still. Broadly speaking, it encompasses these parts that generally follow sequentially, and assume that your database with historical data is up and running:

1. Feature (or factor) development: these are the inputs to any given model. Such things as last speed, number of days off, trainer's record with this horse, etc, etc. They can get arbitrarily complex. These are what differentiate the models imo.

2. Model development: taking the features from step 1 and using them as the input to a model, which is generally some form of regression, to determine how "important" each feature is. Think of it as fitting a line of best fit through some x, y points. There are obviously many x's (features), and it's complicated by the fact that there are many runners in one race, but at the end of the day, you're weighting the factors such that they best explain the output variable, which in this case is a probability for each runner.

3. Model backtesting: take the inputs from 1 and the model from 2 and actually make predictions on historical races. This should be done on data that wasn't used in step 2, as you want to see how the model would have done on "unseen" races. For each race, your model will output. With your predictions and tote odds from the your historical data, you can determine if you have an overlay/underlay. As part of the backtest you'll need to determine which betting strategy works best for your intended goals (i.e. maximize profit, minimize risk, etc). Some examples are flat betting, (fractional) kelly betting, etc.

4. Source the data that you will use for the day of the race**. This is a big one. I have yet to find a historical data provider that will also provide me the same data prior to the start of the race. Anything that goes in to your features prior to a race, including data used to make bets, needs to be available prior to the race running. Things as simple as the names of the horses, the distance of the race, the trainer, the current available odds, pool sizes, etc, etc. THIS IS NOT TRIVIAL. Further complications are that it needs to match the data in your historical data. Your db has William Macy as the trainer, but your only source on the day of the race has the trainer listed as Bill Macy? You need to correct that, and you need to be able to do it in a quick way - we want to bet on many races throughout the day.

5. Once you have the data for the day of the race, it's time to send it through your model and get your predictions for each runner. Take those predictions and calculate how much you'd like to bet given your bankroll.

6. Actually place the bets. We want this to be an automated system (and for more exotic combos it's not feasible to key out all possible combinations), so we need to write a program that can automatically take the output from the previous step and place the bets through your ADW. Another wrinkle here is that not all ADWs provide all tracks. So there has to be some logic that splits your bets up depending on track.

Steps 1-3 are done in the days/months/years prior to a race. 4-6 happen on the day of the race, but the software needed to perform them is written over months. I'd imagine that some people automate more than others for 4-6. On the day of the race, the question becomes how much human intervention is required? I can only say that I've designed mine to be entirely automated. So, to answer your question, I do no work on the day of the race. I have a program running that:

1. Monitors various ADWs for all available races and checks the MTP. I determine ahead of time at what MTP I want my software to begin its predictions and betting. This is tricky because you want to give it enough time to calculate all the features, make the predictions, size the bets, and place the bets, but you also want to wait as long as possible to because the odds get sharper. You also want it to be generally around the same time, becase your backtested results are dependent on the price you assumed you'd get. That is, if I tested making my wagers based on the odds that were available at 1 MTP, then I need to make sure my software runs after 1 MTP and uses the 1 MTP price. Otherwise, how can I truly be sure what I tested is what might actually happen?

2. When the MTP time for a race strikes what my software determines to be the "GO" time, the software sources all of the racing data it needs (from various ADWs). It pulls in the horse names, the trainer, distance, whether it's dirt/turf, any current scratches, etc.

3. The data sourced in 2 is sent to the system which generates all of the features. These features are then sent through the model to generate a probability for each runner.

4. Given the probabilities from 3 and the odds data pulled from 2, the system calculates how much to bet given my current available bankroll and my betting parameters (currently .5 kelly)

5. The bets from 4 are placed with the ADW which sends back a confirmation.

As I said, some might manually do some of this, but I prefer (and think it's more robust) to have it be automated from end to end. There are some gotchas here. The biggest are first, the MTP clocks are not accurate as we all know. Often times, my program will bet at "0 MTP", but the race won't go off for another 5 minutes. The 2nd issue is that late scratches can and do happen after my program wagers. In this case, the wagers for late scratches are cancelled by the ADW, but my other bets stand. I've not interfaced with any teams, so I can't say for sure that this is how they do it.

Last edited by JerryBoyle; 01-16-2019 at 09:36 PM.
JerryBoyle is offline   Reply With Quote Reply
Old 01-16-2019, 10:43 PM   #3
Dave Schwartz
 
Dave Schwartz's Avatar
 
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,873
Jerry,

That was just an awesome post.

One thing...
Quote:
4. Source the data that you will use for the day of the race**. This is a big one. I have yet to find a historical data provider that will also provide me the same data prior to the start of the race. Anything that goes in to your features prior to a race, including data used to make bets, needs to be available prior to the race running. Things as simple as the names of the horses, the distance of the race, the trainer, the current available odds, pool sizes, etc, etc. THIS IS NOT TRIVIAL. Further complications are that it needs to match the data in your historical data. Your db has William Macy as the trainer, but your only source on the day of the race has the trainer listed as Bill Macy? You need to correct that, and you need to be able to do it in a quick way - we want to bet on many races throughout the day.
I know you know this, but this step has to be a function of the software itself instead of the data from the source.

The issue with trainer and jockey names is really a big deal. This is especially true for cross-track jockeys. That is, a jockey may have one name at (say) LAD and another at AP or OP.

EqB actually stores it as a unique number. However, because of the different-names-at-different-tracks issue, those numbers can actually change over time!

Thus, when EqB sends data to a vendor (such as HDW), the data may say that a horse is being ridden by jockey# 8,376.

Sometime in the future, it may be discovered (at EqB) that this jockey was riding under two separate names. In their database, the data will be corrected. However, it will not be changed in YOUR DATABASE. Hence, the issue.

Even the names of horses can change, although it is very rare these days. Think back to the days of seeing a horse in the print Form with the notation, "Formerly raced as [old name]" underneath it.

Managing a long-term database is not for the faint of heart, that's for sure.

BTW, when discussing data vendors, they are not all created equal. I've been with HDW since May of 2001 and have come to respect the work of Ron Tiller. The lengths he goes to in order to insure data integrity is amazing.
Dave Schwartz is online now   Reply With Quote Reply
Old 01-16-2019, 10:44 PM   #4
PIC6SIX
Registered User
 
Join Date: Jun 2005
Posts: 218
Jerry thanks SO MUCH for that informative and comprehensive answer to my question/thread. My hat is off to you for your expertise and diligence in developing your program. I am at the point in my life where I cannot even handle a problem with my cable provider. With that said I am still glad they still make pencils so I can still handicap the races the old fashioned way. Sincerely, I admire your expertise. Wishing you have a profitable 2019.
PIC6SIX is offline   Reply With Quote Reply
Old 01-16-2019, 10:44 PM   #5
Jeff P
Registered User
 
Jeff P's Avatar
 
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,258
That's a pretty good post Jerry.

One area that historically had been problematic for me are scratches, distance changes, surface changes, and rider changes.

I ended up building out a fairly robust system to handle those.

When a late scratch or other late change occurs I can import the change - or alternately, key it manually - and within a few seconds recalculate the individual race.


-jp

.
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com
Jeff P is online now   Reply With Quote Reply
Old 01-17-2019, 09:27 PM   #6
JerryBoyle
Veteran
 
Join Date: Feb 2018
Posts: 845
Quote:
Originally Posted by PIC6SIX View Post
Jerry thanks SO MUCH for that informative and comprehensive answer to my question/thread. My hat is off to you for your expertise and diligence in developing your program. I am at the point in my life where I cannot even handle a problem with my cable provider. With that said I am still glad they still make pencils so I can still handicap the races the old fashioned way. Sincerely, I admire your expertise. Wishing you have a profitable 2019.
No problem, love talking about this stuff. I'm fortunate in that my career has been as a software developer, so much of that came easy. Also can't say enough about how well Benter's paper summarizes what is needed.

Dave & Jeff, you are absolutely right. I've probably spent more time on that kind of code - stitching everything together, handling edge cases, etc - than I have actually writing and testing factors. The sole purpose of one of my programs that runs every morning is just to match trainer and jockey names between the ADW I use on the day of and the names in my historical db. It's ridiculous.

One of my goals is to get where you're at, Jeff - pick up scratches and adjust my bets accordingly.
JerryBoyle is offline   Reply With Quote Reply
Old 01-17-2019, 11:00 PM   #7
pandy
Registered User
 
pandy's Avatar
 
Join Date: Aug 2001
Location: Lehigh Valley, PA.
Posts: 7,464
Quote:
Originally Posted by PIC6SIX View Post
I am asking only as a point of interest since I am 73 years old and a dyed in the wool pen and pencil capper. I would like to know from where, what and how you computer guys gather/manipulate your info. Do you buy downloads from DRF then sort such data according to your program (your own hcp parameters). Maybe someone would like to go through their step by step handicapping process and how long it takes to hcp an 8 race card at one track. No handicapping secrets asked.

I have several computer handicapping programs I've developed, one which uses Trackmaster exe files or brisnet multi files, and two others that use bris single data files. With the latest one I can handicap a card or find spot plays in less than ten minutes.
pandy is offline   Reply With Quote Reply
Old 01-17-2019, 11:21 PM   #8
Dave Schwartz
 
Dave Schwartz's Avatar
 
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,873
Quote:
Originally Posted by JerryBoyle View Post
The sole purpose of one of my programs that runs every morning is just to match trainer and jockey names between the ADW I use on the day of and the names in my historical db. It's ridiculous.
Jerry,

In the late 90s, we were vending data files. Part of the workload was reconciling the very thing you are talking about. I built alias files for trainers and jockeys. In the 1st two years, doing 8 tracks at a time, we had over 50,000 aliases!
Dave Schwartz is online now   Reply With Quote Reply
Old 01-17-2019, 11:54 PM   #9
Suff
Beat up 💪
 
Suff's Avatar
 
Join Date: Jun 2002
Location: Beach life in Fort Lauderdale
Posts: 11,938
Quote:
Originally Posted by JerryBoyle View Post
No problem, love talking about this stuff..
I enjoyed reading that, thank you Jerry.
Suff is offline   Reply With Quote Reply
Old 01-18-2019, 11:31 AM   #10
Jeff P
Registered User
 
Jeff P's Avatar
 
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,258
Quote:
Originally Posted by JerryBoyle View Post
No problem, love talking about this stuff. I'm fortunate in that my career has been as a software developer, so much of that came easy. Also can't say enough about how well Benter's paper summarizes what is needed.

Dave & Jeff, you are absolutely right. I've probably spent more time on that kind of code - stitching everything together, handling edge cases, etc - than I have actually writing and testing factors. The sole purpose of one of my programs that runs every morning is just to match trainer and jockey names between the ADW I use on the day of and the names in my historical db. It's ridiculous.

One of my goals is to get where you're at, Jeff - pick up scratches and adjust my bets accordingly.
Jerry, similar to what you describe, it seems I spend a lot more time building out systems that make model development possible than I do on model development itself.

In the other thread Dave wrote about the big teams sticking with global models.

Over the past five years I've probably invested 11,000 plus man hours building out a platform that, in addition to handling global models, handles distinct models for pretty much any significant situation that can be identified based on the features in my data.

Like everybody else, I can have a global model based on all races all tracks everywhere.

But I'm rapidly approaching the point where I can have as many distinct models as I want.

An example might be a distinct model designed for 8.5F races on the Tapeta on cold Wednesday nights in the fall at WOX.

Any time the parameters of a distinct model are found to be present based on the features in my past performance and/or supplementary data --

The distinct model, if persisted as Active, overrides the global.

The idea being that I'll eventually have the ability to scale to as many track-surface-distance-like situations (I call them SubGroups) as I see fit.

That is provided I live long enough.


-jp

.
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com

Last edited by Jeff P; 01-18-2019 at 11:37 AM.
Jeff P is online now   Reply With Quote Reply
Old 01-18-2019, 03:26 PM   #11
Dave Schwartz
 
Dave Schwartz's Avatar
 
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,873
Jeff,

I love the idea of local models. However, there are some real issues I've not been able to get past.

1. There are simply too many models to develop.
It is one thing to say, "Here's an AQU model." But, logically, the slicing and dicing continues after that.

Obviously, we're not going to handicap a 4f dash for 2yr olds the same way we'd handicap a Graded Stakes on the turf at 9f.

This slicing/dicing is what really expands the system/model list.


2. There are too many models to support.
We had a user a few years ago who was managing 1,300 "systems." Each one was built for a specific track-surface-distance.

The HSH software handles which system fires based in a given race automatically. That part was easy.

The problem comes in when you have to decide when a model needs to be rebuilt. Is it because of a random down-turn that is just a normal aberration or is the model itself fundamentally flawed?

When you start tossing around system counts like 1,300, it becomes a full-time job just to manage those models; just to decide when something is really wrong.


What most of us have done...
Most of our HSH users have gone to a completely dynamic, race-by-race modelling system. That is, when you open a race, HSH queries the database for races "like this one" (based upon whatever filtering parameters you've created.

Then the software builds a system from that data, based upon the factors you've selected. (There can be static factors involved as well.)

In this way, we really only have a single system to maintain!

Of course, we can still slice and dice the results to determine where our strengths and weaknesses are, and then address them across the entire approach.


Dave
Dave Schwartz is online now   Reply With Quote Reply
Old 01-18-2019, 07:51 PM   #12
Jeff P
Registered User
 
Jeff P's Avatar
 
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,258
Quote:
Originally Posted by Dave Schwartz View Post
Jeff,

I love the idea of local models. However, there are some real issues I've not been able to get past.

1. There are simply too many models to develop.
It is one thing to say, "Here's an AQU model." But, logically, the slicing and dicing continues after that.

Obviously, we're not going to handicap a 4f dash for 2yr olds the same way we'd handicap a Graded Stakes on the turf at 9f.

This slicing/dicing is what really expands the system/model list.


2. There are too many models to support.
We had a user a few years ago who was managing 1,300 "systems." Each one was built for a specific track-surface-distance.

The HSH software handles which system fires based in a given race automatically. That part was easy.

The problem comes in when you have to decide when a model needs to be rebuilt. Is it because of a random down-turn that is just a normal aberration or is the model itself fundamentally flawed?

When you start tossing around system counts like 1,300, it becomes a full-time job just to manage those models; just to decide when something is really wrong.


What most of us have done...
Most of our HSH users have gone to a completely dynamic, race-by-race modelling system. That is, when you open a race, HSH queries the database for races "like this one" (based upon whatever filtering parameters you've created.

Then the software builds a system from that data, based upon the factors you've selected. (There can be static factors involved as well.)

In this way, we really only have a single system to maintain!

Of course, we can still slice and dice the results to determine where our strengths and weaknesses are, and then address them across the entire approach.


Dave

Dave,

In JCapper there's something called a Prob Expression which is basically a user defined sql expression that gets executed during number crunching.

The query results from the executed sql expression are scored based on a set of rules called a Behavior which the user defines for the individual Prob Expression.

The scored query results from the Prob Expression are transformed by the Interface into factor numeric value, gap, and rank.

Just like any other factor such as an HDW final time speed fig, early pace fig, late pace fig, or distance pedigree rating, etc., the results from Prob Expressions can be used as inputs for models.

I mostly use Prob Expressions for scoring rider, trainer, post position, or how early or late pace have been performing recently at today's track-surface-dist, etc.

You mentioned there are a LOT of models.

Of course I agree with that.

When the idea of using Prob Expressions first came to me the objective was to create an Interface that could, on its own, learn from the data -- and make decisions from there. Very much like AI.

Early on, my Prob Expressions looked something like this:
Code:
SELECT TOP 150 * FROM STARTERHISTORY 
       WHERE TRACK = 'AQU'
       AND DIST = 1320
       AND INTSURFACE = 1
       AND RANKF20 = 1
       AND [DATE] < #01-18-2019#
       ORDER BY [DATE] DESC
Basically the above sql expression tells the Interface to go back in time (prior to today's date) pull up the most recent 150 starters matching the defined parameters (AQU 6F Dirt and CPace Rank=1) and score the query results.

Of course, structuring Prob Expressions like this means you need a distinct stored sql expression for every permutation of track-surface-dist and factor you want your model to handle.

That's way too much overhead.

I eventually learned to structure them like this:
Code:
SELECT TOP 150 * FROM STARTERHISTORY
       WHERE TRACK = {TRACK} 
       AND DIST = {DIST} 
       AND INTSURFACE = {INTSURFACE} 
       AND RANKF20 = {RANKCPACE} 
       AND [DATE] < {TODAY} 
       ORDER BY [DATE] DESC

Along the way I spent lots of programming hrs writing code to make the Interface recognize and respond to the characters inside the curly brackets --

For example, if the current race is at AQU, the Interface recocnizes the curly brackets and replaces {TRACK} with 'AQU'...

Likewise, the Interface recognizes {RANKCPACE} and replaces that with the actual CPace rank of the current horse.

It took me a while, but in the end, I accomplished the objective.

The Interface now has the ability to learn from the data -- and make decisions from there.

It's similar to the concept of a generative query in AI (introduced by Geoffrey Hinton.)

Imo, it's done in a way that doesn't require the user to do too much in the way of overhead.

For example, a track-dist-surface-cpace specific Prob Expression structured like the one with the curly brackets above works no matter what the track-dist-surface-cpace of the current horse in the current race happens to be.

Of course the next step is to get the Interface make meaningful follow up queries based on observations gleaned from these initial generative queries.

I'll stop here and post more as free time allows.



-jp

.
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com

Last edited by Jeff P; 01-18-2019 at 07:53 PM.
Jeff P is online now   Reply With Quote Reply
Old 01-18-2019, 10:03 PM   #13
Dave Schwartz
 
Dave Schwartz's Avatar
 
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,873
I followed all that.

We do similar, just template based instead of SQL.



The ultimate issue is that the query creates a system, but how do you know that the system is any good? First, you test backwards, but how do you track them ALL going forward?

Personally, I am not a big believer in angle handicapping simply because it is always based upon backfit.

Of course, you can always leave (say) a year of data going forward to test but with racing changing faster and faster, I really question whether or not races from (say) 5 years ago have much relevance this year.

So, instead, we test our approaches going forward, lumping them all together into a single system: Did it work? After a few hundred races you will have a good clue.

Even after a couple of hundred races we can diagnose WHY the system isn't working. For example, perhaps your contenders are not significantly outperforming your non-contenders.

One of the big targets for us is a series of questions that begin with:

1. Do your non-contenders that ultimately go off below 7/2 lose big money? What we want here is at least a 35% loss.

2. Do your contenders that ultimately go off below 7/2 lose small money? We'd like to see something around -12% or better.

There are more questions but even after question 1, if you can get the low odds non-contenders down to around a $1.10 $net (per $2) you can almost throw darts at the rest of the horses and be even.
Dave Schwartz is online now   Reply With Quote Reply
Old 01-19-2019, 09:28 AM   #14
sjk
Registered User
 
Join Date: Feb 2003
Posts: 2,105
I use one model for all dirt (and synthetic) racing. There are different track specific parameter around things like post position and prevailing track bias.

It seems to me that there should be underlying logic to how it all works that is grounded in the nature of the thoroughbred animals running on a dirt surface.

Using all of the history gives plenty of data to fit the model properly (although I did this 20 years ago when I had far less data than I have today).

I think when you start slicing the data into small (track specific) pieces it elevates the noise above the basic truth behind the model.
sjk is offline   Reply With Quote Reply
Old 01-19-2019, 09:42 AM   #15
Jeff P
Registered User
 
Jeff P's Avatar
 
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,258
I tend to agree with that Steve.


-jp

.
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com
Jeff P is online now   Reply With Quote Reply
Reply




Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 01:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.