PDA

View Full Version : Correlation between running lines and how the race develops


DeltaLover
08-29-2013, 01:09 PM
After a very long period of developing handicapping software, I have concluded in a hybrid solution using simple metrics, pattern recognition engines and personal judgement to derive betting decisions. I am following this approach for about a year with relatively good results.

I am reluctant to make quick (or any) changes to my process, since doing so has been proven a disaster, at least for a couple of times in the recent past. Still, since I will have some free time for the weeks to come I am thinking of doing some research that might reveal something interesting...

Although I do not have the intention to completely remove personal judgement I still try to minimize it if possible. For this I had keep detailed notes for my thought process for approximately 80 races, I have bet recently, trying to understand deeper what I am doing and to find out if there is any systematic process that I can express in code.

Going through the sample races I confirmed that (as expected) the most important aspect of my decision making process is the meta-handicapping procedure of detecting systematic errors of the crowd.

Of all the meta-handicapping factors I was able to monitor the most influential has to do with pace handicapping. For example I can see that some of my best bets were on sprinters who not only seemed outclassed but were also running against rivals who seemed to posses more early speed. The crowd is usually underestimating this type of runners creating the right conditions for an upset.

I found out that in all of these cases I followed the same detection pattern which although simple to detect when using personal judgement, is still complicated to express in coding. Deciding for example which running line to use seems to depend in many factors (a few of them are post position changes, class drops, distance, surface, track condition, weight, jockey, field size).

More than this I have to admit, that in my handicapping I follow the common principle that is used by pace handicappers, which assumes that having many early runners will result to a fast pace which will favor the closers and vice versa. Exactly this is the topic of this thread!

How you can quantify the projected pace of a race, given the past performances?

How valid is the common belief that the more early runners the more probable is a fast pace?

How you can prove that there is a significant correlation between the past performances and how the race evolves?

The objective is to create metrics for pace and final times for each start that exists in the past performances, derive a rating for each starter and compare the tuple of this ratings to the actual race verifying or rejecting the hypothesis of a correlation...

What are your thoughts?

CincyHorseplayer
08-29-2013, 02:38 PM
I'm not completely sure what you are asking?There are superficial ways to look at probable pace ie using running styles/Quirin numbers.From that I can say with multiple straight E horses the pace will be quick 90% of the time.If you have a high number,say 35 combined points on E/P types it could be fast,but if there is a single E horse it still gives that horse the lead.If you have the same number but they are all E/P7's and P6's they tend to clump up together in absence of an E horse because they can't help it.A 25 total points race including an E8,an E/P7,and 2 P5's will usually result in a strung out field that can strand closers even with an above average pace.But none of it matters if it's not verified by actual pace figures.Look at last year's BC Sprint.All the horses had around a 100 speed figure but Trinniberg had a pace figure over 115.Nobody else had near a 110.Lot of speed in the race but it was over.

DeltaLover
08-29-2013, 03:01 PM
From that I can say with multiple straight E horses the pace will be quick 90% of the time..


Part of what I am asking here is how can validate this exact statement. What methodology should you use to automate the validation process.

Going further, I like to use more primitive data than derivatives like quirin or any other metrics. What complicates the task is the fact that other parameters should be considered to the model. For example moving from outer to inner post position or stretching out from 5f to a mile.

To tackle this problem we need the following components:

- Measurement for the intermediate - final fractions

- Measurement for the closing fraction

- An algorithm receiving as an input all the past performances of a horse and returning a rating describing his early speed

- An algorithm to take the speed ratings of all the horses of the race returning an opinion about how fast the race will be

- A methodology to measure to what extend the pace projection is already reflected to the betting pools

CincyHorseplayer
08-29-2013, 03:06 PM
To the top point I would say that % in the PP's a horse was at the 1st call would be a valid starting point.Anyway,gott run,and talk to you later DL!





Part of what I am asking here is how can validate this exact statement. What methodology should you use to automate the validation process.

Going further, I like to use more primitive data than derivatives like quirin or any other metrics. What complicates the task is the fact that other parameters should be considered to the model. For example moving from outer to inner post position or stretching out from 5f to a mile.

To tackle this problem we need the following components:

- Measurement for the intermediate - final fractions

- Measurement for the closing fraction

- An algorithm receiving as an input all the past performances of a horse and returning a rating describing his early speed

- An algorithm to take the speed ratings of all the horses of the race returning an opinion about how fast the race will be

- A methodology to measure to what extend the pace projection is already reflected to the betting pools

traynor
08-29-2013, 04:56 PM
Part of what I am asking here is how can validate this exact statement. What methodology should you use to automate the validation process.

Going further, I like to use more primitive data than derivatives like quirin or any other metrics. What complicates the task is the fact that other parameters should be considered to the model. For example moving from outer to inner post position or stretching out from 5f to a mile.

To tackle this problem we need the following components:

- Measurement for the intermediate - final fractions

- Measurement for the closing fraction

- An algorithm receiving as an input all the past performances of a horse and returning a rating describing his early speed

- An algorithm to take the speed ratings of all the horses of the race returning an opinion about how fast the race will be

- A methodology to measure to what extend the pace projection is already reflected to the betting pools

That assumes that all races are of equivalent value in developing a model. That is rarely the case.

DeltaLover
08-29-2013, 05:36 PM
That assumes that all races are of equivalent value in developing a model. That is rarely the case.

Surely al races are not equivalent. Idealy all of their attributes should be part of the input tuple. This is not as easy as it sounds as it not only assumes a complete database but requires more processing power. I think I will start with a very simple model adding data to it in small increaments up until I get significant results.

TrifectaMike
08-29-2013, 07:32 PM
Part of what I am asking here is how can validate this exact statement. What methodology should you use to automate the validation process.

Going further, I like to use more primitive data than derivatives like quirin or any other metrics. What complicates the task is the fact that other parameters should be considered to the model. For example moving from outer to inner post position or stretching out from 5f to a mile.

To tackle this problem we need the following components:

- Measurement for the intermediate - final fractions

- Measurement for the closing fraction

- An algorithm receiving as an input all the past performances of a horse and returning a rating describing his early speed

- An algorithm to take the speed ratings of all the horses of the race returning an opinion about how fast the race will be

- A methodology to measure to what extend the pace projection is already reflected to the betting pools

- An algorithm to take the speed ratings of all the horses of the race returning an opinion about how fast the race will be

If you are using Bris type speed ratings or similar, you can use a Heirachical Bayes Model for Normal-Normal (with unknown mean and unknown variance). You can use Normal-Normal because the speed ratings are normally distributed and so are the observations.

Run a quick test take the last 6 speed ratings for each horse and combine them into one vector. Sort (high to low) the vector... drop the last 2 ratings in the vector. Compute the mean and median. You'll find that the mean and median are very close. So, for practical purposes they can be modeled as Normal.


Mike

TrifectaMike
08-29-2013, 07:42 PM
Part of what I am asking here is how can validate this exact statement. What methodology should you use to automate the validation process.

Going further, I like to use more primitive data than derivatives like quirin or any other metrics. What complicates the task is the fact that other parameters should be considered to the model. For example moving from outer to inner post position or stretching out from 5f to a mile.

To tackle this problem we need the following components:

- Measurement for the intermediate - final fractions

- Measurement for the closing fraction

- An algorithm receiving as an input all the past performances of a horse and returning a rating describing his early speed

- An algorithm to take the speed ratings of all the horses of the race returning an opinion about how fast the race will be

- A methodology to measure to what extend the pace projection is already reflected to the betting pools

- A methodology to measure to what extend the pace projection is already reflected to the betting pools.

Use my probability generating function based on tote-odds rank and field size and generate the probabilities.

Rank your pace projection and fiels size use the same probability generator and generate the probability vector. Then compare the two vectors using either distance or entropy to determine differences.

Mike

Maximillion
08-29-2013, 07:43 PM
Part of what I am asking here is how can validate this exact statement. What methodology should you use to automate the validation process.

Going further, I like to use more primitive data than derivatives like quirin or any other metrics. What complicates the task is the fact that other parameters should be considered to the model. For example moving from outer to inner post position or stretching out from 5f to a mile.

To tackle this problem we need the following components:

- Measurement for the intermediate - final fractions

- Measurement for the closing fraction

- An algorithm receiving as an input all the past performances of a horse and returning a rating describing his early speed

- An algorithm to take the speed ratings of all the horses of the race returning an opinion about how fast the race will be

- A methodology to measure to what extend the pace projection is already reflected to the betting pools


In an earlier post, didnt you say you pay zero attention to speed ratings?

traynor
08-29-2013, 07:55 PM
Surely al races are not equivalent. Idealy all of their attributes should be part of the input tuple. This is not as easy as it sounds as it not only assumes a complete database but requires more processing power. I think I will start with a very simple model adding data to it in small increaments up until I get significant results.

Understood. The difficult part is determining (and coding) a selection process to isolate the significant races from the cooler races, the no-go races, the workout races, the fishing expeditions at a different distance races, and so on. It is an interesting process, and one that has been the subject of a great deal of study.

DeltaLover
08-30-2013, 10:22 AM
In an earlier post, didnt you say you pay zero attention to speed ratings?

This is close to true. This does not mean that some ratings are not valid for handicapping purposes, a good example is Bris Prime Power which is great as a predictor of the final outcome. The reason I do not use this type of ratings is that they are highly correlated with the crowd's opinion. My approach is to create different ratings that although not so good as a general predictor can be combined with a specific decision tree model providing a betting signal with low frequency but higher expected value

DeltaLover
08-30-2013, 10:32 AM
- An algorithm to take the speed ratings of all the horses of the race returning an opinion about how fast the race will be

If you are using Bris type speed ratings or similar, you can use a Heirachical Bayes Model for Normal-Normal (with unknown mean and unknown variance). You can use Normal-Normal because the speed ratings are normally distributed and so are the observations.

Run a quick test take the last 6 speed ratings for each horse and combine them into one vector. Sort (high to low) the vector... drop the last 2 ratings in the vector. Compute the mean and median. You'll find that the mean and median are very close. So, for practical purposes they can be modeled as Normal.


Mike

Doc, that's food for thought for the upcoming post SPA break!

As far as :

- An algorithm to take the speed ratings of all the horses of the race returning an opinion about how fast the race will be

in this thread I am not examining the final time of the race but its opening call so I am expecting pace figures to show more value than the speed ratings.


If you are using Bris type speed ratings or similar, you can use a Heirachical Bayes Model for Normal-Normal (with unknown mean and unknown variance). You can use Normal-Normal because the speed ratings are normally distributed and so are the observations.


Does this mean that mean and sigma are becoming the unknowns?

Are you referring to each individual starter or to a projection of today's race?

DeltaLover
08-30-2013, 11:02 AM
If you are using Bris type speed ratings or similar, you can use a Heirachical Bayes Model for Normal-Normal (with unknown mean and unknown variance). You can use Normal-Normal because the speed ratings are normally distributed and so are the observations.


This is what I am thinking of doing. Let me give a simplified example to see if we are in agreement:

I measure pace using a scale with the following values:

Very slow
slow
average
fast
very fast

How this ratings are calculated is not important for the example.

Suppose that there are only two events causing a race to have fast pace:

- Has at least one very fast and at least two fast horses

- One of the very fast or fast horses is stretching out today for first time

In this case we have a simple network that can be modeled as the following:

http://www.codingismycraft.com/wp-content/uploads/2013/08/bayesian.jpg

TrifectaMike
08-30-2013, 11:40 AM
Doc, that's food for thought for the upcoming post SPA break!

As far as :



in this thread I am not examining the final time of the race but its opening call so I am expecting pace figures to show more value than the speed ratings.



Does this mean that mean and sigma are becoming the unknowns?

Are you referring to each individual starter or to a projection of today's race?

Does this mean that mean and sigma are becoming the unknowns?

Are you referring to each individual starter or to a projection of today's race?

You will infer each individuals horse's mean and sigma (both unknown) and the group mean and sigma (both unknown);

The group mean and sigma is interpreted as the race projection.

In practical terms all horses share information about each other when entered in a race.

in this thread I am not examining the final time of the race but its opening call so I am expecting pace figures to show more value than the speed ratings.

You mentioned speed ratings. My comment still applies to pace ratings. The fact that these ratings are normally distributed within a race is very important to understand. Speed ratings are NOTHING more than a transformation of a skewed distribution (time domain) to a normal distribution (speed rating domain) and NOTHING MORE!!!!!!!!!!!!!!!!!!

Mike


Mike

TrifectaMike
08-30-2013, 11:41 AM
This is what I am thinking of doing. Let me give a simplified example to see if we are in agreement:

I measure pace using a scale with the following values:

Very slow
slow
average
fast
very fast

How this ratings are calculated is not important for the example.

Suppose that there are only two events causing a race to have fast pace:

- Has at least one very fast and at least two fast horses

- One of the very fast or fast horses is stretching out today for first time

In this case we have a simple network that can be modeled as the following:

http://www.codingismycraft.com/wp-content/uploads/2013/08/bayesian.jpg

Two words: Bayesian Networks

TrifectaMike
08-31-2013, 11:38 AM
DL,

I don't believe it has to be as complicated as made out to be (does not imply it is easy).

My suggestion would be to do the following:

Probability of an entry winning the race.

Probability of an entry making the lead.

Probabilty of an entry winning the race given it makes the lead.

You have sufficent information to do reverse probabilities.

Mike

thaskalos
08-31-2013, 02:04 PM
How valid is the common belief that the more early runners the more probable is a fast pace?



This belief is valid...but I doubt the truthfulness of the corresponding belief -- that a fast early pace "favors the closers".

Contrary to what most handicappers believe...many front-runners are more "maneuverable" than they are thought to be. Given a competent ride...a confirmed front-runner is easily capable of staying a few lengths off the hot pace, and rally to win late.

I mean...if $2,500 claimers at Portland Meadows can do it...how hard can it be?

TrifectaMike
09-02-2013, 08:55 AM
This belief is valid...but I doubt the truthfulness of the corresponding belief -- that a fast early pace "favors the closers".

Contrary to what most handicappers believe...many front-runners are more "maneuverable" than they are thought to be. Given a competent ride...a confirmed front-runner is easily capable of staying a few lengths off the hot pace, and rally to win late.

I mean...if $2,500 claimers at Portland Meadows can do it...how hard can it be?

Why rely on anecdotal evidence, when empirical data exists. Gather the data and test the hypothesis.

Mike

DeltaLover
09-02-2013, 09:47 AM
Why rely on anecdotal evidence, when empirical data exists. Gather the data and test the hypothesis.

Mike

Exactly.

The essence of successful horse betting lies in this hypothesis test.

Any handicapping factor can be tested for the following two:

(1) Absolute significance

(2) Reflection of this significance to the betting pools

What interest us as bettors is (2).

Horse bettors and public handicappers are good when it comes to detect angles affecting the outcome of the race but in many cases they overbet them creating overlays elsewhere.

sjk
09-02-2013, 10:47 AM
Making a quantitative projection of the expected relative speed of the horses in a race is a key ingredient to my program. There are many choices that you make along the path to computing such data so some methods probably lead to better results than others.

I think lots of sophisticated players are making and acting on similar analyses; I think it would be worth your while to pursue it.

TrifectaMike
09-02-2013, 10:53 AM
Exactly.

The essence of successful horse betting lies in this hypothesis test.

Any handicapping factor can be tested for the following two:

(1) Absolute significance

(2) Reflection of this significance to the betting pools

What interest us as bettors is (2).

Horse bettors and public handicappers are good when it comes to detect angles affecting the outcome of the race but in many cases they overbet them creating overlays elsewhere.

DL,

You are absolutely correct (2) is critical. We take it a step further. We create oddslines for various predictors after considering (1) and (2) to determine what portion of the tote-odds reflect that particular predictor.

Later today I'll post an oddsline showing what the odds would be if the public bet a particular predictor. This oddsline is created with a universal factor then compared on a track to track basis. This allows us to determine what is being bet at each track ( and where the errors lie ).


We do this for each track meet (data) then use posterior distributions to predict if the "new" race data( this is done daily) is consistent with posterior distributions.

Mike

TrifectaMike
09-02-2013, 11:00 AM
Making a quantitative projection of the expected relative speed of the horses in a race is a key ingredient to my program. There are many choices that you make along the path to computing such data so some methods probably lead to better results than others.

I think lots of sophisticated players are making and acting on similar analyses; I think it would be worth your while to pursue it.

Very true. I've advocated a Hierachical Bayes for this many times. If done correctly a good projection can you a 0.92 ROI.

Mike

TrifectaMike
09-02-2013, 11:52 AM
Here is an oddsline, if the crowd were collectively betting speed


SUF 2013 0902 1 1210 D C 5
PGN Horse Odds
7 HEZA FOX 0.91
1 BROADWAY HAT 3.38
6 DADDY'S MAN 5.70
5 ANOTHER PEPPI 13.77
2 B J'S GIBSON 296.42


SUF 2013 0902 3 1980 D C 6
PGN Horse Odds
4 SUN DANCE MOON 1.67
2 BO BADGER 2.66
1 MR. ROESSINK 5.69
5 SYMPHONIC HERO 8.72
3 RIO BONITA 12.31
6 DEVIL APPROVED IT 29.09


SUF 2013 0902 4 1320 D C 7
PGN Horse Odds
1 OXFORD LASS 1.47
5 MARVELOUS MARGARET 2.13
1A PRINCESS APPEAL 9.95
4 PICK THE DOUBLE 11.04
2 STORMIN MARGARET 13.59
3 MIMI'S SUGAR 16.42
6 ARBELLE 18.18


SUF 2013 0902 6 1210 D C 6
PGN Horse Odds
5 LOCKED OUT 1.83
7 ASHQUAR 5.41
4 SYSTEM RESTORE 6.25
2 SAME DAY PLEASURE 7.55
3 TOCCET MAN 8.24
1 AMAICING GERRY 16.83


SUF 2013 0902 8 1760 T C 9
PGN Horse Odds
7 FORTHELOVEOF ANNA 1.25
1A GRAND BAHAMA 2.31
5 CINDYRELLA 2.51
3 TEN PIN TIDE 4.30
6 ZAI JIAN 5.09
9 AMBUSCADE 6.92
1 PRIZED DREAM 10.37
8 POBRECITA 14.98
10 GRAND MADAM 65.28


SUF 2013 0902 9 1760 D C 7
PGN Horse Odds
7 MISS SPEED DATING 0.54
3 SHOPPER WIFE 1.56
2 LITTLE MISS HUGHES 5.88
4 BACKSIDE DIVA 7.15
1 DOUBLE SHADOW 17.48
5 COP COOKIE 22.91
6 J. W.'S CRYSTAL 68.72



Mike

Elliott Sidewater
09-03-2013, 07:25 PM
Sometime between 15 and 20 years ago I developed a chart that separated races into normal, fast pace, and slow pace based on the total number of Quirin speed points for the whole field. The number of starters after scratches was required, and the charts for turf sprints, turf routes, dirt sprints, and dirt routes were all different. Also, maiden races were separate from races for winners (less total speed points, generally). The threshold values were based on information about speed points from a big statistical book written by Michael Nunamaker. If I find it I'll post the information here, as a payback for all of the hours of entertainment I've derived from reading the posts on PA.

It worked pretty well as I recall, despite the fact that I used my background in statistics to come up with the "too fast" and "too slow" speed point totals. Over a long period of time, I think the races with too much speed provided the best betting value, and to this day I love races like that when I find them.

Elliott Sidewater
09-03-2013, 07:41 PM
Thask:

I don't disagree with your statement about (not necessarily) favoring the closers. However, I'm sure you recognize the difference between a pathological need the lead speed horse and one who may run well without the lead. It is the races that are loaded with pathological speed that can at times yield tremendous betting value.

It's ironic that the best thing that can happen to a speed horse early in its career is to miss the break or get left at the post. I've seen horses improve dramatically in subsequent starts after running a decent or near winning race after that happens. It's a solid indicator of improving form in a lot of cases. This is a variation on Andy Beyer's "change of pace", which is still a productive handicapping factor more than 30 years after it first appeared in print.

Elliott