Predicting Speed Rating from Past Ratings - Horse Racing Forum - PaceAdvantage.Com

TrifectaMike · 11-04-2014, 10:38 AM

Let me ask a question, which appears to have an obvious answer.

I am fairly new to horse racing and have heard about the importance of speed ratings to predicting winners.

So, I think I've come up with a good idea.

"I'll take a horse's last four races and assign each speed rating a weight"

Last speed rating weighted by w1
Second back speed rating by w2
Third back speed rating by w3
Fourth back speed rating by w4

The question; how to assign values to the weights?

Some here have probably done something similar. Anything goes.

Mike

DeltaLover · 11-04-2014, 10:48 AM

p { margin-bottom: 0.1in; line-height: 120%; } I think that this problem is very well addresses by Bris Prime Power, which seems to have very high predictive value. One obvious way to try to reverse engineer the algorithm behind BPP, looks like an optimization problem that can be solved either by multiple regression or a genetic algorithm.

Still, I would question the betting value of this kind of a rating, since it is reflected to a good degree to the way the crowd is betting. Although such an approach has a high predictive power, it does not have a lot to add when it comes to profit making..

Clocker · 11-04-2014, 10:51 AM

Quote:

Originally Posted by TrifectaMike

the importance of speed ratings to predicting winners.

I suspect that there might be some friendly disagreement about this assumption.

And just for the sake of argument, are your contemplated weights constants, or are they values that would have to be computed for each horse in each race?

TrifectaMike · 11-04-2014, 10:54 AM

Quote:

Originally Posted by DeltaLover

p { margin-bottom: 0.1in; line-height: 120%; } I think that this problem is very well addresses by Bris Prime Power, which seems to have very high predictive value. One obvious way to try to reverse engineer the algorithm behind BPP, looks like an optimization problem that can be solved either by multiple regression or a genetic algorithm.

Still, I would question the betting value of this kind of a rating, since it is reflected to a good degree to the way the crowd is betting. Although such an approach has a high predictive power, it does not have a lot to add when it comes to profit making..

DL, I appreciate your response, but my question is very simple...really not addressing betting value or predictive value.

I looking for responses like w1 is greater than w2, etc (why?)
I would choose w1 =.6, w2 = .2, w3 = .1, w4 =.1 (why?)

Mike

TrifectaMike · 11-04-2014, 10:58 AM

Quote:

Originally Posted by Clocker

I suspect that there might be some friendly disagreement about this assumption.

And just for the sake of argument, are your contemplated weights constants, or are they values that would have to be computed for each horse in each race?

Constant weights for a population of horses.

Mike

DeltaLover · 11-04-2014, 11:11 AM

p { margin-bottom: 0.1in; line-height: 120%; } I would start with the what we are expecting from such a rating.

Obviously we expect it to behave similary to the BPP or the odds derived ranking (meaning the first ranked horse to win more the second, the second more then the third etc).

So, one way to describe the problem is the following:

SF11 * W1 + SF12 * W2 + SF13 * W3 + SF14 * W4 + C = R1

SF21 * W1 + SF22 * W2 + SF23 * W3 + SF24 * W4 + C = R2

SF31 * W1 + SF32 * W2 + SF33 * W3 + SF34 * W4 + C = R3

….

SFn1 * W1 + SFn2 * W2 + SFn3 * W3 + SFn4 * W4 + C = Rn

With the following condition

R1 < R2 < R3 < … < Rn

been true

Where:

SF11 = The first SF for the horse who came first

SF12 = The second SF for the horse who came first

….

C: a constant value

n: the number of starters in the race

We will probably need some kind of a normalization for all the available speed figures in the specific race.

The Wi can either be expected to be in a 'clear' number or some sort of an exponent (for example we might try exp(x) or something similar)..

TrifectaMike · 11-04-2014, 11:15 AM

Quote:

Originally Posted by DeltaLover

p { margin-bottom: 0.1in; line-height: 120%; } I would start with the what we are expecting from such a rating.

Obviously we expect it to behave similary to the BPP or the odds derived ranking (meaning the first ranked horse to win more the second, the second more then the third etc).

So, one way to describe the problem is the following:

SF11 * W1 + SF12 * W2 + SF13 * W3 + SF14 * W4 + C = R1

SF21 * W1 + SF22 * W2 + SF23 * W3 + SF24 * W4 + C = R2

SF31 * W1 + SF32 * W2 + SF33 * W3 + SF34 * W4 + C = R3

….

SFn1 * W1 + SFn2 * W2 + SFn3 * W3 + SFn4 * W4 + C = Rn

With the following condition

R1 < R2 < R3 < … < Rn

been true

Where:

SF11 = The first SF for the horse who came first

SF12 = The second SF for the horse who came first

….

C: a constant value

n: the number of starters in the race

We will probably need some kind of a normalization for all the available speed figures in the specific race.

The Wi can either be expected to be in a 'clear' number or some sort of an exponent (for example we might try exp(x) or something similar)..

A decay function, so that w1 is greater than w2, w2 is greater w3 and w3 is greater than w4. Sounds logical.

Mike

DeltaLover · 11-04-2014, 11:21 AM

Aside from using a GA, what is the best way to solve this problem analytically? I think it involves linear regression but at this moment I can not really grasp the problem to its full extend... Is there any other statistical method applicable to it?

Jeff P · 11-04-2014, 11:24 AM

If you look at some data it becomes apparent that more recent running lines should have (slightly) more weight than older running lines.

But beyond that - you might ask "What has changed?" from past lines running lines to today's running line not yet run - and go from there.

-jp

.

classhandicapper · 11-04-2014, 11:35 AM

I once created an extract file of older experienced horses that had run 4 consecutive races at the same track, at the same distance, on the same surface, on only fast tracks, with no trainer changes, no layoffs, no more than a 1 level class move, and no major trouble lines or extreme paces in those races.

So basically I tried to control for everything that could cause significant figure variations unrelated to ability.

I sent the data to an advanced stats expert (that I believe works for an NBA team now) to do a regression analysis to determine how to weight the previous 3 races in order to maximize my chances of predicting or getting close to the 4th race.

He gave me the weights, a few insights, and determined that I would gain very little by going back more than a horse's last 2 races vs. his last 3 races.

It was very interesting, except that most horses have very complex PPs that include all those things that cause wider variations and sometimes force you to go back further. So I'm not so sure the weights he gave me (or any other fixed formula) will accomplish the basic goal as well a subjective analysis of the PPs and the insight that all else being equal, the more recent races should carry more weight.

TrifectaMike · 11-04-2014, 11:37 AM

Quote:

Originally Posted by DeltaLover

Aside from using a GA, what is the best way to solve this problem analytically? I think it involves linear regression but at this moment I can not really grasp the problem to its full extend... Is there any other statistical method applicable to it?

Ok. I (the new guy in post 1) take your advice (linear regression) make a visit to a local university and ask for help. A grad student, for pay, is willing to help. He directs me to get a years data for all tracks and performs a linear regression.

A short time later he delivers his results. The grad student knows nothing about horse racing. His results show that one weight is insignificant and it is not w4, which if I believe that more recent performances are more important...seems odd to me. But he can't help any further.

Mike

Robert Goren · 11-04-2014, 12:04 PM

If you run a multiple regression with the SR in the last 4 races as the Xs and the SR in todays race as the Y(as I am sure you have done), you find the last race is the only one that matters much. I suppose you could look at the cases where the SR of the last race was way off as a predictor and see if the second race back was better then. Then you get in business of predicting "bad predicting last races", a whole new can of worms. What I found a number of years ago in a very limited study was the race SR was the best predictor in a almost cases. What also found was that there certain things that effected the standard error when comparing SRs to SRs. For instance being off 4 weeks had a SE of say 4 points, but being off 20 weeks had SE of 8 points. Those are made up numbers because the study was lost when an old computer died. There was also problem I ran across that I was not expecting. It was the regression to the base SR of todays SR of each horse. The problem developed as I see it was that the regression to "base SR" varied with class of the race horse. A Grade I stakes horses have a different "base SR" than 10k claimers. Each horse has base SR whether not we can figure out what it is. A 10 k claimer who ran a 98 would be expected to have its next SR drop while a G1 horse would be expected to have its SR rise. I know this sound like gibberish to those who haven't played with trying to predict SRs, but it is a real problem that has to be dealt with.

Dave Schwartz · 11-04-2014, 12:13 PM

Mike,

Not sure that you wish to address this or not but there needs to be a "fitness function" for similarity to today's race included in the weight.

That is, a turf race is far less predictive on dirt, a sprint less predictive in routes, etc.

Greyfox · 11-04-2014, 12:14 PM

Quote:

Originally Posted by TrifectaMike

Let me ask a question, which appears to have an obvious answer.

I am fairly new to horse racing and have heard about the importance of speed ratings to predicting winners.

So, I think I've come up with a good idea.

"I'll take a horse's last four races and assign each speed rating a weight"

Last speed rating weighted by w1
Second back speed rating by w2
Third back speed rating by w3
Fourth back speed rating by w4

The question; how to assign values to the weights?

Some here have probably done something similar. Anything goes.

Mike

The idea assumes that the horse tried his best in each of each of the last four races, running at the same class level, and the same distance, on a similar surface and with a clean trip and from a similar post each time.
It does not account for the "shape of the race dynamics" which can radically alter a runners final speed figure.
I don't think your idea will fly with all of those assumptions, but good luck building a regression formula that might improve your handicapping.

TrifectaMike · 11-04-2014, 12:34 PM

Quote:

Originally Posted by Greyfox

The idea assumes that the horse tried his best in each of each of the last four races, running at the same class level, and the same distance, on a similar surface and with a clean trip and from a similar post each time.
It does not account for the "shape of the race dynamics" which can radically alter a runners final speed figure.
I don't think your idea will fly with all of those assumptions, but good luck building a regression formula that might improve your handicapping.

Building a model is not the point here.

Let's move on.

After speaking with some experienced horse players and showing them the results I was given, they all agreed something was wrong.

Another trip to the university. Directed to a more advanced stat guy. He agrees that something doesn't make sense.

He tells me. I'll take your data and stratify it by track. And since I have the results of the races, I'll use a Binary regression (Logistic). All Greek to me. Okay by me.

A week later he informs me that he sees a similar pattern that was observed in the Linear Regression. But not to worry, because he'll redo the analysis controlling for class, distance, etc.

Okay. Whatever you say. Call me when you're done.

Mike