What could you do with a model that accurately predicts final race time? - Horse Racing Forum - PaceAdvantage.Com

JerryBoyle · 04-05-2019, 07:04 AM

Really two questions:

1. What could you do with this model?

2. What average error would you find good/ok/bad? That is, if the model, on average, is off by 1 second, would unacceptable would you consider this. Perhaps another way to think of it is, when looking at a race, how accurately could you guess the final race time?

I've toyed around with building one a few times, and they're reasonably accurate, but not great. The average error is about 1s across all races/distances. Obviously, 1s on a 6f race is much worse than 1s on 1 1/4 mile race. Originally, I thought I might use it as a way to determine if a track was running slower/faster on a given day by comparing the difference in estimate vs actual for all races.

FakeNameChanged · 04-05-2019, 07:32 AM

If your 1 second accuracy model eliminates half or maybe 3/4's of the field 85% of the time, I think that's useful. Maybe your decision tree then moves to a different model for the remaining horses.

i.e. if the remaining 3 or 4 horses are all early speed, or a mix of early, pressers or closers; how has the track been running? Is early speed holding up; or are closers dominating? If it's early, does the model for 1st or 2nd fractions say which is the best number, but not too fast? I've played around modeling the last fraction also. I liked the idea where you adjust the last fractions for early pace, so a horse running :49 gets its last fraction adjusted to compare to a horse running a :45 or :46 . Of course a lot of people do that, but in combo with what you're modeling, it would be interesting. Carry on.
(edit) On your second question, how predictive is your (+/-) .5 second accuracy for finding winners? Routes/Sprints? Dave Schwartz's pars might be helpful. My databases for the tracks I follow, have such a wide range for final times depending on time of the year. Maybe a model of track variant along with your final time is needed. I've sat at many races and watched them disc the track and the times become 3-5 seconds slower but not always. I don't how know anyone deals with these variations.

Gakiss2 · 04-05-2019, 07:35 AM

My thought is that it would help set that line between contender and non contender. If you seriously got to within +/- 1 second 99.5% of the time then you could safely eliminate horses which couldn't make your average time + 1 second. If you found a front runner that you felt will likely run under 1.2 seconds of your average then that would be a bet. And I think you already mentioned that it would help you decide if it was going to be a 'fast' or 'slow' race and that might point you toward a Front Runner vs. a Closer.

It sounds similar to Par times that some of the PP publishers put out there.

Are you basically basing it on average times from the entrants PP's? What happens to your model if there is a scratch? …. late scratch?

Not trying to be negative, just wanting to explore all potential scenarios.

JerryBoyle · 04-05-2019, 07:50 AM

Still processing the responses, but one thing I want to clear up is that the models I've played with/tested take the entire race as an input (all runners and some info about each) and outputs 1 number, which is expected time for the race. So I unfortunately don't have a time for each runner. Just the expected time for the race. Obviously, this is the model's expected time for the winner, but there's nothing connecting that time to which runner the model thinks will run that time. So operationally that restricts what can be done with this info.

barn32 · 04-05-2019, 08:06 AM

Quote:

Originally Posted by JerryBoyle

Still processing the responses, but one thing I want to clear up is that the models I've played with/tested take the entire race as an input (all runners and some info about each) and outputs 1 number, which is expected time for the race. So I unfortunately don't have a time for each runner. Just the expected time for the race. Obviously, this is the model's expected time for the winner, but there's nothing connecting that time to which runner the model thinks will run that time. So operationally that restricts what can be done with this info.

Why can't you compare (back test) the expected time with horses who have actually run that time (or within 1 tick, 2 ticks, etc.) and analyze the results?

098poi · 04-05-2019, 08:53 AM

Every race based on track, conditions and distance already has a par time associated with it. So a 6F race at track X for older claimers at $20,000 has a Beyer (or speed figure of your choice) number that would be associated with that race as the par. If you think your final time projection will be slightly more accurate then convert your final time to the appropriate Beyer. (Don't ask me how to do that but I think it is doable. I think each speed figure corresponds to a specific final time) Then look at your contenders and see if any have run at, near or above the speed figure. If you find one that has not raced at that distance or track but has corresponding figures you may have found yourself a good bet. This would only be of value if your projection is more accurate than what is already known as the par time or speed figure for that race. You may end up trying to do something that is already done. Good luck.

Gakiss2 · 04-05-2019, 10:53 AM

Quote:

Originally Posted by Whosonfirst

If your 1 second accuracy model eliminates half or maybe 3/4's of the field 85% of the time, I think that's useful. Maybe your decision tree then moves to a different model for the remaining horses.

i.e. if the remaining 3 or 4 horses are all early speed, or a mix of early, pressers or closers; how has the track been running? Is early speed holding up; or are closers dominating? If it's early, does the model for 1st or 2nd fractions say which is the best number, but not too fast? I've played around modeling the last fraction also. I liked the idea where you adjust the last fractions for early pace, so a horse running :49 gets its last fraction adjusted to compare to a horse running a :45 or :46 . Of course a lot of people do that, but in combo with what you're modeling, it would be interesting. Carry on.
(edit) On your second question, how predictive is your (+/-) .5 second accuracy for finding winners? Routes/Sprints? Dave Schwartz's pars might be helpful. My databases for the tracks I follow, have such a wide range for final times depending on time of the year. Maybe a model of track variant along with your final time is needed. I've sat at many races and watched them disc the track and the times become 3-5 seconds slower but not always. I don't how know anyone deals with these variations.

I absolutely get getting down to 3 or 4 contenders but I do wonder if only those should be considered for the pace scenario or do you have to account for (regarding pace set up) horses that you are pretty sure are going to fade before the stretch but are right up there in the faces of the horses that will eventually be battling in the stretch.

I raise the concern because I have absolutely handicapped on the basis of what I thought my 3 or 4 contenders would interact with each other only to have a horse with little to no chance mess up the pace scenario I had in mind. (I just hate it when horses don't behave as I think they should)

2c..

Gakiss2 · 04-05-2019, 10:56 AM

I had some similar thoughts when I first looked at this thread but then thought that getting back to the basics of figure making and going through the exercise might well still be valuable since a new approach may be found and who know, it might could be better than our current approach.

traveler · 04-05-2019, 11:38 AM

Your projections need to out-perform the publics prediction which is indicated by odds/probabilites.

If you're top projected time horse wins often enough when say the public has him as the 3rd choice you might have something.

storyline · 04-05-2019, 01:09 PM

If you're able to accurately project final figures for each horse this would be your starting point...work back from that.

Everything else is noise

TheOracle · 04-05-2019, 01:38 PM

Quote:

Originally Posted by JerryBoyle

Really two questions:

1. What could you do with this model?

2. What average error would you find good/ok/bad? That is, if the model, on average, is off by 1 second, would unacceptable would you consider this. Perhaps another way to think of it is, when looking at a race, how accurately could you guess the final race time?

I've toyed around with building one a few times, and they're reasonably accurate, but not great. The average error is about 1s across all races/distances. Obviously, 1s on a 6f race is much worse than 1s on 1 1/4 mile race. Originally, I thought I might use it as a way to determine if a track was running slower/faster on a given day by comparing the difference in estimate vs actual for all races.

Hey Jerry

If you’ve predicted the race time of a race say 6 Furlongs at 1:12.20 and that equates to some speed figure say 85

Would you then look at all the horses in the race to see if they come close to that number or final time most consistently?

I’m just curious as to how you would use the data for a particular race

Jeff P · 04-05-2019, 03:32 PM

Jerry, sounds like you've created a new race level feature (as opposed to a horse level feature.)

You might try using it to segment your development samples.

For example, create one development sample where, based on the feature, race time is projected to be between W and X.

Create another development sample where, based on the feature, race time is projected to be between Y and Z.

Run each of the samples (separately) through a third party stat tool (like the mlogit module in r.) Have a look at the suggested coefficients for the already existing horse level features in your model.

How much do the suggested coefficients for the already existing horse level features in your model differ from each other in each of the samples?

It's a bit more work, but sometimes it's possible to increase the accuracy of an overall model by using different coefficients for each set of circumstances (without changing the features.)

Hope I managed to type most of that out in a way that makes sense,

-jp

.

JerryBoyle · 04-05-2019, 06:31 PM

Quote:

Originally Posted by TheOracle

Hey Jerry

If you’ve predicted the race time of a race say 6 Furlongs at 1:12.20 and that equates to some speed figure say 85

Would you then look at all the horses in the race to see if they come close to that number or final time most consistently?

I’m just curious as to how you would use the data for a particular race

That's part of the reason for this post, I'm not exactly sure what all I could use it for. It sort of feels like it SHOULD be useful, but I'm not sure how.

Originally, I modeled race time with the intent of taking the residuals to estimate how fast or slow a race ran that day. Meaning, if the model predicts a time of 72.2s, but the actual time was 73.5s, then the race ran 1.3s faster than expected. You could then look at this across an entire card to try and gauge how fast/slow the track was. Non of the inputs to the model consider track surface condition or weather, meaning the difference between predicted and actual should at least be PARTIALLY due to conditions of the track. Obviously, there are many other things that will contribute to the difference, but the idea is that over a large sample of days, tracks with a higher residual will have run faster than expected.

When I did this though, I was sort of surprised that it got within +/- 1s of the final time on average, so I started to wonder what else it might be useful for. However, it's possible that +/- 1s of difference to actual time isn't even good, hence question 2.

JerryBoyle · 04-05-2019, 06:33 PM

Quote:

Originally Posted by Jeff P

Jerry, sounds like you've created a new race level feature (as opposed to a horse level feature.)

You might try using it to segment your development samples.

For example, create one development sample where, based on the feature, race time is projected to be between W and X.

Create another development sample where, based on the feature, race time is projected to be between Y and Z.

Run each of the samples (separately) through a third party stat tool (like the mlogit module in r.) Have a look at the suggested coefficients for the already existing horse level features in your model.

How much do the suggested coefficients for the already existing horse level features in your model differ from each other in each of the samples?

It's a bit more work, but sometimes it's possible to increase the accuracy of an overall model by using different coefficients for each set of circumstances (without changing the features.)

Hope I managed to type most of that out in a way that makes sense,

-jp

.

Definitely a possibility and totally makes sense, Jeff. I've looked at stratifying races before, but it always became very cumbersome and manual. I'll definitely give this a think though. Especially because it may lend itself better to a clustering algorithm which could stratify automagically, rather than me having to manually test thresholds.

storyline · 04-05-2019, 07:58 PM

Quote:

Originally Posted by JerryBoyle

I'm not exactly sure what all I could use it for. It sort of feels like it SHOULD be useful, but I'm not sure how.

It's kinda the central question to handicapping

Will be interesting to watch this thread die