|
|
04-12-2019, 08:09 AM
|
#31
|
Veteran
Join Date: Feb 2018
Posts: 845
|
Quote:
Originally Posted by ultracapper
accurately predicting splits is exponentially more helpful than final time. An acceptable +- for final time should be pretty tight to be of value IMO. Maybe 2/5 give or take at 6 furlongs. Predicting 1:12.2 and accepting anything between 1:11.2 and 1:13.2 is much too liberal IMHO.
|
Value in what sense? My goal was to predict the estimated "fundamental" finishing time. By this I mean the finishing time considering nothing about the "condition" of today's surface. The hope being that the difference between predicted and actual would then be at least in part because of the condition of today's surface. Are you saying that variations in conditions only contribute up to +/- 2/5s at a 6f race? In general I'd love to hear from you/others on how much you might expect conditions to contribute to differences in time over days.
As an interesting thought experiment, if I could build a model that accurately predicts finishing time given nothing but all the runners' past information, as well as info about today's race that does not include condition of the surface, would that mean that condition of the surface isn't actually as big of a difference maker as people assume?
|
|
|
04-20-2019, 02:28 AM
|
#32
|
Registered User
Join Date: Nov 2015
Location: LNN
Posts: 524
|
so it predicts final times on given days ahead of time? i would think it'd be useful to predict track variants ahead of time and possibly pace scenarios which would lead you to being able to upgrade certain running styles and downgrade non-optimal running styles.
Long story short, just find the chaos races and box the 6 worst odds so you can hit one of those $4,000 10 centers.
__________________
They didn't take your money...You paid for lessons
|
|
|
04-20-2019, 12:09 PM
|
#33
|
Veteran
Join Date: Feb 2018
Posts: 845
|
Quote:
Originally Posted by deelo
so it predicts final times on given days ahead of time? i would think it'd be useful to predict track variants ahead of time and possibly pace scenarios which would lead you to being able to upgrade certain running styles and downgrade non-optimal running styles.
Long story short, just find the chaos races and box the 6 worst odds so you can hit one of those $4,000 10 centers.
|
Yep, that's exactly what it currently does. I like the idea of creating pace scenarios. It's trivial to adjust it to predict any fraction, not just final. So what I will do is generate a model for each point of call.
|
|
|
04-20-2019, 01:39 PM
|
#34
|
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,877
|
Quote:
Originally Posted by JerryBoyle
Really two questions:
1. What could you do with this model?
2. What average error would you find good/ok/bad? That is, if the model, on average, is off by 1 second, would unacceptable would you consider this. Perhaps another way to think of it is, when looking at a race, how accurately could you guess the final race time?
I've toyed around with building one a few times, and they're reasonably accurate, but not great. The average error is about 1s across all races/distances. Obviously, 1s on a 6f race is much worse than 1s on 1 1/4 mile race. Originally, I thought I might use it as a way to determine if a track was running slower/faster on a given day by comparing the difference in estimate vs actual for all races.
|
Respectfully, 1 second (plus or minus) is 10 lengths, and that is just not meaningful. It would encapsulate about 94% of the winners.
I did this many years ago and using a simple curvo-linear regression on each horse's pacelines got it down to +/- 3 lengths. It was still pretty worthless.
BTW, for those who want to try this approach, what you do is run the regression and remove the paceline with the largest error. Then you continue until you get down to 3 races.
(In about 90% of the horses you should be able to draw a curve between 3 pacelines.)
It pointed to the obvious horses and the price horses became the outliers.
|
|
|
04-21-2019, 09:15 AM
|
#35
|
Veteran
Join Date: Feb 2018
Posts: 845
|
Quote:
Originally Posted by Dave Schwartz
Respectfully, 1 second (plus or minus) is 10 lengths, and that is just not meaningful. It would encapsulate about 94% of the winners.
I did this many years ago and using a simple curvo-linear regression on each horse's pacelines got it down to +/- 3 lengths. It was still pretty worthless.
BTW, for those who want to try this approach, what you do is run the regression and remove the paceline with the largest error. Then you continue until you get down to 3 races.
(In about 90% of the horses you should be able to draw a curve between 3 pacelines.)
It pointed to the obvious horses and the price horses became the outliers.
|
Thanks, Dave, this is exactly some of the feedback I wanted. Looking at how many runners are on average included in the window between final time and final estimated time is an interesting way to do it. E.g. if the difference between expected and final is on avg 3s and if a 3s diff includes all runners, then it's totally useless.
I'd still be interested to hear from anyone how much track variants effect the final times of races day to day. I'm sure this is common knowledge for more experienced handicappers. Meaning, can a slow track slow a race down by more than a second on a given day? Or is the change usually smaller or larger?
Since coming back to this over the last 2 weeks, I've tweaked the model inputs a good deal and have gotten the average difference between final time and estimated time to .628 seconds. This covered ~67k races from 20160101-20190330. I've converted those differences to relative differences and the average relative difference to final race time is about .9%.
These differences have become my "variants", as well as some derivatives of them, like avg difference only on the specific surface. To test their "usefulness", I've included them in a fundamental model which does predict probability estimates of coming in first for each runner, similar to a conditional logit model. Holding all else equal, including specific races used, other metrics, etc, the variants make a significant impact on the final estimates vs not using them. However, that only answers one question, which is "is it useful to include these variants?". It doesn't tell me how good these variants are relative to variants created in a different manner, which I'd love to find out.
|
|
|
04-21-2019, 11:09 AM
|
#36
|
Registered User
Join Date: Feb 2003
Posts: 2,105
|
I have an average absolute value of track variants about 8.5 points (beyer scale) so the track variant often affects final time by more than a second. Several seconds would not be uncommon.
|
|
|
04-21-2019, 12:29 PM
|
#37
|
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,877
|
Quote:
Originally Posted by JerryBoyle
Thanks, Dave, this is exactly some of the feedback I wanted. Looking at how many runners are on average included in the window between final time and final estimated time is an interesting way to do it. E.g. if the difference between expected and final is on avg 3s and if a 3s diff includes all runners, then it's totally useless.
|
Remember that I was doing a regression on how they run when they run. Thus, there was a bias towards "How good they are" as opposed to "How good are they usually."
There is a big difference.
Quote:
I'd still be interested to hear from anyone how much track variants effect the final times of races day to day. I'm sure this is common knowledge for more experienced handicappers. Meaning, can a slow track slow a race down by more than a second on a given day? Or is the change usually smaller or larger?
|
Again, I have experience. Making good, projection-based track variants is not an easy task. Frankly, it is a lifestyle commitment. That is, you have to give up your current lifestyle to support it.
However, my belief would be that it could produce wonderful numbers.
Two caveats:
1. You must do ALL the races and not just a handful of tracks.
2. Average Daily Variants are pretty close to worthless.
Quote:
Since coming back to this over the last 2 weeks, I've tweaked the model inputs a good deal and have gotten the average difference between final time and estimated time to .628 seconds. This covered ~67k races from 20160101-20190330. I've converted those differences to relative differences and the average relative difference to final race time is about .9%.
|
That's pretty much what I found.
|
|
|
04-21-2019, 09:46 PM
|
#38
|
Veteran
Join Date: Feb 2018
Posts: 845
|
Quote:
Originally Posted by Dave Schwartz
2. Average Daily Variants are pretty close to worthless.
|
Hey Dave, I'm curious what you mean by this statement? Are you saying that variants change a lot from day-to-day to be of any use? That is, they must be averaged over many days/weeks/seasons?
This got me thinking about how I might test whether the variants I've created actually capture the information I believe they should capture. Given that they significantly increase the predictive power of a model which includes them, it seems they are capturing something relevant. However, I'd like to determine if they're capturing what I think they should be - whether a track is slower/faster than "expected".
One way I thought to do this is to measure a given day's average variant against the prior day's average. Presumably, a variant is caused by many factors, but the most important strike me as things that persist day-to-day such as track maintenance preference, weather, wear-and-tear, etc. This is to say, I'd expect a given day's surface conditions to correlate with a prior day's. Obviously, this will not always be the case, but on average, I wouldn't expect a surface to oscillate day-over-day from slow - fast - slow, etc. If this is true, then I'd expect there to be a strong correlation between prior day's variant and current day's. Here were the results using the relative difference between final race time and predicted race time (covering 2016-01-01 to 2019-03-30):
Code:
Correlation between prior day variant and current day variant: .521014
Linear regression using prior day variant as predictor and current day variant as target (x1 is prior day variant):
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.271
Model: OLS Adj. R-squared: 0.271
Method: Least Squares F-statistic: 6213.
Date: Mon, 22 Apr 2019 Prob (F-statistic): 0.00
Time: 01:23:21 Log-Likelihood: 36938.
No. Observations: 16676 AIC: -7.387e+04
Df Residuals: 16674 BIC: -7.386e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0061 0.000 27.610 0.000 0.006 0.007
x1 0.5230 0.007 78.821 0.000 0.510 0.536
==============================================================================
Omnibus: 4268.570 Durbin-Watson: 2.306
Prob(Omnibus): 0.000 Jarque-Bera (JB): 168508.100
Skew: 0.503 Prob(JB): 0.00
Kurtosis: 18.540 Cond. No. 32.4
==============================================================================
*One note about this analysis: it compares a given track-day's variant to the prior track-day. However, I didn't exclude gaps between track-days, so this will contain gaps as large as 1 year. Removing those data points will likely only increase the correlation, though.
I've attached a scatter plot with the fitted line. This doesn't tell me anything about the "quality" of this particular variant. It's entirely possible and likely that other variants are better, but I find this analysis interesting nonetheless (and i don't have access to any others).
As a follow up study, I think it'd be interesting to see if/when the correlation changes. That is, how long does it take a track to switch from fast - slow or vice versa.
Last edited by JerryBoyle; 04-21-2019 at 09:52 PM.
|
|
|
04-21-2019, 10:14 PM
|
#39
|
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,877
|
Quote:
Hey Dave, I'm curious what you mean by this statement? Are you saying that variants change a lot from day-to-day to be of any use? That is, they must be averaged over many days/weeks/seasons?
|
What I am saying is that every study I have ever done has proven that using an ADV approach (i.e. SR + TV) is far inferior to using no variant at all.
Surprisingly, it is just as true in the winter as in the summer and probably more so.
I know it sounds crazy but the ADV approach really does not work.
|
|
|
04-21-2019, 10:18 PM
|
#40
|
Veteran
Join Date: Feb 2018
Posts: 845
|
Quote:
Originally Posted by Dave Schwartz
What I am saying is that every study I have ever done has proven that using an ADV approach (i.e. SR + TV) is far inferior to using no variant at all.
Surprisingly, it is just as true in the winter as in the summer and probably more so.
I know it sounds crazy but the ADV approach really does not work.
|
Ahhh, got it, got it.
|
|
|
04-21-2019, 10:59 PM
|
#41
|
Registered User
Join Date: Nov 2015
Location: LNN
Posts: 524
|
I have a question. I'm not much of a programmer and only been playing horses for a few years, but I like to look at statistics guys' stuff here and there to learn a little at a time, maybe someday have the time to get into it more.
Anyways, I am curious as to if there's any logic in this and if not, I'd like to understand why.
So, the speeding rating and track variant are both based on final time of the entire race, correct.
Theory is that the most unstable part of the race is the beginning. You have different run-ups messing with the 2f call, you have extreme pace breaks, bad breaks, bumping, etc. The further you go, the more the race should "normalize" i would think. Breakouts settle a little, bad breaks recover a little, the field basically settles into their roles a little more. So in theory, as you progress through the race incrementals 2f to 4f, 4f to 6f, these segments should be more "stable" perhaps.
What if you throw out the first 2f. Consider the time 2f through Final the actual time of the race. Based off that time, create speed ratings and track variants the same way you normally would. Would this change anything? Would this be more reliable?
Is this a worthless thought or something worth messing with? Thanks in advance.
__________________
They didn't take your money...You paid for lessons
|
|
|
04-22-2019, 05:22 PM
|
#42
|
The Voice of Reason!
Join Date: Mar 2001
Location: Canandaigua, New york
Posts: 112,470
|
Quote:
What could you do with a model that accurately predicts final race time?
|
Sell it.
(Might be better than what some tracks are using to time races now!)
__________________
Who does the Racing Form Detective like in this one?
|
|
|
04-22-2019, 10:12 PM
|
#43
|
Registered User
Join Date: Jan 2006
Posts: 28,390
|
Quote:
Originally Posted by Tom
Sell it.
(Might be better than what some tracks are using to time races now!)
|
True enough.
__________________
Live to play another day.
|
|
|
04-22-2019, 10:25 PM
|
#44
|
Vancouver Island
Join Date: Dec 2010
Posts: 1,747
|
Quote:
Originally Posted by Tom
Sell it.
(Might be better than what some tracks are using to time races now!)
|
Read statement in one of the racing books way back when.
Time is only for those behind bars
Last edited by bob60566; 04-22-2019 at 10:27 PM.
|
|
|
04-22-2019, 11:23 PM
|
#45
|
Registered User
Join Date: Jan 2006
Posts: 28,390
|
Quote:
Originally Posted by bob60566
Read statement in one of the racing books way back when.
Time is only for those behind bars
|
The funny thing is...the guy who wrote that actually did 3 months in the can for passing bad checks.
__________________
Live to play another day.
|
|
|
|
|
Thread Tools |
|
Rate This Thread |
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|