Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > General Handicapping Discussion


Reply
 
Thread Tools Rating: Thread Rating: 2 votes, 5.00 average.
Old 05-01-2015, 04:15 AM   #1
Helles
Registered User
 
Join Date: Nov 2010
Location: Denver, CO.
Posts: 217
Backfitting Deluxe for the Kentucky Derby

I decided to look at the Derby races since 2005 (the extent of my database archive) and see if I could find any interesting factors that were significant in pointing to winners. Interesting as in "arcane".

I started with 900 factors and ran them against races from 2005 to 2014. As I progressed, I threw out factors that had little predictive effect and started over running the remaining factors again over the same 10 races. My program also plugs in different weights for the individual factors and will also run multiple factors together as an analyst. As a factor or analyst goes broke because the factor/analyst is not effective, it is replaced by a new factor/analyst with a new bankroll and a new set of factors and weights.

Eventually I whittled the 900 factors down to 48 that seemed the most effective. I really didn't know what to expect when I then checked my 48-factor weighted analyst against each individual Derby over the sample.

I thought perhaps it would hit the lower-priced winners as the analyst's top choice as we saw last year and the year before, but miss some higher-priced winners completely.

This is where the surprise came in. I unexpectedly found that the analyst had every winner save one in the top two choices. This included $102.60 winner Giacomo in 2005. The only winner that was not in the top two choices was actually the third choice; Mine That Bird who paid $103.20 in 2007.

Here are the results of the analyst against each individual race:
2014 1st choice wins paying 7.00
2013 1st choice wins paying 12.80
2012 2nd choice wins paying 32.60
2011 2nd choice wins paying 43.80
2010 1st choice wins paying 18.00
2009 3rd choice wins paying 103.20
2008 1st choice wins paying 6.80
2007 1st choice wins paying 11.80
2006 1st choice wins paying 14.20
2005 2nd choice wins paying 102.60

Perhaps somebody can explain or theorize how this analyst, using the same factors, weighted the same way, can do so well picking horses with such disparate final odds. I fully understand I am backfitting and backfitting to a VERY small sample. But one would not expect to see what works for a $7.00 horse also working for a $102.60 horse.

The idea of this exercise was to post some of the factors that seemed to point to Derby contenders. As I ran my study, I forced my program to make only one bet per race per factor/analyst. Running the study this way actually didn't do a very good job of picking contenders. (Perhaps I will rerun the study and force three or four bets per race) If one had boxed the top three choices for instance in an exacta, only one exacta would have been hit. However, wheeling the top three choices against the top 12 choices would have hit every exacta in the sample.

The next surprise came when I decided to look at this year's field. No big surprises until I glanced down the list and saw Stanford. I realized I had to scratch him and put in Frammento. I re-analyzed the race and this is how the analyst's top five shook out:

American Pharaoh
Firing Line
Frammento
Dortmund
Ocho Ocho Ocho
(Carpe Diem, Materiality, War Story, International Star and Frosted round out the top 10)

It will be interesting, to me at least, to see if the winner is in the top three this year. It would be no surprise if AP or FL won, but Frammento certainly would surprise.

Good luck to all.

Doug
Helles is offline   Reply With Quote Reply
Old 05-01-2015, 07:37 AM   #2
kevb
Registered User
 
Join Date: Sep 2007
Posts: 506
Thanks for sharing this.
kevb is offline   Reply With Quote Reply
Old 05-01-2015, 08:24 AM   #3
castaway01
Registered User
 
Join Date: Jul 2009
Location: NJ
Posts: 3,822
I like Firing Line to win it, so you're probably safe taking him out of your wagers.
castaway01 is offline   Reply With Quote Reply
Old 05-01-2015, 08:35 AM   #4
mikesal57
Veteran
 
mikesal57's Avatar
 
Join Date: Sep 2003
Location: NEW YORK CITY
Posts: 3,670
if you use this going forward and get winners in the top 2-3 ..you'll be changing the title of this post
and going to the bank!!
mike
mikesal57 is offline   Reply With Quote Reply
Old 05-01-2015, 09:10 AM   #5
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
As you say yourself, what you describe here is nothing else that a huge back-fit that simply memorizes the results and behaves accordingly; simply a through away model
__________________
whereof one cannot speak thereof one must be silent
Ludwig Wittgenstein
DeltaLover is offline   Reply With Quote Reply
Old 05-01-2015, 09:13 AM   #6
acorn54
Registered User
 
Join Date: Dec 2003
Location: new york
Posts: 1,631
the derby will always remain a mystery imo for a variety of significant reasons.
1-distance untested by ALL the contestants
2- with 20 horses, the element of racing luck much greater than normal, too much can go wrong during the 2 minutes of the race.
3-confidence level of model not high, because of uniqueness of race, unable to get a large enough sample for testing and validation.
acorn54 is offline   Reply With Quote Reply
Old 05-01-2015, 09:16 AM   #7
Magister Ludi
Registered User
 
Join Date: Oct 2012
Posts: 441
Monkeys and Typewriters

Quote:
Originally Posted by Helles
Perhaps somebody can explain or theorize how this analyst, using the same factors, weighted the same way, can do so well picking horses with such disparate final odds.
If you can afford the Purina Monkey Chow, you may eventually develop a model where you can predict the winner of every race and not just the Kentucky Derby.
Magister Ludi is offline   Reply With Quote Reply
Old 05-01-2015, 10:33 AM   #8
Helles
Registered User
 
Join Date: Nov 2010
Location: Denver, CO.
Posts: 217
It sounds like DeltaLover and Master Luidi are saying that this is the perfect combination of factors, 48 of them, weighted just right, to be able to get 9 out of 10 winners in the top 2. I had not considered that I would be able to do that using so many factors in my final model. But that actually makes sense. Using fewer factors, say 20, would probably not backfit so well and fewer winners would have been hit.

Thank you for the input.
Helles is offline   Reply With Quote Reply
Old 05-01-2015, 10:48 AM   #9
Helles
Registered User
 
Join Date: Nov 2010
Location: Denver, CO.
Posts: 217
Quote:
Originally Posted by acorn54
the derby will always remain a mystery imo for a variety of significant reasons.
1-distance untested by ALL the contestants
2- with 20 horses, the element of racing luck much greater than normal, too much can go wrong during the 2 minutes of the race.
3-confidence level of model not high, because of uniqueness of race, unable to get a large enough sample for testing and validation.
And this is probably exactly why the same model that backfit so well to winners was not useful in identifying contenders.
Helles is offline   Reply With Quote Reply
Old 05-01-2015, 03:39 PM   #10
whodoyoulike
Veteran
 
Join Date: Aug 2005
Posts: 3,428
Quote:
Originally Posted by Helles
...
This is where the surprise came in. I unexpectedly found that the analyst had every winner save one in the top two choices. This included $102.60 winner Giacomo in 2005. The only winner that was not in the top two choices was actually the third choice; Mine That Bird who paid $103.20 in 2007.

Here are the results of the analyst against each individual race:
2014 1st choice wins paying 7.00
2013 1st choice wins paying 12.80
2012 2nd choice wins paying 32.60
2011 2nd choice wins paying 43.80
2010 1st choice wins paying 18.00
2009 3rd choice wins paying 103.20
2008 1st choice wins paying 6.80
2007 1st choice wins paying 11.80
2006 1st choice wins paying 14.20
2005 2nd choice wins paying 102.60

Perhaps somebody can explain or theorize how this analyst, using the same factors, weighted the same way, can do so well picking horses with such disparate final odds. I fully understand I am backfitting and backfitting to a VERY small sample. But one would not expect to see what works for a $7.00 horse also working for a $102.60 horse...

The next surprise came when I decided to look at this year's field. No big surprises until I glanced down the list and saw Stanford. I realized I had to scratch him and put in Frammento. I re-analyzed the race and this is how the analyst's top five shook out:

American Pharaoh
Firing Line
Frammento
Dortmund
Ocho Ocho Ocho
(Carpe Diem, Materiality, War Story, International Star and Frosted round out the top 10)

It will be interesting, to me at least, to see if the winner is in the top three this year. It would be no surprise if AP or FL won, but Frammento certainly would surprise.

Good luck to all.

Doug

I don't know the reason for your results but, I hope you provide an update after the Derby is run (referencing this post #1). Let us know whether you made a wager and the type of wager(s) using your contenders.

I'm just curious, would it be an easy task to show the fractional call times for each race and the fractional positions of each of the winning horses in the prior Derby races?

I'm a believer in understanding a race's pace scenario.

I'd copyright the word "PaceAdvantage" but, there might be a problem trying this.
whodoyoulike is offline   Reply With Quote Reply
Old 05-01-2015, 03:48 PM   #11
Tom
The Voice of Reason!
 
Tom's Avatar
 
Join Date: Mar 2001
Location: Canandaigua, New york
Posts: 112,887
http://www.anddownthestretchtheycome...pace-1980-2011

Splits are here, for horse positions, you can get the charts free at EB, BRIS.
__________________
Who does the Racing Form Detective like in this one?
Tom is offline   Reply With Quote Reply
Old 05-01-2015, 04:29 PM   #12
whodoyoulike
Veteran
 
Join Date: Aug 2005
Posts: 3,428
Thanks Tom,

Appears the early and 6f fractions were very fast for 2005 - 2011. Now, where were the winners at each fractional call?
whodoyoulike is offline   Reply With Quote Reply
Old 05-01-2015, 06:25 PM   #13
Pensacola Pete
Veteran
 
Join Date: Jan 2010
Posts: 729
They did the same thing with the Dosage Index, backfitting the Derby results by adjusting the indices. They finally gave up when too many odd results made it impossible.
Pensacola Pete is offline   Reply With Quote Reply
Old 05-02-2015, 10:56 AM   #14
Dexter C. Hinton
dch
 
Join Date: Apr 2005
Location: New York City
Posts: 8
It is interesting how many individuals can only respond with:

a) Back Fitting (bah):

b) No statistical substance: (not enough data points, etc.)

For (a), the methodology you utilized of course was to determine, to your best ability, the correlated factors resulting from your analysis. It is easy to be critical and scream 'back fit'. I would rather give the benefit of the doubt, and say 'utilization of predictable elements'.

For (b), there will always be individuals who also scream, 'No statistical substance' because of limited sample size, etc. From a pure statistician's view that will always be true because we live is a world of limited data points (i.e., perhaps if you had detailed data back a hundred or so years, for this particular race, they may give you a break). I doubt it however, they would most likely tell you a 100+ data points is not enough for statistical relevance.

The interesting point of your post is that studying a specific track at a specific distance with a consistent level of participants (i.e., three (3) year olds, etc.), under ALL types of weather conditions, with what one can be said to be to many horses, utilizing a type of 'simple AI', you were able to legitimately test your algorithm against back races.

For the record, I do not put a lot of faith in 'black box handicapping'. However as old as I am, one thing I have learned is the game has changed. 'Dumb money' has become pretty much non-existent and wagers have to utilize all the resources in their toolbox. What you have articulated is one of those 'tools'.

This should not be criticized for what was done, but only questioned for its applicability to be useful in the future (in conjunction with the individual handicapping resources each one has).

As I am sure you are aware, it is much easier to criticize, combined with offering no legitimate ideas or alternatives.

Even if your findings do apply to this year, maybe next year also, and beyond, you can be rest assured that one year it will fail and the response you will get will be "What do you expect of back fitting, it had no statistical significance to begin with".

dch
05/02
10:37.31..31
Dexter C. Hinton is offline   Reply With Quote Reply
Old 05-02-2015, 11:07 AM   #15
traynor
Registered User
 
traynor's Avatar
 
Join Date: Jan 2005
Posts: 6,626
Quote:
Originally Posted by Dexter C. Hinton
It is interesting how many individuals can only respond with:

a) Back Fitting (bah):

b) No statistical substance: (not enough data points, etc.)

For (a), the methodology you utilized of course was to determine, to your best ability, the correlated factors resulting from your analysis. It is easy to be critical and scream 'back fit'. I would rather give the benefit of the doubt, and say 'utilization of predictable elements'.

For (b), there will always be individuals who also scream, 'No statistical substance' because of limited sample size, etc. From a pure statistician's view that will always be true because we live is a world of limited data points (i.e., perhaps if you had detailed data back a hundred or so years, for this particular race, they may give you a break). I doubt it however, they would most likely tell you a 100+ data points is not enough for statistical relevance.

The interesting point of your post is that studying a specific track at a specific distance with a consistent level of participants (i.e., three (3) year olds, etc.), under ALL types of weather conditions, with what one can be said to be to many horses, utilizing a type of 'simple AI', you were able to legitimately test your algorithm against back races.

For the record, I do not put a lot of faith in 'black box handicapping'. However as old as I am, one thing I have learned is the game has changed. 'Dumb money' has become pretty much non-existent and wagers have to utilize all the resources in their toolbox. What you have articulated is one of those 'tools'.

This should not be criticized for what was done, but only questioned for its applicability to be useful in the future (in conjunction with the individual handicapping resources each one has).

As I am sure you are aware, it is much easier to criticize, combined with offering no legitimate ideas or alternatives.

Even if your findings do apply to this year, maybe next year also, and beyond, you can be rest assured that one year it will fail and the response you will get will be "What do you expect of back fitting, it had no statistical significance to begin with".

dch
05/02
10:37.31..31
I disagree. The point is not criticism of the OPs intent. It is criticism of a process used by many to create "betting models" that don't work in the real world. Especially if one bets on them.
traynor is offline   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 06:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.