DAYS SINCE LAST START - Page 2 - Horse Racing Forum - PaceAdvantage.Com

AITrader · 11-19-2012, 04:33 AM

Quick 'n dirty formula to optimize days-last-race is:

( days-last-race_horse - days-last-race-avg-for-this-race ) / days-last-race-std-dev-for-this-race

This assumes that trainers/owners will, on average, optimize layoffs for this particular class, gender, type, etc of horse.

Horses with unknown or no history should be set to the average value.

I also recommend scaling values from 0.5 to -0.5, or whatever normalization is appropriate for your system.

raybo · 11-19-2012, 08:00 AM

Quote:

Originally Posted by traynor

The same thing happens with ROI figures with a few aberrant mutuels tossed into the mix.

Agree, that's why one must scan the individual numbers and/or look at median as well, etc..

Overlay · 11-19-2012, 08:50 AM

Quote:

Originally Posted by traynor

The same thing happens with ROI figures with a few aberrant mutuels tossed into the mix.

That's why I'm surprised with the number of posts that I see on the board that emphasize the ROI of various methods (especially when comparing one method with another), since that statistic is subject to outlier bias (as you note), and also to shrinkage as more people begin to play the method (assuming that everyone who plays it ends up betting the same selections).

DeltaLover · 11-19-2012, 09:33 AM

Quote:

Originally Posted by Overlay

That's why I'm surprised with the number of posts that I see on the board that emphasize the ROI of various methods (especially when comparing one method with another), since that statistic is subject to outlier bias (as you note), and also to shrinkage as more people begin to play the method (assuming that everyone who plays it ends up betting the same selections).

I could not agree more...

The following is the outcome of a genetic algorithm that is optimizing for bankroll growth. The fitness is the PNL of a hypothetical 10,000 starting bank roll.

Starters with less than 2-1 odds are not considered while winners at more than 10-1 are truncated to to 10-1

The training universe consists of 5,000 randomly selected races:

(I've just realized that what appears as win% is actually win probability and has to be multiplied by 100 to become percentage)

Code:

generation#:  149

All chromosomes  for generation 

BF: 0.40 total bets:   2008 win: 0.21%  ROI: 1.22 mean odds: 5.42 max odds: 10.00 min odds: 2.05 fitness =  43625.0
BF: 0.40 total bets:   2008 win: 0.21%  ROI: 1.22 mean odds: 5.42 max odds: 10.00 min odds: 2.05 fitness =  43625.0
BF: 0.40 total bets:   2008 win: 0.21%  ROI: 1.22 mean odds: 5.42 max odds: 10.00 min odds: 2.05 fitness =  43625.0
BF: 0.67 total bets:   3358 win: 0.16%  ROI: 0.88 mean odds: 6.14 max odds: 10.00 min odds: 2.05 fitness =  -40695.0
BF: 0.06 total bets:    286 win: 0.25%  ROI: 1.10 mean odds: 3.98 max odds: 10.00 min odds: 2.05 fitness =  2995.0
BF: 0.62 total bets:   3112 win: 0.09%  ROI: 0.68 mean odds: 8.22 max odds: 10.00 min odds: 2.05 fitness =  -100370.0
BF: 0.64 total bets:   3199 win: 0.10%  ROI: 0.68 mean odds: 8.17 max odds: 10.00 min odds: 2.05 fitness =  -101875.0
BF: 0.01 total bets:     52 win: 0.37%  ROI: 1.32 mean odds: 3.04 max odds: 8.60 min odds: 2.05 fitness =  1640.0
BF: 0.64 total bets:   3186 win: 0.13%  ROI: 0.81 mean odds: 6.46 max odds: 10.00 min odds: 2.05 fitness =  -59215.0
BF: 0.10 total bets:    516 win: 0.23%  ROI: 1.11 mean odds: 4.43 max odds: 10.00 min odds: 2.05 fitness =  5755.0
BF: 0.55 total bets:   2739 win: 0.14%  ROI: 0.85 mean odds: 6.47 max odds: 10.00 min odds: 2.05 fitness =  -40770.0
BF: 0.02 total bets:     86 win: 0.24%  ROI: 1.29 mean odds: 4.40 max odds: 10.00 min odds: 2.05 fitness =  2480.0
BF: 0.67 total bets:   3325 win: 0.16%  ROI: 0.96 mean odds: 6.24 max odds: 10.00 min odds: 2.05 fitness =  -14535.0
BF: 0.72 total bets:   3579 win: 0.11%  ROI: 0.76 mean odds: 7.62 max odds: 10.00 min odds: 2.05 fitness =  -85750.0
BF: 0.00 total bets:     10 win: 0.40%  ROI: 2.23 mean odds: 6.31 max odds: 10.00 min odds: 2.10 fitness =  1225.0
BF: 0.00 total bets:     18 win: 0.39%  ROI: 1.56 mean odds: 3.84 max odds: 10.00 min odds: 2.10 fitness =  1015.0
BF: 0.54 total bets:   2684 win: 0.17%  ROI: 1.06 mean odds: 6.42 max odds: 10.00 min odds: 2.05 fitness =  15470.0
BF: 0.64 total bets:   3222 win: 0.13%  ROI: 0.81 mean odds: 7.61 max odds: 10.00 min odds: 2.05 fitness =  -61590.0
BF: 0.04 total bets:    197 win: 0.23%  ROI: 0.91 mean odds: 3.57 max odds: 10.00 min odds: 2.05 fitness =  -1785.0
BF: 0.58 total bets:   2898 win: 0.14%  ROI: 0.91 mean odds: 6.87 max odds: 10.00 min odds: 2.05 fitness =  -27395.0
BF: 0.22 total bets:   1081 win: 0.09%  ROI: 0.64 mean odds: 8.26 max odds: 10.00 min odds: 2.05 fitness =  -39220.0
BF: 0.12 total bets:    625 win: 0.06%  ROI: 0.51 mean odds: 9.20 max odds: 10.00 min odds: 2.15 fitness =  -30795.0
BF: 0.15 total bets:    747 win: 0.08%  ROI: 0.50 mean odds: 8.25 max odds: 10.00 min odds: 2.10 fitness =  -37655.0
BF: 0.50 total bets:   2520 win: 0.16%  ROI: 0.80 mean odds: 5.42 max odds: 10.00 min odds: 2.05 fitness =  -50910.0
BF: 0.17 total bets:    858 win: 0.21%  ROI: 1.19 mean odds: 5.43 max odds: 10.00 min odds: 2.05 fitness =  16340.0
BF: 0.51 total bets:   2549 win: 0.05%  ROI: 0.40 mean odds: 9.41 max odds: 10.00 min odds: 2.15 fitness =  -154170.0
BF: 0.24 total bets:   1222 win: 0.07%  ROI: 0.46 mean odds: 8.11 max odds: 10.00 min odds: 2.05 fitness =  -66505.0
BF: 0.25 total bets:   1254 win: 0.10%  ROI: 0.66 mean odds: 7.94 max odds: 10.00 min odds: 2.05 fitness =  -42165.0
BF: 0.15 total bets:    770 win: 0.23%  ROI: 1.17 mean odds: 4.53 max odds: 10.00 min odds: 2.05 fitness =  12895.0
BF: 0.19 total bets:    939 win: 0.17%  ROI: 0.88 mean odds: 5.22 max odds: 10.00 min odds: 2.05 fitness =  -10865.0
BF: 0.41 total bets:   2048 win: 0.05%  ROI: 0.44 mean odds: 9.16 max odds: 10.00 min odds: 2.10 fitness =  -114350.0
BF: 0.67 total bets:   3368 win: 0.14%  ROI: 0.87 mean odds: 6.69 max odds: 10.00 min odds: 2.05 fitness =  -45255.0
BF: 0.80 total bets:   3983 win: 0.07%  ROI: 0.47 mean odds: 8.75 max odds: 10.00 min odds: 2.05 fitness =  -209940.0
BF: 0.20 total bets:   1015 win: 0.18%  ROI: 0.98 mean odds: 5.23 max odds: 10.00 min odds: 2.05 fitness =  -2520.0
BF: 0.63 total bets:   3147 win: 0.08%  ROI: 0.47 mean odds: 7.73 max odds: 10.00 min odds: 2.05 fitness =  -165895.0
BF: 0.71 total bets:   3542 win: 0.15%  ROI: 0.84 mean odds: 5.81 max odds: 10.00 min odds: 2.05 fitness =  -55760.0
BF: 0.59 total bets:   2940 win: 0.10%  ROI: 0.75 mean odds: 8.36 max odds: 10.00 min odds: 2.05 fitness =  -72890.0
BF: 0.15 total bets:    738 win: 0.20%  ROI: 1.12 mean odds: 5.25 max odds: 10.00 min odds: 2.05 fitness =  8935.0
BF: 0.71 total bets:   3553 win: 0.11%  ROI: 0.61 mean odds: 6.75 max odds: 10.00 min odds: 2.05 fitness =  -138195.0
BF: 0.00 total bets:     15 win: 0.33%  ROI: 1.76 mean odds: 3.25 max odds: 8.00 min odds: 2.05 fitness =  1135.0

number of chromosomes:  40 
Elitism Factor: 0.05 
Mutation Rate: 0.18

 winner chromosome: 
 0.287 -0.145 -0.100 0.354 -0.055 -0.243 0.428 0.152 0.202 0.464 0.335 -0.020 Fitness: 43625.0

****************************************************************************************************

The winner behavior is the following:

BF: 0.40 total bets: 2008 win: 0.21% ROI: 1.22 mean odds: 5.42 max odds: 10.00 min odds: 2.05 fitness = 43625.0

Now back testing this chromosome to a another randomly selected universe of 6,100 races none of them included to the original
we have the following behavior:

Code:

>ga_example.py 
final pnl:  8775.0
all races: 6100 
all best: 2447 
all winners: 495 
win: 20.23%
roi:  1.03586023702

Again I skip less than 2-1 and if a winner is more than 10-1 I reset it to 10-1

This is the same run without odd restrictions:

Code:

>ga_example.py 
final pnl:  23150.0
all races: 6100 
all best: 2447 
all winners: 495 win: 20.23%
roi:  1.09460563956

I thing this is a good demonstration of how misleading outliers can be in the calculation of ROI.

traynor · 11-19-2012, 09:35 AM

Quote:

Originally Posted by Overlay

That's why I'm surprised with the number of posts that I see on the board that emphasize the ROI of various methods (especially when comparing one method with another), since that statistic is subject to outlier bias (as you note), and also to shrinkage as more people begin to play the method (assuming that everyone who plays it ends up betting the same selections).

It is even more surprising to discover how adamantly opposed otherwise rational people become when it is suggested their ROI figures may be skewed by a few unusual (and unlikely to repeat) mutuel payoffs. That data is skewed by outliers is a given, and most researchers routinely correct for such. Accepting an unusually fast or unusually slow time on a given day and throwing it into an average figure for a DTV, for example, is an error that only a rank amateur would make. The same caution should be exercised in other areas of research as well--including calculations of average days off.

raybo · 11-19-2012, 09:44 AM

Quote:

Originally Posted by Overlay

That's why I'm surprised with the number of posts that I see on the board that emphasize the ROI of various methods (especially when comparing one method with another), since that statistic is subject to outlier bias (as you note), and also to shrinkage as more people begin to play the method (assuming that everyone who plays it ends up betting the same selections).

Agree. The method developer must include user options/customizations that allow emphasis to be placed on goals other than positive ROI, regarding degree, like number of plays, hit rate %, profit, etc..

traynor · 11-19-2012, 09:46 AM

Quote:

Originally Posted by DeltaLover

I could not agree more...

...

Again I skip less than 2-1 and if a winner is more than 10-1 I reset it to 10-1

I thing this is a good demonstration of how misleading outliers can be in the calculation of ROI.

Of all things related to horse race analysis, I think the one factor that will contribute more to becoming a consistent winner than any other is correcting for mutuel outliers. There are few things more frustrating than chasing rainbows believing that profit exists where it does not. And there are few things more beneficial than realizing that the apparent profit does not really exist, and that continued study and research is necessary to find it.

The upside is that when models are controlled for outliers, when and if those outliers crop up in future results, the actual ROI is always better than anticipated.

traynor · 11-19-2012, 09:55 AM

Quote:

Originally Posted by raybo

Agree. The method developer must include user options/customizations that allow emphasis to be placed on goals other than positive ROI, regarding degree, like number of plays, hit rate %, profit, etc..

Most useful is the ability to search results with toggles for odds ranges. When a model is developed that shows a positive ROI, rather than betting on it with both hands, the researcher is well advised to re-run the same data with the odds range set to a more reasonable range to see if the positive ROI still exists. That is, with filters set in a given pattern, an ROI that shows up when "any odds" are searched may change substantially when "only odds <= 10-1" are considered. Especially with small samples.

I understand that the topic of this thread is days off. My comments are not meant as a digression, but rather to point out the necessity of controlling for outliers in ANY type of serious research.

Dave Schwartz · 11-19-2012, 10:18 AM

Quote:

Most useful is the ability to search results with toggles for odds ranges. When a model is developed that shows a positive ROI, rather than betting on it with both hands, the researcher is well advised to re-run the same data with the odds range set to a more reasonable range to see if the positive ROI still exists. That is, with filters set in a given pattern, an ROI that shows up when "any odds" are searched may change substantially when "only odds <= 10-1" are considered. Especially with small samples.

Traynor,

Another technique that I like is adding some randomness. If a horse won by a nostril hair and pays $68 that should have different value than if the horse won by 12 lengths.

And sometimes thread drift is a good thing.

Dave

mountainman · 11-19-2012, 11:28 AM

Quote:

Originally Posted by Jeff P

For the all starters coming back to race within 45 days sample above:

Sum of days last start all starters: 5,822,538
---Number of starters in the sample: 273,000
---Avg number of days since last race: 21.33

For the MNR only sample above:

Sum of days last start all starters: 226,089
--Number of starters in the sample: 11,406
-Avg number of days since last race: 19.82

-jp

.

More than encompassing, perfectly distilled, and supportive of my argument. Many thanks, sir. As always, your thoroughness astounds me.

DeltaLover · 11-19-2012, 11:30 AM

Quote:

Originally Posted by traynor

Most useful is the ability to search results with toggles for odds ranges. When a model is developed that shows a positive ROI, rather than betting on it with both hands, the researcher is well advised to re-run the same data with the odds range set to a more reasonable range to see if the positive ROI still exists. That is, with filters set in a given pattern, an ROI that shows up when "any odds" are searched may change substantially when "only odds <= 10-1" are considered. Especially with small samples.

I understand that the topic of this thread is days off. My comments are not meant as a digression, but rather to point out the necessity of controlling for outliers in ANY type of serious research.

I agree .

Instead a generic model using several more specialized based in odds ranking or odds makes the whole process more reliable.

For example a complete strategy might consist of the following models:

Favorite
Less than 5-1
Less than 10-1
More than 10-1

We can specialize each model to have more than opinions meaning that the more than 10-1 might give us none, one, two or more candidates, or we can create negative models focusing for very low returns.

Having more than one final selections can be a sign to pass at first glance at least, although things get more complicated if we consider the possibilities of dutching or exotics and most likely we need another model to make this decision for us.

Although I am reluctant to bet long exotic propositions, the low takeout pick 5 running in CA looks interesting and multi selection models might be the way to go. I still find it hard though to bet more than one starter for the first spot albeit such an approach seems like a necessity to attack this pool.... I would certainly liked it better if the minimum was not set to fifty cents but was higher, preferably two dollars...

DeltaLover · 11-19-2012, 11:32 AM

Quote:

Originally Posted by Dave Schwartz

Traynor,
If a horse won by a nostril hair and pays $68 that should have different value than if the horse won by 12 lengths.
Dave

Why so?

I really cannot understand it. This is a binary event after all.

Dave Schwartz · 11-19-2012, 11:45 AM

It is a "binary event" if you choose for it to be.

In other words, if a horse wins by a nose and the race were run 100 times, that horse would likely not win 100 times. Perhaps it would be 50-50 with the 2nd horse. Or 52-48... you choose. But not likely 100-0.

On the other hand, if the horse won by 12 lengths then that is pretty much 100-0.

DeltaLover · 11-19-2012, 12:04 PM

Quote:

Originally Posted by Dave Schwartz

It is a "binary event" if you choose for it to be.

I have to disagree..

it is not what you 'choose', it is what happens in the real world.

It is a binary event because this is how the game operates.

If there was a spread like for football for example, then it would have been a different case.

Horse racing does not operate like this. A nose have the same value as Secretariat's Belmont.

mountainman · 11-19-2012, 12:50 PM

Quote:

Originally Posted by traynor

Most useful is the ability to search results with toggles for odds ranges. When a model is developed that shows a positive ROI, rather than betting on it with both hands, the researcher is well advised to re-run the same data with the odds range set to a more reasonable range to see if the positive ROI still exists. That is, with filters set in a given pattern, an ROI that shows up when "any odds" are searched may change substantially when "only odds <= 10-1" are considered. Especially with small samples.

I understand that the topic of this thread is days off. My comments are not meant as a digression, but rather to point out the necessity of controlling for outliers in ANY type of serious research.

Sharp post.