Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > Handicapping Software


Reply
 
Thread Tools Rate Thread
Old 03-05-2010, 09:53 PM   #1
2low
Registered User
 
Join Date: Dec 2007
Posts: 310
Black Box sample size?

So I'm in the early stages of adding tracks and testing my black box handicapping software. What's conventional wisdom for how many races are needed to gauge the box's ability? I'm placing win bets only at this point.
2low is offline   Reply With Quote Reply
Old 03-05-2010, 11:14 PM   #2
Handiman
BarelyWinning
 
Join Date: Oct 2005
Location: Santa Rosa, California
Posts: 2,828
Not sure what others might say, but I believe you need to have at least 200 races. That should give you a reality check at least.

Best to even break down distances, surfaces and other things such as tracks, class levels if possible.

Handi
Handiman is offline   Reply With Quote Reply
Old 03-05-2010, 11:52 PM   #3
2low
Registered User
 
Join Date: Dec 2007
Posts: 310
Quote:
Originally Posted by Handiman
Not sure what others might say, but I believe you need to have at least 200 races. That should give you a reality check at least.

Best to even break down distances, surfaces and other things such as tracks, class levels if possible.

Handi
Thanks - yes, I forgot to mention I have my software broken down by track, then sprint/route. Dirt and rubber balls only at this point.
2low is offline   Reply With Quote Reply
Old 03-06-2010, 08:27 AM   #4
goforgin
goforgin
 
goforgin's Avatar
 
Join Date: Mar 2008
Location: Carol Stream, IL
Posts: 220
I agree, it depends on how many factors there are in your black box (e.g. speed, pace, trainer/jockey %, earnings per start). Then further breakdown by class, distance, surface, odds, etc. 200 to 1,000 races should give you an idea of where you're at. In my opinion, tracking the class, distance, surface and odds is as important as what's in you black box. Then you may be able to determine what works for 4-1 to 9-1 contenders. Otherwise you may end up on a lot of 3-5 to 2-1's. Also, unless you have a fast and programmed tracking system, recommend keep the black box factors simple to start and then enhance or add-on down the road once you get comfortable with your process. Try not to "boil the ocean" as they say in the first few weeks.
__________________
"I want to be Bob Dylan, Mr. Jones wishes he was someone just a little more funky"
goforgin is online now   Reply With Quote Reply
Old 03-06-2010, 08:54 AM   #5
sjk
Registered User
 
Join Date: Feb 2003
Posts: 2,105
There may be practical reasons for starting with a sample this small but I would not place much stock in any tests unless far larger samples had been run (thousands or if possible tens of thousands of races).

If you test 200 races, a single $20 horse in a head bob can make a significant difference in your apparent return.

Going against the grain I would say that when you are dealing with such small samples to subdivide it into smaller groups is counterproductive.
sjk is offline   Reply With Quote Reply
Old 03-06-2010, 09:35 AM   #6
Dave Schwartz
 
Dave Schwartz's Avatar
 
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,918
I have seen some wonderful tests win at 800 bets and wind up losing at 2,000. I believe if you are still winning at 3,000 wagers you've locked in.

200 races would be just absurdly small, but median odds makes a huge difference. If you are playing horses that average (say) $8.00 then a 1,000 races should be plenty. If you have the occasional $35 horse then 3,000 is probably a good number.

Last edited by Dave Schwartz; 03-06-2010 at 09:36 AM.
Dave Schwartz is online now   Reply With Quote Reply
Old 03-06-2010, 11:32 AM   #7
2low
Registered User
 
Join Date: Dec 2007
Posts: 310
Thanks - this all helps give me an idea. I was expecting people to come in with 10,000 races

My box doesn't have a ton of factors. I've kept it pretty simple for now. I'm not a programmer, but I am a programmer wannabe, so I'm in excel right now with a lot of the work in VBA. My goal is to get it to around break-even and then learn Access and migrate my operation to a database for further refinement.

I'm purposely trying to toss out high priced horses, so a single monster fluke shouldn't be a problem. I'll be living in the 3/1 - 7/1 range.

I'm playing only my top pick with conditional odds, so I'm getting bets down on about 40% of the races I handicap so far, but I'm only about 60ish races into my testing overall. That said, I've set it up so I can see what would have happened with my 2nd - 4th picks as well. I'm also tracking what-if exacta and trifecta bets.

This is fun, win or lose. At the very least my loss per hour will skyrocket
2low is offline   Reply With Quote Reply
Old 03-06-2010, 04:03 PM   #8
gm10
Registered User
 
gm10's Avatar
 
Join Date: Sep 2005
Location: Ringkoebing
Posts: 4,342
Quote:
Originally Posted by 2low
So I'm in the early stages of adding tracks and testing my black box handicapping software. What's conventional wisdom for how many races are needed to gauge the box's ability? I'm placing win bets only at this point.
It depends on your model. The more complex you make it, the more data you need.
Bill Benter wrote that he needed 5 years' worth of data to get a stable model (that was in Hong Kong where there are two tracks who run for about 2/3 of the year I suppose).
gm10 is offline   Reply With Quote Reply
Old 03-06-2010, 05:21 PM   #9
Dave Schwartz
 
Dave Schwartz's Avatar
 
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,918
Hong Kong: 35 weeks of two racing cards per week.

700 races a year.
Dave Schwartz is online now   Reply With Quote Reply
Old 03-06-2010, 06:55 PM   #10
2low
Registered User
 
Join Date: Dec 2007
Posts: 310
I might just stick with Tampa and forget the rest
2low is offline   Reply With Quote Reply
Old 03-06-2010, 07:27 PM   #11
CBedo
AllAboutTheROE
 
Join Date: Aug 2006
Location: Denver
Posts: 2,411
It depends on your hit rate and avg payout as well. If you're trying to find statistical signifcance, there is a quick and dirty formula/explanation in Mitchell's Commonsense Betting. It provides some good rules of thumb and should provide you with a basis of things to think about.
__________________
"No problem can withstand the assault of sustained thinking" -- Voltaire
CBedo is offline   Reply With Quote Reply
Old 03-06-2010, 07:56 PM   #12
Handiman
BarelyWinning
 
Join Date: Oct 2005
Location: Santa Rosa, California
Posts: 2,828
The thing about sample size is directly related to time period. You could have every race for 10 years from 1923-1933....but what was winning then might not be winning now.

So size of sample is just part of the picture. When I mentioned 200 races, I was talking about 200 of whatever you were looking at, so if it's sprints then 200 of those. If it's 4 year olds then 200 of them and so forth.

Hope that clears that up a bit.

Handi
Handiman is offline   Reply With Quote Reply
Old 03-06-2010, 09:18 PM   #13
Jeff P
Registered User
 
Jeff P's Avatar
 
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,291
A few random thoughts/general guidelines worth considering...

First, sample size alone might not be enough. It almost goes without saying that you need to be measuring or modeling something that's causal in nature.

For example, I've seen some surprisingly large samples where if you flat bet $2.00 to win on every horse whose name happens to begin with the letter "H" you'd have shown a profit. Is that relevant? Unless someone fairly knowledgeable about breeding is be able to do a follow up study to show that the offspring of certain underlooked sires tend to end up with names beginning with the letter "H" then I'd tend to think something like that is just "noise" in the data set.

Second, if you happen to be looking at samples where you are measuring the right causal factors... and I think most good players have something in mind that fits their own definition of the right causal factors... Most samples - even 200 race samples, which are admittedly small, will tend to show some form of solid promise. So depending on the factor mix of whatever it is you are measuring or modeling, you might not need anything larger than a 200 race sample.

So how do you know when it's worth taking a shot at the windows relying on results from a smallish sample vs. a much larger one?

In practice, I like to validate my models.

By validate, I mean confront whatever it is you are measuring or modeling against a fresh set of races not used or seen in the development sample... with the development sample being defined as the data set you used to create the model in the first place.

There are a number of ways to do this. In my own JCapper samples I assign a randomly generated number to every starter in the database and use sql expressions to commit a certain percentage of starters to the development sample and the rest to the validation sample. Another way might be to simply have data from two different (recent) time periods sitting on different folders and use folder A for development and folder B for validation.

If your validation sample produces results that are similar to what was observed in the development sample then I'd tend to think you are onto something. One further point in this area is that the game is constantly (slowly) evolving. Therefore all models have shelf lives. Most of the time when you are working with past data you will be observing things that others will have had a chance to notice too. Much of the time what it is you are modeling will still have some shelf life left to it but your validation results will be just a skosh below your development results performance-wise. Models that fit this description can be thought of to be on the downward side of the shelf life cycle... with the shelf life having an unknown/yet to be determined duration.

Every once in a while you'll discover something towards the beginning of the shelf life cycle. Indications of that are that succesive validation samples outshine the previous one. (Believe it or not it does happen.)

When results produced by the validation sample leave a lot to be desired... as is frequently the case... experience has taught me that sitting on the sidelines and not putting the model into live play can be a wise thing to do.


-jp

.
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com

Last edited by Jeff P; 03-06-2010 at 09:21 PM.
Jeff P is offline   Reply With Quote Reply
Old 03-06-2010, 09:56 PM   #14
CBedo
AllAboutTheROE
 
Join Date: Aug 2006
Location: Denver
Posts: 2,411
To summarize, I think what most are saying is that sample size doesn't matter if the sample is bad!
__________________
"No problem can withstand the assault of sustained thinking" -- Voltaire
CBedo is offline   Reply With Quote Reply
Old 03-07-2010, 12:53 AM   #15
Handiman
BarelyWinning
 
Join Date: Oct 2005
Location: Santa Rosa, California
Posts: 2,828
Try and find a woman who says sample size doesn't matter....


Handi
Handiman is offline   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 11:47 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.