PDA

View Full Version : Black Box sample size?


2low
03-05-2010, 09:53 PM
So I'm in the early stages of adding tracks and testing my black box handicapping software. What's conventional wisdom for how many races are needed to gauge the box's ability? I'm placing win bets only at this point.

Handiman
03-05-2010, 11:14 PM
Not sure what others might say, but I believe you need to have at least 200 races. That should give you a reality check at least.

Best to even break down distances, surfaces and other things such as tracks, class levels if possible.

Handi:)

2low
03-05-2010, 11:52 PM
Not sure what others might say, but I believe you need to have at least 200 races. That should give you a reality check at least.

Best to even break down distances, surfaces and other things such as tracks, class levels if possible.

Handi:)

Thanks - yes, I forgot to mention I have my software broken down by track, then sprint/route. Dirt and rubber balls only at this point.

goforgin
03-06-2010, 08:27 AM
I agree, it depends on how many factors there are in your black box (e.g. speed, pace, trainer/jockey %, earnings per start). Then further breakdown by class, distance, surface, odds, etc. 200 to 1,000 races should give you an idea of where you're at. In my opinion, tracking the class, distance, surface and odds is as important as what's in you black box. Then you may be able to determine what works for 4-1 to 9-1 contenders. Otherwise you may end up on a lot of 3-5 to 2-1's. Also, unless you have a fast and programmed tracking system, recommend keep the black box factors simple to start and then enhance or add-on down the road once you get comfortable with your process. Try not to "boil the ocean" as they say in the first few weeks.

sjk
03-06-2010, 08:54 AM
There may be practical reasons for starting with a sample this small but I would not place much stock in any tests unless far larger samples had been run (thousands or if possible tens of thousands of races).

If you test 200 races, a single $20 horse in a head bob can make a significant difference in your apparent return.

Going against the grain I would say that when you are dealing with such small samples to subdivide it into smaller groups is counterproductive.

Dave Schwartz
03-06-2010, 09:35 AM
I have seen some wonderful tests win at 800 bets and wind up losing at 2,000. I believe if you are still winning at 3,000 wagers you've locked in.

200 races would be just absurdly small, but median odds makes a huge difference. If you are playing horses that average (say) $8.00 then a 1,000 races should be plenty. If you have the occasional $35 horse then 3,000 is probably a good number.

2low
03-06-2010, 11:32 AM
Thanks - this all helps give me an idea. I was expecting people to come in with 10,000 races:eek:

My box doesn't have a ton of factors. I've kept it pretty simple for now. I'm not a programmer, but I am a programmer wannabe, so I'm in excel right now with a lot of the work in VBA. My goal is to get it to around break-even and then learn Access and migrate my operation to a database for further refinement.

I'm purposely trying to toss out high priced horses, so a single monster fluke shouldn't be a problem. I'll be living in the 3/1 - 7/1 range.

I'm playing only my top pick with conditional odds, so I'm getting bets down on about 40% of the races I handicap so far, but I'm only about 60ish races into my testing overall. That said, I've set it up so I can see what would have happened with my 2nd - 4th picks as well. I'm also tracking what-if exacta and trifecta bets.

This is fun, win or lose. At the very least my loss per hour will skyrocket:cool:

gm10
03-06-2010, 04:03 PM
So I'm in the early stages of adding tracks and testing my black box handicapping software. What's conventional wisdom for how many races are needed to gauge the box's ability? I'm placing win bets only at this point.

It depends on your model. The more complex you make it, the more data you need.
Bill Benter wrote that he needed 5 years' worth of data to get a stable model (that was in Hong Kong where there are two tracks who run for about 2/3 of the year I suppose).

Dave Schwartz
03-06-2010, 05:21 PM
Hong Kong: 35 weeks of two racing cards per week.

700 races a year.

2low
03-06-2010, 06:55 PM
I might just stick with Tampa and forget the rest:jump:

CBedo
03-06-2010, 07:27 PM
It depends on your hit rate and avg payout as well. If you're trying to find statistical signifcance, there is a quick and dirty formula/explanation in Mitchell's Commonsense Betting. It provides some good rules of thumb and should provide you with a basis of things to think about.

Handiman
03-06-2010, 07:56 PM
The thing about sample size is directly related to time period. You could have every race for 10 years from 1923-1933....but what was winning then might not be winning now.

So size of sample is just part of the picture. When I mentioned 200 races, I was talking about 200 of whatever you were looking at, so if it's sprints then 200 of those. If it's 4 year olds then 200 of them and so forth.

Hope that clears that up a bit.

Handi:)

Jeff P
03-06-2010, 09:18 PM
A few random thoughts/general guidelines worth considering...

First, sample size alone might not be enough. It almost goes without saying that you need to be measuring or modeling something that's causal in nature.

For example, I've seen some surprisingly large samples where if you flat bet $2.00 to win on every horse whose name happens to begin with the letter "H" you'd have shown a profit. Is that relevant? Unless someone fairly knowledgeable about breeding is be able to do a follow up study to show that the offspring of certain underlooked sires tend to end up with names beginning with the letter "H" then I'd tend to think something like that is just "noise" in the data set.

Second, if you happen to be looking at samples where you are measuring the right causal factors... and I think most good players have something in mind that fits their own definition of the right causal factors... Most samples - even 200 race samples, which are admittedly small, will tend to show some form of solid promise. So depending on the factor mix of whatever it is you are measuring or modeling, you might not need anything larger than a 200 race sample.

So how do you know when it's worth taking a shot at the windows relying on results from a smallish sample vs. a much larger one?

In practice, I like to validate my models.

By validate, I mean confront whatever it is you are measuring or modeling against a fresh set of races not used or seen in the development sample... with the development sample being defined as the data set you used to create the model in the first place.

There are a number of ways to do this. In my own JCapper samples I assign a randomly generated number to every starter in the database and use sql expressions to commit a certain percentage of starters to the development sample and the rest to the validation sample. Another way might be to simply have data from two different (recent) time periods sitting on different folders and use folder A for development and folder B for validation.

If your validation sample produces results that are similar to what was observed in the development sample then I'd tend to think you are onto something. One further point in this area is that the game is constantly (slowly) evolving. Therefore all models have shelf lives. Most of the time when you are working with past data you will be observing things that others will have had a chance to notice too. Much of the time what it is you are modeling will still have some shelf life left to it but your validation results will be just a skosh below your development results performance-wise. Models that fit this description can be thought of to be on the downward side of the shelf life cycle... with the shelf life having an unknown/yet to be determined duration.

Every once in a while you'll discover something towards the beginning of the shelf life cycle. Indications of that are that succesive validation samples outshine the previous one. (Believe it or not it does happen.)

When results produced by the validation sample leave a lot to be desired... as is frequently the case... experience has taught me that sitting on the sidelines and not putting the model into live play can be a wise thing to do.


-jp

.

CBedo
03-06-2010, 09:56 PM
To summarize, I think what most are saying is that sample size doesn't matter if the sample is bad! ;)

Handiman
03-07-2010, 12:53 AM
Try and find a woman who says sample size doesn't matter....:lol:


Handi:)

GameTheory
03-07-2010, 01:07 AM
If you come up with a new untested model based on ideas out of the blue, and you test it and it works great on 200 races, you've probably got something. But normally you tweak this and that, try again, improve your model (so you think), etc etc. You end up comparing competing models looking for a good one. There is a concept called degrees of freedom, and you only get so many of 'em. One of the principles involved that is hard for many to accept is that if you come up with something that works great (seemingly), it matters how many "tries" it took you to find that something great. A good-looking model found after many attempts is much much less likely to do well into the future than an mediocre model found after only an attempt or two. This doesn't make sense to most people, since it is what it is and what does it matter how I got here?

Another thing to realize when comparing models is that the best one got to be best because of luck. Unless you've made a quantum leap in improvement from one model to the next version, when comparing models with only a few percentage points between them, the best one is rarely the best going forward and only looks the best because it "got lucky" on your test sample. There is always a measure of noise/randomness/luck to any test and whatever model got luckiest for your test will look the best when usually some other model is actually a better generalizer going forward.

Tricky business.

2low
03-07-2010, 07:05 PM
If you come up with a new untested model based on ideas out of the blue, and you test it and it works great on 200 races, you've probably got something. But normally you tweak this and that, try again, improve your model (so you think), etc etc. You end up comparing competing models looking for a good one. There is a concept called degrees of freedom, and you only get so many of 'em. One of the principles involved that is hard for many to accept is that if you come up with something that works great (seemingly), it matters how many "tries" it took you to find that something great. A good-looking model found after many attempts is much much less likely to do well into the future than an mediocre model found after only an attempt or two. This doesn't make sense to most people, since it is what it is and what does it matter how I got here?

Another thing to realize when comparing models is that the best one got to be best because of luck. Unless you've made a quantum leap in improvement from one model to the next version, when comparing models with only a few percentage points between them, the best one is rarely the best going forward and only looks the best because it "got lucky" on your test sample. There is always a measure of noise/randomness/luck to any test and whatever model got luckiest for your test will look the best when usually some other model is actually a better generalizer going forward.

Tricky business.

Interesting stuff. Lucky for me, my excel black box would be such a pain in the rear end to tinker with I'm pretty much stuck with it as-is. Rerunning race cards would involve loading each one up one at a time again. Not something I'm planning to do:eek:

I am curious about the part I bolded. I'm assuming a "try" is a tweak to an existing system, correct? I think I understand the theory. Basically, if you come up with a mediocre model early, you have a good base, but if you come up with a good model after many little tweaks, the base may not be so good, and it may fall apart as the tweaks become a more significant part of the overall data. Am I on the right track?

formula_2002
03-07-2010, 08:26 PM
I have spent a considerable amount of time, about 2 years, working with JCapper (Jeff Platt’s work of art). Through those two years I’ve gathered a data base of about 300,000 to 400,000 ( a bit small in my estimation).
Beyond that I’ve testing models for more than 10-15 years.
It has gotten to the point where I’ve have taken a break from the testing (stopped about 10-20-2009) and just play the model.
I’m stunned at how well it is working.
To me the key is lots and lost of data, good racing factors and analysis techniques that are somewhat unique. I don’t want to mention what they are..I want to keep them unique. :)
But they are very “computer program” intensive

RonTiller
03-08-2010, 12:36 PM
Jeff's post is spot on in my opinion.

I recommend inserting a factor for each horse that is a random number scaled 1 to 100, then ranked. Also assign a random factor to each race - in other words, randomly assign each race the letter A, B, C, D, E...

Then, with whatever sample you have, see if you can find profitable plays from various subsets. #1 ranked Random Factor may show a good win % at CRC and a lousy win % but positive ROI at MNR. Even worse, when you drill down, you may find that at SA, #1 ranked Random Factor does really well in B races but is 0 for 30 in A races (remember, which races are labeled A and B is entirely random).

You'll also find track biases on the random data. I recall when I did this experiment years ago that AQU had a period of 4 days where #1 ranked Random Factor was winning 4 to 5 races a day, many big to huge prices. Then there was a 2 week period where it didn't produce even a place horse. Imagine you start testing at the beginning of this 2 week period - you'll swear you have an automatic throw out.

Once I had 200 races in a development database I used to validate the programming, not to do any serious research yet. I calculated a factor and was flabergasted that I hit over 40% winners with a nice ROI and a nice spread of low, medium and higher odds horses. Even throwing out the highest paying 3 horses showed a big profit. I was excited and immediately ran 1500 races as a follow up test. Result: 13% winners, 78 cent ROI. WTF? I double checked my work (which was the point of the test database to begin with) and the fabulous number I created was meaningless because I had made a stupid error and generated an essentially random number. But it looked good after 200 races, even though in the long term, it performed as it should have: 13% winners, 78 cent ROI. That was discouraging. So maybe 200 is not enough but 1500 may be. Or not. Sigh...

There is nothing like seeing how random data can clump and LOOK meaningful and how random data can produce profitable random subsets - it changed my perspective in a way that reading a book, article or a post here cannot. It is sobering but also empowering, as you can start to see with real data the sample levels at which these random factors and random subsets really start looking like something random instead of like the keys to the mint.

Ron Tiller
HDW

Light
03-08-2010, 02:15 PM
In my experience,it's not that crucial to have an x number of theoretical plays because there seems to be a qualitative difference between theory and reality. Of course we all get the theoretical protocol first.Then when applying the reality factor find we are usually fighting an uphill financial battle.

As soon as you make your first bet for real it is only bet #1 under that condition. I've had 80%-90% hit rates with no dry spells for a betting criteria in theory that start out at 10% in reality. Its the reality bets that count and if you want to know how many,its going to have to be under real conditions. We usually dont last long enough to find out.So we go back to the drawing board over and over again.

If you got something that seems like a good thing,you dont need thousands of races to check it out. Give it a reality check and save yourself the time and headache of theoretical analysis,because in the end,the theories and stats will crumble under reality if they are not comprehensive enough.

togatrigger
03-08-2010, 02:40 PM
It's not that the sample is bad, is that you need to compare the results to a null hypothesis.

formula_2002
03-08-2010, 05:09 PM
If you got something that seems like a good thing,you dont need thousands of races to check it out. Give it a reality check and save yourself the time and headache of theoretical analysis,because in the end,the theories and stats will crumble under reality if they are not comprehensive enough.

what is reality? Is it testing the model on never before seen data, or posting the model's daily picks on PA?

So long as the model has never seen the data, and the results fall within "good" statistical boundries of the theoretical analysis, repeadedly,
then the only thing standing in the way of LONG TERM CAPITAL GAINS , is what happened to LONG TERM CAPITAL MANAGEMENT:)

Light
03-08-2010, 06:59 PM
what is reality?

Nobody really knows. If we knew (regarding horseracing) the variables would be known and can be programed. But the variables are infinite. Thats why most systems or methods will fail when put to the reality test.

When you deal in theory,you're only dealing with one slice of the pie. When you play in reality,you're little slice has to deal with the rest of the pie.

Light
03-08-2010, 07:17 PM
So I'm not saying it's hopeless. What I'm saying is you do the best you can,but just know what you are dealing with.What I am also saying is the idea that you need x number of races to qualify a system is rubbish. Since the variables are infinite,you can never have enough data.

formula_2002
03-08-2010, 07:35 PM
So I'm not saying it's hopeless. What I'm saying is you do the best you can,but just know what you are dealing with.What I am also saying is the idea that you need x number of races to qualify a system is rubbish. Since the variables are infinite,you can never have enough data.

:) Someone almost agrees with me ? My position has always been that you can never get enough data to prove a long term profit.

2low
03-08-2010, 08:02 PM
I am wagering real life cash money for testing. $2 a pop:cool:. So I won't have any theory winnings that evaporate when put to the real test.

84 races and counting.

CBedo
03-08-2010, 09:44 PM
My position has always been that you can never get enough data to prove a long term profit.Although it would be nice, personally, I could care less about "proving" a long term profit, but I rather enjoy cashing the checks from the "short term" profitable models month after month....;)

formula_2002
03-09-2010, 06:22 AM
Although it would be nice, personally, I could care less about "proving" a long term profit, but I rather enjoy cashing the checks from the "short term" profitable models month after month....;)

I guess what it really comes down to is the journey, not so much the final destination.
The final destination is beyond our stars, never to be reached in a life time.
You can only hope that a little luck and good sense makes for a fulfilling trip before it ends.

Vinnie
03-09-2010, 08:18 AM
I guess what it really comes down to is the journey, not so much the final destination.
The final destination is beyond our stars, never to be reached in a life time.
You can only hope that a little luck and good sense makes for a fulfilling trip before it ends.

Formula2002:

I hope that you are doing well? Very nicely stated eloquent post.

Have a super day today. :)

formula_2002
03-09-2010, 09:38 AM
Hi Vinnie, sounded a bit ominous right?
All is well, thanks for asking.
Perhaps its just having had my 73rd and outliving the batteries on two successive pacemakers and going in for my 3rd the end of the month gives one pause.
:) :)

CBedo
03-09-2010, 12:40 PM
Hi Vinnie, sounded a bit ominous right?
All is well, thanks for asking.
Perhaps its just having had my 73rd and outliving the batteries on two successive pacemakers and going in for my 3rd the end of the month gives one pause.
:) :)Wow! Happy belated birthday and keep on ticking. I'm just hoping to live long enough to make to my first pacemaker. ;)

Vinnie
03-09-2010, 02:08 PM
Hi Vinnie, sounded a bit ominous right?
All is well, thanks for asking.
Perhaps its just having had my 73rd and outliving the batteries on two successive pacemakers and going in for my 3rd the end of the month gives one pause.
:) :)

Happy Belated Birthday to you Forumula2002 and may you have many many more to enjoy. As they say in Portuguese "Pedabengh" (PED- A- BINGH). I know that isn't the proper spelling, but, it means "Congratulations" in the Portuguese language.. :)

All the BEST to you Always.

formula_2002
03-09-2010, 09:06 PM
thanks for the well wishes gentlemen...

and good health to you and yours..

InControlX
04-08-2010, 02:36 PM
Another analysis technique to try when evaluating database spot plays is to see how they perform (winning % and ROI) in quarterly increments over a few years. Generally the lower the variation the better with the notable exception that some seasonal change may also be determined, ie, this is a Spring Play or a Winter Play only. Also, I like to "flatten" big payoff winners in ROI calculations by limiting them to 10:1 even if they pay more. This prevents one or two huge longshot wins from distorting the group average and results in a more likely future return.

The type of play I have the most trouble analyzing is the rare one that turns up about once a month. Just because it's rare doesn't mean it's not good, but I'll never be able to test a "significant sample". I'd like to hear some ideas on how to judge these before hitting the window.

ICX

Track Collector
04-08-2010, 04:58 PM
I am curious about the part I bolded. I'm assuming a "try" is a tweak to an existing system, correct? I think I understand the theory. Basically, if you come up with a mediocre model early, you have a good base, but if you come up with a good model after many little tweaks, the base may not be so good, and it may fall apart as the tweaks become a more significant part of the overall data. Am I on the right track?

I have experienced the same thing that GameTheory spoke about, so let me try to put it another way.

When you "refine" an angle that has merit, the ROI can be made to go even higher as you add more and more restrictions. Unfortunately, experience has taught us that the resulting identified plays used to calculate this newer, higher ROI includes out of the ordinary (high payoff) results which usually occur by chance rather than some superior form of handicapping. Thus, they are not duplicated over a longer run.

Another way to think about it would be to use an example. Suppose your study involves win bets, and in looking at the super-refined selection of plays you notice that there are 100 plays, and that among all the winners, 3 of them won at odds of 50-1 or greater. Huge payoffs like this have a tremendous impact on ROI. Can you be assured that the next cyle of 100 plays will include at least 3 winners at 50-1 or more (or that future cycles average at least 3 winners at 50-1 or more per 100 plays)? Maybe it will happen, but experience will almost always show otherwise. In addition, what happens to the ROI if 2 of those big winners are missing, or does the ROI turn negative if even only 1 out of the 3 does not come in?

OK then, should one tweak their selection criteria? Absolutely! So how does one know what is not far enough and what is too far? It is really a balancing act based on experience. Take your sample size (i.e. the number of total plays after refinements/restrictions) and consider it along your winning percentage. If your winning percentage is 50%, I am guessing that the sample size required to have high confidence is probably in the 100 to 200 range. For a winning percentage of 25%, perhaps 1000 to 3000 is required to instill high confidence. (The trained statisticians among us can tell us what theory tells us regarding sample sizes and confidence levels. :) )

I hope this helps!