PDA

View Full Version : Developing New System - How Many Races Should I Test?


dougrobinson2024
04-24-2006, 02:14 AM
Hi all -

I'm developing a new handicapping system in Excel and so far (after testing it on 2,330 races) it's working out great. But I don't know how many races I should test before I can have confidence in it. I was thinking maybe 10,000 races, but that could be way high or way low.

Anyone have an idea? How do you figure this out?

Thanks for any advice,
Doug

PaceAdvantage
04-24-2006, 03:00 AM
Anyone have an idea? How do you figure this out?

Get a $400 bankroll and start making $2 wagers on your system's selections. You'll know soon enough if it is any good....

RXB
04-24-2006, 04:27 AM
What's your win % and ROI during these 2330 races? The higher the win %, especially, the fewer races you need to feel confident that the sample is accurate.

Also, when you say that you've tested it over 2330 races, does that mean 2330 paper wagers? Or 2330 races with a lot of those races being "passes?"

hurrikane
04-24-2006, 06:54 AM
Are you doing this live. Meaning, watching the odds, horses, dealing with scratches etc? Or playing blind. putting in your bets in the morning and coming back at night to see how you did?

Are the 2330 races of the same type? Older, male, clmg etc? or an all burger look at everything?

2330 really isn't that many races but it might get you started at 2 bucks.

ryesteve
04-24-2006, 08:48 AM
Depends on what exactly you mean by "testing". If you mean you've got a bunch of historical races, and you're fitting a system around them, the answer is very different than if you got a methodology already in place, and you're testing it on races going forward.

shanta
04-24-2006, 09:00 AM
Get a $400 bankroll and start making $2 wagers on your system's selections. You'll know soon enough if it is any good....

I agree with this way of checking it.

btw welcome to the board!

Richie

Tom
04-24-2006, 10:17 AM
Nothing counts if you don't have your money on it.
Start betting.

xfile
04-24-2006, 10:57 AM
Hi all -

I'm developing a new handicapping system in Excel and so far (after testing it on 2,330 races) it's working out great. But I don't know how many races I should test before I can have confidence in it. I was thinking maybe 10,000 races, but that could be way high or way low.

Anyone have an idea? How do you figure this out?

Thanks for any advice,
Doug

In any study it is more beneficial to split your studies up into sub-groups rather than just one long running study. Start a new study and compare it with your current one OR split your current study into 2 or 3 sub-groups then compare them. :cool:

traynor
04-24-2006, 02:10 PM
dougrobinson2024 wrote: <I'm developing a new handicapping system in Excel>

Of all the considerations in developing a new system, the most important may be to avoid being led astray by anomalies. Sort your mutuels, divide into quarters, average the two middle quarters, and set 1.5 times that amount as the high end mutuel for modeling purposes. In one whack that will eliminate the overwhelming majority of misleading data in your sample, whether it is 2000 or 10,000 races.

That is not my opinion; that is standard statistical process that is mandatory for every first semester stat class, business class, or research class. It is called "controlling for outliers" and will save you endless hours of frustration and confusion. Additionally, if you are betting on your system, it will either save you a lot of money (by avoiding misleading models) or make a lot more money for you (because your "actual" return will be greater than your "projected" return).

Every other piece of advice on this thread--especially PAs suggestion to start with a small bankroll and actually bet on it--should be taken to heart and utilized. There is a big difference between "regression" models of something that has already happened, and "working" models that attempt to predict the future.

In direct response to your question, there is no magic number for a sample size. The keys are that the sample is large enough to be representative (of whatever tracks, distances, surfaces, and classes you are modeling) of the general population. It sounds like your sample is large enough to get started.

Whether the 2000+ are "playable races" extracted from a larger sample, or are a population from which playable races are extracted, makes a BIG difference. If the former, start a new sample and see if the results are similar to your base sample. That is, do not simply add more races to the old sample; create a new sample for comparison.

If you are familiar with bootstrapping, it is a great modeling tool. If not, search for the bootstrap add-in for Excel and learn to use it. It will tell you how similar your results are in randomly extracted samples of your dataset. For example, if your win% and ROI vary wildly, you have an outlier problem; that takes only a few minutes to show with a bootstrap. If random samplings show relatively consistent win% and ROI, you can be a lot more certain in your betting than you could be otherwise.
Good Luck

kenwoodallpromos
04-24-2006, 04:33 PM
IMO it depends on the type of system. Hurrikane asked a good question because it may depend on the type of race. If it handicaps colts Vs. geldings or other criteria where you only look at a small number of horses per race you may need to look at more races. if is a high % return (pinpointing mostly low-odds or favorites) maybe less races will be required. The more horses per race which you can evaluate the less races you will need.
If you are going to used the test results to promote the marketing of your system your may need a higher number than for your own use.

kitts
04-24-2006, 05:21 PM
Pace Advantage says it best. I find 50 races samples are enough to start betting. Then I start betting modestly in 20 race groups, requiring consecutive profitable 20 race groups. Sometimes I lose..............

dougrobinson2024
05-03-2006, 07:43 PM
Wow! Thanks to everyone for responding! I didn't know I would get that many people to help answer my question. It's good to find such an active forum.

OK, I know I could "test" the system I'm building with real money, but I've developed too many systems in the past, they worked great--up to the point that I put money on it! So I think testing with real race data is the way to go, and get as many races as possible to test would be a good thing.

Right now, I've developed an Excel spreadsheet that scrapes the data I need off of one of the betting exchanges. It works, but I only get about 130 to maybe 250 races a day. So, it will take a while to get the thousands of races I need to properly test it.

I need as much race data as I can get. All I really need are the closing odds for each horse and the results of the race. Does anyone know where I can get this? Some kind of archive somewhere? Or does anybody have this data just lying around, that you're not using? :)

By the way, so far I'm up to 25 days of testing this and everyday has been profitable. My race data is up somewhere in the 3000 range. About a 39% ROI with about 40% of the races playable. So I need more data..!

Thanks for any help you can give!
Doug

kenwoodallpromos
05-03-2006, 09:07 PM
What a coincidence! there has benn several "newbies" (LOL!) lately begging for some kind of data! As a numbers person you should appreciate the irony of that!
Equibas give daily all the results with odds you will ever need.

ryesteve
05-03-2006, 09:34 PM
I need as much race data as I can get. All I really need are the closing odds for each horse and the results of the race.
Your system is based solely on closing odds? Other than the fact I'm skeptical such a thing would ever work, with so much money getting dumped into the pools at or after the bell, I'm not sure such a system would ever be workable.

Anyway, rather than ask someone to feed you data, if you were to post the criteria, there's a bunch of database people here who would run it through for you.

dougrobinson2024
05-03-2006, 10:13 PM
I tried Equibase, but they don't have what I need. Either they don't have it in HTML format or it's in PDF format and I can't easily get that back into HTML. Maybe I should say if they have it, I sure couldn't find it, as I explored everything they had under "Results".

This is a paste from the spreadsheet I've developed, which scrapes from a betting site. This shows what I need:

Date 05/02/06
Tr Turf Paradise
Ra 9
Len 6.0 Furlongs
PP1 Odds 2.0
PP2 Odds 9.0
PP3 Odds 2.5
PP4 Odds -
PP5 Odds 9.0
PP6 Odds 6.0
PP7 Odds 35.0
PP8 Odds 60.0
PP9 Odds 2.5
PP10 Odds
PP11 Odds
PP12 Odds
Results1 1st 9-Yodeltilyourblue
Results2 2nd 6-Kolinor
Results3 3rd 3-Browns Derby Cat
W1 $7.20
W2 $4.00
W3 $2.60
P1 -
P2 $5.40
P3 $4.00
S1 -
S2 -
S3 $2.60
H1 1 Knight In Silver
H2 2 Whirling Colors
H3 3 Browns Derby Cat
H4 4 Ron E.
H5 5 Armed 'N Crafty
H6 6 Kolinor
H7 7 Notate
H8 8 Sheersox
H9 9 Yodeltilyourblue
H10
H11
H12
H13
H14
H15

Lou G
05-04-2006, 07:24 AM
Your system is based solely on closing odds? Other than the fact I'm skeptical such a thing would ever work, with so much money getting dumped into the pools at or after the bell, I'm not sure such a system would ever be workable.

Anyway, rather than ask someone to feed you data, if you were to post the criteria, there's a bunch of database people here who would run it through for you.

It's not uncommon to see horses rise/fall 2 or 3 (or more) odds levels after post time, especially at lesser tracks. I'm an active investor with Supertote running for 2 to 5 tracks every day and I see this happen often. Ergo, since you cannot know closing odds before betting I can't see how you can test a "system" based on closing odds.

dancingbrave
05-04-2006, 01:04 PM
ANOTHER PIMP FOR BETFAIR?

dougrobinson2024
05-04-2006, 03:05 PM
No, I'm not "another pimp for Betfair". I'm a US citizen and couldn't use Betfair if I wanted, to my knowledge.

And for those who don't believe a system using the closing odds could work, well OK, you've registered your opinion. So work on your own system, using your own methods. I'm happy to let you do that. But I know that for the last 28 or so days, what I'm working on works.

I'm not here to get into a flame war, and I'm not going to get baited into defending what I'm trying to do on my own leisure time. I simply put up a post to see if anyone had any idea where one could obtain the kind of data I need.

Jeff P
05-04-2006, 04:23 PM
posted by dougrobinson2024 - I simply put up a post to see if anyone had any idea where one could obtain the kind of data I need.

At the risk of stating the obvious - If you're serious about doing a research project and are willing to buy your data...

Bris (at http://www.brisnet.com/ ) sells comma delimited XRD Results files for 25 cents each. These files contain the info you posted that you need: odds, finish position, and payoffs (if any) for each horse. The format is one file per calendar day per track. Bris lists the data structure of these files on the Library section of their site. They have past files going back several years on their Archive Server. Downloading from their Archive Server can be time consuming but if you pick up the phone and place a call to their tech support someone there will usually offer to burn a CD of the files you want and mail it to you. Once you have a collection of files it's a pretty straightforward matter of importing them into a spreadsheet for analysis.

-jp

.

dougrobinson2024
05-04-2006, 05:59 PM
I'll look into that. Thanks!

dastar
05-04-2006, 06:30 PM
Hi Doug,

In agreement with most of PA members responces::ThmbUp:

Are you only testing for Win, or are exotics part of your testing?

Keep going, and let us know of progress.

dastar

ryesteve
05-04-2006, 07:28 PM
I'm not here to get into a flame war
Perhaps you should try to separate the flames from the constructive advice. Otherwise, you're liable to end up spending hundreds of dollars on data, hundreds of hours on research, and then discover you have a system you can't implement because the odds at the time at which you need to place a bet are different enough from the actual closing odds, rendering your system unworkable.

Dick Schmidt
05-04-2006, 07:55 PM
Doug,

Many years ago, I listened to Bill Ziemba and Dick Mitchell discuss this very problem. Both were serious math guys, especially Dr. Z who consulted with the Canadian government on setting up their lottery, and this was back in the day when all such work had to be done by hand. First off, there is no such thing as 100% confidence in a system. If you test 100,000 races, it could still turn around. They decided that 95% confidence was sufficient to start betting, and that you needed to test about 850 races to reach 95%. Since there wasn't much difference between live races and old races back then, assuming you didn't cheat, how the races were done didn't matter much. Today, I would test against live races because of the final odds problems mentioned. With exotics, you rarely know the payoff in advancd anyway, so it doesn't matter so much.

That said, there is a well known statistical "shortcut" that says that any meaningful subset of a sample should resemble the entire sample in statistical distribution, though with a lower degree of confidence. Not always, but most of the time. When I was testing, I insisted on 100 races showing a profit, and then started with small bets. I found that few systems turned around after a decent 100 race run. Profits might decline, but if I insisted on 20% ROI in the test, the system stayed profitable. As you can guess, I didn't find many.

My best advice is to do as PA says and get to the window. There are too many guys out there spending their lives testing, looking for some mythical perfection while the world passes them by. You will never get enough tests to satisfy some people, or reach 100% confidence. That's why they call it gambling.

Dick

I wondered why the Frisbee was getting bigger, and then it hit me.

prank
05-04-2006, 08:38 PM
The advice given here is good for testing a statistical model. Traynor spoke about improving the robustness of your model by trimming outliers. That's wise, as is the use of the bootstrap or other methods in order to avoid overfitting.

I think the other guys have it right, though, regarding the final odds: unless you can squeak in at the very end, this can be very hard to beat. You might try seeing what level of noise or fluctuation in the pool your model can handle. For instance, suppose that the amount bet on each horse can vary as much as x% or $y (you need to identify these thresholds iteratively), at what level of x or y does your system break? Then, you can get an idea for how much movement of dollars you can tolerate - i.e. how robust it is to the final distribution of dollars being significantly different from the posted dollar amounts. If your system is sensitive to even 1% fluctuations (other folks here can tell you how much the pool can change in the final 60 seconds, I have no idea), then you may have a serious problem.

It's certainly an interesting problem.

As for how many races to test, you can use the bootstrap to estimate various confidence intervals on your ROI. I'd recommend subsample bootstraps, such as random samples of 100 races. Try a few thousand of these subsamples, sort them, and you can get an estimate of the median ROI (or mean, as you wish), and the x% (e.g. 95%) confidence interval around it. Consider what quantile means "breaking even", and then decide if you're comfortable with that. You might want to get more races in order to narrow your CIs.

Best wishes,

prank

spilparc
05-04-2006, 10:51 PM
Why not test it using morning line odds? If it works doing that, I'd say you really have something.

Lou G
05-05-2006, 12:26 PM
"other folks here can tell you how much the pool can change in the final 60 seconds, I have no idea"...

I'm not trying to flame or discourage you, Doug - just sharing what I observe every day. I've seen many horses go from 7/2 at a minute before post to 6/5 well after post. If fluctuations like this don't screw up your approach, great - go for it!

AwolAtPA
05-05-2006, 01:49 PM
Fri 5 May 6

hi Doug,

first in reply to your request for data, check my post #12 to pRank's request for data

http://www.paceadvantage.com/forum/showthread.php?t=27516

second, I think your research sounds neat. However, I once worked on an odds only system which partitioned the data by type of favorite. The types were like Heavy, OneClear, TwoClear, etc. Well, yes, SOMETIMES it worked but I did not find a good bet. Maybe you are following a similar path and MAYBE your '..bet rules..' will work!! Whichever, good luck with your search.

third, I agree with others about '...can not know the final odds..'. HOWEVER, if your bet rules require a simple range, then an odds based system would be workable at most tracks for most races. By '..simple range..', I mean a. heavy fav near even money, b. contendor under 3:1, c. longshot over 10:1, etc. Also, I have found that the Exacta will pay odds (available by Supertote or BrisBet or WinTicket or ??) is an excellent predictor of which bet numbers will get the late money. Yes, including MOST simulcast dumps that send a four to one to two to one.

awol

P. D. Mahalik
05-05-2006, 06:22 PM
Doug,

When Sartin developed his methodology he had over 38,000 races, but of course he had a lot of help from his students to include Brohamer.

Turntime
05-05-2006, 07:21 PM
I agree totally with what everyone is saying - start making bets. After 100 races, betting $2 per race, even if you lose 30% you'll only be stuck $60. If you can't afford that, you're wasting your time. Jump in, the water's fine.

traynor
05-06-2006, 12:13 AM
prank wrote: <As for how many races to test, you can use the bootstrap to estimate various confidence intervals on your ROI. I'd recommend subsample bootstraps, such as random samples of 100 races. Try a few thousand of these subsamples, sort them, and you can get an estimate of the median ROI (or mean, as you wish), and the x% (e.g. 95%) confidence interval around it. >

You can learn a lot from the bootstrapping. Watch for spikes in the ROI model, indicating anomalies (outliers, unusually high mutuels). If your sample is representative of the normal distribution, the ROI will tend to stay relatively symmetrical in the subsets extracted from the sample.
Good Luck

tahoesid
05-07-2006, 12:47 AM
Why not download the free charts from BRIS and extract the odds from there?

oddswizard
05-07-2006, 01:37 PM
I used 10,000 races in setting up my formula. However, a math expert friend told me that 1000 races would have been enough. The reults of 1000 races vs 10000 races is less than a percentage of 1%. Sounds like it is time to recap your existing results. Good Luck.

formula_2002
05-07-2006, 08:04 PM
Hi all -

I'm developing a new handicapping system in Excel and so far (after testing it on 2,330 races) it's working out great. But I don't know how many races I should test before I can have confidence in it. I was thinking maybe 10,000 races, but that could be way high or way low.

Anyone have an idea? How do you figure this out?

Thanks for any advice,
Doug

This is how I do it for win pool betting.
Start off with about 100,000 horses.
Keep track of your # plays and number of wins in small odds increments, such as;
1-1
6/5
8/5
9/5
2-1
5/2
3-1
7/2
9/2
5-1
5.50-1
6-1
6.5-1
beyond this point, maintain about a 1% spread between (1/(odds+1).
You should also normalize the win % by summing 1/(odds+1) for each race .
Then compare your win % to the public’s win % over the exact same races.
You can then see If your system is statistically significant for the different confidence levels by doing the following;

Sqrt(pubwin% x (1-pubwin%) x number of plays).
Multiply that number by 2 for a 5% level of confidence, or multiply that number by 2.5 for a 1% level of confidence.

If your number of wins equal or exceeds that number, then, that’s a good thing.
Do it for each of the incremental odds.
When you can get about 12 consecutive incremental odds rages to equal or beat that number, start betting.
Until then, take that $400 bankroll some one suggest and buy data.
How much data do you need?.
The above results will tell you.

Don’t bet a cent of serious money until you can get this far.

Actual betting is an entire discipline unto it self.

By the way, you can be significantly better than the public and still not have a winning system because…you must also beat the take-out.