PDA

View Full Version : sample size


formula_2002
09-23-2003, 11:23 AM
Intrigued as I am to determine sample size, I came across this little gem.

http://www.ubmail.ubalt.edu/~harsham/Business-stat/otherapplets/SampleSize.htm

Using the 4th applet down from the top and entering the data on page 34 of Dick Mitchell's "Commonsense", the applet calculates the sample size, confirming Mitchell's results.
I don't know if it’s a silver bullet... As far as I'm concerned, only testing “may” tell.


Also, on a search engine, type in "sample size"
A lot of boys and girls have put in some heavy time to figure these things out.

One word of caution, Mitchel uses average mutual size to calculate sample size. I'm reasonably certain that small incremental odds ranges should be used for testing.

Joe M

formula_2002
09-23-2003, 04:37 PM
Using the applet mentioned in the previous note I got the following results based on 95% confidence, 5% system profit in each odds range and 2 1/2% error for all horses with odds >=.5 to <20-1

Based on 2 plays per race card, 8 horses in each race, 9 races per card, I would have to handicap 506,124 horses.

For a little relief, the number of required horses to handicap goes down as the % profit goes up.

Can you imagine handicapping that many horses and never changing any of the inputs (speed, class,etc,).

Joe M

ps. I have but 150,000 horses in my data base!!!

formula_2002
09-23-2003, 05:13 PM
actually, the figure is closer to 437,472

sjk
09-23-2003, 05:47 PM
If you use a computer to handicap, it is not at all unreasonable to handicap that many starters to test your method.

formula_2002
09-23-2003, 06:15 PM
sjk

You are right about that. Once down loaded from Bris, I can handicap 10 race cards in less then 5 minuets.
The long side of that is, it cost $7.00 for each All-Ways data file.

But how did those guys do it when there was only the print DRF ?

Wow...

sjk
09-23-2003, 06:23 PM
formula,

I remember reading books 20 or 30 years ago where studies of 1200 races were presented as proof of a method.

Even 10 years ago when I started downloading charts and entries to handicap with, I had a choice between a 1200 baud connection and a 2400 baud connection. It would have been totally impractical to follow all tracks.

My PC is now 30x faster than the one I was using then. The time it took to do one circuit back then is about what it takes to do them all today.

I wonder what changes are coming in the next few years.

formula_2002
09-24-2003, 09:38 AM
WOW. According to the way Mitchel would test the data, my "Basic System Plus" should we a winner. My results, base on no back fitted data is as follows;
157 plays
20% win rate
25% profit.

At the 1% confidence level and 10% error, all I need to prove significance is 107 races...

Hold open those bank doors....

Joe M

turfspec
09-25-2003, 04:30 AM
formula,

Checked out the site and the search you suggested and it looks interesting, thanks, but I admit to being a math moron and could use a little help if you would. I'm lost as to how to make the entries and could use an example. I was attempting to find the number of races needed for a 90% confidence level with a 28.2% win rate, avg. mutuel of $7.75 and an roi of 9.6%. Thanks again.

Rob

formula_2002
09-25-2003, 07:07 AM
Rob,
Using the the 4th applet down from the top

Pilot Sample Size...disregard this. It has no apparent affect.

Current Estimate (p)...your win% rate .282

Acceptable Significant level....10 (that’s 1-.90). I understand that most research is done using .05.

Acceptable error (I use 1/2 of my expected profit)... in your case it would be .0453

the resulting sample size will be 267.

One word of caution, Mitchell uses average mutual size to calculate sample size. I'm reasonably certain that small incremental odds ranges should be used for testing.

I test each odds range;

.>=.01 to <=.04
>=.05 to <=1.4
>=1.5 to <=2.4
up to odds of 20-1

turfspec
09-25-2003, 02:44 PM
formula,

Thanks for taking the time. I'm beginning to see the light.

Rob

formula_2002
01-25-2004, 08:23 AM
Originally posted by formula_2002
Intrigued as I am to determine sample size, I came across this little gem.

http://www.ubmail.ubalt.edu/~harsham/Business-stat/otherapplets/SampleSize.htm

Using the 4th applet down from the top and entering the data on page 34 of Dick Mitchell's "Commonsense", the applet calculates the sample size, confirming Mitchell's results.
I don't know if it’s a silver bullet... As far as I'm concerned, only testing “may” tell.


Also, on a search engine, type in "sample size"
A lot of boys and girls have put in some heavy time to figure these things out.

One word of caution, Mitchel uses average mutual size to calculate sample size. I'm reasonably certain that small incremental odds ranges should be used for testing.

Joe M

due to the recent emphysis on sample size in recent notes, I thought it would be worth while sending this post out again.

In addition to sample sample size, you really have to bring incremental odds ranges into the study.

shanta
01-25-2004, 08:53 AM
I wish i had you as one of my instructors back in school! I am being serious. You probaly could have help me make sense out of all that math that to me seemed like a bunch of "mumble jumble".
Richie:D

formula_2002
01-25-2004, 09:27 AM
Very kind of you rich...

Thanks
Joe M

Chico
01-29-2004, 03:03 PM
Originally posted by formula_2002
due to the recent emphysis on sample size in recent notes, I thought it would be worth while sending this post out again.

In addition to sample sample size, you really have to bring incremental odds ranges into the study.

It seems to me there is far too much emphasis being placed on sample size. I have been a "Neilson TV Rater" on two separate occasions (odds on that are over 9 million to one) and Neilson uses less than 1800 homes to rate over 100 million
TV sets and that is used as the basis to charge hundreds of millions of dollars in advertising fees to advertisers. I seem to remember from my old school statistics courses that a few hundred samples within a single class yield a very high confidence number (usually within + or - 2%.)
Regards,
Chico

formula_2002
01-29-2004, 04:06 PM
Chico, have they done much work based on horse racing odds?
Sure would like to see their paper on that subject.

I think I'll send them a note on that. I'll let you know what they have to say. Good point!

Joe M

formula_2002
01-29-2004, 04:36 PM
Statistical significance as it applies to horse racing.


sent this off to Neilsen

To who it is may concern;



Although some work has been written on the subject (much of it too heady for the average horse racing bettor), this may be a good time for some entity outside of the field of pari-mutual thoroughbred horse racing to perform a definitive work establishing the minimum number of out of sample races that must be reviewed in order to test the statistical significance of a proposed betting system. Any interest in this ?

Chico
01-30-2004, 09:30 AM
Originally posted by formula_2002
Statistical significance as it applies to horse racing.


sent this off to Neilsen

To who it is may concern;



Although some work has been written on the subject (much of it too heady for the average horse racing bettor), this may be a good time for some entity outside of the field of pari-mutual thoroughbred horse racing to perform a definitive work establishing the minimum number of out of sample races that must be reviewed in order to test the statistical significance of a proposed betting system. Any interest in this ?

Great idea to send that to Nielsen. I hope it gets a positive response. I always knew that serious horseplayers were a resourceful bunch - You have proved the point! <g>
Regards,
Chico

GameTheory
01-30-2004, 09:50 AM
http://www.0dd5.com/matharchie.shtml

formula_2002
01-30-2004, 01:26 PM
Originally posted by GameTheory
http://www.0dd5.com/matharchie.shtml

If you use the above site to analyze work, it will be very clear why your analysis must be done on an incremental odds range and not the overal average odds.

using the table on my web page "comparison of sample data to out of sample data by average odds range" ;
I get a "Archie" score of over 10 using overall average odds and nothing that close on an incremental odds range.

try it your self.
I use columns n,t and u with rows 31 to 34 for the analysis.

GameTheory
01-30-2004, 02:19 PM
That's because you are cutting down your sample sizes by breaking up the sample. It is impossible to get as high a score as the whole sample when you break it into parts...

formula_2002
01-30-2004, 03:09 PM
Originally posted by GameTheory
That's because you are cutting down your sample sizes by breaking up the sample. It is impossible to get as high a score as the whole sample when you break it into parts...

And that is one of the reasons why, as I have said so often in the past, that a very large sample is required. So large in fact, that it may be impractical , if not impossible (as it applies to horse racing).

GameTheory
01-30-2004, 03:41 PM
I'm not going to get into this again with you, as you have shown yourself repeatedly to be impervious to good sense. You point to one number being lower than another like it proves your point when it is mathematically impossible for it to be anything but lower. The test already accounts for the odds and the sample size. That is the whole point of the test -- to see if your results were due to luck or not. You claim statistics is the answer, but this standard statisical test is no good for you. You are one contradiction after another...

formula_2002
01-30-2004, 03:45 PM
Originally posted by GameTheory
I'm not going to get into this again with you, ...

Forever Thankfully...Joe M

Rick
01-31-2004, 11:39 AM
Calculating the variance of win % and using the average payoff will give you the WRONG answer. You also have to consider the variance of the payoffs. If you calculate the standard error of the mean payoff, you'll get a more accurate (and probably more discouraging) estimate of the required sample size.

formula_2002
01-31-2004, 11:52 AM
Originally posted by Rick
If you calculate the standard error of the mean payoff, you'll get a more accurate (and probably more discouraging) estimate of the required sample size.

Add to that "The mean payoff based on small incremental odd range", and I think I agree with you.

With the exception on the incremental odds analysis, Mitchell works along those lines in his book "CommonsenseBetting".

Rick
01-31-2004, 12:47 PM
Mitchell doesn't do the calculation correctly. He uses the variance of win % and assumes a constant, average payoff. So does almost everyone else, but it's WRONG. The only way it would be correct is if you had exactly the same payoff every time. Using a small odds range reduces, but doesn't eliminate, the problem.

Any statistics book will tell you how to calculate the standard error of a mean based on sample size. If you calculate the variance of your payoffs (including the zero returns), you can plug in the numbers and find out how much variance in average payoff you can expect for various future sample sizes. I recommend using at least 100 winners in calculating the variance of the payoffs.

This brings up a good point. Just because something is published in a book doesn't mean that it's correct. Be careful about assuming that the author knows the correct solution, no matter how many degrees he may have.

formula_2002
01-31-2004, 01:41 PM
again, I agree thats why I said.

With the exception on the incremental odds analysis, Mitchell works along those lines his book "CommonsenseBetting

formula_2002
02-01-2004, 05:37 AM
Originally posted by Rick

Any statistics book will tell you how to calculate the standard error of a mean...... .
This brings up a good point. Just because something is published in a book doesn't mean that it's correct. Be careful about assuming that the author knows the correct solution, no matter how many degrees he may have.

THINK ABOUT THAT ONE!!

Rick
02-05-2004, 04:18 PM
Another thought on the mathematical abilities of most horse racing authors: Most of them should be sued for mathematical malpractice.

Rick
02-06-2004, 03:32 AM
Another way to say that is they're always adding apples and oranges together and coming up with stinky twinkies.

OTM Al
02-06-2004, 01:11 PM
Rick-

Couldn't agree more there. I learned a lot from the Brohammer pace book but there were so many errors in the text it really bothered me to read it. Numbers from just basic calculations were way off in some instances. The ideas there were good I thought, but it took me some time to give credibility back to him because of the poor editing.

Rick
02-06-2004, 06:28 PM
OTM,

Yeah, speed ratings don't measure speed. Velocity ratings measure speed but they're called pace ratings. Track variant based on a couple of races. Lengths behind based on someone's eyeball guess. Variable runup distances before the timer starts. Not very scientific. Money management advice, probability of ruin formulas, and confidence levels for statistical testing are all based on false assumptions. Yikes.

sjk
02-06-2004, 06:37 PM
It's a game of odds. I don't mind losing. What kills me is when I don't bet and miss a nice winner.