Small Sample Size? - Horse Racing Forum - PaceAdvantage.Com

coljesep · 07-06-2014, 12:53 PM

I am wondering as you tried to develop your own "system" so to speak, or things you look for inside each race... what do you consider a good enough sample size of races? 100? More?

Hoofless_Wonder · 07-06-2014, 02:58 PM

More is better, but sometimes the numbers aren't there, but are still worth looking at - especially trainer angles.

There's plenty of posts on this here - search on "sample".

Rather than worry about sample size, perhaps a look at "expected results" is in order:

http://www.hoof.demon.co.uk/archie.html

Actor · 07-06-2014, 04:10 PM

Quote:

Originally Posted by coljesep

I am wondering as you tried to develop your own "system" so to speak, or things you look for inside each race... what do you consider a good enough sample size of races? 100? More?

My personal opinion is 1000 races or more. When I took statistics in college I was taught that the magic number was 20, but I would not bet money on a sample that small. William L. Scott used a sample of 500 to 600 races in developing his system described in How Will Your Horse Run Today?

whodoyoulike · 07-06-2014, 04:59 PM

Quote:

Originally Posted by Actor

My personal opinion is 1000 races or more. When I took statistics in college I was taught that the magic number was 20, but I would not bet money on a sample that small. William L. Scott used a sample of 500 to 600 races in developing his system described in How Will Your Horse Run Today?

Interesting, I thought it was supposed to be at least 30 e.g., Dow Jones Industrials. Doesn't the statistical T-test also suggests at least 30?

I think any sample only needs to be representative of the population. Be careful of your sample not being representative.

Tom · 07-06-2014, 05:25 PM

I started on a 3 race sample at Los Al T Breds.
I'm 5 for 9 overall.
If I waited for a 1,000 sample, it would be next year.
When this stops, I will find another one to play.
Short term is where you make money.

therussmeister · 07-06-2014, 06:16 PM

Quote:

Originally Posted by Tom

I started on a 3 race sample at Los Al T Breds.
I'm 5 for 9 overall.
If I waited for a 1,000 sample, it would be next year.
When this stops, I will find another one to play.
Short term is where you make money.

Not so much short term, but being one of the first ones to use a methodology/angle. If I use an 1,000 race sample to verify profitability before betting, that makes me 1,000 races too late.

thaskalos · 07-06-2014, 10:09 PM

Quote:

Originally Posted by Actor

My personal opinion is 1000 races or more. When I took statistics in college I was taught that the magic number was 20, but I would not bet money on a sample that small. William L. Scott used a sample of 500 to 600 races in developing his system described in How Will Your Horse Run Today?

And it still backfired.

hcap · 07-07-2014, 08:24 AM

Quote:

Originally Posted by thaskalos

And it still backfired.

Years ago using then a computer running a CPM operating system and a primitive version of lotus 123, set up a program using Scott's first book, How Wll Your Horse Run Today? Every Saturday brought my printouts (dot matrix) with me to OTB. Was soon quite annoyed losing every Saturday after all that work, but worse came to know two OTB regulars. A pair of wonderful elderly ladies who would cash tickets quite often using NYC Daily News public handicapper Russ Harris (chalh heavy)

I would not trust Scott, and I have since got to the point of being able to test systems using automatic modeling techniques, including length of time periods and sample size.

Depending on what factors I or the program choose to model, and what track I was playing and what time of year, the ideal sample size or time period of a model was all over the place.

But have came to the conclusion very old data often got stale, as well as very short models ---a few days were too small.

raybo · 07-07-2014, 09:10 AM

I database by individual track, and keep the most recent 24-30 cards in the database, ideally 240-260 races. I use the database for eliminations only. In shorter meets I will go back to the previous year using cards from the same time of meet and time of year, looking for similar environmental conditions.

Tom · 07-07-2014, 09:27 AM

At my age, 9 races IS the long run!

DeltaLover · 07-07-2014, 09:54 AM

Quote:

Originally Posted by hcap

Years ago using then a computer running a CPM operating system and a primitive version of lotus 123, set up a program using Scott's first book,

CP/M rocked

Way better that MS-DOS who eventually became the market's standard

hcap · 07-07-2014, 12:22 PM

Quote:

Originally Posted by DeltaLover

CP/M rocked

Way better that MS-DOS who eventually became the market's standard

My introduction to computers.

Only thing I remember other than

using William Scott, was sometimes when I shifted the paper in my trusty dot matrix printer, often the image to print on my computer screen shifted too---spooky

JohnGalt1 · 07-07-2014, 03:00 PM

This isn't a test of a method, but Ed Bain would play trainers if they had a 30% win with at least 4 wins for the categories he found important.

If a trainer was 4 for 7 with first after claim it would qualify as a bet even though there were only 7 instances, because in his experience this was enough of a history for a positive expectation.

So as not to hijack this important thread, I will start a new thread on my data from 2013 Bain like trainer and jockey results.

Thanks coljesep for a reason for me to stop procrastinating.

**************************

When Marc Cramer would test an angle, he would eliminate the top pay off so as not to skew the ROI with one extremely huge win.

Dave Schwartz · 07-07-2014, 06:16 PM

Quote:

Interesting, I thought it was supposed to be at least 30 e.g., Dow Jones Industrials. Doesn't the statistical T-test also suggests at least 30?

At least 30 winners in each category.

Thus, if you were looking at odds, and had broken the horses into (say) 5 classes and the upper class was 30/1 and above, you would want 30 winners in that group, at 30/1 or higher.

whodoyoulike · 07-07-2014, 06:45 PM

I viewed the question a little differently. I thought he was looking for an adequate sample size. If you had a database population of say 10k, you could take a random sample of around 30 records and use the T-test formula (there is an F-test, z-test and probably others) to determine it's confidence level and then compare the results to another similar random sample of approx. the same size. If the confidence level was similar to the previous, you would have a confidence level that your sample(s) were representative of the population as a whole.

Btw, I recall that there are formulas to calculate what an appropriate sample size should be for a given population size. But, my statistics knowledge is limited.