PDA

View Full Version : Statistical significants


jasperson
05-29-2011, 10:33 AM
For the statisticians a question. How any races do I need to have a statistical significant sample and how do I arrive at that. Say I have 50 races with a winning percent of 55% and an roi of +$.50. Is that enough data to predict that this trend will continue for the next 50 races?

jasperson
05-29-2011, 10:34 AM
For the statisticians a question. How many races do I need to have a statistical significant sample and how do I arrive at that. Say I have 50 races with a winning percent of 55% and an roi of +$.50. Is that enough data to predict that this trend will continue for the next 50 races?
Sorry I meant how many races

xfile
05-29-2011, 10:41 AM
Thousands of races split into different studies. Something could work for 500 races then take all your money back and then some in the next 500. Some of my favorites I have been following over a decade. Takes time.

SchagFactorToWin
05-29-2011, 04:44 PM
sample size calculator:
http://www.surveysystem.com/sscalc.htm

jasperson
05-29-2011, 05:53 PM
sample size calculator:
http://www.surveysystem.com/sscalc.htm
Thanks, that is what I was looking for.
Jack

Tom
05-29-2011, 06:27 PM
What will you use as population?

raybo
05-30-2011, 08:56 AM
For the statisticians a question. How any races do I need to have a statistical significant sample and how do I arrive at that. Say I have 50 races with a winning percent of 55% and an roi of +$.50. Is that enough data to predict that this trend will continue for the next 50 races?

IMO, no. You can find "trends" with a statistical study, the larger the study the more trends you will find. The length of those trends can be seen, but not predicted except, possibly within the same, slightly extended, time period, when speaking of horse racing.

So, if the trend happens in a certain time period, you may be able to be successful with that trend, for a short time, in that slightly extended time period. But, after the trend is over, it's over. Back to the drawing board.

Capper Al
05-30-2011, 09:04 AM
For the statisticians a question. How any races do I need to have a statistical significant sample and how do I arrive at that. Say I have 50 races with a winning percent of 55% and an roi of +$.50. Is that enough data to predict that this trend will continue for the next 50 races?

Jack,

You're still at it? The little guy without massive resources has to zero in on his research. A 50 races specified by dist, surface, age, and race classification could point you in the right direction. The next 100 or 200 should confirm your findings. Good Luck.

raybo
05-30-2011, 09:18 AM
Jack,

You're still at it? The little guy without massive resources has to zero in on his research. A 50 races specified by dist, surface, age, and race classification could point you in the right direction. The next 100 or 200 should confirm your findings. Good Luck.

Agreed.

The constant updating of the sample is extremely important, particularly in horse racing. We're not talking about things set in stone here. The landscape is constantly changing and the sample must reflect those changes, in "near" real time.

Don't expect your factors/methods to remain static, and profitable. They will have to change along with the landscape. Generally, the more specific the landscape (race type, track, distance, etc., etc), the more predictable the study.

jasperson
05-30-2011, 10:09 AM
What will you use as population?

I tried 200 races and the percentage was 12%. Then I tried 1000 races and it went up to 13%, so population doesn't seem to have much effect. To get the percentage down I need to increase my sample size.

Harvhorse
05-31-2011, 10:35 AM
Twenty five races should be statistacly valid.

Robert Goren
05-31-2011, 10:50 AM
For the statisticians a question. How any races do I need to have a statistical significant sample and how do I arrive at that. Say I have 50 races with a winning percent of 55% and an roi of +$.50. Is that enough data to predict that this trend will continue for the next 50 races?The win % is pretty easy to determine the statistical significants of. The ROI is bit more tricky and would certainly require a larger sample and little bit more fancy foot work. That is because it distribution is normal when looked at on a race by race basis.

TrifectaMike
05-31-2011, 11:57 AM
Take the easy approach, break the problem into two parts.

Part 1 - The percentage
Part 2 - The ROI

Part 1

n = (Z^2 * p * q)/e^2

Z = z-score ( I suggest 1.96, which corresponds to 95% confidence level)
p = percentage expressed in decimal (50% = .5)
q = 1-p
e = desired level of precision (ex: .05)

Part 2

n = (Z^2 * sigma^2)/e^2

sigma = standard deviation

Z, e same definition as in Part 1


Mike (Dr Beav)

uzeb
05-31-2011, 12:19 PM
First post....yea...

One thing to take into consideration when determining population size is what you're tracking. If all your horses in the sample are 2-1 or 3-1, you'd probably be able to get a pretty good reading with 100 or so races. But if you're a longshot bettor, like me, you'll need a larger sample, simply because there will be far less winners when tracking longshots.

As mentioned before by other posters, and I agree totally, it's something you have to continue. Unfortunately with horse racing, what worked yesterday may not work tomorrow.

/gordo

Robert Goren
05-31-2011, 12:27 PM
Take the easy approach, break the problem into two parts.

Part 1 - The percentage
Part 2 - The ROI

Part 1

n = (Z^2 * p * q)/e^2

Z = z-score ( I suggest 1.96, which corresponds to 95% confidence level)
p = percentage expressed in decimal (50% = .5)
q = 1-p
e = desired level of precision (ex: .05)

Part 2

n = (Z^2 * sigma^2)/e^2

sigma = standard deviation

Z, e same definition as in Part 1


Mike (Dr Beav)Since you have badly skewed distribution with regards to ROI, I would question a result when the event is one race. 45% of the distribution has the same value.

TrifectaMike
05-31-2011, 12:34 PM
Since you have badly skewed distribution with regards to ROI, I would question a result when the event is one race. 45% of the distribution has the same value.

Can you explain more clearly what it is you are saying?

Specifically, "I would question a result when the event is one race. 45% of the distribution has the same value."

Mike (Dr Beav)

davew
05-31-2011, 01:32 PM
not enough data



what trend are you asking about - the % win or the ROI

what are you comparing to? the favorites?
33% win -0.20 ROI

a random horse entered in a race?
13% win -0.20 ROI


how did you get these results?

Robert Goren
05-31-2011, 01:35 PM
With ROI, you would have a rate on each individual race of either -1.00 for a losing bet(45%) or what ever the odds were on a winning bet(55%). The distribution is not any thing close to a normal curve so therefore the least squared standard deviation is meaningless. Therefore any test using a standard deviation would be meaningless. I am willing give accept a small amount of skewness, but I can believe that any statistician would allow for that much. If it was me, I would divide the sample into equally sized groups and take ROI of these groups and the calculate a SD of ROI of those groups. Then you could compare it to bunch of groups taken from randomly selected races. While there are still some problems with this method, I would feel a lot more comfortable with the results. The main problem would selecting the size of the groups. There may be a formula for selecting the size, but I don't know it. I took my last stats class 40 years ago so I have forgotten some of the stuff, but I remember that one of profs emphasizing that using tests for data analysis that on things it wasn't meant to was a big no no.

Robert Goren
05-31-2011, 01:39 PM
I know that is about as clear as mud, but it the best I can do. Perhaps someone else who had problems with it not being a normal distribution could explain it better.

Bill Cullen
05-31-2011, 02:27 PM
The level of statistical significance is chosen before a test or experiment is performed. The higher the level of statistical significance one seeks, the bigger your sample size needs to be.

That's the general statistical rule whether your testing a horse racing angle, a new medical treatment or conducting an experiment in quantum physics.

Best,

Bill C

Capper Al
05-31-2011, 04:07 PM
not enough data



what trend are you asking about - the % win or the ROI

what are you comparing to? the favorites?
33% win -0.20 ROI

a random horse entered in a race?
13% win -0.20 ROI


how did you get these results?

This brings up a good idea for a piratical test if you so desire. Take 3 days randomly and see if you can confirm that favorites win around 33% with about 30 races.

windoor
05-31-2011, 04:42 PM
For the statisticians a question. How any races do I need to have a statistical significant sample and how do I arrive at that. Say I have 50 races with a winning percent of 55% and an roi of +$.50. Is that enough data to predict that this trend will continue for the next 50 races?


As a win only player:

I usually use 100 consecutive playable races (at the same track) as a starting point before I take interest in whatever I am researching.

If I have a decent win percent and average odd, I will pull random months (at the same track) to see if it still valid. The lower the win percent, the longer the losing streaks can be. I don't like playing any under 25%, but I do have one at 20% with such a high average odd and ROI, that I am willing to suffer the losing streaks.

If that is go, then I will throw a $200 bank at it to support a $2 win wager and let it fly. If it succeeds, double the wager at 125% (pull 25% for yourself) continue to do this until you get to the level of play you desire.

If at anytime it loses 50 betting units from a bank high, I will investigate and make changes as needed.

If it fails the first time out, you only lost $200. Start over.

A wise man once told me, if you can't earn a profit with a small wager, you certainly won't with a large one.

I asked a similar question when using a database software program I am working with. It all comes down to sample size for research and sample size for validation. It is still under discussion.

Regards,

Windoor.

TrifectaMike
05-31-2011, 05:01 PM
With ROI, you would have a rate on each individual race of either -1.00 for a losing bet(45%) or what ever the odds were on a winning bet(55%). The distribution is not any thing close to a normal curve so therefore the least squared standard deviation is meaningless. Therefore any test using a standard deviation would be meaningless. I am willing give accept a small amount of skewness, but I can believe that any statistician would allow for that much. If it was me, I would divide the sample into equally sized groups and take ROI of these groups and the calculate a SD of ROI of those groups. Then you could compare it to bunch of groups taken from randomly selected races. While there are still some problems with this method, I would feel a lot more comfortable with the results. The main problem would selecting the size of the groups. There may be a formula for selecting the size, but I don't know it. I took my last stats class 40 years ago so I have forgotten some of the stuff, but I remember that one of profs emphasizing that using tests for data analysis that on things it wasn't meant to was a big no no.

ROI, as a measurement parameter is a random variable that has a normal distribution with an associated mean and standard deviation. I am still confused with your explanation.

Mike (Dr Beav)

AlanBaze
05-31-2011, 06:37 PM
Hi Windoor

Your reply looks logical. My questions are how long have you used it? How well has it worked for you? Is it the only approach you use? If not can you elaborate on some others.

Alan

Robert Goren
05-31-2011, 07:19 PM
ROI, as a measurement parameter is a random variable that has a normal distribution with an associated mean and standard deviation. I am still confused with your explanation.

Mike (Dr Beav)The ROI value on the individual bets is not a normal distribution. In this case, 45% of the bets has value of -1.00 . If you were to graph it you would not see the symmetrical curve of a normal distribution. There would be approximately the same number of bets at equal distance from the mean if it was normal. In this case with a mean of 0.50, there would have to be 45% of the bets that returned both -1.00 and and very close to +2.00. Although I don't access to his data, I can not think of scenario where that is true. You would pull a sample of 9 bets you see something like this -1.00(4 times), +0.40(1 time) =0.60(1 time) +1.00(2 times) +2.00(1 time) and +3.5(1 time). No way that looks anything like a normal curve. If you were to pull a bunch of samples of 9 bets each. The ROI of those samples of 9 bets each would resemble a normal curve. I know this sounds picky, but it really does make a quite a difference in the accuracy of the conclusions,

Robert Goren
05-31-2011, 07:36 PM
As a win only player:

I usually use 100 consecutive playable races (at the same track) as a starting point before I take interest in whatever I am researching.

If I have a decent win percent and average odd, I will pull random months (at the same track) to see if it still valid. The lower the win percent, the longer the losing streaks can be. I don't like playing any under 25%, but I do have one at 20% with such a high average odd and ROI, that I am willing to suffer the losing streaks.

If that is go, then I will throw a $200 bank at it to support a $2 win wager and let it fly. If it succeeds, double the wager at 125% (pull 25% for yourself) continue to do this until you get to the level of play you desire.

If at anytime it loses 50 betting units from a bank high, I will investigate and make changes as needed.

If it fails the first time out, you only lost $200. Start over.

A wise man once told me, if you can't earn a profit with a small wager, you certainly won't with a large one.

I asked a similar question when using a database software program I am working with. It all comes down to sample size for research and sample size for validation. It is still under discussion.

Regards,

Windoor.From a practical point of view, there is not too much wrong this approach as long you don't go through your $200 too often. It seems reasonable if you are willing to gamble a little in hopes of finding something profitable and don't have access to a huge data base. A lot of things look good on paper but turn sour when you start plunking the cash. This way you find out pretty quick if it has a chance of being workable. Things like late odds changes can screw with things that look to work on paper.

windoor
05-31-2011, 11:00 PM
Hi Windoor

Your reply looks logical. My questions are how long have you used it? How well has it worked for you? Is it the only approach you use? If not can you elaborate on some others.

Alan

Hello Alan,

I am only now getting into database handicapping and research. Seems everybody knows more about it then I do. I have much to learn. The same principles will apply here, I would think. It should just make it much easier to find profitable angles to try.

What used to take me months, to see if it what I was looking at had some value, now only takes a few minutes. Finding a software program that will allow me to test my “unorthodox” methods is proving quite difficult, but with some expert help, it is beginning to look like I (we) can get it done. Time will tell.

Starting with a small bank to test an idea is something I have been doing for the last 3 or 4 years. I had the advice much earlier than that, but I can be pretty dense at times.

It is the only way I will test a new play, regardless of what the research says. If it truly does have value, it will grow into a nice bank that you can feed on :)

Greed Kills: Keep to the game plan in good times and bad. Survive the losing streak. I always see a nice increase in odds after a substandard month or extended losing streak. This tells me others are probably looking at the same things I see, but lose heart or bankroll when things turn bad. I’m ok with that :)

You have heard me say it before. Patience, discipline, and record keeping are the key qualities you need to beat this game. There may be some gifted handicappers that can do well without them, but I am not among the elite. I had to “work” very hard to get where I am today. I only hope it continues.

Regards,

Windoor.

On Spec
06-01-2011, 02:07 AM
I don't mean to be snarky (well, OK, just a little bit), but:

Why would you want to prove that a handicapping approach is statistically significant?

Go ahead, take as big a database as you want to use, dig something out that looks promising and figure out how you might implement it. But unless you want a career in academia (or handicapping education or tip sheet sales), what will a statistical test tell you that a series of $2 bets won't say better?

I'm genuinely puzzled by the interest in this here. If you want to know if your approach will work when you start putting cash money on it, why not give it a $100 (or a $50) bankroll and see what happens?

Elliott Sidewater
06-01-2011, 03:14 AM
Good discussion. Just a couple points from experience -

1. the sample size required to validate win probability with a high level of confidence is depressingly large, you just can't get away from that.

2. several years ago I was doing paid research for a betting syndicate and although I know that the correct formula appears in this thread to determine the mean and standard deviation for ROI, as a practical matter the ROI started to stabilize down to a reasonably small variation after about 160 winners. So if the win percentage was 20 percent, it would take about 800 races to "know" the ROI.

I hope that helps.

Elliott

Ray2000
06-01-2011, 06:27 AM
Take the easy approach, break the problem into two parts.

Part 1 - The percentage
Part 2 - The ROI

Part 1

n = (Z^2 * p * q)/e^2

Z = z-score ( I suggest 1.96, which corresponds to 95% confidence level)
p = percentage expressed in decimal (50% = .5)
q = 1-p
e = desired level of precision (ex: .05)

Part 2

n = (Z^2 * sigma^2)/e^2

sigma = standard deviation

Z, e same definition as in Part 1


Mike (Dr Beav)


TrifectaMike

I used those formulas on a test run I've been posting in the Harness Forum for chalky Win picks. I'm doing a thousand race test but maybe I'm finished.:D

In 4 months I'll be celebrating my 50th anniversary of dropping that college stats course I took in '61 :), so that tells you where I'm coming from. I just want to be sure I'm doing the ROI Standard deviation right.

Test Population as of today is 830 bets, 39.8% Wins, -0.2% ROI (-0.002)

Part 1 for Strike Rate 39.8% ± 3%, (e=.03) would be

n = (1.96^2 * .398 * .602)/.03^2
n= 1023

This reads..
"the sample size needed to be 95% sure that strike rate is within .398 ± .03 is 1023 races."
Seems reasonable to me.


Part 2 for ROI
I took the $2 return on each wager and did the ROI for each bet,
ROI= -1 for a loss,...and ROI= (payoff-2)/2 for the hits.

The Standard deviation of all 830 ROIs is 1.7849

So by the Formula (again with e=.03)

n = (1.96^2 * 1.7849^2)/.03^2
n=13,599

This reads..
the sample size needed to be 95% sure that ROI is -0.002 ±.03 is 13,599 races.

The ROI, being so close to zero in this run, makes this is hard to see, if the payoffs were 10% larger, then the numbers would be StdDev 1.963, n=16,454 needed races for ROI 10% ±3%

I'm still a bit fuzzy :confused: on what population to use for the StdDev ?

Thanks for the formulas,

Interesting thread, the value of doing the math is to alert you when something has changed and your system is falling apart. If you just continue betting...scratching your head and asking "Is that just a normal losing streak or should I be worried?" ...then you're doomed.



.