PDA

View Full Version : Question for You Educated Statisticians


Track Collector
07-23-2013, 12:38 PM
The following question is directed at those of you who have training in statistics. I hope I can phrase this question so that it makes sense for those trying to offer some advise.

I have a group of handicapping factors for a short race meet that yield the following:

Year 2011 --> 37 plays, Win rate = 59%
Year 2012 --> 33 plays, Win rate = 61%

I recognize that these are tiny samples, but I also understand that with higher occurrence rates one can work with smaller sample sizes than what might otherwise be needed.

If I evaluate this strategy after a number of plays (like 10, 15, etc.), I want to get a feel for how well the strategy is holding up vs. the statistics of the previous 2 years. An 0 for 15 start leads one to a very obvious conclusion, perhaps even an 0 for 10. But what about a 2 for 10 start, or a 4 for 15 start, etc.? Are these results significant enough to suggest that the ending result (for this year's play) are going to be no where close to the 60%, or is there still statistical basis to believe the ending results still might be somewhere "close" to the 60% win rate?

I suspect some of you can run simulations with a 60% win rate and determine what the "normal" variation is for different size groups of plays.

With my luck, chances approach 100% that the results for this year will come in at a win rate of 30% :bang:

Thanks in advance for any help!


Chris

Ocala Mike
07-23-2013, 01:03 PM
I recognize that these are tiny samples, but I also understand that with higher occurrence rates one can work with smaller sample sizes than what might otherwise be needed.







I'm NOT a statistician, but I believe intuitively that this is flawed logic. In short, your sample size of 70 with a strike rate of 60% is too small to insure against what is called "regression to the mean."

DeltaLover
07-23-2013, 04:13 PM
Your sample is very small, something that implies that it suffers from backfitting.

You should relax your criteria resulting to larger samples that can be easily back tested.

Peeking 40 races out of a yearly worth of data, pretty much allows you to come up with any conclusion you can think of.

More than this, solely win percentage is useless for betting decisions. What counts is to what extend a specific angle is underestimated by the crowd, winning frequency is irrelative for betting purposes.

Robert Goren
07-23-2013, 04:45 PM
It tough to know when to abandon something even if you had a large sample. Conditions do change and the failure to recognize that change is always a tricky subject. For your case, I would use this formula. If (0.6*N)-(1.25*Square root of N)> the winners this year where N is the races this year, then you have a big problem.
I will say this about small samples. A lot of things are decide on a lot smaller samples than you have. Ideas are often abandoned in the business world after 1 or 2 failures at the start because the cost of continuing. 1 success at the start can lead to lot of money being lost also. Even in horse racing you have to make decisions about short term track biases on the basis of one or two races or the bias will disappear before you can take advantage of it.

Robert Goren
07-23-2013, 05:08 PM
One more thing, the chances of find something in horse racing with a large sample that anymore than marginally profitable is next to none. People with large samples are overjoyed if they get a 2% ROI. It is the rare event that repeats itself with the same result that produce real profits. If it happens often enough to get a large sample, other people are already all over it.

TrifectaMike
07-23-2013, 06:17 PM
The following question is directed at those of you who have training in statistics. I hope I can phrase this question so that it makes sense for those trying to offer some advise.

I have a group of handicapping factors for a short race meet that yield the following:

Year 2011 --> 37 plays, Win rate = 59%
Year 2012 --> 33 plays, Win rate = 61%

I recognize that these are tiny samples, but I also understand that with higher occurrence rates one can work with smaller sample sizes than what might otherwise be needed.

If I evaluate this strategy after a number of plays (like 10, 15, etc.), I want to get a feel for how well the strategy is holding up vs. the statistics of the previous 2 years. An 0 for 15 start leads one to a very obvious conclusion, perhaps even an 0 for 10. But what about a 2 for 10 start, or a 4 for 15 start, etc.? Are these results significant enough to suggest that the ending result (for this year's play) are going to be no where close to the 60%, or is there still statistical basis to believe the ending results still might be somewhere "close" to the 60% win rate?

I suspect some of you can run simulations with a 60% win rate and determine what the "normal" variation is for different size groups of plays.

With my luck, chances approach 100% that the results for this year will come in at a win rate of 30% :bang:

Thanks in advance for any help!


Chris

Chris,

You have sufficient data. However, you might end up with very large credible intervals.

Google Hierachical Bayes Estimating a Proportion ( Beta - Binomial Model ). You are in luck the Beta-Binomial Model has conjucacy, which allows a closed form solution.

Mike

098poi
07-23-2013, 06:21 PM
Chris,

You have sufficient data. However, you might end up with very large credible intervals.

Google Hierachical Bayes Estimating a Proportion ( Beta - Binomial Model ). You are in luck the Beta-Binomial Model has conjucacy, which allows a closed form solution.

Mike

That's what I was going to say.

Ocala Mike
07-23-2013, 07:06 PM
That's what I was going to say.

Yeah, it's way over my head. One thing I would like to know from Track Collector on that sample of 70 producing a 60% strike rate would be the range of payoffs on the 42 hits (average, median, mode, etc.). In other words (and again intuitively), if we're talking $3.00 prices, it's probably reliable and reproducable. If we're talking $6.00 prices, I'd be skeptical.

Track Collector
07-23-2013, 07:14 PM
Your sample is very small, something that implies that it suffers from backfitting.

You should relax your criteria resulting to larger samples that can be easily back tested.

Peeking 40 races out of a yearly worth of data, pretty much allows you to come up with any conclusion you can think of.

More than this, solely win percentage is useless for betting decisions. What counts is to what extend a specific angle is underestimated by the crowd, winning frequency is irrelative for betting purposes.

Unfortunately, the total number of races at the meet were as follows:

2011 --> 58 (sample was 37 out of 58, or from 64% of the total available races)
2012 --> 51 (sample was 33 out of 51, or from 65% of the total available races)

It is was it is, so the very small sample size is going to add a huge risk to the validity of the 60% win rate.

On the plus side, the $1 Win ROIs were 1.22 and 1.24 respectively, and they do not appear to be significantly impacted by one or two high-paying winners (outliers).

When one finds something they like to win, it is not usual to find even more profitability (with of course additional risk) by using that selection in a higher level exotic wager. With this potential increase in ROI, I might still be able to live with a win rate like 50% or even 40%.


...

Track Collector
07-23-2013, 07:56 PM
Yeah, it's way over my head. One thing I would like to know from Track Collector on that sample of 70 producing a 60% strike rate would be the range of payoffs on the 42 hits (average, median, mode, etc.). In other words (and again intuitively), if we're talking $3.00 prices, it's probably reliable and reproducable. If we're talking $6.00 prices, I'd be skeptical.

Chalk city as expected. Ave Win = $4.10

Tom
07-23-2013, 10:22 PM
I'd be betting with both hands until I saw reason to back off.
I'd have been all over year 2 after seeing year 1.
You make your money in the short run.

I'm already betting smaller samples than that at Del Mar.
As Tom Ainslie once wrote, "Now is how."
By the time you get a significant number of ace at this rate, your will be too damn old to enjoy the winnings.

highnote
07-24-2013, 01:51 AM
I'd be betting with both hands until I saw reason to back off.
I'd have been all over year 2 after seeing year 1.
You make your money in the short run.

I'm already betting smaller samples than that at Del Mar.
As Tom Ainslie once wrote, "Now is how."
By the time you get a significant number of ace at this rate, your will be too damn old to enjoy the winnings.


I agree with Tom.

30 samples is the minimum you need to start making any sort of reliable statistical inferences.

One thing you can do is assume your stats will hold up, but make smaller bets until you can determine how valid your selections are. You will limit the amount you win, but you will also limit the amount you lose.

In order to know how much to optimally bet you need to factor in the odds you're getting on your bet.

For example, you have found a system bet that on average has a 50% hit rate. Then you go to the races and find two bets that meet your criteria -- one bet is at even money and one bet is at 3-1 odds. I'd bet more on the 3-1 horse than on the even money horse. However, I'd probably assume that my true chances of winning are less on the 3-1 than on the even money horse. It's easy to overestimate your chances of winning.

formula_2002
07-24-2013, 04:02 AM
The following question is directed at those of you who have training in statistics. I hope I can phrase this question so that it makes sense for those trying to offer some advise.

I have a group of handicapping factors for a short race meet that yield the following:

Year 2011 --> 37 plays, Win rate = 59%
Year 2012 --> 33 plays, Win rate = 61%

I recognize that these are tiny samples, but I also understand that with higher occurrence rates one can work with smaller sample sizes than what might otherwise be needed.

If I evaluate this strategy after a number of plays (like 10, 15, etc.), I want to get a feel for how well the strategy is holding up vs. the statistics of the previous 2 years. An 0 for 15 start leads one to a very obvious conclusion, perhaps even an 0 for 10. But what about a 2 for 10 start, or a 4 for 15 start, etc.? Are these results significant enough to suggest that the ending result (for this year's play) are going to be no where close to the 60%, or is there still statistical basis to believe the ending results still might be somewhere "close" to the 60% win rate?


With my luck, chances approach 100% that the results for this year will come in at a win rate of 30% :bang:
Chris
if that is where your luck takes you, bet it that way. you have already answered your own question bout sample size :)

dkithore
07-24-2013, 04:45 AM
One more thing, the chances of find something in horse racing with a large sample that anymore than marginally profitable is next to none. People with large samples are overjoyed if they get a 2% ROI. It is the rare event that repeats itself with the same result that produce real profits. If it happens often enough to get a large sample, other people are already all over it.

Robert,

You are ONE (horse) worldly wise man. I appreciate your insights and broad perspective with which you observe this game.

DK

p.s. I will never forget your take on bridge jumper thread. That was fun.

Capper Al
07-24-2013, 10:41 AM
That's what I was going to say.

Funny, I was going to say that too.

Ocala Mike
07-24-2013, 03:43 PM
A 60% strike rate at an average price of $4.10 would lead me to believe that your sample size may, in fact, be adequate. I mostly agree now with Tom and others that you should "bet with your head, not over it" unless and until the results turn around. Let us know how year 3 works out.

flatstats
07-27-2013, 08:36 PM
Year 2011 --> 37 plays, Win rate = 59%
Year 2012 --> 33 plays, Win rate = 61%

There is no value element in that statement.

No one has suggested the A/E. This is the key tool.

The A/E will will let you know if the sample size is good; if the method is worth following; indicate if the OP has backfitted or not.

Work out the A/E. It's easy.

Sum all the odds and divide into the actual number of winners. Compensate for the track take.

Set a threshold for the E before you start following the method. Then see if this is just a fluke and if you should put your life savings into this method in the future.

Strike rates and ROI are meaningless on their own. A/E is the key stat. Using it will improve your handicapping.

flatstats
07-28-2013, 07:56 AM
The maths does not have to be complicated. If you can work out the odds then you can run tests such as ARCHIE.

ARCHIE: a method of evaluating systems (http://www.hoof.demon.co.uk/archie.html)

green80
07-28-2013, 01:43 PM
just put together some new software using info in the bris files to make my selections. started this week and handicapped 65 races so far and got 21 wins. (32%) for a 1.24 roi. How many races do I need to handicap (showing a positive roi) before betting real money on this?

Hoofless_Wonder
07-28-2013, 02:22 PM
just put together some new software using info in the bris files to make my selections. started this week and handicapped 65 races so far and got 21 wins. (32%) for a 1.24 roi. How many races do I need to handicap (showing a positive roi) before betting real money on this?

Load up now. No sense letting winners get away. As long as you continue to keep tabs on the code and the results, you can always worry about whether or not it's "statistically" good. Your code is based on your handicapping methods, so why not?

I would expect your ROI to drop over time, or otherwise you have early retirement and a couple of NTRA championships in your future (knocking on wood to avoid jinx...)

Hoofless_Wonder
07-28-2013, 02:46 PM
The maths does not have to be complicated. If you can work out the odds then you can run tests such as ARCHIE.

ARCHIE: a method of evaluating systems (http://www.hoof.demon.co.uk/archie.html)

Hey Flatstats, that's a pretty slick and practical tool to help evaluate small sample sizes. It's been 30 years since I took a graduate level class in statistics, but I seem to remember most analyses required larger sample sizes to become "valid", which of course implies you're taking more risk with smaller samples.

I took TrifectaMike's suggestion and looked up the Hierachical Bayes Estimating a Proportion ( Beta - Binomial Model ). The old saying of "lies, damn lies and statistics" quickly came to mind. It's probably the most scientific approach to address the issue, if you've got some significant spare time on your hands.

At the end of the day, I'd personally worry less about the statistical validity of the method, and concentrate more on accurate tracking of profits to determine if the method is valid. This allows pressing the advantage when the method is hot, scaling back when it turns cold, and of course adhering to the requirement of being flexible and dynamic as a player.....

Track Collector
07-29-2013, 11:05 AM
The maths does not have to be complicated. If you can work out the odds then you can run tests such as ARCHIE.

ARCHIE: a method of evaluating systems (http://www.hoof.demon.co.uk/archie.html)

Hello flatstats,

This is EXACTLY the type of statistical tool I was hoping to learn about when I started the thread in the first place.

There are several gold "nuggets" of information in the article you provided. Perhaps the most important is:

It is important to note that the profit after tax on a level stake is a poor
measure of how the system was performing.

I will be re-reading the article a number of times just to make sure I understand all the logic.

If I did the math correctly, the "Archie" score for my system of plays is 3.52, which I interpret to mean that there is only a 6% chance that the indicated play metrics are as good as they look due to chance circumstances.

I hope to report back in a few weeks how the system "performs" this year. I will also have fun using this statistical tool to look at the other plays in my handicapping methodology.

THANKS AGAIN FOR YOUR GREAT INPUT! :ThmbUp::ThmbUp::ThmbUp:


Chris

DeltaLover
07-29-2013, 11:22 AM
Hello flatstats,

This is EXACTLY the type of statistical tool I was hoping to learn about when I started the thread in the first place.

There are several gold "nuggets" of information in the article you provided. Perhaps the most important is:

It is important to note that the profit after tax on a level stake is a poor
measure of how the system was performing.

I will be re-reading the article a number of times just to make sure I understand all the logic.

If I did the math correctly, the "Archie" score for my system of plays is 3.52, which I interpret to mean that there is only a 6% chance that the indicated play metrics are as good as they look due to chance circumstances.

I hope to report back in a few weeks how the system "performs" this year. I will also have fun using this statistical tool to look at the other plays in my handicapping methodology.

THANKS AGAIN FOR YOUR GREAT INPUT! :ThmbUp::ThmbUp::ThmbUp:


Chris


You can read here as well:

http://www.paceadvantage.com/forum/showthread.php?t=78576

davew
07-29-2013, 08:51 PM
Here is my take - you are concerned about win%, average pay-off, and ROI

you can get 95% confident ranges that the 'true' number falls within

win % - can you make these selections before the race/day starts? you have mostly post time favorites, but that can not be part of the selection process as a certain percent of races (10%?) have a change in favoritism from the last you can place a wager and by the time the race becomes official - at many places over half the money shows up on the board after the horses have left the gate. Knowing the true win % may not matter unless you are going to single in multi-race bets and top of exactas.

average pay-off - I am guessing the numbers you have from previous 2 years does not include any of your bets. Depending on pool size, your $10, $100, $500 bet may drop pay-off substantially

ROI is dependent on the above 2 and is probably what you care most about.


For your question on how many races of running bad before reconsidering your method? If the true win% were 50%, you will lose 1 half the time, 2 in a row 25%, 3 in a row 12.5%, 4in a row 6.25%, 5 in a row 3.125% -> every 32 races you will be starting a 5 race losing streak.


because your past 2 years hit rate is so high, I would start a separate 'bank' for 20-50 races so your bet would be 5% - 2% of total bank. After the number of races in your bank, you would start a new bank with the returns from the previous bank. Once comfortable with results, could switch to Kelly betting. Giving up after a string of only 10-15 bad races would be to early.

Is your meet a fair track that only runs a couple weeks every year?

TrifectaMike
07-29-2013, 09:18 PM
Here is my take - you are concerned about win%, average pay-off, and ROI

you can get 95% confident ranges that the 'true' number falls within


That is not correct.

Mike

Track Collector
07-30-2013, 01:11 AM
Here is my take - you are concerned about win%, average pay-off, and ROI

you can get 95% confident ranges that the 'true' number falls within

win % - can you make these selections before the race/day starts? No, it requires monitoring of the toteboard. you have mostly post time favorites, but that can not be part of the selection process as a certain percent of races (10%?) have a change in favoritism from the last you can place a wager and by the time the race becomes official - at many places over half the money shows up on the board after the horses have left the gate. Being the Post Time Favorite is not an absolute requirement. Knowing the true win % may not matter unless you are going to single in multi-race bets and top of exactas.

average pay-off - I am guessing the numbers you have from previous 2 years does not include any of your bets. Correct. Depending on pool size, your $10, $100, $500 bet may drop pay-off substantially My expected wager size (to win) will reduce the average payout by 1.5% to 2.0%.

ROI is dependent on the above 2 and is probably what you care most about.


For your question on how many races of running bad before reconsidering your method? If the true win% were 50%, you will lose 1 half the time, 2 in a row 25%, 3 in a row 12.5%, 4in a row 6.25%, 5 in a row 3.125% -> every 32 races you will be starting a 5 race losing streak. If the true win% is 60 (which means the true lose% is 40), then 2 losses in a row is 16%, 3 in a row is 6.4%, 4 in a row is 2.6%, and 5 in a row is 1.0%. The longest "observed" streak over the 2-year period is 3, which happened twice. The longest observed win streak was 6.

because your past 2 years hit rate is so high, I would start a separate 'bank' for 20-50 races so your bet would be 5% - 2% of total bank. After the number of races in your bank, you would start a new bank with the returns from the previous bank. Once comfortable with results, could switch to Kelly betting. Giving up after a string of only 10-15 bad races would be to early.

Is your meet a fair track that only runs a couple weeks every year? Yes.

Hi davew,

See responses in Red above.


Chris

davew
07-30-2013, 10:21 AM
a link to a Binomial Confidence Interval Calculator

http://www.biyee.net/data-solution/resources/binomial-confidence-interval-calculator.aspx

plugging in your given N=70 with X=42, show different confidence intervals for what range the 'true' win percentage would be.


Your bet size affecting pay-outs ->My expected wager size (to win) will reduce the average payout by 1.5% to 2.0%.
this is interesting, is this your average increase to the amount bet on the winning horse? Breakage can be such a percentage killer with lower priced winners - a few bucks could turn a $4 into a $3.80 ($2 profit to $1.80)

With more details given, I would not hesitate to up my bets/bank to 10% per race and only stop to reevaluate if bank dropped below 50% of start.

TrifectaMike
07-30-2013, 11:01 AM
a link to a Binomial Confidence Interval Calculator

http://www.biyee.net/data-solution/resources/binomial-confidence-interval-calculator.aspx

plugging in your given N=70 with X=42, show different confidence intervals for what range the 'true' win percentage would be.


Your bet size affecting pay-outs ->My expected wager size (to win) will reduce the average payout by 1.5% to 2.0%.
this is interesting, is this your average increase to the amount bet on the winning horse? Breakage can be such a percentage killer with lower priced winners - a few bucks could turn a $4 into a $3.80 ($2 profit to $1.80)

With more details given, I would not hesitate to up my bets/bank to 10% per race and only stop to reevaluate if bank dropped below 50% of start.


The bolded statement is untrue. In the context of confidence intervals the statement is about the data and NOT the parameter ("true" win perecentages). In the context of confidence intervals the parameter is in the interval with probability of 1 or outside the interval of 0. Confidence intervals are ad-hoc solution for the wrong question.

Mike

Track Collector
07-30-2013, 01:26 PM
Your bet size affecting pay-outs ->My expected wager size (to win) will reduce the average payout by 1.5% to 2.0%.
this is interesting, is this your average increase to the amount bet on the winning horse? Breakage can be such a percentage killer with lower priced winners - a few bucks could turn a $4 into a $3.80 ($2 profit to $1.80)

The $4.10 average $2 payout figure already takes into consideration takeout and breakage.

The relationship between pool dilution and average increase to the amount bet on the winning horse may be linear, but pool dilution should be the focus point. Assuming similar results this year, I can expect, as the result of my participation and expected success level, some negative impact. Specifically, I estimate a 1.5% to 2.0% reduction to the historical $2 payout value of $4.10, so that when the 2013 final average $2 payout for this type of play is determined, it will come in somewhere between $4.04 and $4.06.

Anyone wanting help on pool dilution calculations should feel free to send me a private message. :)


Chris

classhandicapper
08-01-2013, 10:50 AM
Chris,

You have sufficient data. However, you might end up with very large credible intervals.

Google Hierachical Bayes Estimating a Proportion ( Beta - Binomial Model ). You are in luck the Beta-Binomial Model has conjucacy, which allows a closed form solution.

Mike

Mike,

That may have been one of the funniest things I've read here. I had this picture in my head of the look on my father's face the day I tried to explain track variants to him. I said to myself, "That's how I must look right now".

:lol:

I need a dictionary or something.