Mathamatical Probability Formula? [Archive] - Horse Racing Forum - PaceAdvantage.Com

View Full Version : Mathamatical Probability Formula?

InFront

09-29-2008, 09:31 PM

Need some advice and help. Is there a mathamatical formula that tells you what number of races in sample, number of horses in sample, number of total plays found, win% and roi% you need to figure out the probability that such a spot play angle will hold going forward in a new sample of races?

For example say you had a year database that contained 50,000 races with 400,000 horses. And you tested a spot play angle that produced 1,000 plays for that year with a 15% wins and a 1.10 roi. What is the chance that this same angle will hold similar stats in the next year of 50,000 new races?

In otherwords is there some mathamatical formula that shows you the MINIMUM stats you need as far as # of races, # of horses, # of plays, win% and roi% that will give a 80% confidence level that this same angle will hold forwardly, 90% confidence level, 95% level, etc.

As we all know discovering and backfitting stuff can be easily done in PAST races to show PAST profits but how most of them fail miserably once tested into new database samples of races. Maybe cause the sample size was too small, number of plays, win%, etc. But thought I read long ago that there was an actual mathamatical proven formula to help us with this dilemma just don't remember how it was calculated. Thanks to anyone who replies.

ryesteve

09-29-2008, 10:26 PM

Is there a mathamatical formula that tells you what number of races in sample, number of horses in sample, number of total plays found, win% and roi% you need to figure out the probability that such a spot play angle will hold going forward in a new sample of races?In a word, no. You can't build confidence intervals around these performance metrics, because it's not as if you're simply taking objective measurements of performance in your data; you're fitting specific criteria around the data, so by definition, it's biased.

If you've got 50,000 races, what you want to do is look for spot plays in 25,000 of them, and then see how it performs on the other 25,000. This test on independent data is what will provide an unbiased performance estimate that you can't get from just looking at the data from which you derived the spot play.

njcurveball

09-29-2008, 11:13 PM

For example say you had a year database that contained 50,000 races with 400,000 horses. And you tested a spot play angle that produced 1,000 plays for that year with a 15% wins and a 1.10 roi. What is the chance that this same angle will hold similar stats in the next year of 50,000 new races?

This seems to be the most intellectual question these days. When I talked to Ron Tiller at a seminar he mentioned there were guys STILL downloading data after a year or two who have yet to make a bet.

If your spot play is based on a sound fundamental, that is much more important than the size of the sample.

For example, I can take 100 races with the top Beyer and win about 25% of them. THAT statistic will carry forward very well.

Now if I take 100 races where the horse had the letters CH in the name, the ROI could be $10 or $1. Even with 200, 300, 500 races the ROI could vary widely from the norm (about 82 cents on the dollar). Strength of numbers is not proof horses with CH in their name are a better bet.

Build first using race variables (distance,surface, class, track, etc.) After splitting your samples along these lines, use sound fundamentals. Smaller samples based on this type of organization will carry forward much better over time.

Then when you have something worthwhile, break it down by circuit and track.

Good Luck,
Jim

InFront

09-30-2008, 12:13 AM

For example, I can take 100 races with the top Beyer and win about 25% of them. THAT statistic will carry forward very well.

Now if I take 100 races where the horse had the letters CH in the name, the ROI could be $10 or $1. Even with 200, 300, 500 races the ROI could vary widely from the norm (about 82 cents on the dollar). Strength of numbers is not proof horses with CH in their name are a better bet.

Yes this is similar to what I found and something I was gonna post on. Whenever you take a base factor such as something as simple as the best last race speed horse and test it over a year of 50,000 races that whatever the win% and roi% comes back it will repeat itself almost EXACTLY over the next 50,000 races. And it could be ANY FACTOR that produces a play in every race or even decent portion of the total races in sample. Yes the final results may lose money but they are at least consistent in the win and roi department.

But now when you put a set of factors together to make some kind of strict spot play angle and it produces a small profit sample after huge sample of races that unless that same angle also produces very large number of plays for each sample that there is a 99% chance it will fail miserably over the next huge sample of races simply cause the number of plays may have been too low.

This tends me to believe why sample of total races are important and win% and roi% is also important the MAIN STAT which is most important is the actual number of plays that are produced WITHIN that sample. As when testing ANY METHOD all FOUR stats are ALWAYS needed but that last stat could be the main culprit which causes all past good stuff to flop forwardly.

I currently have about 3 years of data covering almost every track which is about 50,000 races per year or about 150,000 races total. That is easily over 1,000,000 horses in my database. I think that is enough to prove or disprove any theory. But as mentioned the problem is that just cause my total samples are huge if any such PROFITABLE spot play produces meaningless number of plays per year that entire huge database is pretty much useless.

Also I am weary breaking anything down to such sub-catagories as (track/distance/class/surface) simply for the above reason. We are now cutting not only are sample size into small groups but are total number of plays into low amounts which could then lead to spot plays having even more of chance to backfire and flop forwardly. What do you think?

robert99

09-30-2008, 06:09 AM

The mathematical formula we use in UK is based on the Chi Squared statitistic and is referred to as Archie. It tells you how likely your results are, based upon chance. If you are OK with them having a 90% chance of being based on skill/knowledge then you need less results/data than if you want to have a 95% assurance. You can never get to 100%, however much data you have and by the time you had that data it would be long out of date and erroneous.

The Archie test is done as follows:

***SCROLL DOWN TO POST #15 TO SEE A CORRECTED VERSION***

ryesteve

09-30-2008, 09:39 AM

But now when you put a set of factors together to make some kind of strict spot play angle... there is a 99% chance it will fail miserably over the next huge sample of races simply cause the number of plays may have been too lowYour conclusion is correct (although "99%" may be a bit pessimistic) but your rationale is not. It will likely fail because you are fitting these factors to the observed data. If you define a spot play BEFORE looking at the data, because it seems logical, and THEN test it, then you CAN trust the results you see in your 50,000 races. But if you look at your 50,000 races, notice combinations of parameters that look successful, you can not reliably say anything about how well this spot play will perform going forward.

Overlay

09-30-2008, 11:16 AM

If the horse started at 4/1 the chance is 1/(4+1) = 0.20 (20% chance) and so on. That horse would be 0.20 of an expected winner. You add up all those expecations for each selection to give a total expected number of winners.

Can you just take the odds at face value in calculating expected winners? Don't you have to factor take and breakage into the calculations? For example, with a 17% take and dime breakage, wouldn't the expected percentage of winners for horses with odds of 4-1 be (1 - .17) / (4.00 + 1.05), or .164 (16.4%), rather than 20%?

Bill Cullen

09-30-2008, 11:57 AM

At the end of the day, the "statistical confidence" level you want is primarily a function of sample size.

How much confidence you can have in your "statistical confidence" level (90%, 95%, etc,) is a judement call on how rigorous the sampling process was and the empircial methodology and experimental design filters used to put the data into different buckets for cross comparisons.

Bill C

GeniusIQ179

09-30-2008, 12:00 PM

NO:bang:

GeniusIQ179

09-30-2008, 12:05 PM

Was that Archie Bunker?

GeniusIQ179

09-30-2008, 12:09 PM

Best Answer:)

ryesteve

09-30-2008, 12:18 PM

How much confidence you can have in your "statistical confidence" level (90%, 95%, etc,) is a judement call on how rigorous the sampling process wasHe's not "sampling"... which is why any discussions of confidence intervals aren't applicable.

Bill Cullen

09-30-2008, 12:39 PM

Your conclusion is correct (although "99%" may be a bit pessimistic) but your rationale is not. It will likely fail because you are fitting these factors to the observed data. If you define a spot play BEFORE looking at the data, because it seems logical, and THEN test it, then you CAN trust the results you see in your 50,000 races. But if you look at your 50,000 races, notice combinations of parameters that look successful, you can not reliably say anything about how well this spot play will perform going forward.

You said it well.

Controlling for experimental bias is critical here.

Experimental or research designs for handicapping could be statically tested by making them run a gauntlet of inquisitors just like the folks have to do in academia. Gotta get the methodology approved before you can even begin data collection much less manipulating the independent variable(s). Usually have to have done a review of the literature as well.

I sure we could find enough inquisitors in this forum to make a quorum for an inquisition!

Best,

Bill C

rrbauer

09-30-2008, 01:57 PM

At the end of the day, the "statistical confidence" level you want is primarily a function of sample size.

Bill C

AND variance......

robert99

09-30-2008, 02:35 PM

Can you just take the odds at face value in calculating expected winners? Don't you have to factor take and breakage into the calculations? For example, with a 17% take and dime breakage, wouldn't the expected percentage of winners for horses with odds of 4-1 be (1 - .17) / (4.00 + 1.05), or .164 (16.4%), rather than 20%?

In UK, you can bet to near zero takeout. Anywhere else, of course, you may need a shrinkage adjustment. If you don't want to make the adjustment you will be on the safe side for any conclusions though.

My post today got a bit messed up. Here is a corrected version:

"The mathematical formula we use in UK is based on the Chi Squared statistic
and is referred to as Archie. It tells you how likely your results are,
based upon chance. If you are OK with them having a 90% chance of being
based on skill/knowledge then you need less results/data than if you want to
have a 95% assurance. You can never get to 100%, however much data you have and by the time you had that data it would be long out of date and
erroneous.

The Archie test is done as follows:

selections × (winners - expected_winners)2
------------------------
expected_winners × (selections - expected_winners)

Expected winners - you need to know the final odds of every selected horse
to work out its best estimated chance. Add up all of the odds as follows:

If the horse started at 4/1 the chance is 1/(4+1) = 0.20 (20% chance) and so on. That horse would be 0.20 of an expected winner. You add up all those
expectations for each selection to give a total expected number of winners.

If for a test example, selections = 1400 and actual winners = 281
and your summed expectation was 249 winners, your Archie equation becomes:

1400 × (281 -249)2
-----------
249 × (1400 - 249)

= 1433600/286599 = 5.002

which gives an Archie Score of approximately 5

Check your Archie score against the Archie table :

Archie Archie Probability
0.3 0.5839
0.5 0.4795
1 0.3173
1.5 0.2207
2 0.1573
2.5 0.1138
3 0.0833
3.5 0.0614
4 0.0455
4.5 0.0339
5 0.0253 (example)
5.5 0.019
6 0.0143
6.5 0.0108
7 0.0082
7.5 0.0062
8 0.0047
8.5 0.0036
9 0.0027
9.5 0.0021
10 0.0016
10.5 0.0012
11 0.0009
11.5 0.0007
12 0.0005

So we look for Archie score = 5 which represents a probability of roughly
2.53% that your method of selecting those winners was just down to chance or fluke. So it is 97.47% probable that the method success was not just due to chance.

If you want more reassurance, you need a higher Archie score and more
samples."

Bill Cullen

09-30-2008, 02:35 PM

My real point is that statistical discussions on this board should be contextually tempered by the nature of the angle(s) or approach(s) under review. Statistics is the hand maiden of empirical research, not the end of it.

Occasionally, and usually serendipitously, statistics will passively reveal
a counter-intuitive insight. Beyond that, Einstein's dictum holds equally true for handicapping as in all areas where empirical methods apply:

"Imagination is more important than knowledge."

Best,

Bill C

Bill Cullen

09-30-2008, 03:14 PM

Reading over my last post I would add to what I said (in the spirit of "Physician, heal thyself.")

There's a fine line between hypothesis and verification:

the finish line.

(a gloss on some line I vaguely recall from the movie "Let It Ride").

Bill C

InFront

10-01-2008, 09:41 PM

Your conclusion is correct (although "99%" may be a bit pessimistic) but your rationale is not. It will likely fail because you are fitting these factors to the observed data. If you define a spot play BEFORE looking at the data, because it seems logical, and THEN test it, then you CAN trust the results you see in your 50,000 races. But if you look at your 50,000 races, notice combinations of parameters that look successful, you can not reliably say anything about how well this spot play will perform going forward.

I agree but do you know the chance of throwing together a bunch of factors even if they all together make some logical sense and then test it against a large database it will come even close to breakeven let alone profitable. This is why as horseplayers we try and tweak and backfit factor after factor trying to discover what may be profitable through our databases. But as you may agree by the time it is finalized together it was so biased toward that sample of races it almost becomes useless when testing forwardly on new races.

CincyHorseplayer

10-01-2008, 10:00 PM

"Imagination is more important than knowledge."

Best,

Bill C

This is so true.As somone who while never published has always been refining some form of literary expression or other,there is a sense of purpose when you are inspired and creativity flows out of you from some immortal well.You sense it,know it,aspire to it and all your energy and focus is geared to absorbing it.When I sit down with a racing form I approach it as if I were creating a work of art.And often handicapping and betting is just that and those days are immortal!!In that vein and spirit comes all the best handicapping ideas IMO.Good quote.

InFront

10-03-2008, 12:39 PM

CincyHorseplayer

10-03-2008, 03:21 PM

It seems no matter what spot plays we can come up with that work through past races we can never be absolutely sure or predict how good they will hold forwardly through new sets of sample races. I guess that is why they call it gambling. Thanks to all that responded.

InFront,I think once you have developed a solid criteria for picking spot plays it comes with ease.It becomes a habit.However putting things through 10 or 20 years of samples and subsets does nothing to improve your game.It may make you conscious of the potential pitfalls,but projection methods,even good ones don't address what separates winning and losing.No mathematical equation can give concrete answers to the ability of making fine line decisions,split hairs on contenders,key horses,designated money on combinations,a changing track bias,especially at 5 minutes to post.To be able to think fluidly,creatively,aggressively and make more right decisions than wrong ones is the greatest skill of any betting endeavor.That is easier said than done and it can only be solved in the heat of action.You get locked in.

I have always had a solid cash rate and can play around .500 with my bank give or take 30%(short term) even when struggling and that survival skill keeps me alive for when I get locked in and go on a run.I am not a proffesional,though I have played fulltime for 1-2 months at a time in the past.But I have had some tremendous runs for being a small bettor.In fall 2004,betting around $90.00 a day average I had a negative -210.00 balance through the first 8 days.In the next 10 days over the course of a month I ran up the balance to $3711.90.I'm still refining my approach to keep myself thinking more and going through the motions less.I need help in the handicapping workload and am stupid in the software department but I know that once I crack that nut it'll be the difference for me being a part time winner and a fulltime player.

InFront

10-03-2008, 06:24 PM

It may make you conscious of the potential pitfalls,but projection methods,even good ones don't address what separates winning and losing.No mathematical equation can give concrete answers to the ability of making fine line decisions,split hairs on contenders,key horses,designated money on combinations,a changing track bias,especially at 5 minutes to post.

It seems you are more of a non-mechanical handicapper that is you use methods of play that require making "judgement calls". Things like this are very difficult or actually impossible to backtest through any databases even if some of your handicapping is based on some mechanical rules. While I on the other hand am more looking for that mechanical approach that is once I have created a winning and profitable formula that is 100% mechanical then simply programming the computer to scan through all tracks and races for the day to spit out plays is the easy part.

But since 99%+ of such mechanical plays come back negative when testing through several large databases and then even the few that do come back profitable still doesn't guarantee future success as this thread is all about. It seems with such mechanical algorithms there are so many variables in horse racing that are constantly working against such formulas it is very tough for them to hold forwardly even when backtested through huge databases.

Cratos

10-05-2008, 09:36 AM