PDA

View Full Version : sample size question


Parson
03-13-2017, 04:52 PM
For a number of years, I really did not know if I won or lost money thru the windows. This fall I made myself a promise that I would track all of my wagers to determine when and how to play. I tested for almost 2 months to get and idea of how I would be able to profit. My question is how big of a sample do I need before I go double fisted? My testing and live play are very close. the testing sample was 100 races. I went live and betting flat to win on 2 horses, I am hitting @ 63.22%. ROI is 1.13 thru Saturday. My exacta bets are 2 horses keyed over 3 others. That has a hit rate of 33.33% and a roi of 1.35. All of this is live play is @ AQU from 2/10 thru 3/11 and played 81 races.

I was not impressed with the ROI, hit rate was great. I went back and read the charts and if I only limit my win bets to 3/1 or greater, my play drops some on the amount of races, but I may or may not have 2 horses to bet, but the win rate is still 25% ROI of 1.42 but the race sample is down to 71 races. Does this sample size of real money bets enough? Thoughts from anyone would be appreciated.

Elliott Sidewater
03-13-2017, 05:54 PM
I know the mathematicians here will probably disagree, but knowing your ROI within a range requires a larger sample than verifying your hit percentage on the 2 horse win bets. So, my advice, having worked this through experience, is that the ROI will start to stabilize at around 160 winners. Using your numbers, that is around 253 races (you may hit the 160 winners a little sooner or later). So you need around 153 more, and then look to see what your ROI is for the entire sample. From there you could decide to bet Kelly or fractional Kelly, or perhaps just flat bets of comfortable size. Do you bet the same amount of money on your 2 horses to win?

Parson
03-13-2017, 06:27 PM
I usually bet the same on both my picks but it has to be a potential return larger than the bet. I have tried to dutch in the past with limited sucess. It has been my experience to flat bet and raise my bet with the bankroll

Cratos
03-13-2017, 10:25 PM
For a number of years, I really did not know if I won or lost money thru the windows. This fall I made myself a promise that I would track all of my wagers to determine when and how to play. I tested for almost 2 months to get and idea of how I would be able to profit. My question is how big of a sample do I need before I go double fisted? My testing and live play are very close. the testing sample was 100 races. I went live and betting flat to win on 2 horses, I am hitting @ 63.22%. ROI is 1.13 thru Saturday. My exacta bets are 2 horses keyed over 3 others. That has a hit rate of 33.33% and a roi of 1.35. All of this is live play is @ AQU from 2/10 thru 3/11 and played 81 races.

I was not impressed with the ROI, hit rate was great. I went back and read the charts and if I only limit my win bets to 3/1 or greater, my play drops some on the amount of races, but I may or may not have 2 horses to bet, but the win rate is still 25% ROI of 1.42 but the race sample is down to 71 races. Does this sample size of real money bets enough? Thoughts from anyone would be appreciated.

The question you asked: ”how big of a sample do I need before I go double fisted?” is not the right question to ask.

The question should be: “How well does my sample statistic estimate my underlying population value?” And the answer to that question should be answered by the confidence level which describes the uncertainty associated with a sampling method.

Parson
03-14-2017, 11:14 AM
Confidence is not too bad at the moment but I would like to have a few more races to be sure. The only thing about having enough, is by the time that happens, the meet will be over and I will have to start at another track.

Frankie D
03-14-2017, 04:17 PM
Sorry, I am new around here.

When you say you are getting ROI of 1.42 does that mean for every dollar you bet you are receiving in return 42% or 142% (which includes the original wager I suppose)?

Inner Dirt
03-14-2017, 04:41 PM
If a track raced an average of 5 days a week and 9 races a day your sample would be slightly over 2 weeks. Coming from experience gambling daily I have had 4 week periods were I was on fire and then the following month betting the same way could not gamble my way out of a wet paper bag. Back in the days where multiple race bets were few and far between I had a 1 for 20 stretch on cashing win bets after hitting over 40% the previous month, same track same meet.

Honestly not applying a bunch of Einstein level formulas I would say somewhere around 1,000 races would be what it takes to get an accurate gauge.

Parson
03-14-2017, 05:45 PM
Welcome to the Board Frankie, yes it means I am making 42% on my wagers or for every 2.00 bet a return of 2.84.

Parson
03-14-2017, 05:51 PM
Innerdirt,

I am not betting every race. On occasion during my sample, I did not see any race to bet on that card that day and on a couple I only played 2 or three. This past Saturday, I played 7. That is the most of any days I believe. Like I said, if I need a 1000 race base, then based on how I chose to play, the meet is over. What about a boutique meet like Keeneland? How long would it take to get to 1000 races in my Keeneland base? There are about 45-50 races a week and a three week meet. Assuming I only wager 30 races a week, that means 90 races a meet and that would take over 10 years to get to 1000 races. Hell I wont live that long. Any thoughts would be appreciated.

traynor
03-14-2017, 06:50 PM
Innerdirt,

I am not betting every race. On occasion during my sample, I did not see any race to bet on that card that day and on a couple I only played 2 or three. This past Saturday, I played 7. That is the most of any days I believe. Like I said, if I need a 1000 race base, then based on how I chose to play, the meet is over. What about a boutique meet like Keeneland? How long would it take to get to 1000 races in my Keeneland base? There are about 45-50 races a week and a three week meet. Assuming I only wager 30 races a week, that means 90 races a meet and that would take over 10 years to get to 1000 races. Hell I wont live that long. Any thoughts would be appreciated.

Large samples are good, but not necessarily useful. "Confidence levels" are similarly good, but not necessarily useful. The key element is distribution. In plain English, are your results "normally distributed" (meaning they can be applied to a larger sample with identical or similar results), or are they an anomaly that only exists in the specific (small) set of races that you calculate? I dunno. There are ways to use a smaller sample and "test" it in various ways to determine if the distribution is "normal" or not.

Most common (and simplest) way to start is to split your sample in half randomly, and see if the results in one half resemble (or repeat) the results in the other half. Thirds are better, quarters better still.

It is a problem. The app I use generates dozens (sometimes hundreds) of "positive ROI models" every day of the week. Most are worthless because they are descriptive (of what happened in a relatively small subset of races) rather than predictive (of what the results are likely to be when applied to a different set of races). However, some are like the keys to the candy store. I like those.

Dave Schwartz
03-14-2017, 07:10 PM
I have, first hand, seen systems work for 1,500 races and be losers at 3,000 races.

The higher median mutuel, the more races you need.

As someone said long ago, (maybe Alan Wilson of Casino Gambler's Guide, circa 1959 or so) double your bankroll three times so that you have 8 times what you started with and you've probably got a system that works.

Cratos
03-14-2017, 07:33 PM
I don’t want to take this into a discussion of mathematical statistics, but being informed correctly is necessary in this instance.

Therefore, the textbook explanation of the “interval estimate” is as follows:

“Confidence limits for the mean are an interval estimate for the mean. Interval estimates are often desirable because the estimate of the mean varies from sample to sample. Instead of a single estimate for the mean, a confidence interval generates a lower and upper limit for the mean. The interval estimate gives an indication of how much uncertainty there is in our estimate of the true mean. The narrower the interval, the more precise is our estimate.”

LottaKash
03-14-2017, 09:05 PM
However, some are like the keys to the candy store. I like those.

Subsets?...

How big would a subset have to be for you to like those?....

With confidence and kash $$...

traynor
03-14-2017, 10:03 PM
Subsets?...

How big would a subset have to be for you to like those?....

With confidence and kash $$...

It depends. I use a lot of models that extract 50-100 hits from a couple of hundred matches. That is not backfitting that number from a small sample. That is a thoroughly tested application of a previously developed model trained and applied to new data.

traynor
03-14-2017, 10:07 PM
I don’t want to take this into a discussion of mathematical statistics, but being informed correctly is necessary in this instance.

Therefore, the textbook explanation of the “interval estimate” is as follows:

“Confidence limits for the mean are an interval estimate for the mean. Interval estimates are often desirable because the estimate of the mean varies from sample to sample. Instead of a single estimate for the mean, a confidence interval generates a lower and upper limit for the mean. The interval estimate gives an indication of how much uncertainty there is in our estimate of the true mean. The narrower the interval, the more precise is our estimate.”


Not really. The assumption that the data is "representative" may be incorrect. It may be seriously skewed.

It may be fine for textbooks intended for the consumption of dewy-eyed innocents who uncritically accept such. Not so good for betting purposes.

traynor
03-14-2017, 10:11 PM
For the OP:

"For certain predictive models to be able to learn and generalize, it takes thousands and thousands of records. In line with our example above, a hundred or so records containing data for customers that churned in the past may not be enough. If not enough data is used for training, a model may not be able learn or worse, it may over fit. That means that it learns everything about the given data during training, but it is incapable of generalizing that knowledge when presented with new data. It is simply unable to predict."

Quote from "ba-predictive-analytics1-pdf" from IBM DeveloperWorks. (Paste it into Google--only 7-8 pages, minimal BS--intended for business readers.)

traynor
03-14-2017, 10:16 PM
I have, first hand, seen systems work for 1,500 races and be losers at 3,000 races.

The higher median mutuel, the more races you need.

As someone said long ago, (maybe Alan Wilson of Casino Gambler's Guide, circa 1959 or so) double your bankroll three times so that you have 8 times what you started with and you've probably got a system that works.

I don't know about the median mutuel part (because I clean all that and a lot more before I even look at it), but I very much agree with the rest of your comment.

Cratos
03-14-2017, 11:49 PM
I don’t want to take this thread into an academic argument about the banal concepts of applied statistics, but I will say horseracing is not much different than many other risky endeavors.

However, in part what is wrong with horserace gambling is an unrealistic belief of prediction/winning because of a weak understanding of statistical applications; and skewness is an inherent part of understanding in statistical analysis

cj
03-15-2017, 12:34 AM
:lol: I:lol::lol:

formula_2002
03-15-2017, 08:43 AM
I have, first hand, seen systems work for 1,500 races and be losers at 3,000 races.

The higher median mutuel, the more races you need.

As someone said long ago, (maybe Alan Wilson of Casino Gambler's Guide, circa 1959 or so) double your bankroll three times so that you have 8 times what you started with and you've probably got a system that works.

dave did you do an incremental odds analysis on the 1500 races