PDA

View Full Version : Confidence intervals


Foolish Pleasure
06-11-2003, 03:52 PM
I have seen numerous posts of citing various confidence numbers for certain statistics,

can anyone please elaborate on the methodology being utilized.

I believe most random distributions and consequently statistics gained from them were typically designed for the physical sciences. I would be curious what type of coeffcients are being used to normalize these calculations for the endeavor at hand.

Thanks

diablogger
06-13-2003, 05:00 PM
Historically they were designed for agriculture. And beer. (Guiness played a significant role in the early advancement of statistics.)

Foolish Pleasure
06-18-2003, 02:41 PM
To my knowledge,

statisitics evolved with physics,

but since no one answered, I'll assume you are all using some random probability distribution, for something it wasn't designed for, likely your confidence intervals are non-sensical and will rarely perform as you cite, and I strongly urge you to figure out what's going on and make the necessary adjustments.

As an example if your using a normal distribution, I would consider at least doubling the Z scores to reflect the application.

Skanoochies
06-18-2003, 07:46 PM
WOW!!! What did he say???:D

Lefty
06-18-2003, 10:14 PM
He said dble the Z scores. ZZZZZZZZZZ

Rick
06-19-2003, 04:58 AM
FP,

Most people refer to confidence intervals for the normal distribution but that's just one possibility and most racing data doesn't fit that distribution very well. There are some ways to manipulate the data to make it fit better though. It's even worse than that though. Let's say you find a way to transform the data so that it fits a normal distribution almost perfectly. Well, you still have the problem that the mean and variance might be different at some future time. The statistical term for that would be that it's a "non-stationary process".

The best example is when you try to determine what your future ROI might be. If you knew it would be constant you could just take a sufficiently large sample and calculate the mean and standard deviation. From that you can calculate confidence intervals for your ROI in future samples. Since the standard error of a mean is guaranteed to be normally distributed in sufficiently large samples (>= 30) you should have a very accurate estimate. BUT, there's no guarantee that the mean and standard deviation will be the same in future samples because the prices are determined by the level of knowledge of your opponents, the size of the fields in the races, and a myriad of other factors that may change.

So, many times it's not the guy who analyzes 100,000 races that is the big winner. It's more likely the guy who develops a technique that is relatively immune to the inevitable changes that occur. A statistician would say that it's more "robust".

Foolish Pleasure
06-19-2003, 12:16 PM
Thanks Rick,

I have never attempted to apply random probability theory to racing, however some distributions fit quite nicely in sportsbetting and I made the mistake of not taking the nature of the beast into account before utilizing them and raising certain parameters.

Your post in very interesting, personally I never used any of the applications for racing because they don't fit so nicely. Consequently, I read quite a few posts here citing them and was very curious how these problems were overcome.

I'm not sure that racing is so dynamic that the problems you mention are that big a factor. I suppose it depends entirely on what populations are used etc.....

Rick
06-19-2003, 04:12 PM
FP,

The significance of most factors is relatively stable over time except where radical changes occur such as in the effect of drugs on the value of recency. However, there is a slow change in the public odds as they shift their emphasis from one area to another.

There's a saying in business forecasting that goes, "with more than four variables you can fit an elephant", impying that you're likely to get garbage if you use too many variables in real world forecasting. The same thing applies to horse racing and most successful models that I know of use only a few variables. The model used by the Hong Kong Betting Syndicate would be the exception here and I think they have the advantage of having all of the races run at one track by a limited number of horses. This "closed universe" would tend to make their predictions more accurate than would be possible at US tracks.