View Full Version : What are Confidence Intervals
TrifectaMike
08-02-2013, 09:54 AM
Recently, in another thread the following was posted:
a link to a Binomial Confidence Interval Calculator
http://www.biyee.net/data-solution/...calculator.aspx (http://www.biyee.net/data-solution/resources/binomial-confidence-interval-calculator.aspx)
plugging in your given N=70 with X=42, show different confidence intervals for what range the 'true' win percentage would be.
When I see this sort of misinformation, I normally respond by saying. "That is not true" with a brief explanation. I realize a brief explanation is insufficient
My aim is not tp pick on any individual(s) poster(s). This misunderstanding occurs at all levels from the novice to trained statisticians.
"plugging in your given N=70 with X=42, show different confidence intervals for what range the 'true' win percentage would be."
The above statement is simply wrong. The confidence intervals (CI) will not give you the true range of the parameter of interest. The CI tells you nothing about the parameter of interest.
I know you see CI's everywhere, so how can it be an incorrect interpretation?
Well, I don't enjoy calling it a lie,l but it is a lie... a falsehood and quite often a costly one.
Mike
davew
08-02-2013, 12:25 PM
A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.
statistics are a funny thing, everyone can find a better test...
http://www.stat.yale.edu/Courses/1997-98/101/confint.htm
what is the parameter of interest? (that you are referring to?)
TrifectaMike
08-02-2013, 01:13 PM
A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.
statistics are a funny thing, everyone can find a better test...
http://www.stat.yale.edu/Courses/1997-98/101/confint.htm
what is the parameter of interest? (that you are referring to?)
The parameter of interest is usually referred to as theta. Theta can represent a mean, mode, sigma, a proportion, etc.
I don't have the time now. So, I won't go into a detailed discussion.
So, I'll ask this question.
Assume your unknown is the mean for a population and you sample some data, and determine the mean with a confidence interval, what is the probability that the mean lies within your interval?
Mike
formula_2002
08-02-2013, 05:13 PM
confidence intervals!
Is a moment in time, when the sublime attempts to define the ridiculous :cool:
TrifectaMike
08-03-2013, 10:16 AM
The parameter of interest is usually referred to as theta. Theta can represent a mean, mode, sigma, a proportion, etc.
I don't have the time now. So, I won't go into a detailed discussion.
So, I'll ask this question.
Assume your unknown is the mean for a population and you sample some data, and determine the mean with a confidence interval, what is the probability that the mean lies within your interval?
Mike
Assume your unknown is the mean for a population and you sample some data, and determine the mean with a confidence interval, what is the probability that the mean lies within your interval?
Here's the simple explanation.
If the mean is constant, you can ONLY say that the mean either lies in the interval or it does not... in the interval with probability equal to 1 or outside the interval with probability equal to 0.
If the the mean is random, CI have no meaning in that context.
Mike
facorsig
08-04-2013, 08:03 AM
Confidence Intervals are statistically derived KPIs. See ANSI Z1.4 and ANSI Z1.9 for a more detailed treatment.
Fred
TrifectaMike
08-04-2013, 09:20 AM
Let's take it a step further.
A confidence interval (CI) is the answer to the following question:
Give me an interval that will bracket the true value of a parameter in percentages of instances of an experiment that is repeated (fictitiously) a large number of times.
So, a CI is a statistic about the data (fictitious data) and not the parameter.
Unfortunately this by many, if not most, becomes an answer to the following question:
Give me an interval that brackets the true value of a parameter with probability given the particular sample I've actually observed.
More later...
Mike
TrifectaMike
08-04-2013, 10:12 AM
Let's set some context.
If told that (75, 85) is a 95% confidence interval for a horse's speed rating, the interpretation that there is a 95% probability the speed rating lies between 75 and 85 is common. This is a false interpretation.
The confidence actually says that if this confidence interval were computed from each of the same infinite sequence of data sets, then 95% of those intervals will contain the true value of the speed rating,.
It says nothing about that the speed rating lies in the interval for the actual data observed. If we condition on the single set of data observed, there is no randomness in the speed rating and so no probabilities can be stated,.
Mike
OTM Al
08-04-2013, 10:17 AM
The parameter of interest is usually referred to as theta. Theta can represent a mean, mode, sigma, a proportion, etc.
I don't have the time now. So, I won't go into a detailed discussion.
So, I'll ask this question.
Assume your unknown is the mean for a population and you sample some data, and determine the mean with a confidence interval, what is the probability that the mean lies within your interval?
Mike
I'm not really following your argument here but the confidence level is generally set by the tester and thus determines the bounds of the interval. Conversely you can ask "with what level of confidence does my estimate lie between A and B" and you could determine the probabilty.
TrifectaMike
08-04-2013, 05:06 PM
I'm not really following your argument here but the confidence level is generally set by the tester and thus determines the bounds of the interval. Conversely you can ask "with what level of confidence does my estimate lie between A and B" and you could determine the probabilty.
Al,
How is that done (the bold part)?
Mike
OTM Al
08-04-2013, 06:29 PM
Al,
How is that done (the bold part)?
Mike
You would have to have an idea about the distribution of the statistic you were sampling for. If it is a simple mean and you know the true value of that mean, they you could use a standardized normal distribution and integrate across it from point A to point B (both standardized in the same way as the mean) and the mass between those two points would give you the probability that if you sampled the population, the result you would get would lie between those two points. Not a normal thing to do as you are usually trying to find that mean, but the math works all the same.
TrifectaMike
08-04-2013, 07:49 PM
You would have to have an idea about the distribution of the statistic you were sampling for. If it is a simple mean and you know the true value of that mean, they you could use a standardized normal distribution and integrate across it from point A to point B (both standardized in the same way as the mean) and the mass between those two points would give you the probability that if you sampled the population, the result you would get would lie between those two points. Not a normal thing to do as you are usually trying to find that mean, but the math works all the same.
Al,
I believe you are confused,
Firstly, the mean is the parameter that the confidence intervals brackets. If we knew the "true" mean, we wouldn't be computing intervals, since the mean would be a constant (no randomness).
Secondly, confidence intervals inply that the data is random, but the parameter is fixed.
So, if the parameter is fixed and constant, how can we assign a probability to that parameter.
Mike
OTM Al
08-04-2013, 08:22 PM
Al,
I believe you are confused,
Firstly, the mean is the parameter that the confidence intervals brackets. If we knew the "true" mean, we wouldn't be computing intervals, since the mean would be a constant (no randomness).
Secondly, confidence intervals inply that the data is random, but the parameter is fixed.
So, if the parameter is fixed and constant, how can we assign a probability to that parameter.
Mike
Not at all. I was talking about the taking of a sample mean. Say you knew the mean age of 10000 people. The example I gave would answer the question, what is the probability that if you sampled 50 of them, the mean of that sample would lie between say 25 and 35. It's backward of what normally would be done, but works just the same.
TrifectaMike
08-04-2013, 08:40 PM
Not at all. I was talking about the taking of a sample mean. Say you knew the mean age of 10000 people. The example I gave would answer the question, what is the probability that if you sampled 50 of them, the mean of that sample would lie between say 25 and 35. It's backward of what normally would be done, but works just the same.
Al, that is a confidence interval. It is a statement about the data, not the probability that the mean lies between 25 and 35.
Mike
OTM Al
08-04-2013, 10:02 PM
Al, that is a confidence interval. It is a statement about the data, not the probability that the mean lies between 25 and 35.
Mike
If 72% of the mass of the distribution of sample means lies in that interval then there is a 72% chance when taking a sample mean that it will fall in that interval.
TrifectaMike
08-04-2013, 11:01 PM
If 72% of the mass of the distribution of sample means lies in that interval then there is a 72% chance when taking a sample mean that it will fall in that interval.
Al,
Yes, that can be true. If you take an infinite sequence of datasets. And even then it is NOT a probability statement.
If, for example, I was conducting experiments to measure the speed of a specific horse, the unknown parameter is the speed of a specific horse. I cannot consider the specific horse as a randomly chosen horse. It is the specific horse I want to know about. Conventional statistical inferences such as Confidence Intervals never make probability statements about parameters. In this case the speed of a specific horse.
Although there is a small random element, the probability statement is made because of lack of knowledge of the speed. If I condition on the single set of data in front of me, there is NO randomness to the problem, so no frequentist probabilities can be stated ( as you described in your post).
Mike
OTM Al
08-05-2013, 06:23 PM
Maybe you should state the problem as I really don't understand what you are trying to do.
I would say that statistics could be used to estimate the speed of a specific horse. You can get an estimate of his speed by what he did in the past. You could also use data on horses with the same sire or dam. You could include information about the track conditions, field size, post and whatever else you felt relevant. All that could give you an estimate about what he will do today. And as an estimate there will be error. Hand in hand with that error would give you a confidence interval about what range of speeds he could run around that mean estimate.
If you are containing yourself to a single data set, you are only looking at what has happened, so of course all statistics are known. If you are considering that set the whole world, then nothing is in doubt, but if that set is only a subset of the who,e world, then you are still dealing with estimates.
Also, do not think of probabilities as being derived from a frequentist approach. The frequentist thing only gives you an estimate of those probabilities. Probabilities are better described through what is called a fair bet.
TrifectaMike
08-06-2013, 06:29 PM
Al,
Let's try an example:
I look at a horse's pp's. I have 10 speed ratings. They average out to 80 with one sigma = 5. So, I compute a 95% confidence interval for the mean, which is (76.42, 83.58).
How do you inteprete (76.42, 83.58)?
Mike
OTM Al
08-07-2013, 11:14 AM
Al,
Let's try an example:
I look at a horse's pp's. I have 10 speed ratings. They average out to 80 with one sigma = 5. So, I compute a 95% confidence interval for the mean, which is (76.42, 83.58).
How do you inteprete (76.42, 83.58)?
Mike
There is a 95% probability that the true mean (ie the horses true average ability) is in that range.
TrifectaMike
08-07-2013, 11:37 AM
There is a 95% probability that the true mean (ie the horses true average ability) is in that range.
Thanks Al.
So, you agree with the statement:
Confidence Intervals:
Give me an interval that brackets the true value of a parameter with probability given the particular sample I've actually observed.
Does anyone disagree?
Mike
OTM Al
08-07-2013, 11:54 AM
Thanks Al.
So, you agree with the statement:
Confidence Intervals:
Give me an interval that brackets the true value of a parameter with probability given the particular sample I've actually observed.
Does anyone disagree?
Mike
Actually my statement was incorrect. Here is a better definition:
Confidence intervals consist of a range of values (interval) that act as good estimates of the unknown population parameter. However, in infrequent cases, none of these values may cover the value of the parameter. The level of confidence of the confidence interval would indicate the probability that the confidence range captures this true population parameter given a distribution of samples. It does not describe any single sample. This value is represented by a percentage, so when we say, "we are 99% confident that the true value of the parameter is in our confidence interval", we express that 99% of the observed confidence intervals will hold the true value of the parameter.
DeltaLover
08-07-2013, 12:08 PM
(What I will say is not exactly on topic)
I am reluctant to use previous speed figures to construct an average speed.
Speed figures are not distributed normally and what concerns us as bettors is not the average speed but the probability of a spike.
Although reasonable to assume that past speed figures affect the future, still this does not add a lot of betting value since it is a concept well known and very well incorporated in the odds.
A profitable model should be able to have an opinion about performance spikes, that will contradict the behavior of the horse as displayed in the past performances.
One of my early approaches (that I latter watched Benter presenting it during a lecture to an Asian math community) was the following:
For each horse:
Input:
- Past performance data
- A closed universe of handicapping factors
Output:
- Average speed figure
- SF sigma
The input was feeding a genetic program consisting of a LISP based DSL. This DSL was trying to optimize the output using as a fitness function a Monte Carlo race simulator using mean and sigma trying to maximize R2.
I have concluded that such an approach results to probabilities highly correlated to the crowd's odds.
The problem of this approach has to to do with the fact that is treating all races as self weighted events.
A better approach might have been to start with a binary filter, marking some races as no bets and some others as betable. By doing so we relief our model with a lot of close calls that will eventually shift it towards to the crowd's opinions. In reality we do not care about becoming better than the crowd in a global fashion.
We should concerned only about races where our opinion is greatly different that the crowd and ignore every other close call. Our systems should be designed in such a way to either give no signal at all, or provide a very extreme one.
Small estimated overlays, should be avoided and only large ones should be selected for betting purposes. I have no interest betting a horse who I think should had been 7-5 while offered at 2-1. Sure, I know that in theory this horse has the potential to be an overlay, but in reality it can very well be an error of my system due to some missing information. My model should silently ignore this starter.
What I am really looking for is the 9-1 shot that I estimate as having a 40% winning chance. My model should be optimized to favor this type of selections, tolerating a lot of error on the close calls while simultaneously magnifying the extreme ones.
I am saying all this, to make a case that it is wrong to judge a model by its R2. A model that has the ability to make correct decisions about when to bet can afford a lower R2 achieving a much higher ROI.
Probably what I said here, is not directly related to the topic of the thread but might have something to do with its relevance to betting....
TrifectaMike
08-07-2013, 12:46 PM
Actually my statement was incorrect. Here is a better definition:
Confidence intervals consist of a range of values (interval) that act as good estimates of the unknown population parameter. However, in infrequent cases, none of these values may cover the value of the parameter. The level of confidence of the confidence interval would indicate the probability that the confidence range captures this true population parameter given a distribution of samples. It does not describe any single sample. This value is represented by a percentage, so when we say, "we are 99% confident that the true value of the parameter is in our confidence interval", we express that 99% of the observed confidence intervals will hold the true value of the parameter.
Al,
Now we're cooking. So, confidence intervals are intervals on the data and not the parameter. In the example I gave, there is one sample (10 speed ratings), so I would believe that the CI says nada about the parameter.
Mike
TrifectaMike
08-07-2013, 01:16 PM
(What I will say is not exactly on topic)
I am reluctant to use previous speed figures to construct an average speed.
Speed figures are not distributed normally and what concerns us as bettors is not the average speed but the probability of a spike.
Although reasonable to assume that past speed figures affect the future, still this does not add a lot of betting value since it is a concept well known and very well incorporated in the odds.
A profitable model should be able to have an opinion about performance spikes, that will contradict the behavior of the horse as displayed in the past performances.
One of my early approaches (that I latter watched Benter presenting it during a lecture to an Asian math community) was the following:
For each horse:
Input:
- Past performance data
- A closed universe of handicapping factors
Output:
- Average speed figure
- SF sigma
The input was feeding a genetic program consisting of a LISP based DSL. This DSL was trying to optimize the output using as a fitness function a Monte Carlo race simulator using mean and sigma trying to maximize R2.
I have concluded that such an approach results to probabilities highly correlated to the crowd's odds.
The problem of this approach has to to do with the fact that is treating all races as self weighted events.
A better approach might have been to start with a binary filter, marking some races as no bets and some others as betable. By doing so we relief our model with a lot of close calls that will eventually shift it towards to the crowd's opinions. In reality we do not care about becoming better than the crowd in a global fashion.
We should concerned only about races where our opinion is greatly different that the crowd and ignore every other close call. Our systems should be designed in such a way to either give no signal at all, or provide a very extreme one.
Small estimated overlays, should be avoided and only large ones should be selected for betting purposes. I have no interest betting a horse who I think should had been 7-5 while offered at 2-1. Sure, I know that in theory this horse has the potential to be an overlay, but in reality it can very well be an error of my system due to some missing information. My model should silently ignore this starter.
What I am really looking for is the 9-1 shot that I estimate as having a 40% winning chance. My model should be optimized to favor this type of selections, tolerating a lot of error on the close calls while simultaneously magnifying the extreme ones.
I am saying all this, to make a case that it is wrong to judge a model by its R2. A model that has the ability to make correct decisions about when to bet can afford a lower R2 achieving a much higher ROI.
Probably what I said here, is not directly related to the topic of the thread but might have something to do with its relevance to betting....
Delta,
Some good things. Some I agree with, some I don't.
I am reluctant to use previous speed figures to construct an average speed. Speed figures are not distributed normally and what concerns us as bettors is not the average speed but the probability of a spike.
Actually, speed rating like bris, beyer, etc are normally distributed (if we exclude a zero rating.... stopped, didn't finish race, etc),
Speed ratings are a transformation of race times, which is a skewed distribution to a normal distribution. The problem with speed ratings is that they are discounted for betting purposes. And determining a more accurate rating is futile, UNLESS it can change the rank associated with the rating in any particular race. A 95 rating vs a 93 rating is NOT going to change the odds distribution, as long as the 93 and 95 have the same rank in the field.
Speed ratings do have some utility as a predictor when viewed in a hierachical basis in a particular field. For example, a 95 rating can shrink depending on the speed ratings given to the other entries in the field.
One of my early approaches (that I latter watched Benter presenting it during a lecture to an Asian math community) was the following:
For each horse:
Input:
- Past performance data
- A closed universe of handicapping factors
Output:
- Average speed figure
- SF sigma
The input was feeding a genetic program consisting of a LISP based DSL. This DSL was trying to optimize the output using as a fitness function a Monte Carlo race simulator using mean and sigma trying to maximize R2.
I have concluded that such an approach results to probabilities highly correlated to the crowd's odds.
As I said on several previous occasions an unbiased estimater is not desireable.
However, if you can well estimate the "actual" time of the race and the average speed rating of each horse, the difference is a useful predictor.
We should concerned only about races where our opinion is greatly different that the crowd and ignore every other close call. Our systems should be designed in such a way to either give no signal at all, or provide a very extreme one.
I agree, and have written about this many times.
What I am really looking for is the 9-1 shot that I estimate as having a 40% winning chance. My model should be optimized to favor this type of selections, tolerating a lot of error on the close calls while simultaneously magnifying the extreme ones.
I am saying all this, to make a case that it is wrong to judge a model by its R2. A model that has the ability to make correct decisions about when to bet can afford a lower R2 achieving a much higher ROI.
I agree (see above).
Good stuff, Delta
Mike
DeltaLover
08-07-2013, 05:00 PM
Here you can see a more detailed description:
http://www.codingismycraft.com/wp-content/uploads/2013/08/W1.pdf
pondman
08-07-2013, 06:21 PM
(
What I am really looking for is the 9-1 shot that I estimate as having a 40% winning chance. My model should be optimized to favor this type of selections, tolerating a lot of error on the close calls while simultaneously magnifying the extreme ones.....
The crowd responds to the speed #s and isn't just going to give this to you, unless something unusual, or something that can't be quantified by the speed crowd is part of the race.
And then the method becomes a part of something else, such as counting how many horses repeat their performance when moving from dirt to grass. And at that point, you might as well not try competing against the crowd with a speed method, but become a grass expert. Or become an expert at leaps in class.
TrifectaMike
08-07-2013, 06:23 PM
Here you can see a more detailed description:
http://www.codingismycraft.com/wp-content/uploads/2013/08/W1.pdf
Delta,
I see you've been around the block a few times. I never did like NN's for the horse racing problem, especially since the data domains are assumed to be modeled by a Guassian distribution.
Yes, the result is high correlation with the crowd (after all you are seeking an unbiased estimator).
Just curious, why LISP to create your DSL and not Python (was it a runtime problem?)
Mike
DeltaLover
08-07-2013, 10:23 PM
Delta,
I never did like NN's for the horse racing problem, especially since the data domains are assumed to be modeled by a Guassian distribution.
I completely agree. NN's are not suitable for our domain.
I no longer use them.
What I use are genetic programs consisting of scripts created programmatically.
Just curious, why LISP to create your DSL and not Python (was it a runtime problem?)
I favor LISP over Python for DSL purposes, because it makes the creation of decision trees very easy.
My latest implementation is using Python to maintain and dynamically alter the decision trees which are automatically converted to LISP expressions that are evaluated in run time.
I serialize the script to LISP code which can either be called directly (for real betting purposes) or used for further learning). So my choice of LISP is mainly for convenience rather than anything else.
Some interesting links about LISP and DSL:
http://patricklogan.blogspot.com/2005/04/new-lisp.html
http://programmers.stackexchange.com/questions/81202/in-what-area-is-lisps-macro-better-than-rubys-ability-to-create-dsl
http://programmers.stackexchange.com/questions/60028/is-there-any-evidence-that-lisp-actually-is-better-than-other-languages-at-artif?rq=1
In the past when my code base was still .NET based, I used to do the same thing, calling Scheme (which is a dialect of LISP) from C# using the following:
http://ironscheme.codeplex.com/
I like the topic and I might write a post for my blog since I maintain some strong opinions about....
DeltaLover
08-07-2013, 10:26 PM
The crowd responds to the speed #s and isn't just going to give this to you, unless something unusual, or something that can't be quantified by the speed crowd is part of the race.
Probably you are right for my specific example which is a bit extreme.
What I was trying to underline, was the fact that I am looking for horses presenting a large enough spread between what I anticipate as fair odds and what is offered.
TrifectaMike
08-07-2013, 11:01 PM
Delta,
The difficulty with the majority of performance factors is that once you condition on a rank, the gaps (numerical differences) provide very minor or no additional value for betting purposes... as per your data on Bris Power Rating showed. The rank is dominant and not the gaps.
Negative gaps on the other hand can lead to some interesting results,
Mike
TexasDolly
08-08-2013, 09:04 AM
Delta,
The difficulty with the majority of performance factors is that once you condition on a rank, the gaps (numerical differences) provide very minor or no additional value for betting purposes... as per your data on Bris Power Rating showed. The rank is dominant and not the gaps.
Negative gaps on the other hand can lead to some interesting results,
Mike
Mike, I didn't grasp what negative gaps are when you mentioned it before and I still don't. Would you take a minute and explain it somewhat ?
Thank you,
TD
HUSKER55
08-08-2013, 09:44 AM
is that like the rank1 = 95 and rank 2 is 50?
TrifectaMike
08-08-2013, 11:07 AM
Mike, I didn't grasp what negative gaps are when you mentioned it before and I still don't. Would you take a minute and explain it somewhat ?
Thank you,
TD
A negative gap is simply Rank 2 - Rank 1 (as Husker55 answered)
Or for n, Rank n - Rank1, n= 2,...,N
So, if we are studying some data and we condition on the Rank = 1 of that variable, a positive gap is simply Rank 1 - Rank n, n= 2,...,N
I have learned from experience that when you condition on Rank =1 and the gaps result in a fairly constant result, let's say ROI, then it is a good idea to look at what is happening with the negative gaps for Rank n, n= 2,...,N.
Mike
TexasDolly
08-08-2013, 11:12 AM
A negative gap is simply Rank 2 - Rank 1 (as Husker55 answered)
Or for n, Rank n - Rank1, n= 2,...,N
So, if we are studying some data and we condition on the Rank = 1 of that variable, a positive gap is simply Rank 1 - Rank n, n= 2,...,N
I have learned from experience that when you condition on Rank =1 and the gaps result in a fairly constant result, let's say ROI, then it is a good idea to look at what is happening with the negative gaps for Rank n, n= 2,...,N.
Mike
Thanks Mike,I think the last paragraph was the help I was seeking.
TD
vBulletin® v3.8.9, Copyright ©2000-2024, vBulletin Solutions, Inc.