|
|
12-31-2010, 11:04 AM
|
#91
|
Librocubicularist
Join Date: Jun 2010
Location: Ohio
Posts: 10,466
|
Quote:
Originally Posted by TrifectaMike
Where do you guys come up with this stuff? Quirin's standard normal?
Mike
|
Winning at the Races: Computer Discoveries in Thoroughbred Handicapping By William L. Quirin, Ph.D. 1979, page 297.
__________________
Sapere aude
|
|
|
12-31-2010, 11:41 AM
|
#92
|
Registered User
Join Date: Feb 2008
Posts: 1,591
|
I'm not continuing with this thread. If I've insulted anyone, I apologize.
I tend to say too much.
Happy New Year to all.
Mike
|
|
|
12-31-2010, 11:54 AM
|
#93
|
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,912
|
Personally, I am getting a lot out of this thread. I'd hate to see it discontinued.
May I make a suggestion?
Mike is a guy that offered to lead a class. Please, let him do just that. If you wish to take issue with his approach, please do it AFTER he has finished.
At that point you can tell him why YOUR way is better.
Happy New Year to all.
Dave Schwartz
|
|
|
12-31-2010, 12:19 PM
|
#94
|
PA Steward
Join Date: Mar 2001
Location: Del Boca Vista
Posts: 88,632
|
Quote:
Originally Posted by TrifectaMike
I'm not continuing with this thread. If I've insulted anyone, I apologize.
I tend to say too much.
Happy New Year to all.
Mike
|
What did I miss? I've been watching this thread like a hawk making sure it doesn't get sidetracked....tell me how to fix it so that it may continue.
|
|
|
12-31-2010, 12:21 PM
|
#95
|
AllAboutTheROE
Join Date: Aug 2006
Location: Denver
Posts: 2,411
|
Quote:
Originally Posted by Dave Schwartz
Personally, I am getting a lot out of this thread. I'd hate to see it discontinued.
May I make a suggestion?
Mike is a guy that offered to lead a class. Please, let him do just that. If you wish to take issue with his approach, please do it AFTER he has finished.
At that point you can tell him why YOUR way is better.
Happy New Year to all.
Dave Schwartz
|
I agree with Dave. I hope Mike will continue. Why can't everyone not get hung up on the factors (for now) and focus on the development of the methodology?
__________________
"No problem can withstand the assault of sustained thinking" -- Voltaire
|
|
|
12-31-2010, 12:30 PM
|
#96
|
EXCEL with SUPERFECTAS
Join Date: Mar 2004
Posts: 10,206
|
I, too, would like to see Mike continue with his teachings. Although I'm not a "statistics" guy, I'm sure there will be things to learn here.
Maybe if we just went along with Mike's theories, withholding personal beliefs for the time being, there will come a point when the value/viability of the Chi Square statistic will become visible.
|
|
|
12-31-2010, 12:32 PM
|
#97
|
AllAboutTheROE
Join Date: Aug 2006
Location: Denver
Posts: 2,411
|
Quote:
Originally Posted by Actor
I've been going over this with my college statistics textbook by my side and, as best as I can tell, you do not seem to be calculating the Chi-Square Value correctly.
|
I think you're misreading your stats book (or the book is bad, lol). If you don't think Mike is calculating it correctly, try putting the data into a statistical package or even excel and doing the calculation.
From R:
Code:
> recency
Wins Losses
1-14 110 790
15-30 90 890
31+ 65 570
> chisq.test(recency)
Pearson's Chi-squared test
data: recency
X-squared = 4.6765, df = 2, p-value = 0.0965
Looks dead on in his analysis to me (numbers slightly different due to rounding probably).
__________________
"No problem can withstand the assault of sustained thinking" -- Voltaire
|
|
|
12-31-2010, 12:36 PM
|
#98
|
Registered User
Join Date: Feb 2008
Posts: 1,591
|
I made a bold statement and I should not have tried to impose on
Arkansasman. So, I've run a Logistic regression using the last four
Speed Ratings.
Here's an explanation of the independent variables (regressors)
Variable 1 Last Speed Rating
Variable 2 Second Speed Rating Back
Variable 3 Third Speed Rating Back
Variable 4 Fourth Speed Rating Back
How I setup the data:
In each case(last speed rating, etc) I determine the median rating
for each race. Then I determine how each horse's rating differs from the
median. This allows for determining the strength within the race and
also allows to use the data across all races.
Running iteration number 1
Running iteration number 2
Running iteration number 3
Running iteration number 4
Running iteration number 5
Running iteration number 6
Running iteration number 7
The process converged after 7 iterations
The software I use is home grown.
Descriptives.......
154 cases have Y = 1 1236 cases have Y = 0
-2 log likelihood = 967.9034 (Null Model)
-2 log likelihood = 887.8558 (Full Model)
Overall Model Fit...
Chi Square = 80.0476 df = 4 p = 0.0000
R Square = 0.0827
Akaike's Information Criterion = 897.8558
Bayesian Information Criterion = 895.9030
Coefficients and Standard Errors...
Variable Coefficient Standard Error prob
1 0.0788 0.0142 0.0000
2 0.0050 0.0102 0.6216
3 0.0428 0.0123 0.0005
4 0.0243 0.0112 0.0298
Intercept -2.1564
Odds Ratios and 95% Confidence Intervals...
Variable Odds Ratio Low High
1 1.0819 1.0522 1.1125
2 1.0050 0.9852 1.0253
3 1.0438 1.0189 1.0692
4 1.0246 1.0024 1.0474
input data record?
Let me show you the important numbers.
Variable 2 (Second race back rating)
prob = .6216
That is not good!!!!!
Coefficient 0.0050
That is not good!!!
Odds Ratio 1.0050
That is not good!!!!
Mike
Last edited by TrifectaMike; 12-31-2010 at 12:43 PM.
|
|
|
12-31-2010, 01:49 PM
|
#99
|
Registered User
Join Date: Nov 2003
Location: Ohio
Posts: 1,307
|
I don't want to sound negative or discouraging and I hope this is helpful.
You can't run such highly correlated variables as a multiple regression model. Check out a correlation matrix for the variables--if you have values higher than .7 or so you run into multi-colinearity which leads to an unstable model. Instability is tied to the correlation between error terms (Regression assumes a normal distriubtion for error terms--which the correlation violates). Also if you are running a stepwise regression, the first variable "sucks up" all the variance for explanation and doesn't leave enough for the following variables to be associated with. (not a very technical explanation)
You can run them individually as bivariate regressions (one regressor and the logit probability density function--just an S-shaped curve--as the dependent variable) and compare each of those models. So you can run, a model for speed figure one race back, then one for two races back etc. and compare them.
See which one gives you the best fit. That's assuming that is what you want to do. I think what you are trying to do is use the probability of winning as the dependent variable and speed figure as the independent variable.
|
|
|
12-31-2010, 02:14 PM
|
#100
|
Registered User
Join Date: Feb 2008
Posts: 1,591
|
Quote:
Originally Posted by garyoz
I don't want to sound negative or discouraging and I hope this is helpful.
You can't run such highly correlated variables as a multiple regression model. Check out a correlation matrix for the variables--if you have values higher than .7 or so you run into multi-colinearity which leads to an unstable model. Instability is tied to the correlation between error terms (Regression assumes a normal distriubtion for error terms--which the correlation violates). Also if you are running a stepwise regression, the first variable "sucks up" all the variance for explanation and doesn't leave enough for the following variables to be associated with. (not a very technical explanation)
You can run them individually as bivariate regressions (one regressor and the logit probability density function--just an S-shaped curve--as the dependent variable) and compare each of those models. So you can run, a model for speed figure one race back, then one for two races back etc. and compare them.
See which one gives you the best fit. That's assuming that is what you want to do. I think what you are trying to do is use the probability of winning as the dependent variable and speed figure as the independent variable.
|
You're not telling me anything I already don't know. Without getting too technical, it DIDN'T reduce the significance of the 3rd and 4th variable. I stand by these results.
Mike
|
|
|
12-31-2010, 04:24 PM
|
#101
|
Registered User
Join Date: Dec 2005
Location: MI
Posts: 6,330
|
Go for it TrifectaMike. I'm interested in your take on handicapping using the stats. Let's see how your system works. Thanks in advance for doing this.
__________________
"The Law, in its majestic equality, forbids the rich, as well as the poor, to sleep under bridges, to beg in the streets, and to steal bread."
Anatole France
|
|
|
12-31-2010, 08:20 PM
|
#102
|
Registered User
Join Date: Jun 2010
Location: El Paso
Posts: 466
|
Quote:
Originally Posted by TrifectaMike
It's my understanding that estimation is based on Maximum likelihood...finding those coefficients that have the greatest likelihood of producing the observed data. In practice, I would assume that means maximizing the log likelihood function (the objective function).
Hey, that is just my understanding. I could be wrong.
But I ask once again,
"Aside from how one would determine the independent variables, I have another question. How does one test for goodness of fit of a logistic regression?"
Mike
|
That is what I posted.
|
|
|
12-31-2010, 09:42 PM
|
#103
|
Registered User
Join Date: Sep 2005
Location: Ringkoebing
Posts: 4,342
|
Quote:
Originally Posted by TrifectaMike
Let me give you a specific example ( arkansasman try this out with your model). When looking at horses last four Beyers or Bris numbers, etc, you would think that the last is more significant than the previous and so on. The data does not support the premise.
I have found using numerous tests that the second race beyer (or any other similar rating) is insignificant and adds no usable information. The third is significant as well as the fourth, but NOT the second race back. Interesting I would say.
Mike
|
How did you test that? Y ~ f(X)
What was Y, what was the shape of f() .. linear for example
Multi-collinearity was my first thought as well - if you were using linear regression, it sounds likely that this would be a problem. Have you tested the significance without including the most recent Beyer in your X vector?
Last edited by gm10; 12-31-2010 at 09:47 PM.
|
|
|
12-31-2010, 10:06 PM
|
#104
|
Registered User
Join Date: Sep 2005
Location: Ringkoebing
Posts: 4,342
|
Quote:
Originally Posted by TrifectaMike
I made a bold statement and I should not have tried to impose on
Arkansasman. So, I've run a Logistic regression using the last four
Speed Ratings.
Here's an explanation of the independent variables (regressors)
Variable 1 Last Speed Rating
Variable 2 Second Speed Rating Back
Variable 3 Third Speed Rating Back
Variable 4 Fourth Speed Rating Back
How I setup the data:
In each case(last speed rating, etc) I determine the median rating
for each race. Then I determine how each horse's rating differs from the
median. This allows for determining the strength within the race and
also allows to use the data across all races.
Running iteration number 1
Running iteration number 2
Running iteration number 3
Running iteration number 4
Running iteration number 5
Running iteration number 6
Running iteration number 7
The process converged after 7 iterations
The software I use is home grown.
Descriptives.......
154 cases have Y = 1 1236 cases have Y = 0
-2 log likelihood = 967.9034 (Null Model)
-2 log likelihood = 887.8558 (Full Model)
Overall Model Fit...
Chi Square = 80.0476 df = 4 p = 0.0000
R Square = 0.0827
Akaike's Information Criterion = 897.8558
Bayesian Information Criterion = 895.9030
Coefficients and Standard Errors...
Variable Coefficient Standard Error prob
1 0.0788 0.0142 0.0000
2 0.0050 0.0102 0.6216
3 0.0428 0.0123 0.0005
4 0.0243 0.0112 0.0298
Intercept -2.1564
Odds Ratios and 95% Confidence Intervals...
Variable Odds Ratio Low High
1 1.0819 1.0522 1.1125
2 1.0050 0.9852 1.0253
3 1.0438 1.0189 1.0692
4 1.0246 1.0024 1.0474
input data record?
Let me show you the important numbers.
Variable 2 (Second race back rating)
prob = .6216
That is not good!!!!!
Coefficient 0.0050
That is not good!!!
Odds Ratio 1.0050
That is not good!!!!
Mike
|
Interesting ... can you test without X1 (most recent)?
|
|
|
12-31-2010, 11:51 PM
|
#105
|
Registered User
Join Date: Feb 2008
Posts: 1,591
|
Quote:
Originally Posted by gm10
Interesting ... can you test without X1 (most recent)?
|
Code:
Descriptives.......
154 cases have Y = 1 1236 cases have Y = 0
-2 log likelihood = 967.9034 (Null Model)
-2 log likelihood = 926.0421 (Full Model)
Overall Model Fit...
Chi Square = 41.8613 df = 3 p = 0.0000
R Square = 0.0432
Akaike's Information Criterion = 934.0421
Bayesian Information Criterion = 931.5873
Coefficients and Standard Errors...
Variable Coefficient Standard Error prob
1 0.0181 0.0104 0.0813
2 0.0493 0.0122 0.0001
3 0.0282 0.0110 0.0103
Intercept -2.0949
Odds Ratios and 95% Confidence Intervals...
Variable Odds Ratio Low High
1 1.0183 0.9977 1.0393
2 1.0506 1.0257 1.0760
3 1.0286 1.0067 1.0510
input data record?
|
|
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|