Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > General Handicapping Discussion


Reply
 
Thread Tools Rating: Thread Rating: 5 votes, 4.60 average.
Old 12-31-2010, 11:04 AM   #91
Actor
Librocubicularist
 
Join Date: Jun 2010
Location: Ohio
Posts: 10,466
Quote:
Originally Posted by TrifectaMike
Where do you guys come up with this stuff? Quirin's standard normal?

Mike
Winning at the Races: Computer Discoveries in Thoroughbred Handicapping By William L. Quirin, Ph.D. 1979, page 297.
__________________
Sapere aude
Actor is offline   Reply With Quote Reply
Old 12-31-2010, 11:41 AM   #92
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
I'm not continuing with this thread. If I've insulted anyone, I apologize.

I tend to say too much.

Happy New Year to all.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 12-31-2010, 11:54 AM   #93
Dave Schwartz
 
Dave Schwartz's Avatar
 
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,912
Personally, I am getting a lot out of this thread. I'd hate to see it discontinued.

May I make a suggestion?

Mike is a guy that offered to lead a class. Please, let him do just that. If you wish to take issue with his approach, please do it AFTER he has finished.

At that point you can tell him why YOUR way is better.


Happy New Year to all.

Dave Schwartz
Dave Schwartz is offline   Reply With Quote Reply
Old 12-31-2010, 12:19 PM   #94
PaceAdvantage
PA Steward
 
PaceAdvantage's Avatar
 
Join Date: Mar 2001
Location: Del Boca Vista
Posts: 88,632
Quote:
Originally Posted by TrifectaMike
I'm not continuing with this thread. If I've insulted anyone, I apologize.

I tend to say too much.

Happy New Year to all.

Mike
What did I miss? I've been watching this thread like a hawk making sure it doesn't get sidetracked....tell me how to fix it so that it may continue.
__________________
@paceadvantage | Support the site and become a today!
PaceAdvantage is offline   Reply With Quote Reply
Old 12-31-2010, 12:21 PM   #95
CBedo
AllAboutTheROE
 
Join Date: Aug 2006
Location: Denver
Posts: 2,411
Quote:
Originally Posted by Dave Schwartz
Personally, I am getting a lot out of this thread. I'd hate to see it discontinued.

May I make a suggestion?

Mike is a guy that offered to lead a class. Please, let him do just that. If you wish to take issue with his approach, please do it AFTER he has finished.

At that point you can tell him why YOUR way is better.


Happy New Year to all.

Dave Schwartz
I agree with Dave. I hope Mike will continue. Why can't everyone not get hung up on the factors (for now) and focus on the development of the methodology?
__________________
"No problem can withstand the assault of sustained thinking" -- Voltaire
CBedo is offline   Reply With Quote Reply
Old 12-31-2010, 12:30 PM   #96
raybo
EXCEL with SUPERFECTAS
 
raybo's Avatar
 
Join Date: Mar 2004
Posts: 10,206
I, too, would like to see Mike continue with his teachings. Although I'm not a "statistics" guy, I'm sure there will be things to learn here.

Maybe if we just went along with Mike's theories, withholding personal beliefs for the time being, there will come a point when the value/viability of the Chi Square statistic will become visible.
__________________
Ray
Horseracing's like the stock market except you don't have to wait as long to go broke.

Excel Spreadsheet Handicapping Forum

Charter Member: Horseplayers Association of North America
raybo is offline   Reply With Quote Reply
Old 12-31-2010, 12:32 PM   #97
CBedo
AllAboutTheROE
 
Join Date: Aug 2006
Location: Denver
Posts: 2,411
Quote:
Originally Posted by Actor
I've been going over this with my college statistics textbook by my side and, as best as I can tell, you do not seem to be calculating the Chi-Square Value correctly.
I think you're misreading your stats book (or the book is bad, lol). If you don't think Mike is calculating it correctly, try putting the data into a statistical package or even excel and doing the calculation.

From R:

Code:
> recency
      Wins Losses
1-14   110    790
15-30   90    890
31+     65    570
> chisq.test(recency)

	Pearson's Chi-squared test

data:  recency 
X-squared = 4.6765, df = 2, p-value = 0.0965
Looks dead on in his analysis to me (numbers slightly different due to rounding probably).
__________________
"No problem can withstand the assault of sustained thinking" -- Voltaire
CBedo is offline   Reply With Quote Reply
Old 12-31-2010, 12:36 PM   #98
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
I made a bold statement and I should not have tried to impose on
Arkansasman. So, I've run a Logistic regression using the last four
Speed Ratings.

Here's an explanation of the independent variables (regressors)

Variable 1 Last Speed Rating
Variable 2 Second Speed Rating Back
Variable 3 Third Speed Rating Back
Variable 4 Fourth Speed Rating Back

How I setup the data:
In each case(last speed rating, etc) I determine the median rating
for each race. Then I determine how each horse's rating differs from the
median. This allows for determining the strength within the race and
also allows to use the data across all races.

Running iteration number 1
Running iteration number 2
Running iteration number 3
Running iteration number 4
Running iteration number 5
Running iteration number 6
Running iteration number 7

The process converged after 7 iterations

The software I use is home grown.


Descriptives.......
154 cases have Y = 1 1236 cases have Y = 0

-2 log likelihood = 967.9034 (Null Model)
-2 log likelihood = 887.8558 (Full Model)

Overall Model Fit...
Chi Square = 80.0476 df = 4 p = 0.0000
R Square = 0.0827

Akaike's Information Criterion = 897.8558
Bayesian Information Criterion = 895.9030

Coefficients and Standard Errors...
Variable Coefficient Standard Error prob
1 0.0788 0.0142 0.0000
2 0.0050 0.0102 0.6216
3 0.0428 0.0123 0.0005
4 0.0243 0.0112 0.0298
Intercept -2.1564

Odds Ratios and 95% Confidence Intervals...
Variable Odds Ratio Low High
1 1.0819 1.0522 1.1125
2 1.0050 0.9852 1.0253
3 1.0438 1.0189 1.0692
4 1.0246 1.0024 1.0474
input data record?

Let me show you the important numbers.

Variable 2 (Second race back rating)
prob = .6216
That is not good!!!!!

Coefficient 0.0050
That is not good!!!

Odds Ratio 1.0050
That is not good!!!!

Mike

Last edited by TrifectaMike; 12-31-2010 at 12:43 PM.
TrifectaMike is offline   Reply With Quote Reply
Old 12-31-2010, 01:49 PM   #99
garyoz
Registered User
 
Join Date: Nov 2003
Location: Ohio
Posts: 1,307
I don't want to sound negative or discouraging and I hope this is helpful.

You can't run such highly correlated variables as a multiple regression model. Check out a correlation matrix for the variables--if you have values higher than .7 or so you run into multi-colinearity which leads to an unstable model. Instability is tied to the correlation between error terms (Regression assumes a normal distriubtion for error terms--which the correlation violates). Also if you are running a stepwise regression, the first variable "sucks up" all the variance for explanation and doesn't leave enough for the following variables to be associated with. (not a very technical explanation)

You can run them individually as bivariate regressions (one regressor and the logit probability density function--just an S-shaped curve--as the dependent variable) and compare each of those models. So you can run, a model for speed figure one race back, then one for two races back etc. and compare them.

See which one gives you the best fit. That's assuming that is what you want to do. I think what you are trying to do is use the probability of winning as the dependent variable and speed figure as the independent variable.
garyoz is offline   Reply With Quote Reply
Old 12-31-2010, 02:14 PM   #100
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by garyoz
I don't want to sound negative or discouraging and I hope this is helpful.

You can't run such highly correlated variables as a multiple regression model. Check out a correlation matrix for the variables--if you have values higher than .7 or so you run into multi-colinearity which leads to an unstable model. Instability is tied to the correlation between error terms (Regression assumes a normal distriubtion for error terms--which the correlation violates). Also if you are running a stepwise regression, the first variable "sucks up" all the variance for explanation and doesn't leave enough for the following variables to be associated with. (not a very technical explanation)

You can run them individually as bivariate regressions (one regressor and the logit probability density function--just an S-shaped curve--as the dependent variable) and compare each of those models. So you can run, a model for speed figure one race back, then one for two races back etc. and compare them.

See which one gives you the best fit. That's assuming that is what you want to do. I think what you are trying to do is use the probability of winning as the dependent variable and speed figure as the independent variable.
You're not telling me anything I already don't know. Without getting too technical, it DIDN'T reduce the significance of the 3rd and 4th variable. I stand by these results.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 12-31-2010, 04:24 PM   #101
Capper Al
Registered User
 
Capper Al's Avatar
 
Join Date: Dec 2005
Location: MI
Posts: 6,330
Go for it TrifectaMike. I'm interested in your take on handicapping using the stats. Let's see how your system works. Thanks in advance for doing this.
__________________


"The Law, in its majestic equality, forbids the rich, as well as the poor, to sleep under bridges, to beg in the streets, and to steal bread."

Anatole France


Capper Al is offline   Reply With Quote Reply
Old 12-31-2010, 08:20 PM   #102
Native Texan III
Registered User
 
Native Texan III's Avatar
 
Join Date: Jun 2010
Location: El Paso
Posts: 466
Quote:
Originally Posted by TrifectaMike
It's my understanding that estimation is based on Maximum likelihood...finding those coefficients that have the greatest likelihood of producing the observed data. In practice, I would assume that means maximizing the log likelihood function (the objective function).

Hey, that is just my understanding. I could be wrong.

But I ask once again,

"Aside from how one would determine the independent variables, I have another question. How does one test for goodness of fit of a logistic regression?"

Mike
That is what I posted.
Native Texan III is offline   Reply With Quote Reply
Old 12-31-2010, 09:42 PM   #103
gm10
Registered User
 
gm10's Avatar
 
Join Date: Sep 2005
Location: Ringkoebing
Posts: 4,342
Quote:
Originally Posted by TrifectaMike

Let me give you a specific example ( arkansasman try this out with your model). When looking at horses last four Beyers or Bris numbers, etc, you would think that the last is more significant than the previous and so on. The data does not support the premise.

I have found using numerous tests that the second race beyer (or any other similar rating) is insignificant and adds no usable information. The third is significant as well as the fourth, but NOT the second race back. Interesting I would say.

Mike
How did you test that? Y ~ f(X)
What was Y, what was the shape of f() .. linear for example

Multi-collinearity was my first thought as well - if you were using linear regression, it sounds likely that this would be a problem. Have you tested the significance without including the most recent Beyer in your X vector?

Last edited by gm10; 12-31-2010 at 09:47 PM.
gm10 is offline   Reply With Quote Reply
Old 12-31-2010, 10:06 PM   #104
gm10
Registered User
 
gm10's Avatar
 
Join Date: Sep 2005
Location: Ringkoebing
Posts: 4,342
Quote:
Originally Posted by TrifectaMike
I made a bold statement and I should not have tried to impose on
Arkansasman. So, I've run a Logistic regression using the last four
Speed Ratings.

Here's an explanation of the independent variables (regressors)

Variable 1 Last Speed Rating
Variable 2 Second Speed Rating Back
Variable 3 Third Speed Rating Back
Variable 4 Fourth Speed Rating Back

How I setup the data:
In each case(last speed rating, etc) I determine the median rating
for each race. Then I determine how each horse's rating differs from the
median. This allows for determining the strength within the race and
also allows to use the data across all races.

Running iteration number 1
Running iteration number 2
Running iteration number 3
Running iteration number 4
Running iteration number 5
Running iteration number 6
Running iteration number 7

The process converged after 7 iterations

The software I use is home grown.


Descriptives.......
154 cases have Y = 1 1236 cases have Y = 0

-2 log likelihood = 967.9034 (Null Model)
-2 log likelihood = 887.8558 (Full Model)

Overall Model Fit...
Chi Square = 80.0476 df = 4 p = 0.0000
R Square = 0.0827

Akaike's Information Criterion = 897.8558
Bayesian Information Criterion = 895.9030

Coefficients and Standard Errors...
Variable Coefficient Standard Error prob
1 0.0788 0.0142 0.0000
2 0.0050 0.0102 0.6216
3 0.0428 0.0123 0.0005
4 0.0243 0.0112 0.0298
Intercept -2.1564

Odds Ratios and 95% Confidence Intervals...
Variable Odds Ratio Low High
1 1.0819 1.0522 1.1125
2 1.0050 0.9852 1.0253
3 1.0438 1.0189 1.0692
4 1.0246 1.0024 1.0474
input data record?

Let me show you the important numbers.

Variable 2 (Second race back rating)
prob = .6216
That is not good!!!!!

Coefficient 0.0050
That is not good!!!

Odds Ratio 1.0050
That is not good!!!!

Mike
Interesting ... can you test without X1 (most recent)?
gm10 is offline   Reply With Quote Reply
Old 12-31-2010, 11:51 PM   #105
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by gm10
Interesting ... can you test without X1 (most recent)?
Code:
  Descriptives.......
  154 cases have Y = 1  1236 cases have Y = 0
  
  -2 log likelihood = 967.9034	(Null Model)
  -2 log likelihood = 926.0421	(Full Model)
  
  Overall Model Fit...
  Chi Square = 41.8613	df = 3  p = 0.0000
  R Square = 0.0432
  
  Akaike's Information Criterion = 934.0421
  Bayesian Information Criterion = 931.5873
  
  Coefficients and Standard Errors...
    Variable	 Coefficient  Standard Error	   prob
  		 1		  0.0181		  0.0104	 0.0813
  		 2		  0.0493		  0.0122	 0.0001
  		 3		  0.0282		  0.0110	 0.0103
  Intercept	   -2.0949
  
  Odds Ratios and 95% Confidence Intervals...
    Variable	  Odds Ratio			 Low	   High
  		 1		  1.0183		  0.9977	 1.0393
  		 2		  1.0506		  1.0257	 1.0760
  		 3		  1.0286		  1.0067	 1.0510
  input data record?
TrifectaMike is offline   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 02:29 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.