Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > General Handicapping Discussion


Reply
 
Thread Tools Rating: Thread Rating: 5 votes, 4.60 average.
Old 12-31-2010, 12:17 AM   #76
CBedo
AllAboutTheROE
 
Join Date: Aug 2006
Location: Denver
Posts: 2,411
Quote:
Originally Posted by TrifectaMike
Z-score = (p^ - p )/ sqrt(p*(1 - p) / n)

p^ = Observed frequency
p = Expected frequency
n = Number of cases in the sample

Z-score for - Days since last race 1 to 14

= (.1220 - .1054)/sqrt(.1054*(1-.1054)/900)
= 1.622

Therefore any horse that raced within 14 days would receive a weight of 1.622.

You can compute the Z-score for the remaining two categories.

Mike
Since (p^ - p ) can be negative(and hopefully the square root can't, lol), the Z scores can be negative as well, which means we could have negative weights? I think both the second and third categories would be negative (-1.38 & -0.25). I'll wait patiently to see how this all comes together, but this leads to the question of data preprocessing and "how to bin." In this instance, a horse that came back in 14 days gets a positive weight, but a horse who comes back one day later receives and equally negative factor weighting?
__________________
"No problem can withstand the assault of sustained thinking" -- Voltaire
CBedo is offline   Reply With Quote Reply
Old 12-31-2010, 05:30 AM   #77
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by CBedo
Since (p^ - p ) can be negative(and hopefully the square root can't, lol), the Z scores can be negative as well, which means we could have negative weights? I think both the second and third categories would be negative (-1.38 & -0.25). I'll wait patiently to see how this all comes together, but this leads to the question of data preprocessing and "how to bin." In this instance, a horse that came back in 14 days gets a positive weight, but a horse who comes back one day later receives and equally negative factor weighting?
You are correct. Data definitions are important. I personally like to take a "shotgun" approach. Shoot as many pellets as possible and observe the patterns. And then consider refinements, and then shoot again...and again if necessary...always letting the data speak for itself, even when it appears illogical.

I attempt to construct metrics using medians. This allows one to measure strength within a race and process data across races much easier.

For example, we all to some degree or another scan a horses speed ratings (beyers, etc) and naturally assume the last rating is more significant than the previous. In fact, it is standard practice for logistic regression people to use time related weights on their factors. It seems reasonable, but is it always correct?

Let me give you a specific example ( arkansasman try this out with your model). When looking at horses last four Beyers or Bris numbers, etc, you would think that the last is more significant than the previous and so on. The data does not support the premise.

I have found using numerous tests that the second race beyer (or any other similar rating) is insignificant and adds no usable information. The third is significant as well as the fourth, but NOT the second race back. Interesting I would say.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 12-31-2010, 05:45 AM   #78
garyoz
Registered User
 
Join Date: Nov 2003
Location: Ohio
Posts: 1,307
Quote:
Originally Posted by TrifectaMike
". How does one test for goodness of fit of a logistic regression?"

Mike
R-Squared statistic.
garyoz is offline   Reply With Quote Reply
Old 12-31-2010, 07:11 AM   #79
Capper Al
Registered User
 
Capper Al's Avatar
 
Join Date: Dec 2005
Location: MI
Posts: 6,330
Quote:
Originally Posted by TrifectaMike
I have found using numerous tests that the second race beyer (or any other similar rating) is insignificant and adds no usable information. The third is significant as well as the fourth, but NOT the second race back. Interesting I would say.

Mike
I'm not following this because if the second Speed figure back isn't significant in today's race, it will become the third Speed figure back in the next race. Then it has significances? It's the same figure. How can this be?
__________________


"The Law, in its majestic equality, forbids the rich, as well as the poor, to sleep under bridges, to beg in the streets, and to steal bread."

Anatole France


Capper Al is offline   Reply With Quote Reply
Old 12-31-2010, 07:50 AM   #80
Actor
Librocubicularist
 
Join Date: Jun 2010
Location: Ohio
Posts: 10,466
Quote:
Originally Posted by TrifectaMike
Example of Chi-Square Statistic and Z-Score

NOTE: This data is factious and used for demonstration purposes.

Let's us test a recency factor of days since the horse's last race.

Observed
Code:
					Win	Lose	Total	Win%
   Days since last race 1 to 14 		110	790	900	12.2
   Days since last race 15 to 30		 90	890	980	 9.2
   Days since last race over 30 		 65	570	635	10.2
   				Total	265
The sample size is 2,515 horses.
The Win% of sample is 265/2515 = 10.54%


Expected
Code:
					Win	Lose	Total	Win%
   Days since last race 1 to 14 		95	805	900	10.54
   Days since last race 15 to 30		103	877	980	10.54
   Days since last race over 30 		67	568	635	10.54
Chi-Square Value
Code:
 
         	= (110-95)**2/95 + (790-805)**2/805 + (90-103)**2/103 + 
         	  (890-877)**2/877 + (65-67)**2/67 + (570-568)**2/568
          
         	= 2.367 + .280 + 1.64 + .193 + .060 + .007
Chi-Square Value = 4.547
I've been going over this with my college statistics textbook by my side and, as best as I can tell, you do not seem to be calculating the Chi-Square Value correctly. From the text:
Code:
THEOREM:  If n1, n2, ..., nk and e1, e2, ..., ek are the observed and
expected frequencies, respectively, for the k possible outcomes of an
experiment that is performed n times, then as n becomes infinite the
distribution of the quantity

 k
Σ     (ni - ei)²/ei
 i=1

will approach that of a Chi-Squared variable with k-1 degrees of freedom.
I.e, the results are to be summed over all possible outcomes. There are three possible outcomes here: a horse from one of the three categories wins. There is no other possibility. There should be three terms in the summation, viz., the first, third, and fifth terms. The second, fourth, and sixth terms are double counts. The correct value is 4.18, which is less than the critical value of 4.605.

These double counts are not present in your die casting example.

You should look up the definition of "factious." I think you mean "fictitious" but I could be wrong.
__________________
Sapere aude

Last edited by Actor; 12-31-2010 at 07:52 AM.
Actor is offline   Reply With Quote Reply
Old 12-31-2010, 08:09 AM   #81
Actor
Librocubicularist
 
Join Date: Jun 2010
Location: Ohio
Posts: 10,466
Quote:
Originally Posted by TrifectaMike
Z-score = (p^ - p )/ sqrt(p*(1 - p) / n)
This is William L. Quirin's "standard normal" test.

For the first category I compute the standard normal value to be 1.59 Quirin says that the value needs to lie outside -2.5 to +2.5, indicating that this category (which has the biggest impact value) is not significant.
__________________
Sapere aude
Actor is offline   Reply With Quote Reply
Old 12-31-2010, 08:26 AM   #82
sjk
Registered User
 
Join Date: Feb 2003
Posts: 2,105
Quote:
Originally Posted by TrifectaMike

Let me give you a specific example ( arkansasman try this out with your model). When looking at horses last four Beyers or Bris numbers, etc, you would think that the last is more significant than the previous and so on. The data does not support the premise.

I have found using numerous tests that the second race beyer (or any other similar rating) is insignificant and adds no usable information. The third is significant as well as the fourth, but NOT the second race back. Interesting I would say.

Mike
My reading of the data disagrees with your assertion here. In my experience the more recent races should be weighted most heavily with the second last race having something like a 25% weighting.
sjk is offline   Reply With Quote Reply
Old 12-31-2010, 10:24 AM   #83
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by Actor
I've been going over this with my college statistics textbook by my side and, as best as I can tell, you do not seem to be calculating the Chi-Square Value correctly. From the text:
Code:
 THEOREM:  If n1, n2, ..., nk and e1, e2, ..., ek are the observed and
 expected frequencies, respectively, for the k possible outcomes of an
 experiment that is performed n times, then as n becomes infinite the
 distribution of the quantity
 
  k
 Σ     (ni - ei)²/ei
  i=1
 
 will approach that of a Chi-Squared variable with k-1 degrees of freedom.
I.e, the results are to be summed over all possible outcomes. There are three possible outcomes here: a horse from one of the three categories wins. There is no other possibility. There should be three terms in the summation, viz., the first, third, and fifth terms. The second, fourth, and sixth terms are double counts. The correct value is 4.18, which is less than the critical value of 4.605.

These double counts are not present in your die casting example.

You should look up the definition of "factious." I think you mean "fictitious" but I could be wrong.
Read another book or read that one more carefully.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 12-31-2010, 10:26 AM   #84
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by Actor
This is William L. Quirin's "standard normal" test.

For the first category I compute the standard normal value to be 1.59 Quirin says that the value needs to lie outside -2.5 to +2.5, indicating that this category (which has the biggest impact value) is not significant.
Where do you guys come up with this stuff? Quirin's standard normal?

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 12-31-2010, 10:30 AM   #85
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by sjk
My reading of the data disagrees with your assertion here. In my experience the more recent races should be weighted most heavily with the second last race having something like a 25% weighting.
Sjk, I know this is off topic, but let's have someone, possibly Arkansasman run a test.

Run a logistic regression using the last four beyers as regressors....and share the results.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 12-31-2010, 10:34 AM   #86
Capper Al
Registered User
 
Capper Al's Avatar
 
Join Date: Dec 2005
Location: MI
Posts: 6,330
I'm going to be forced to re-read The Mathematics of Horse Racing by David B. Fogel, not a bad idea just need the time. Fogel covers the 'How to' topic in six pages with references to charts in the appendices. Then he walks one through examples in twenty-one pages. His examples are for beaten lengths, days between races, drop in class, shippers, and change in distance. The rest of the book covers making your own system and the possibility of making a living at the track. In other words, Fogel gives one the 'How to' with many examples in 27 pages. Let's move on.
__________________


"The Law, in its majestic equality, forbids the rich, as well as the poor, to sleep under bridges, to beg in the streets, and to steal bread."

Anatole France


Capper Al is offline   Reply With Quote Reply
Old 12-31-2010, 10:38 AM   #87
sjk
Registered User
 
Join Date: Feb 2003
Posts: 2,105
Quote:
Originally Posted by TrifectaMike
Sjk, I know this is off topic, but let's have someone, possibly Arkansasman run a test.

Run a logistic regression using the last four beyers as regressors....and share the results.

Mike

I would be interested in the findings and how they were arrived at.

SK
sjk is offline   Reply With Quote Reply
Old 12-31-2010, 10:44 AM   #88
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by garyoz
R-Squared statistic.
And which R-Squared statistic would that be? Mine, yours, Daves, GT's, Benter's, Loie's, Peter's or Saint Paul's? You tell me, because if you compute one, I can compute another.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 12-31-2010, 11:01 AM   #89
teddy
Registered User
 
teddy's Avatar
 
Join Date: Jan 2009
Posts: 1,516
In the end are we trying to find out what factors grouped together will produce a profitable return? The statistics are interesting but what are we going to be left with. Impact values are everywhere...
teddy is offline   Reply With Quote Reply
Old 12-31-2010, 11:04 AM   #90
Greyfox
Registered User
 
Join Date: Jan 2007
Posts: 18,962
Quote:
Originally Posted by TrifectaMike
For example, we all to some degree or another scan a horses speed ratings (beyers, etc) and naturally assume the last rating is more significant than the previous. In fact, it is standard practice for logistic regression people to use time related weights on their factors. It seems reasonable, but is it always correct?

Mike

I like the path that you are taking Trifecta Mike. The concept behind the approach is more important at this point than whether or not your particular figures are being done accurately. (Obviously down the road figure accuracy would be important.)

I also like the questions that you are asking.
For instance, with respect to "time related weights" they do not necessarily work that well in turf races. One has to look at the total picture on grass.
Greyfox is offline   Reply With Quote Reply
Reply




Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 04:30 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.