Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > General Handicapping Discussion


Reply
 
Thread Tools Rating: Thread Rating: 5 votes, 4.60 average.
Old 01-05-2011, 06:39 AM   #151
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by gm10
Yes I have but I did not keep track of all the (parametric and nonparametric) simulation results. I compared them with my main multinomial logit model at the time in terms of predicting winners, and ROI, and was disappointed. My logit model is better than now than it was then, hence my doubts.



Probit is more advanced and computationally much more complex than previous models. Benter uses it. I read somewhere that he improved his ROI with 5 to 10% higher after replacing his multinomial logit model with a probit model. There is a presentation about him mentioning the probit model here

http://wmedia.hkedcity.net/archive/0...ICCMpt04Ex.wmv

(very interesting).

I have not looked into the mathematics of the probit model for about 6 years so won't comment on them. In general however, how you use the information you get is just as important as the information you have. As far as I know, probit is the most advanced of the well-researched models.

Anyway - enough about probit. I use multinomial logit and as I said earlier, my simulation results on the basis of speed figures, underperformed. That is why I made the earlier post.

Having said all that, I've always had vague plans to build a grand simulation model that merges pace analysis with speed, form and class analysis. The logit model is not ideal for that.

Thanks for the info. I haven't had the chance to listen to Benter's presentation, but I will soon.

I REALLY like the idea of a simulation model.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 01-05-2011, 06:54 AM   #152
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Let me list our tools that we will be using:

1. Chi-Square statistic for significance testing of factors.

2. Z-score for weighting the significant factors.

However, before we run a single test, we want to do the following:

Segment our selection of races for testing by odds.

For instance, one data sample may include races where the winner was between 2-1 to 4-1, another sample may include races where the winner was between 5-1 to 10-1.

This will help us find on significant factors that the public under bets and over bets.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 01-05-2011, 08:05 AM   #153
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
And some advice for anyone looking into modeling using regression techniques.

Before one throws himself into the complex realm of logistic or probit regression, that he first learn about ordinary linear regression. And learn in a manner that advances the cause.

Using ordinary regression one can determine expected horses winning times and expected race winning times with an adjusted R2 of .95 which explains nearly all the variation in horse times.

And then use this information in your logistic model ( A very, very good start).

Here I will get you started with a list of factors to use (from actual experience):

Expected Horse Time Parameters
1, Number of Previous Races
2. Number of Previous Races Squared
3. Days Since Last Race
4. Days Since Last Race Squared
5. Age
6. Age squared
7. Number of Top Three Finishes
8. Number of Top Three Finishes Squared
9 Distance
10. Distance Squared
11. Earnings per Start
12 Categorical Variable (Maiden, Allowance, Claiming, Stake, Handicap)

Expected Race Time Parameters
1. Distance
2. Distance Squared
3. Purse Size
4. Categorical Variable (Maiden, Allowance, Claiming, Stake, Handicap)

The squared terms are an absolute must.

It allows the effect of the factor to be diminished at some level.

Gotta stop this now...I don't want to piss too many people off.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 01-05-2011, 08:33 AM   #154
m001001
Registered User
 
Join Date: Mar 2007
Posts: 28
logit vs probit

Logit model predicts only winning probabilities. You have to use conditional probability (e.g. Harville formula) or other methods to find prob for 2nd, 3rd, etc...

Probit model predicts probability for each and every permutation of finishing order. Therefore more accurate probabilities for exotics. But probit model is far far far more difficult to build and run.
m001001 is offline   Reply With Quote Reply
Old 01-05-2011, 01:21 PM   #155
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by m001001
Logit model predicts only winning probabilities. You have to use conditional probability (e.g. Harville formula) or other methods to find prob for 2nd, 3rd, etc...

Probit model predicts probability for each and every permutation of finishing order. Therefore more accurate probabilities for exotics. But probit model is far far far more difficult to build and run.
Wow, and I always thought they were similar....I guess viewing them from a purely mathematical sense is insufficient. Thanks, I'll explore this further.

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 01-05-2011, 01:51 PM   #156
SchagFactorToWin
Registered User
 
SchagFactorToWin's Avatar
 
Join Date: Jul 2009
Location: WNY
Posts: 444
Quote:
Originally Posted by TrifectaMike
Using ordinary regression one can determine expected horses winning times and expected race winning times with an adjusted R2 of .95 which explains nearly all the variation in horse times.

Here I will get you started with a list of factors to use (from actual experience):
Maybe I'm misunderstanding you. Are you saying you achieved a .95 by using those (or similar, or those plus more) parameters?
SchagFactorToWin is offline   Reply With Quote Reply
Old 01-05-2011, 02:07 PM   #157
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by SchagFactorToWin
Maybe I'm misunderstanding you. Are you saying you achieved a .95 by using those (or similar, or those plus more) parameters?
Don't fall off your chair just yet, because the expected performance depends on horse k 's expected time in race j as well as the expected winning time in race j.

Mike

Last edited by TrifectaMike; 01-05-2011 at 02:09 PM.
TrifectaMike is offline   Reply With Quote Reply
Old 01-05-2011, 02:21 PM   #158
SchagFactorToWin
Registered User
 
SchagFactorToWin's Avatar
 
Join Date: Jul 2009
Location: WNY
Posts: 444
Quote:
Originally Posted by TrifectaMike
Don't fall off your chair just yet, because the expected performance depends on horse k 's expected time in race j as well as the expected winning time in race j.

Mike
You didn't answer the question.
SchagFactorToWin is offline   Reply With Quote Reply
Old 01-05-2011, 02:30 PM   #159
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by SchagFactorToWin
You didn't answer the question.
Short answer, yes.
Mike
TrifectaMike is offline   Reply With Quote Reply
Old 01-05-2011, 09:04 PM   #160
gm10
Registered User
 
gm10's Avatar
 
Join Date: Sep 2005
Location: Ringkoebing
Posts: 4,342
Quote:
Originally Posted by TrifectaMike
And some advice for anyone looking into modeling using regression techniques.

Before one throws himself into the complex realm of logistic or probit regression, that he first learn about ordinary linear regression. And learn in a manner that advances the cause.

Using ordinary regression one can determine expected horses winning times and expected race winning times with an adjusted R2 of .95 which explains nearly all the variation in horse times.

And then use this information in your logistic model ( A very, very good start).

Here I will get you started with a list of factors to use (from actual experience):

Expected Horse Time Parameters
1, Number of Previous Races
2. Number of Previous Races Squared
3. Days Since Last Race
4. Days Since Last Race Squared
5. Age
6. Age squared
7. Number of Top Three Finishes
8. Number of Top Three Finishes Squared
9 Distance
10. Distance Squared
11. Earnings per Start
12 Categorical Variable (Maiden, Allowance, Claiming, Stake, Handicap)

Expected Race Time Parameters
1. Distance
2. Distance Squared
3. Purse Size
4. Categorical Variable (Maiden, Allowance, Claiming, Stake, Handicap)

The squared terms are an absolute must.

It allows the effect of the factor to be diminished at some level.

Gotta stop this now...I don't want to piss too many people off.

Mike
Very interesting comment on the squared terms.
gm10 is offline   Reply With Quote Reply
Old 01-05-2011, 11:43 PM   #161
Cratos
Registered User
 
Join Date: Jan 2004
Location: The Big Apple
Posts: 4,252
Quote:
Originally Posted by TrifectaMike
If there is sufficient interest, I will along with any PA member(s) will go through the process of generating a handicapping method based on the Chi Square statistic.

I (We) will do it in a such a manner which will be non-technical, easy to understand, using only basic arithmetic, and will allow for anyone to participate.

The end result I believe will be a profitable system.

We will need someone with a large database to provide data for the factors that we will test, and include in our model.

If there is interest I will proceed.

Let me know.

Mike
Mike,

I have read the posts on this thread from its inception and although I have found your thread premise to be enjoyable, I don’t regard it as “Using Chi Square Statistic to Produce a Handicapping Method.”

I see it as “Using Chi Square Statistic to Produce a Handicapping Analysis.” You might say that I am splitting hairs, but I am not. A handicapping method in my opinion should be a predictive method based on the inputs which determines the horse winning.

As you very well know, the Chi Square statistic is about the “goodness of fit,” comparing observed data with data we would expect to obtain according to a specific hypothesis.

Therefore if we develop a predictive method (and I have) we can use the Chi Square statistic to test how “good” a predictor that method is.

In my experiment to develop a predictive method I started with the question: “What makes a horse wins?” From that question I isolated two distinct variable groups that influence a horse winning. They are Factor-variables and Angle-variables. Factors-variables typically influence the winning or losing of the race regardless. Angle-variables might or might not influence the winning or losing of the race and might only occur once in a horse’s racing career.

Additionally, my effort led me to develop a logarithmic predictive curve which I wrote an equation for to allow me to integrate the parametric factor variables and use the angle-variables when applicable as additives.

For example pace is a factor variable. To win a race with all else being equal, a horse must be able to negotiate the pace. On the other hand, equipment change can be an angle-variable; a trainer might add blinkers to a horse and it run off and leaves the field in its wake all else being equal.
The list of factor-variables is very short when compared to the list of angle variable.

I hope you don’t take this as a criticism, but just another point of view and I am anxiously awaiting the conclusion to your method.
__________________
Independent thinking, emotional stability, and a keen understanding of both human and institutional behavior are vital to long-term investment success – My hero, Warren Edward Buffett

"Science is correct; even if you don't believe it" - Neil deGrasse Tyson

Last edited by Cratos; 01-05-2011 at 11:45 PM.
Cratos is offline   Reply With Quote Reply
Old 01-06-2011, 10:23 AM   #162
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by gm10
Very interesting comment on the squared terms.
Interesting how? Is it because it seems on the surface to be a contradiction?

Mike
TrifectaMike is offline   Reply With Quote Reply
Old 01-06-2011, 10:56 AM   #163
gm10
Registered User
 
gm10's Avatar
 
Join Date: Sep 2005
Location: Ringkoebing
Posts: 4,342
Quote:
Originally Posted by TrifectaMike
Interesting how? Is it because it seems on the surface to be a contradiction?

Mike
I meant more your comment that they are a must, and that they diminish the effect of the factor 'at some level'. How should I interpret this?

To your list, I would also add the position of the temporary rail in turf races.
gm10 is offline   Reply With Quote Reply
Old 01-06-2011, 12:26 PM   #164
TrifectaMike
Registered User
 
Join Date: Feb 2008
Posts: 1,591
Quote:
Originally Posted by gm10
I meant more your comment that they are a must, and that they diminish the effect of the factor 'at some level'. How should I interpret this?

To your list, I would also add the position of the temporary rail in turf races.
I'll explain with an example.

Let's take one independent variable, age. If the regression is based only on age, we know horse's get faster as they get older But when does this age factor become less and less important? Does a horse continue to be faster indefinitely?By adding a quadratic term it attenuates this factor.

How?

The linear term coefficient would have a negative value and the quadratic term would have a positive value. Now depending on the magnitude of the respective coefficients, there is an age where the effect on the dependent variable (expected time) no longer decreases with age, but instead increases with age.

Mike

Last edited by TrifectaMike; 01-06-2011 at 12:30 PM.
TrifectaMike is offline   Reply With Quote Reply
Old 01-10-2011, 08:19 PM   #165
gm10
Registered User
 
gm10's Avatar
 
Join Date: Sep 2005
Location: Ringkoebing
Posts: 4,342
Quote:
Originally Posted by TrifectaMike
I'll explain with an example.

Let's take one independent variable, age. If the regression is based only on age, we know horse's get faster as they get older But when does this age factor become less and less important? Does a horse continue to be faster indefinitely?By adding a quadratic term it attenuates this factor.

How?

The linear term coefficient would have a negative value and the quadratic term would have a positive value. Now depending on the magnitude of the respective coefficients, there is an age where the effect on the dependent variable (expected time) no longer decreases with age, but instead increases with age.

Mike

Would you do the same with 'total number of races so far' (as mentioned a few pages ago)?
gm10 is offline   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 07:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.