Using Chi Square Statistic To Produce A Handicapping Method - Page 11 - Horse Racing Forum - PaceAdvantage.Com

TrifectaMike · 01-05-2011, 06:39 AM

Quote:

Originally Posted by gm10

Yes I have but I did not keep track of all the (parametric and nonparametric) simulation results. I compared them with my main multinomial logit model at the time in terms of predicting winners, and ROI, and was disappointed. My logit model is better than now than it was then, hence my doubts.

Probit is more advanced and computationally much more complex than previous models. Benter uses it. I read somewhere that he improved his ROI with 5 to 10% higher after replacing his multinomial logit model with a probit model. There is a presentation about him mentioning the probit model here

http://wmedia.hkedcity.net/archive/0...ICCMpt04Ex.wmv

(very interesting).

I have not looked into the mathematics of the probit model for about 6 years so won't comment on them. In general however, how you use the information you get is just as important as the information you have. As far as I know, probit is the most advanced of the well-researched models.

Anyway - enough about probit. I use multinomial logit and as I said earlier, my simulation results on the basis of speed figures, underperformed. That is why I made the earlier post.

Having said all that, I've always had vague plans to build a grand simulation model that merges pace analysis with speed, form and class analysis. The logit model is not ideal for that.

Thanks for the info. I haven't had the chance to listen to Benter's presentation, but I will soon.

I REALLY like the idea of a simulation model.

Mike

TrifectaMike · 01-05-2011, 06:54 AM

Let me list our tools that we will be using:

1. Chi-Square statistic for significance testing of factors.

2. Z-score for weighting the significant factors.

However, before we run a single test, we want to do the following:

Segment our selection of races for testing by odds.

For instance, one data sample may include races where the winner was between 2-1 to 4-1, another sample may include races where the winner was between 5-1 to 10-1.

This will help us find on significant factors that the public under bets and over bets.

Mike

TrifectaMike · 01-05-2011, 08:05 AM

And some advice for anyone looking into modeling using regression techniques.

Before one throws himself into the complex realm of logistic or probit regression, that he first learn about ordinary linear regression. And learn in a manner that advances the cause.

Using ordinary regression one can determine expected horses winning times and expected race winning times with an adjusted R2 of .95 which explains nearly all the variation in horse times.

And then use this information in your logistic model ( A very, very good start).

Here I will get you started with a list of factors to use (from actual experience):

Expected Horse Time Parameters
1, Number of Previous Races
2. Number of Previous Races Squared
3. Days Since Last Race
4. Days Since Last Race Squared
5. Age
6. Age squared
7. Number of Top Three Finishes
8. Number of Top Three Finishes Squared
9 Distance
10. Distance Squared
11. Earnings per Start
12 Categorical Variable (Maiden, Allowance, Claiming, Stake, Handicap)

Expected Race Time Parameters
1. Distance
2. Distance Squared
3. Purse Size
4. Categorical Variable (Maiden, Allowance, Claiming, Stake, Handicap)

The squared terms are an absolute must.

It allows the effect of the factor to be diminished at some level.

Gotta stop this now...I don't want to piss too many people off.

Mike

m001001 · 01-05-2011, 08:33 AM

Logit model predicts only winning probabilities. You have to use conditional probability (e.g. Harville formula) or other methods to find prob for 2nd, 3rd, etc...

Probit model predicts probability for each and every permutation of finishing order. Therefore more accurate probabilities for exotics. But probit model is far far far more difficult to build and run.

TrifectaMike · 01-05-2011, 01:21 PM

Quote:

Originally Posted by m001001

Logit model predicts only winning probabilities. You have to use conditional probability (e.g. Harville formula) or other methods to find prob for 2nd, 3rd, etc...

Probit model predicts probability for each and every permutation of finishing order. Therefore more accurate probabilities for exotics. But probit model is far far far more difficult to build and run.

Wow, and I always thought they were similar....I guess viewing them from a purely mathematical sense is insufficient. Thanks, I'll explore this further.

Mike

SchagFactorToWin · 01-05-2011, 01:51 PM

Quote:

Originally Posted by TrifectaMike

Using ordinary regression one can determine expected horses winning times and expected race winning times with an adjusted R2 of .95 which explains nearly all the variation in horse times.

Here I will get you started with a list of factors to use (from actual experience):

Maybe I'm misunderstanding you. Are you saying you achieved a .95 by using those (or similar, or those plus more) parameters?

TrifectaMike · 01-05-2011, 02:07 PM

Quote:

Originally Posted by SchagFactorToWin

Maybe I'm misunderstanding you. Are you saying you achieved a .95 by using those (or similar, or those plus more) parameters?

Don't fall off your chair just yet, because the expected performance depends on horse k 's expected time in race j as well as the expected winning time in race j.

Mike

SchagFactorToWin · 01-05-2011, 02:21 PM

Quote:

Originally Posted by TrifectaMike

Don't fall off your chair just yet, because the expected performance depends on horse k 's expected time in race j as well as the expected winning time in race j.

Mike

You didn't answer the question.

TrifectaMike · 01-05-2011, 02:30 PM

Quote:

Originally Posted by SchagFactorToWin

You didn't answer the question.

Short answer, yes.
Mike

gm10 · 01-05-2011, 09:04 PM

Quote:

Originally Posted by TrifectaMike

And some advice for anyone looking into modeling using regression techniques.

Before one throws himself into the complex realm of logistic or probit regression, that he first learn about ordinary linear regression. And learn in a manner that advances the cause.

Using ordinary regression one can determine expected horses winning times and expected race winning times with an adjusted R2 of .95 which explains nearly all the variation in horse times.

And then use this information in your logistic model ( A very, very good start).

Here I will get you started with a list of factors to use (from actual experience):

Expected Horse Time Parameters
1, Number of Previous Races
2. Number of Previous Races Squared
3. Days Since Last Race
4. Days Since Last Race Squared
5. Age
6. Age squared
7. Number of Top Three Finishes
8. Number of Top Three Finishes Squared
9 Distance
10. Distance Squared
11. Earnings per Start
12 Categorical Variable (Maiden, Allowance, Claiming, Stake, Handicap)

Expected Race Time Parameters
1. Distance
2. Distance Squared
3. Purse Size
4. Categorical Variable (Maiden, Allowance, Claiming, Stake, Handicap)

The squared terms are an absolute must.

It allows the effect of the factor to be diminished at some level.

Gotta stop this now...I don't want to piss too many people off.

Mike

Very interesting comment on the squared terms.

Cratos · 01-05-2011, 11:43 PM

Quote:

Originally Posted by TrifectaMike

If there is sufficient interest, I will along with any PA member(s) will go through the process of generating a handicapping method based on the Chi Square statistic.

I (We) will do it in a such a manner which will be non-technical, easy to understand, using only basic arithmetic, and will allow for anyone to participate.

The end result I believe will be a profitable system.

We will need someone with a large database to provide data for the factors that we will test, and include in our model.

If there is interest I will proceed.

Let me know.

Mike

Mike,

I have read the posts on this thread from its inception and although I have found your thread premise to be enjoyable, I don’t regard it as “Using Chi Square Statistic to Produce a Handicapping Method.”

I see it as “Using Chi Square Statistic to Produce a Handicapping Analysis.” You might say that I am splitting hairs, but I am not. A handicapping method in my opinion should be a predictive method based on the inputs which determines the horse winning.

As you very well know, the Chi Square statistic is about the “goodness of fit,” comparing observed data with data we would expect to obtain according to a specific hypothesis.

Therefore if we develop a predictive method (and I have) we can use the Chi Square statistic to test how “good” a predictor that method is.

In my experiment to develop a predictive method I started with the question: “What makes a horse wins?” From that question I isolated two distinct variable groups that influence a horse winning. They are Factor-variables and Angle-variables. Factors-variables typically influence the winning or losing of the race regardless. Angle-variables might or might not influence the winning or losing of the race and might only occur once in a horse’s racing career.

Additionally, my effort led me to develop a logarithmic predictive curve which I wrote an equation for to allow me to integrate the parametric factor variables and use the angle-variables when applicable as additives.

For example pace is a factor variable. To win a race with all else being equal, a horse must be able to negotiate the pace. On the other hand, equipment change can be an angle-variable; a trainer might add blinkers to a horse and it run off and leaves the field in its wake all else being equal.
The list of factor-variables is very short when compared to the list of angle variable.

I hope you don’t take this as a criticism, but just another point of view and I am anxiously awaiting the conclusion to your method.

TrifectaMike · 01-06-2011, 10:23 AM

Quote:

Originally Posted by gm10

Very interesting comment on the squared terms.

Interesting how? Is it because it seems on the surface to be a contradiction?

Mike

gm10 · 01-06-2011, 10:56 AM

Quote:

Originally Posted by TrifectaMike

Interesting how? Is it because it seems on the surface to be a contradiction?

Mike

I meant more your comment that they are a must, and that they diminish the effect of the factor 'at some level'. How should I interpret this?

To your list, I would also add the position of the temporary rail in turf races.

TrifectaMike · 01-06-2011, 12:26 PM

Quote:

Originally Posted by gm10

I meant more your comment that they are a must, and that they diminish the effect of the factor 'at some level'. How should I interpret this?

To your list, I would also add the position of the temporary rail in turf races.

I'll explain with an example.

Let's take one independent variable, age. If the regression is based only on age, we know horse's get faster as they get older But when does this age factor become less and less important? Does a horse continue to be faster indefinitely?By adding a quadratic term it attenuates this factor.

How?

The linear term coefficient would have a negative value and the quadratic term would have a positive value. Now depending on the magnitude of the respective coefficients, there is an age where the effect on the dependent variable (expected time) no longer decreases with age, but instead increases with age.

Mike

gm10 · 01-10-2011, 08:19 PM

Quote:

Originally Posted by TrifectaMike

I'll explain with an example.

Let's take one independent variable, age. If the regression is based only on age, we know horse's get faster as they get older But when does this age factor become less and less important? Does a horse continue to be faster indefinitely?By adding a quadratic term it attenuates this factor.

How?

The linear term coefficient would have a negative value and the quadratic term would have a positive value. Now depending on the magnitude of the respective coefficients, there is an age where the effect on the dependent variable (expected time) no longer decreases with age, but instead increases with age.

Mike

Would you do the same with 'total number of races so far' (as mentioned a few pages ago)?