Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board


Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board (http://www.paceadvantage.com/forum/index.php)
-   Handicapping Software (http://www.paceadvantage.com/forum/forumdisplay.php?f=3)
-   -   peter wagner, benter and multinomial logit (http://www.paceadvantage.com/forum/showthread.php?t=10365)

arkansasman 02-21-2004 02:18 PM

peter wagner, benter and multinomial logit
 
Does anyone know anything about the North Dakota high roller - Peter Wagner? Does Bill Benter and Peter Wagner both use Multinomial Logit to predict horse race probabilities. If someone is well schooled in Multinomial Logit, please comment. I have a model that has 47 factors but I am stuck getting the probabilities.
I am very interested in hearing how some of you have arrived at probabilities for a model.

GameTheory 02-21-2004 02:24 PM

Logit works okay, as do a number of other methods to make probabilities. The effectiveness of your factors themselves, and how you relate them to the public odds/probabilities are far more important questions.

What are you doing now, and in what way are you stuck?

arkansasman 02-21-2004 02:29 PM

Well, I have the coefficients via maximum likelihood. Is my next step to take the exponential of the sum of the all the factors times their coefficients and then divide each horse by the sum of all the horses exponentials

GameTheory 02-21-2004 03:12 PM

Maximum Likelihood is a general term. Logistic regression is one form of maximum likelihood estimation that uses a dichotomous dependent variable (which means either 1 or 0 -- usually won or didn't win in horseracing). Just look up "logistic regression" in Google and you'll find plenty to read. Of course you'll need some software to do the calculations...

arkansasman 02-22-2004 06:45 AM

Game, thanks for the reply.

I have been wrong many times, but if everything I read about multinomial logit, the dependant variable should not be binary. The reason that Chapman, Benter(if he uses multinomial logit), and others use multinomial logit from what I have read is - that it accounts for the in race competition - how did horse A compete against all the others in the race.

I think that once you have the coefficients for each factor, you then have a linear sum of a horse's attributes which is the vector.
If I am right, (which I might not be) you then arrive at the probabilities by getting the exponential of the vector divided by the sum of the exponential of all horse's vectors in the race.

Thanks for you help Game.

garyoz 02-22-2004 11:31 AM

Logistic regression or probit analyis are the proper ways to express the dependent variable (an S-shaped probability density function). However, you still have the problem of multicolinearity in most horse racing variables. Predictor variables are highly correlated (eg. back class and back speed, or speed and class). Multicolinearity violates the assumption of randomness in the error terms that is required by regression analysis. Thus, I am skeptical that regression analysis could work well using standard canned regression programs (e.g., SPSS). I do not know the multinominal logit methodology, but it likely solves this problem. The multicolinearity problem (plus other statistical issues) is why I am skeptical about a program such as Allways claim that it does multiple regression analysis.

Derek2U 02-22-2004 04:51 PM

that pesky multi-coll****
 
yesterday i typed that this problem, which is VERY real, can be
corrected for. I said "IVs are very susceptible 2 it, but if you
have the raw data, you can estimate this error VERY well." --
of corse, i was being attacked then so ONCE AGAIN, no comment.
But there are OTHER worse probs w/ regression than MC.
in fact, MC is the lesser prob by far. Now, what do YOU THINK
is the REAL issue w/ all regression formulas (in HorseRacing)????

Jeff P 02-22-2004 06:19 PM

My opinion is that horse racing data doen't conform very well to use with regression formulas for two reasons:

Complexity
Horse racing data is by nature very complex data. Many single factors are often intricately co-related with what, on the surface, appear to be other single factors. For example- Try running some regression tests on speed figures. Now do the same thing with class. Higher class horses tend to have higher speed ratings. Before evaluating the true effects of class, wouldn't you first have to estimate and remove the effect that speed figures have on class? That's of course assuming you can find an effective way to measure its performance in the first place.

Posted by Rick yesterday in another thread:

Quote:

Well, here's the thing. To a large extent, speed=class, class=pace, and pace=speed, so when people say the most important factors are A, B, and C it really doesn't mean all that much. What really matters is finding relatively independent measures of performance.
I tend to think Rick's above statement makes a lot of sense.


Noise
Secondly, horse racing data tends to be noisy. What seems to work well during one time period often falls flat on its face when tested during a different time period.

For example, I have recently been trying to develop a play type that finds lots of plays with a very high win percentage and essentially a breakeven ROI. I'm trying to do this to be in a better position to take full advantage of rebates being offered. Okay- back to my point. I did some testing using the horse with the top Bris prime power rating with a myriad of unique single factors. Something interesting that I found was that in dirt sprints, at the tracks that I'm playing this year, the top Prime power horse, when drawn on the rail, wins better than 40% of its races and shows a positive roi. But when I tested this same idea against a sample taken from last year's races the results were horrible: 27 percent winners and a minus 20 percent roi.

Was it simple noise in my first sample? Or are other factors at work here? Perhaps there has been a rail bias at the the tracks I have been playing so far this year and I'm just now becoming aware of it. How would anybody apply regression analysis to THAT?

garyoz 02-22-2004 06:37 PM

Good points Jeff P. I agree with you in terms of the difficulty in modeling. I don't think there is a very clean way around the highly correlated (or colinearity) variable problem except to try to combine them in some type of indices. But I think that would probably blunt their interpretability and precision. I never liked the concept of power figures. In terms of isolating the effects of single variables (or control variables) and then determining the "main effect" of subsequent variables, theoretically this can be accomplished through stepwise regression (at least as I remember it--I could be wrong). But if you have highly correlated variables in a stepwise, the first variable would be associated with most of the variance and that wouldn't leave much to associate with subsequent variables ( once again, as I remember the statistics). This would probably be problematic.

I have pretty much given up on trying to use statistical models, but rather use programs to measure, display and organize handicapping variables. Then I grind out plays using pen and pencil. Not very efficient or elegent and doesn't always work.

Rick 02-22-2004 08:19 PM

Wow! See what I mean about there being some really smart guys here. I'm a little bit lost, but they're bringing up some really good points. Pay attention you mathematical geniuses. Not you Derek.

GameTheory 02-22-2004 08:55 PM

Horse racing data is also full of contradictions, which regression models don't handle well. For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. You accuracy is limited until you start to discover the relationships variables have to each other...

Rick 02-23-2004 04:10 AM

GT,

It's possible to capture relationships like that, for example with an A x B variable, but you have to first guess that they exist and add them to the model. There are just too many nonlinear relationships that are possible.

Rick 02-23-2004 04:39 AM

The thing that confused me about Benter's reference to using a multinomial logit model was, according to what I've read, in a "multinomial" model you would have more than two values for the dependent variable. Now, I've used a logit model with the typical 0,1 dependent variable but not with more values. And, I'm not sure what the values would represent if I were to use more than two. Benter does refer to the interesting trick of effectively increasing the data by including 2nd or even 3rd place finishes as "winners" and considering only the horses below that position. But that wouldn't seem to create additional values for the dependent variable, only double or triple the data set. Also, can I assume that logit regression is the same as logistic regression or is their some difference that I'm missing?

Red Knave 02-23-2004 08:21 AM

I wish I understood more of this stuff
 
Quote:

Originally posted by GameTheory
... But when both A & B are high, they almost never win. You accuracy is limited until you start to discover the relationships variables have to each other...
Is this not what neural networks are supposed to do?
Can you comment?
Anyone with any experience in this area?

Jeff P 02-23-2004 12:57 PM

I don't think the idea of increasing the data set by including 2nd or even 3rd place finishers as "winners" and considering only the horses below that position is a good idea. I say this purely from a logistical standpoint. Back away from statistics for a second and consider the way races are run in the first place.

The gate opens. One or more speed horses scramble for the lead. One speed horse gets the lead. The rest then take up positions behind the leader. They wait. Each makes a move at some point to challenge for the lead. Each challenge either succeeds or fails. That success or failure is only revealed to us when the first horse hits the wire. They load another field in the gate and the whole process is repeated.

Okay. Back to statistics. As soon as you remove the winner from the model and re-evaluate the race using only the horses below that- isn't your model now flawed because it is deviating from the way races are run? The winner that you just removed had some influence on the way the race was run. Probably a very strong one. Now remove the second place horse and re-evaluate using only the horses below that. Did the second place horse have an influence on the way the race was run? Again, very likey yes.

How valid can information obtained in this manner actually be?


All times are GMT -4. The time now is 04:21 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved

» Advertisement
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 04:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.