|
|
02-21-2004, 02:18 PM
|
#1
|
Registered User
Join Date: Feb 2004
Location: Paragould, Arkansas
Posts: 198
|
peter wagner, benter and multinomial logit
Does anyone know anything about the North Dakota high roller - Peter Wagner? Does Bill Benter and Peter Wagner both use Multinomial Logit to predict horse race probabilities. If someone is well schooled in Multinomial Logit, please comment. I have a model that has 47 factors but I am stuck getting the probabilities.
I am very interested in hearing how some of you have arrived at probabilities for a model.
|
|
|
02-21-2004, 02:24 PM
|
#2
|
Registered User
Join Date: Dec 2001
Posts: 6,128
|
Logit works okay, as do a number of other methods to make probabilities. The effectiveness of your factors themselves, and how you relate them to the public odds/probabilities are far more important questions.
What are you doing now, and in what way are you stuck?
|
|
|
02-21-2004, 02:29 PM
|
#3
|
Registered User
Join Date: Feb 2004
Location: Paragould, Arkansas
Posts: 198
|
Well, I have the coefficients via maximum likelihood. Is my next step to take the exponential of the sum of the all the factors times their coefficients and then divide each horse by the sum of all the horses exponentials
|
|
|
02-21-2004, 03:12 PM
|
#4
|
Registered User
Join Date: Dec 2001
Posts: 6,128
|
Maximum Likelihood is a general term. Logistic regression is one form of maximum likelihood estimation that uses a dichotomous dependent variable (which means either 1 or 0 -- usually won or didn't win in horseracing). Just look up "logistic regression" in Google and you'll find plenty to read. Of course you'll need some software to do the calculations...
|
|
|
02-22-2004, 06:45 AM
|
#5
|
Registered User
Join Date: Feb 2004
Location: Paragould, Arkansas
Posts: 198
|
Game, thanks for the reply.
I have been wrong many times, but if everything I read about multinomial logit, the dependant variable should not be binary. The reason that Chapman, Benter(if he uses multinomial logit), and others use multinomial logit from what I have read is - that it accounts for the in race competition - how did horse A compete against all the others in the race.
I think that once you have the coefficients for each factor, you then have a linear sum of a horse's attributes which is the vector.
If I am right, (which I might not be) you then arrive at the probabilities by getting the exponential of the vector divided by the sum of the exponential of all horse's vectors in the race.
Thanks for you help Game.
|
|
|
02-22-2004, 11:31 AM
|
#6
|
Registered User
Join Date: Nov 2003
Location: Ohio
Posts: 1,307
|
Logistic regression or probit analyis are the proper ways to express the dependent variable (an S-shaped probability density function). However, you still have the problem of multicolinearity in most horse racing variables. Predictor variables are highly correlated (eg. back class and back speed, or speed and class). Multicolinearity violates the assumption of randomness in the error terms that is required by regression analysis. Thus, I am skeptical that regression analysis could work well using standard canned regression programs (e.g., SPSS). I do not know the multinominal logit methodology, but it likely solves this problem. The multicolinearity problem (plus other statistical issues) is why I am skeptical about a program such as Allways claim that it does multiple regression analysis.
|
|
|
02-22-2004, 04:51 PM
|
#7
|
Veteran
Join Date: Mar 2002
Location: new york city
Posts: 1,424
|
that pesky multi-coll****
yesterday i typed that this problem, which is VERY real, can be
corrected for. I said "IVs are very susceptible 2 it, but if you
have the raw data, you can estimate this error VERY well." --
of corse, i was being attacked then so ONCE AGAIN, no comment.
But there are OTHER worse probs w/ regression than MC.
in fact, MC is the lesser prob by far. Now, what do YOU THINK
is the REAL issue w/ all regression formulas (in HorseRacing)????
|
|
|
02-22-2004, 06:19 PM
|
#8
|
Registered User
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,290
|
My opinion is that horse racing data doen't conform very well to use with regression formulas for two reasons:
Complexity
Horse racing data is by nature very complex data. Many single factors are often intricately co-related with what, on the surface, appear to be other single factors. For example- Try running some regression tests on speed figures. Now do the same thing with class. Higher class horses tend to have higher speed ratings. Before evaluating the true effects of class, wouldn't you first have to estimate and remove the effect that speed figures have on class? That's of course assuming you can find an effective way to measure its performance in the first place.
Posted by Rick yesterday in another thread:
Quote:
Well, here's the thing. To a large extent, speed=class, class=pace, and pace=speed, so when people say the most important factors are A, B, and C it really doesn't mean all that much. What really matters is finding relatively independent measures of performance.
|
I tend to think Rick's above statement makes a lot of sense.
Noise
Secondly, horse racing data tends to be noisy. What seems to work well during one time period often falls flat on its face when tested during a different time period.
For example, I have recently been trying to develop a play type that finds lots of plays with a very high win percentage and essentially a breakeven ROI. I'm trying to do this to be in a better position to take full advantage of rebates being offered. Okay- back to my point. I did some testing using the horse with the top Bris prime power rating with a myriad of unique single factors. Something interesting that I found was that in dirt sprints, at the tracks that I'm playing this year, the top Prime power horse, when drawn on the rail, wins better than 40% of its races and shows a positive roi. But when I tested this same idea against a sample taken from last year's races the results were horrible: 27 percent winners and a minus 20 percent roi.
Was it simple noise in my first sample? Or are other factors at work here? Perhaps there has been a rail bias at the the tracks I have been playing so far this year and I'm just now becoming aware of it. How would anybody apply regression analysis to THAT?
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com
|
|
|
02-22-2004, 06:37 PM
|
#9
|
Registered User
Join Date: Nov 2003
Location: Ohio
Posts: 1,307
|
Good points Jeff P. I agree with you in terms of the difficulty in modeling. I don't think there is a very clean way around the highly correlated (or colinearity) variable problem except to try to combine them in some type of indices. But I think that would probably blunt their interpretability and precision. I never liked the concept of power figures. In terms of isolating the effects of single variables (or control variables) and then determining the "main effect" of subsequent variables, theoretically this can be accomplished through stepwise regression (at least as I remember it--I could be wrong). But if you have highly correlated variables in a stepwise, the first variable would be associated with most of the variance and that wouldn't leave much to associate with subsequent variables ( once again, as I remember the statistics). This would probably be problematic.
I have pretty much given up on trying to use statistical models, but rather use programs to measure, display and organize handicapping variables. Then I grind out plays using pen and pencil. Not very efficient or elegent and doesn't always work.
|
|
|
02-22-2004, 08:19 PM
|
#10
|
Registered User
Join Date: Feb 2002
Location: Fallon, NV
Posts: 1,571
|
Wow! See what I mean about there being some really smart guys here. I'm a little bit lost, but they're bringing up some really good points. Pay attention you mathematical geniuses. Not you Derek.
__________________
"I might not give the answer that you want me to" - Fleetwood Mac
|
|
|
02-22-2004, 08:55 PM
|
#11
|
Registered User
Join Date: Dec 2001
Posts: 6,128
|
Horse racing data is also full of contradictions, which regression models don't handle well. For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. You accuracy is limited until you start to discover the relationships variables have to each other...
|
|
|
02-23-2004, 04:10 AM
|
#12
|
Registered User
Join Date: Feb 2002
Location: Fallon, NV
Posts: 1,571
|
GT,
It's possible to capture relationships like that, for example with an A x B variable, but you have to first guess that they exist and add them to the model. There are just too many nonlinear relationships that are possible.
__________________
"I might not give the answer that you want me to" - Fleetwood Mac
|
|
|
02-23-2004, 04:39 AM
|
#13
|
Registered User
Join Date: Feb 2002
Location: Fallon, NV
Posts: 1,571
|
The thing that confused me about Benter's reference to using a multinomial logit model was, according to what I've read, in a "multinomial" model you would have more than two values for the dependent variable. Now, I've used a logit model with the typical 0,1 dependent variable but not with more values. And, I'm not sure what the values would represent if I were to use more than two. Benter does refer to the interesting trick of effectively increasing the data by including 2nd or even 3rd place finishes as "winners" and considering only the horses below that position. But that wouldn't seem to create additional values for the dependent variable, only double or triple the data set. Also, can I assume that logit regression is the same as logistic regression or is their some difference that I'm missing?
__________________
"I might not give the answer that you want me to" - Fleetwood Mac
|
|
|
02-23-2004, 08:21 AM
|
#14
|
dGnr8
Join Date: Aug 2003
Location: Niagara, Ontario
Posts: 3,023
|
I wish I understood more of this stuff
Quote:
Originally posted by GameTheory
... But when both A & B are high, they almost never win. You accuracy is limited until you start to discover the relationships variables have to each other...
|
Is this not what neural networks are supposed to do?
Can you comment?
Anyone with any experience in this area?
__________________
.
The great menace to progress is not ignorance but the illusion of knowledge - Daniel J. Boorstin
The takers get the honey, the givers sing the blues - Robin Trower, Too Rolling Stoned - 1974
|
|
|
02-23-2004, 12:57 PM
|
#15
|
Registered User
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,290
|
I don't think the idea of increasing the data set by including 2nd or even 3rd place finishers as "winners" and considering only the horses below that position is a good idea. I say this purely from a logistical standpoint. Back away from statistics for a second and consider the way races are run in the first place.
The gate opens. One or more speed horses scramble for the lead. One speed horse gets the lead. The rest then take up positions behind the leader. They wait. Each makes a move at some point to challenge for the lead. Each challenge either succeeds or fails. That success or failure is only revealed to us when the first horse hits the wire. They load another field in the gate and the whole process is repeated.
Okay. Back to statistics. As soon as you remove the winner from the model and re-evaluate the race using only the horses below that- isn't your model now flawed because it is deviating from the way races are run? The winner that you just removed had some influence on the way the race was run. Probably a very strong one. Now remove the second place horse and re-evaluate using only the horses below that. Did the second place horse have an influence on the way the race was run? Again, very likey yes.
How valid can information obtained in this manner actually be?
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com
|
|
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|