Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > Handicapping Software


Reply
 
Thread Tools Rate Thread
Old 02-23-2004, 03:27 PM   #16
GameTheory
Registered User
 
Join Date: Dec 2001
Posts: 6,128
Re: I wish I understood more of this stuff

Quote:
Originally posted by Red Knave
Is this not what neural networks are supposed to do?
Can you comment?
Anyone with any experience in this area?
Well, yes, although I haven't used neural networks much. I do prefer "non-parametric" methods (like NNs) that let the "data speak for itself" without imposing assumptions on it like most statistical techniques, and are able to discover more subtle relationships in the data. However, these methods have their own set of (non-trivial) problems that need to be addressed before they will work well; a topic I could write all day about (but am not going to). But it took me 10 years to solve all those problems...
GameTheory is offline   Reply With Quote Reply
Old 02-23-2004, 03:33 PM   #17
GameTheory
Registered User
 
Join Date: Dec 2001
Posts: 6,128
Quote:
Originally posted by Jeff P
I don't think the idea of increasing the data set by including 2nd or even 3rd place finishers as "winners" and considering only the horses below that position is a good idea. I say this purely from a logistical standpoint. Back away from statistics for a second and consider the way races are run in the first place.

The gate opens. One or more speed horses scramble for the lead. One speed horse gets the lead. The rest then take up positions behind the leader. They wait. Each makes a move at some point to challenge for the lead. Each challenge either succeeds or fails. That success or failure is only revealed to us when the first horse hits the wire. They load another field in the gate and the whole process is repeated.

Okay. Back to statistics. As soon as you remove the winner from the model and re-evaluate the race using only the horses below that- isn't your model now flawed because it is deviating from the way races are run? The winner that you just removed had some influence on the way the race was run. Probably a very strong one. Now remove the second place horse and re-evaluate using only the horses below that. Did the second place horse have an influence on the way the race was run? Again, very likey yes.

How valid can information obtained in this manner actually be?
I more or less agree, but I think some real tests need to be done.

The technique might have some usefulness. For instance, if you have only a very small sample of races (say from a particular track) it might improve your predictions.

Also, it depends on the factors you are extracting from the data. General factors like speed will hold up, but certain pace factors will fall apart as you described. Or will they? Throwing the winner out may just bring an almost equally likely but different pace scenario to the fore.

I've learned that most ideas in horseracing can't be defeated with logic, only with experimentation. Most of my best discoveries came from taking something I thought would work well that wasn't working well and doing the exact opposite. (I love it when I describe to somebody something I'm doing that works great, and they explain to me why it can't work.)
GameTheory is offline   Reply With Quote Reply
Old 02-23-2004, 03:34 PM   #18
Rick
Registered User
 
Join Date: Feb 2002
Location: Fallon, NV
Posts: 1,571
Jeff,

Well, I agree with you but Benter says it works in Hong Kong racing. Of course there are some huge differences between there an here. I've only tried it using just the winner and about all I can say is that it works better than linear regression. But, there's something that prevents any of these techniques from working really well. I guess the term would be lack of "robustness". Outliers affect all of these techniques a lot, and there seem to be a lot of outliers in horse racing data.
__________________
"I might not give the answer that you want me to" - Fleetwood Mac
Rick is offline   Reply With Quote Reply
Old 02-23-2004, 04:07 PM   #19
GameTheory
Registered User
 
Join Date: Dec 2001
Posts: 6,128
Quote:
Originally posted by Rick
The thing that confused me about Benter's reference to using a multinomial logit model was, according to what I've read, in a "multinomial" model you would have more than two values for the dependent variable. Now, I've used a logit model with the typical 0,1 dependent variable but not with more values. And, I'm not sure what the values would represent if I were to use more than two. Benter does refer to the interesting trick of effectively increasing the data by including 2nd or even 3rd place finishes as "winners" and considering only the horses below that position. But that wouldn't seem to create additional values for the dependent variable, only double or triple the data set. Also, can I assume that logit regression is the same as logistic regression or is their some difference that I'm missing?
I don't remember Benter talking about multinomial, but Bolton and Chapman's original paper had "multinomial" in the title -- maybe they were predicting finish position instead of just won/lost? I don't feel like digging out the paper. The data replication trick was also Bolton & Chapmans.

Logit / Logistic are the same thing.
GameTheory is offline   Reply With Quote Reply
Old 02-23-2004, 04:18 PM   #20
Rick
Registered User
 
Join Date: Feb 2002
Location: Fallon, NV
Posts: 1,571
After looking around a bit, it seems that logit and logistic aren't exactly the same. They use different methods to arrive at the same conclusion though so, for all practical purposes, they are equivalent.
__________________
"I might not give the answer that you want me to" - Fleetwood Mac
Rick is offline   Reply With Quote Reply
Old 02-23-2004, 04:21 PM   #21
GameTheory
Registered User
 
Join Date: Dec 2001
Posts: 6,128
Quote:
Originally posted by Rick
After looking around a bit, it seems that logit and logistic aren't exactly the same. They use different methods to arrive at the same conclusion though so, for all practical purposes, they are equivalent.
Really? If you have any links to stuff that explains the difference I'd be interested. I've always seen them used interchangably. (e.g. a "logit model" is something derived via "logistic regression") You're not thinking of logit & probit, are you?
GameTheory is offline   Reply With Quote Reply
Old 02-23-2004, 07:06 PM   #22
MichaelNunamaker
Registered User
 
Join Date: Feb 2004
Location: Windermere, FL
Posts: 151
Hi GameTheory,

You wrote "However, these methods have their own set of (non-trivial) problems that need to be addressed before they will work well; a topic I could write all day about (but am not going to). But it took me 10 years to solve all those problems..."

Any hints?

You also wrote "For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. "

OK, I'll bite. Any examples? I've never seen this behavior and I've modeled loads of variable pairs. Perhaps I'm not looking at the correct pairings?
MichaelNunamaker is offline   Reply With Quote Reply
Old 02-23-2004, 07:12 PM   #23
Rick
Registered User
 
Join Date: Feb 2002
Location: Fallon, NV
Posts: 1,571
GT,

Here's a link to something that seemed to me to be saying that:

http://www2.chass.ncsu.edu/garson/pa765/logistic.htm

See what you think. I've also seen people use the terms interchangably so I kind of confused about the whole thing. I understand the difference between probit and logit though.

On the other hand, here's something that implies that they're the same:

http://support.sas.com/faq/014/FAQ01494.html
__________________
"I might not give the answer that you want me to" - Fleetwood Mac

Last edited by Rick; 02-23-2004 at 07:19 PM.
Rick is offline   Reply With Quote Reply
Old 02-23-2004, 08:02 PM   #24
garyoz
Registered User
 
Join Date: Nov 2003
Location: Ohio
Posts: 1,307
I don't think that there is a major issue in the difference between probit or logit regression, at least the way I was taught it many years ago in at Univ. of Wisconsin-Madison, by Draper. He was one of the heavy hitters in Applied Regression Analysis. The issue is that you can't regress against a dichotomous dependent variable (0,1 or win, lose). It violates error term distribution assumptions. So, you use a distribution curve (normally S-shaped such as a normal distribution) referred to as a probability density function. Thus, you can predict a probability of the occurance with the regression model. Actually this is a simple but intuitive approach. My Ph.D. dissertation many years ago used logistic regression and i had to write a chapter on methods, which is why I remember this.

My take on this thread is that we have been discussing a new approach which simply uses a logistic (or it could be probit) dependent variable in the form of some type of S-shaped curve. I think most of the questions and uncertainty is on the independent variable side of the equation. With the one exception of the use of two dependent variables, which I'm absolutely clueless on.

If any of the above is incorrect please correct me. This is the best of my rusty and outdated knowledge.
garyoz is offline   Reply With Quote Reply
Old 02-23-2004, 08:22 PM   #25
Rick
Registered User
 
Join Date: Feb 2002
Location: Fallon, NV
Posts: 1,571
garyoz,

Yeah, I think you should come up with essentially the same results using either. Draper? Didn't he write a famous statistical book a loooong time ago? I see I'm not the only one here who's been around the block a few times.
__________________
"I might not give the answer that you want me to" - Fleetwood Mac
Rick is offline   Reply With Quote Reply
Old 02-24-2004, 01:21 AM   #26
ranchwest
Registered User
 
Join Date: Oct 2001
Location: near Lone Star Park
Posts: 5,147
Quote:
Originally posted by Jeff P
[snip]
Something interesting that I found was that in dirt sprints, at the tracks that I'm playing this year, the top Prime power horse, when drawn on the rail, wins better than 40% of its races and shows a positive roi. But when I tested this same idea against a sample taken from last year's races the results were horrible: 27 percent winners and a minus 20 percent roi.

Was it simple noise in my first sample? Or are other factors at work here? Perhaps there has been a rail bias at the the tracks I have been playing so far this year and I'm just now becoming aware of it. How would anybody apply regression analysis to THAT?
Are you certain of the scientific consistency of the formulation of the Prime Power numbers you are using?

Also, are you using only fast tracks?

Are you comparing the same time of year? Comparing a summer surface to a winterized track could make your results invalid.

Have you done any comparisons concerning races coming out of chutes and races not coming out of chutes?
__________________
Ranch West
Equine Performance Analyst, Quick Grid Software
ranchwest is offline   Reply With Quote Reply
Old 02-24-2004, 06:09 AM   #27
arkansasman
Registered User
 
Join Date: Feb 2004
Location: Paragould, Arkansas
Posts: 198
multinomial logit

Thanks for all the post. This is very interesting.

Here is what I know to do regarding Multinomial Logit.

All the big computer betting syndicates use Multinomial Logit or something similar. Bill Benter uses Multinomial Probit as he says it gives him a 5 to 10 percent increase in profits. He used Multinomial logit before he went to Multinomial Probit Two factors that Benter uses is Average Normalized Finishing Position and Recency Weighted Finishing Position. In Hong Kong, he uses ranking factors also. One could not do this in the United States, because of the number of horses running

If anyone wants these fromulas, email me and I will give those to you. Benter says if you are wanting to start a computer model, these are 2 factors you would start a model with for horse racing.

I have to go to work, I will post some more later.
arkansasman is offline   Reply With Quote Reply
Old 02-24-2004, 03:44 PM   #28
Jeff P
Registered User
 
Jeff P's Avatar
 
Join Date: Dec 2001
Location: JCapper Platinum: Kind of like Deep Blue... but for horses.
Posts: 5,257
Originally posted by Ranchwest:
Quote:
Are you certain of the scientific consistency of the formulation of the Prime Power numbers you are using?

Also, are you using only fast tracks?

Are you comparing the same time of year? Comparing a summer surface to a winterized track could make your results invalid.

Have you done any comparisons concerning races coming out of chutes and races not coming out of chutes?
Ranchwest,

The answer to all of your questions is: No.

Unless somebody on this board is privy to the Bris algorithm for prime power and wants to share, I'm in the dark as to how it is calculated. I've contacted Bris a handful of times iquiring how they calculate it. Their answer has always been "It's proprietary. See our webpage for further explanation." Very helpful- no?

Rail position, although it seems to be a simple isolated factor, as you suggest, might not be. Before using it as such, further testing appears to be in order.
__________________
Team JCapper: 2011 PAIHL Regular Season ROI Leader after 15 weeks
www.JCapper.com
Jeff P is offline   Reply With Quote Reply
Old 02-24-2004, 04:16 PM   #29
Rick
Registered User
 
Join Date: Feb 2002
Location: Fallon, NV
Posts: 1,571
Jeff,

Several years ago I did a study of Prime Power Rating. I used Race Rating, Class Rating, Speed Rating, and Odds (Log(Odds+1) from the last two races to predict Prime Power and got a very good fit (very high r squared). In fact, I think using just those things for the last race did pretty well. I'm sure there's a lot more factors that they use, but it seems to be heavily weighted toward recent performances.
__________________
"I might not give the answer that you want me to" - Fleetwood Mac
Rick is offline   Reply With Quote Reply
Old 02-24-2004, 06:13 PM   #30
GameTheory
Registered User
 
Join Date: Dec 2001
Posts: 6,128
Quote:
Originally posted by MichaelNunamaker
Hi GameTheory,

You wrote "However, these methods have their own set of (non-trivial) problems that need to be addressed before they will work well; a topic I could write all day about (but am not going to). But it took me 10 years to solve all those problems..."

Any hints?
Not really. I haven't made enough money yet to give away my hard-earned secrets.

But what are the problems? A couple of very broad categories:

1) Methodology

This applies to any automatic modelling technique, parametric (any kind of regression) or non-parametric (neural network, genetic algorithms, etc). In either case you've got some historical data that you want to use to create a model that will predict future races. But just exactly how are you going to represent your data (code it numerically) to be presented to your modelling algorithm? And just what exactly are you going to model? (i.e. What are you going to predict?) Will it be probabilities, or finish position, or projected times, or beaten lengths? Or maybe you're going to do some sort of simulation? Each algorithm has it own strengths, weaknesses, and biases. You have to know what is appropriate to attempt to model, and you have to know how the data must be coded for that algorithm for it to work well. This is not a trivial question.

Many many people conclude that such-and-such technique doesn't work for horse racing simply because they haven't prepared the data properly, or because they are trying to get it to answer the wrong question. There isn't much literature on the subject either, at least that is very helpful. Most published research papers on regression/prediction/classification methods have a common structure: first they introduce some new technique or a new variant on an old one; then they demonstrate how great it is with some empirical data. Problem is most of their methodology is seriously flawed. For one thing, the internet has got all the researchers sharing the same known datasets. Which in turn encourages them to make a point of coming up with results that are better than previously published on some known dataset. (If someone publishes a paper demonstrating a technique that shows they can get the lowest known error on the "Boston housing dataset", then it makes them look good in the academic community.) The problem is they are all "tuning" their techniques to perform well on these known datasets and a mass group overfitting is occurring. The result is that a huge amount of the published research is total hogwash and won't hold up when you try it with new data. And horse racing data is invariably a much tougher nut to crack than anything they're doing anyway. (One thing that attracted me to the challenge of horse racing to begin with is that you can't cheat or fool yourself because we keep score with real money -- you're either making it or you're losing it.) The real progress with these technologies is happening in the financial world, which also keeps score with real money. But they tend to keep their discoveries to themselves, just as I am.

Bottom line: you're on your own to figure out some very thorny and often non-intuitive problems just to nail down what your exact methodology is going to be: data representation / modelling technique / what is being modelled. Each technique has it own set of problems, and it is probably up to you just to discover those problems as well as to solve them.


2) Overfitting.

The major stumbling block with non-parametric techniques is the same thing that makes them attractive in the first place -- they make no assumptions about the data (they don't force it to fit a particular distribution). But since they don't, the model is free to fit very tightly to the data, in effect memorizing the training samples. And then we have overfitting, and the model will not perform well on new data. Overfitting has to be addressed one way or another with any non-parametric technique (also with parametric, but much less so). Again, each technique has different appropriate solutions to the problem, some better than others.



Quote:


You also wrote "For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. "

OK, I'll bite. Any examples? I've never seen this behavior and I've modeled loads of variable pairs. Perhaps I'm not looking at the correct pairings?
Well, it depends what you're looking at I guess. But a simple example is the "too good to be true scenario" like a standout in speed, class, & form in a lower or mid-level claiming race. What's he doing in this race? Losing, usually. (Whereas as speed standout that has been racing at a lower level might be a decent bet.) A graded stakes winner in an allowance race? No thanks.

Which relates to another interesting thing I've noticed. I've made hundreds of numerical rating systems over the years, and one almost universal property that I've NEVER seen mentioned by anyone else is this: a smaller margin of advantage usually wins more often than a large one. In other words let's say we're giving each horse a rating based on who-knows-what:


Scenario #1
-------------
Horse A: 100
Horse B: 98
Horse C: 85


Scenario #2
-------------
Horse A: 100
Horse B: 90
Horse C: 85


Does Horse A have a better chance of winning in Scenario #1 or #2? In most of my tests it would be Scenario #1, and that seems to be almost independent of what the rating is based on. Scenario #2 is usually too good to be true. That may be function of the races I typically look at, which are usually claimers and regular non-winners allowance races. But usually there is a threshold somewhere where the positive advantage becomes a negative indicator.
GameTheory is offline   Reply With Quote Reply
Reply




Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 12:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.