PDA

View Full Version : peter wagner, benter and multinomial logit


arkansasman
02-21-2004, 03:18 PM
Does anyone know anything about the North Dakota high roller - Peter Wagner? Does Bill Benter and Peter Wagner both use Multinomial Logit to predict horse race probabilities. If someone is well schooled in Multinomial Logit, please comment. I have a model that has 47 factors but I am stuck getting the probabilities.
I am very interested in hearing how some of you have arrived at probabilities for a model.

GameTheory
02-21-2004, 03:24 PM
Logit works okay, as do a number of other methods to make probabilities. The effectiveness of your factors themselves, and how you relate them to the public odds/probabilities are far more important questions.

What are you doing now, and in what way are you stuck?

arkansasman
02-21-2004, 03:29 PM
Well, I have the coefficients via maximum likelihood. Is my next step to take the exponential of the sum of the all the factors times their coefficients and then divide each horse by the sum of all the horses exponentials

GameTheory
02-21-2004, 04:12 PM
Maximum Likelihood is a general term. Logistic regression is one form of maximum likelihood estimation that uses a dichotomous dependent variable (which means either 1 or 0 -- usually won or didn't win in horseracing). Just look up "logistic regression" in Google and you'll find plenty to read. Of course you'll need some software to do the calculations...

arkansasman
02-22-2004, 07:45 AM
Game, thanks for the reply.

I have been wrong many times, but if everything I read about multinomial logit, the dependant variable should not be binary. The reason that Chapman, Benter(if he uses multinomial logit), and others use multinomial logit from what I have read is - that it accounts for the in race competition - how did horse A compete against all the others in the race.

I think that once you have the coefficients for each factor, you then have a linear sum of a horse's attributes which is the vector.
If I am right, (which I might not be) you then arrive at the probabilities by getting the exponential of the vector divided by the sum of the exponential of all horse's vectors in the race.

Thanks for you help Game.

garyoz
02-22-2004, 12:31 PM
Logistic regression or probit analyis are the proper ways to express the dependent variable (an S-shaped probability density function). However, you still have the problem of multicolinearity in most horse racing variables. Predictor variables are highly correlated (eg. back class and back speed, or speed and class). Multicolinearity violates the assumption of randomness in the error terms that is required by regression analysis. Thus, I am skeptical that regression analysis could work well using standard canned regression programs (e.g., SPSS). I do not know the multinominal logit methodology, but it likely solves this problem. The multicolinearity problem (plus other statistical issues) is why I am skeptical about a program such as Allways claim that it does multiple regression analysis.

Derek2U
02-22-2004, 05:51 PM
yesterday i typed that this problem, which is VERY real, can be
corrected for. I said "IVs are very susceptible 2 it, but if you
have the raw data, you can estimate this error VERY well." --
of corse, i was being attacked then so ONCE AGAIN, no comment.
But there are OTHER worse probs w/ regression than MC.
in fact, MC is the lesser prob by far. Now, what do YOU THINK
is the REAL issue w/ all regression formulas (in HorseRacing)????

Jeff P
02-22-2004, 07:19 PM
My opinion is that horse racing data doen't conform very well to use with regression formulas for two reasons:

Complexity
Horse racing data is by nature very complex data. Many single factors are often intricately co-related with what, on the surface, appear to be other single factors. For example- Try running some regression tests on speed figures. Now do the same thing with class. Higher class horses tend to have higher speed ratings. Before evaluating the true effects of class, wouldn't you first have to estimate and remove the effect that speed figures have on class? That's of course assuming you can find an effective way to measure its performance in the first place.

Posted by Rick yesterday in another thread:

Well, here's the thing. To a large extent, speed=class, class=pace, and pace=speed, so when people say the most important factors are A, B, and C it really doesn't mean all that much. What really matters is finding relatively independent measures of performance.
I tend to think Rick's above statement makes a lot of sense.


Noise
Secondly, horse racing data tends to be noisy. What seems to work well during one time period often falls flat on its face when tested during a different time period.

For example, I have recently been trying to develop a play type that finds lots of plays with a very high win percentage and essentially a breakeven ROI. I'm trying to do this to be in a better position to take full advantage of rebates being offered. Okay- back to my point. I did some testing using the horse with the top Bris prime power rating with a myriad of unique single factors. Something interesting that I found was that in dirt sprints, at the tracks that I'm playing this year, the top Prime power horse, when drawn on the rail, wins better than 40% of its races and shows a positive roi. But when I tested this same idea against a sample taken from last year's races the results were horrible: 27 percent winners and a minus 20 percent roi.

Was it simple noise in my first sample? Or are other factors at work here? Perhaps there has been a rail bias at the the tracks I have been playing so far this year and I'm just now becoming aware of it. How would anybody apply regression analysis to THAT?

garyoz
02-22-2004, 07:37 PM
Good points Jeff P. I agree with you in terms of the difficulty in modeling. I don't think there is a very clean way around the highly correlated (or colinearity) variable problem except to try to combine them in some type of indices. But I think that would probably blunt their interpretability and precision. I never liked the concept of power figures. In terms of isolating the effects of single variables (or control variables) and then determining the "main effect" of subsequent variables, theoretically this can be accomplished through stepwise regression (at least as I remember it--I could be wrong). But if you have highly correlated variables in a stepwise, the first variable would be associated with most of the variance and that wouldn't leave much to associate with subsequent variables ( once again, as I remember the statistics). This would probably be problematic.

I have pretty much given up on trying to use statistical models, but rather use programs to measure, display and organize handicapping variables. Then I grind out plays using pen and pencil. Not very efficient or elegent and doesn't always work.

Rick
02-22-2004, 09:19 PM
Wow! See what I mean about there being some really smart guys here. I'm a little bit lost, but they're bringing up some really good points. Pay attention you mathematical geniuses. Not you Derek.

GameTheory
02-22-2004, 09:55 PM
Horse racing data is also full of contradictions, which regression models don't handle well. For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. You accuracy is limited until you start to discover the relationships variables have to each other...

Rick
02-23-2004, 05:10 AM
GT,

It's possible to capture relationships like that, for example with an A x B variable, but you have to first guess that they exist and add them to the model. There are just too many nonlinear relationships that are possible.

Rick
02-23-2004, 05:39 AM
The thing that confused me about Benter's reference to using a multinomial logit model was, according to what I've read, in a "multinomial" model you would have more than two values for the dependent variable. Now, I've used a logit model with the typical 0,1 dependent variable but not with more values. And, I'm not sure what the values would represent if I were to use more than two. Benter does refer to the interesting trick of effectively increasing the data by including 2nd or even 3rd place finishes as "winners" and considering only the horses below that position. But that wouldn't seem to create additional values for the dependent variable, only double or triple the data set. Also, can I assume that logit regression is the same as logistic regression or is their some difference that I'm missing?

Red Knave
02-23-2004, 09:21 AM
Originally posted by GameTheory
... But when both A & B are high, they almost never win. You accuracy is limited until you start to discover the relationships variables have to each other...

Is this not what neural networks are supposed to do?
Can you comment?
Anyone with any experience in this area?

Jeff P
02-23-2004, 01:57 PM
I don't think the idea of increasing the data set by including 2nd or even 3rd place finishers as "winners" and considering only the horses below that position is a good idea. I say this purely from a logistical standpoint. Back away from statistics for a second and consider the way races are run in the first place.

The gate opens. One or more speed horses scramble for the lead. One speed horse gets the lead. The rest then take up positions behind the leader. They wait. Each makes a move at some point to challenge for the lead. Each challenge either succeeds or fails. That success or failure is only revealed to us when the first horse hits the wire. They load another field in the gate and the whole process is repeated.

Okay. Back to statistics. As soon as you remove the winner from the model and re-evaluate the race using only the horses below that- isn't your model now flawed because it is deviating from the way races are run? The winner that you just removed had some influence on the way the race was run. Probably a very strong one. Now remove the second place horse and re-evaluate using only the horses below that. Did the second place horse have an influence on the way the race was run? Again, very likey yes.

How valid can information obtained in this manner actually be?

GameTheory
02-23-2004, 04:27 PM
Originally posted by Red Knave
Is this not what neural networks are supposed to do?
Can you comment?
Anyone with any experience in this area?

Well, yes, although I haven't used neural networks much. I do prefer "non-parametric" methods (like NNs) that let the "data speak for itself" without imposing assumptions on it like most statistical techniques, and are able to discover more subtle relationships in the data. However, these methods have their own set of (non-trivial) problems that need to be addressed before they will work well; a topic I could write all day about (but am not going to). But it took me 10 years to solve all those problems...

GameTheory
02-23-2004, 04:33 PM
Originally posted by Jeff P
I don't think the idea of increasing the data set by including 2nd or even 3rd place finishers as "winners" and considering only the horses below that position is a good idea. I say this purely from a logistical standpoint. Back away from statistics for a second and consider the way races are run in the first place.

The gate opens. One or more speed horses scramble for the lead. One speed horse gets the lead. The rest then take up positions behind the leader. They wait. Each makes a move at some point to challenge for the lead. Each challenge either succeeds or fails. That success or failure is only revealed to us when the first horse hits the wire. They load another field in the gate and the whole process is repeated.

Okay. Back to statistics. As soon as you remove the winner from the model and re-evaluate the race using only the horses below that- isn't your model now flawed because it is deviating from the way races are run? The winner that you just removed had some influence on the way the race was run. Probably a very strong one. Now remove the second place horse and re-evaluate using only the horses below that. Did the second place horse have an influence on the way the race was run? Again, very likey yes.

How valid can information obtained in this manner actually be?

I more or less agree, but I think some real tests need to be done.

The technique might have some usefulness. For instance, if you have only a very small sample of races (say from a particular track) it might improve your predictions.

Also, it depends on the factors you are extracting from the data. General factors like speed will hold up, but certain pace factors will fall apart as you described. Or will they? Throwing the winner out may just bring an almost equally likely but different pace scenario to the fore.

I've learned that most ideas in horseracing can't be defeated with logic, only with experimentation. Most of my best discoveries came from taking something I thought would work well that wasn't working well and doing the exact opposite. (I love it when I describe to somebody something I'm doing that works great, and they explain to me why it can't work.)

Rick
02-23-2004, 04:34 PM
Jeff,

Well, I agree with you but Benter says it works in Hong Kong racing. Of course there are some huge differences between there an here. I've only tried it using just the winner and about all I can say is that it works better than linear regression. But, there's something that prevents any of these techniques from working really well. I guess the term would be lack of "robustness". Outliers affect all of these techniques a lot, and there seem to be a lot of outliers in horse racing data.

GameTheory
02-23-2004, 05:07 PM
Originally posted by Rick
The thing that confused me about Benter's reference to using a multinomial logit model was, according to what I've read, in a "multinomial" model you would have more than two values for the dependent variable. Now, I've used a logit model with the typical 0,1 dependent variable but not with more values. And, I'm not sure what the values would represent if I were to use more than two. Benter does refer to the interesting trick of effectively increasing the data by including 2nd or even 3rd place finishes as "winners" and considering only the horses below that position. But that wouldn't seem to create additional values for the dependent variable, only double or triple the data set. Also, can I assume that logit regression is the same as logistic regression or is their some difference that I'm missing?

I don't remember Benter talking about multinomial, but Bolton and Chapman's original paper had "multinomial" in the title -- maybe they were predicting finish position instead of just won/lost? I don't feel like digging out the paper. The data replication trick was also Bolton & Chapmans.

Logit / Logistic are the same thing.

Rick
02-23-2004, 05:18 PM
After looking around a bit, it seems that logit and logistic aren't exactly the same. They use different methods to arrive at the same conclusion though so, for all practical purposes, they are equivalent.

GameTheory
02-23-2004, 05:21 PM
Originally posted by Rick
After looking around a bit, it seems that logit and logistic aren't exactly the same. They use different methods to arrive at the same conclusion though so, for all practical purposes, they are equivalent.

Really? If you have any links to stuff that explains the difference I'd be interested. I've always seen them used interchangably. (e.g. a "logit model" is something derived via "logistic regression") You're not thinking of logit & probit, are you?

MichaelNunamaker
02-23-2004, 08:06 PM
Hi GameTheory,

You wrote "However, these methods have their own set of (non-trivial) problems that need to be addressed before they will work well; a topic I could write all day about (but am not going to). But it took me 10 years to solve all those problems..."

Any hints?

You also wrote "For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. "

OK, I'll bite. Any examples? I've never seen this behavior and I've modeled loads of variable pairs. Perhaps I'm not looking at the correct pairings?

Rick
02-23-2004, 08:12 PM
GT,

Here's a link to something that seemed to me to be saying that:

http://www2.chass.ncsu.edu/garson/pa765/logistic.htm

See what you think. I've also seen people use the terms interchangably so I kind of confused about the whole thing. I understand the difference between probit and logit though.

On the other hand, here's something that implies that they're the same:

http://support.sas.com/faq/014/FAQ01494.html

garyoz
02-23-2004, 09:02 PM
I don't think that there is a major issue in the difference between probit or logit regression, at least the way I was taught it many years ago in at Univ. of Wisconsin-Madison, by Draper. He was one of the heavy hitters in Applied Regression Analysis. The issue is that you can't regress against a dichotomous dependent variable (0,1 or win, lose). It violates error term distribution assumptions. So, you use a distribution curve (normally S-shaped such as a normal distribution) referred to as a probability density function. Thus, you can predict a probability of the occurance with the regression model. Actually this is a simple but intuitive approach. My Ph.D. dissertation many years ago used logistic regression and i had to write a chapter on methods, which is why I remember this.

My take on this thread is that we have been discussing a new approach which simply uses a logistic (or it could be probit) dependent variable in the form of some type of S-shaped curve. I think most of the questions and uncertainty is on the independent variable side of the equation. With the one exception of the use of two dependent variables, which I'm absolutely clueless on.

If any of the above is incorrect please correct me. This is the best of my rusty and outdated knowledge.

Rick
02-23-2004, 09:22 PM
garyoz,

Yeah, I think you should come up with essentially the same results using either. Draper? Didn't he write a famous statistical book a loooong time ago? I see I'm not the only one here who's been around the block a few times.

ranchwest
02-24-2004, 02:21 AM
Originally posted by Jeff P
[snip]
Something interesting that I found was that in dirt sprints, at the tracks that I'm playing this year, the top Prime power horse, when drawn on the rail, wins better than 40% of its races and shows a positive roi. But when I tested this same idea against a sample taken from last year's races the results were horrible: 27 percent winners and a minus 20 percent roi.

Was it simple noise in my first sample? Or are other factors at work here? Perhaps there has been a rail bias at the the tracks I have been playing so far this year and I'm just now becoming aware of it. How would anybody apply regression analysis to THAT?

Are you certain of the scientific consistency of the formulation of the Prime Power numbers you are using?

Also, are you using only fast tracks?

Are you comparing the same time of year? Comparing a summer surface to a winterized track could make your results invalid.

Have you done any comparisons concerning races coming out of chutes and races not coming out of chutes?

arkansasman
02-24-2004, 07:09 AM
Thanks for all the post. This is very interesting.

Here is what I know to do regarding Multinomial Logit.

All the big computer betting syndicates use Multinomial Logit or something similar. Bill Benter uses Multinomial Probit as he says it gives him a 5 to 10 percent increase in profits. He used Multinomial logit before he went to Multinomial Probit Two factors that Benter uses is Average Normalized Finishing Position and Recency Weighted Finishing Position. In Hong Kong, he uses ranking factors also. One could not do this in the United States, because of the number of horses running

If anyone wants these fromulas, email me and I will give those to you. Benter says if you are wanting to start a computer model, these are 2 factors you would start a model with for horse racing.

I have to go to work, I will post some more later.

Jeff P
02-24-2004, 04:44 PM
Originally posted by Ranchwest:
Are you certain of the scientific consistency of the formulation of the Prime Power numbers you are using?

Also, are you using only fast tracks?

Are you comparing the same time of year? Comparing a summer surface to a winterized track could make your results invalid.

Have you done any comparisons concerning races coming out of chutes and races not coming out of chutes?


Ranchwest,

The answer to all of your questions is: No.

Unless somebody on this board is privy to the Bris algorithm for prime power and wants to share, I'm in the dark as to how it is calculated. I've contacted Bris a handful of times iquiring how they calculate it. Their answer has always been "It's proprietary. See our webpage for further explanation." Very helpful- no?

Rail position, although it seems to be a simple isolated factor, as you suggest, might not be. Before using it as such, further testing appears to be in order.

Rick
02-24-2004, 05:16 PM
Jeff,

Several years ago I did a study of Prime Power Rating. I used Race Rating, Class Rating, Speed Rating, and Odds (Log(Odds+1) from the last two races to predict Prime Power and got a very good fit (very high r squared). In fact, I think using just those things for the last race did pretty well. I'm sure there's a lot more factors that they use, but it seems to be heavily weighted toward recent performances.

GameTheory
02-24-2004, 07:13 PM
Originally posted by MichaelNunamaker
Hi GameTheory,

You wrote "However, these methods have their own set of (non-trivial) problems that need to be addressed before they will work well; a topic I could write all day about (but am not going to). But it took me 10 years to solve all those problems..."

Any hints?Not really. I haven't made enough money yet to give away my hard-earned secrets.

But what are the problems? A couple of very broad categories:

1) Methodology

This applies to any automatic modelling technique, parametric (any kind of regression) or non-parametric (neural network, genetic algorithms, etc). In either case you've got some historical data that you want to use to create a model that will predict future races. But just exactly how are you going to represent your data (code it numerically) to be presented to your modelling algorithm? And just what exactly are you going to model? (i.e. What are you going to predict?) Will it be probabilities, or finish position, or projected times, or beaten lengths? Or maybe you're going to do some sort of simulation? Each algorithm has it own strengths, weaknesses, and biases. You have to know what is appropriate to attempt to model, and you have to know how the data must be coded for that algorithm for it to work well. This is not a trivial question.

Many many people conclude that such-and-such technique doesn't work for horse racing simply because they haven't prepared the data properly, or because they are trying to get it to answer the wrong question. There isn't much literature on the subject either, at least that is very helpful. Most published research papers on regression/prediction/classification methods have a common structure: first they introduce some new technique or a new variant on an old one; then they demonstrate how great it is with some empirical data. Problem is most of their methodology is seriously flawed. For one thing, the internet has got all the researchers sharing the same known datasets. Which in turn encourages them to make a point of coming up with results that are better than previously published on some known dataset. (If someone publishes a paper demonstrating a technique that shows they can get the lowest known error on the "Boston housing dataset", then it makes them look good in the academic community.) The problem is they are all "tuning" their techniques to perform well on these known datasets and a mass group overfitting is occurring. The result is that a huge amount of the published research is total hogwash and won't hold up when you try it with new data. And horse racing data is invariably a much tougher nut to crack than anything they're doing anyway. (One thing that attracted me to the challenge of horse racing to begin with is that you can't cheat or fool yourself because we keep score with real money -- you're either making it or you're losing it.) The real progress with these technologies is happening in the financial world, which also keeps score with real money. But they tend to keep their discoveries to themselves, just as I am.

Bottom line: you're on your own to figure out some very thorny and often non-intuitive problems just to nail down what your exact methodology is going to be: data representation / modelling technique / what is being modelled. Each technique has it own set of problems, and it is probably up to you just to discover those problems as well as to solve them.


2) Overfitting.

The major stumbling block with non-parametric techniques is the same thing that makes them attractive in the first place -- they make no assumptions about the data (they don't force it to fit a particular distribution). But since they don't, the model is free to fit very tightly to the data, in effect memorizing the training samples. And then we have overfitting, and the model will not perform well on new data. Overfitting has to be addressed one way or another with any non-parametric technique (also with parametric, but much less so). Again, each technique has different appropriate solutions to the problem, some better than others.





You also wrote "For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. "

OK, I'll bite. Any examples? I've never seen this behavior and I've modeled loads of variable pairs. Perhaps I'm not looking at the correct pairings? Well, it depends what you're looking at I guess. But a simple example is the "too good to be true scenario" like a standout in speed, class, & form in a lower or mid-level claiming race. What's he doing in this race? Losing, usually. (Whereas as speed standout that has been racing at a lower level might be a decent bet.) A graded stakes winner in an allowance race? No thanks.

Which relates to another interesting thing I've noticed. I've made hundreds of numerical rating systems over the years, and one almost universal property that I've NEVER seen mentioned by anyone else is this: a smaller margin of advantage usually wins more often than a large one. In other words let's say we're giving each horse a rating based on who-knows-what:


Scenario #1
-------------
Horse A: 100
Horse B: 98
Horse C: 85


Scenario #2
-------------
Horse A: 100
Horse B: 90
Horse C: 85


Does Horse A have a better chance of winning in Scenario #1 or #2? In most of my tests it would be Scenario #1, and that seems to be almost independent of what the rating is based on. Scenario #2 is usually too good to be true. That may be function of the races I typically look at, which are usually claimers and regular non-winners allowance races. But usually there is a threshold somewhere where the positive advantage becomes a negative indicator.

Rick
02-24-2004, 07:32 PM
GT,

I agree 100% on what you've said. Another interesting thing about your "Scenario #2" is that many times neither of the top two horses will be the favorite and your ROI will higher as a result. I think some would call that a "chaos" race. I call it a "very profitable" race.

How your selection runs relative to it's odds seems to be a worthwhile thing to predict. But you also want to predict a good % of winners, not just lose by less on the average because you don't get paid for that.

Jeff P
02-24-2004, 08:02 PM
Originally posted by Game Theory-

Scenario #1
-------------
Horse A: 100
Horse B: 98
Horse C: 85

Such scenarios often turn out to be better from a betting perspective. The public always tends to overbet the more obvious situations (horse with higher apparent advantage) and tends to ignore the less obvious (horse with smaller apparent advantage.)

It is this overreaction to what a horse looks like on paper that makes this game so fascinating- and one that can be beaten in the first place.

That said, I have had success factoring things into my own models that tend to cause the public to look the other way (high morning line odds/low speed fig last race/loss in stretch last race/out of money finish last race, etc. The result is that I tend to get a lower win percentage than most other players (12%-16%) with my own best plays- but I get a win mutuel that averages about $21.00 each time that I am right.

Rick
02-24-2004, 08:18 PM
Jeff,

You could say that the public is more predictable than the horses!

arkansasman
02-24-2004, 09:31 PM
Derek,

I think that Bill Benter's betting has been confirmed by people who know him and also his rival computer teams in Hong Kong. I do not know him, but the first I ever heard of somone beating the horse races was in a Andy Beyer article that appeared in the Washington Post on December the 23rd and 24th, 1994. I found the article fascinating and was amused at how Andy Beyer wrote the article. Evidently, Beyer visited Benter in 1994 in Hong Kong.

He referred to Benter as Mr B and for whatever reason, said that Benter was from England. I guess the reason for that was misdirection, because in the article, Beyer says that Benter did not want his identity revealed.

From memory, Beyer says in the article something like "I always thought that a computer could not beat the horse races, I now know different."

I think that Andy Beyer is a person who tells it like it is and if he says that Benter was beating the races then, I believe it.

Jack

GameTheory
02-24-2004, 10:00 PM
Benter has to post picks now too?

It is possible to evaluate the worth of an someone's idea without them posting picks first, you know. Do people who post winning selections automatically post good advice? One really doesn't have all that much to do with the other...

PaceAdvantage
02-24-2004, 10:16 PM
OK, anymore posts that lead to severe thread drift will be deleted, just like recent ones by Derek have just been deleted. Don't even reply to MY post here, cause I'll just delete that too. If you've got a problem with what I'm doing, either start another thread, or write me privately....

Jeff P
02-24-2004, 11:21 PM
Not familiar with Hong Kong racing other than what I've read about it on this board. Curious about one thing. So I'll throw the question out there.

Why did Benter choose Hong Kong? Pool sizes over there are much larger than here. But is there something else about racing there that would make it easier to exploit than here? Field size, lack of published information/figures- dumb money in the pools- what?

GameTheory
02-24-2004, 11:43 PM
Closed circuit. There are only 1500 horses total or something. That, along with huge pools and most of the public betting with non-scientific methods (a data-based approach is antithetical to the general culture) and you've got an exploitable situation. Eventually, there were many competing computer teams, pretty much all of them created by foreigners (non-Chinese). The Hong Kong Jockey Club did not like the computer teams, and did their best to shut them out...

Tom
02-24-2004, 11:45 PM
I have a database of many factors.
What would you suggest I do to start a regression study?
Look at single factors, combinations?
I could use Excel to plot finsh positions or beaten lengths of horses with certain factors by rank, look for postivie correlation?
Is there a better program out there to run multiple studies?
WhatI do now is to querry Access for things like

Running Style F or E
Qirin speed points >6
F1 velocity rank 1
EP velocity rank <3

Then figure out impact values and roi for the results.
It sounds like you guys are skinng the cat with a sharper knife.
I re-read the chapters in the back of Bill Quirrin's first book, about the post position study and the formula he came up with.
My HTR data has some very predictive factors, and some factors that enhance the predictablilty of others (ie, jockey change, workout rating, are very potent)
I want to use my data to its fullest, and since some of these factors are not available to the general public, I think it will worth the work.

LOU M.
02-25-2004, 12:12 AM
You've probably read these.

http://www.wired.com/wired/archive/10.03


http://www.asiaweek.com/asiaweek/te....computing.html


1200 horses racing against each other for 600 races at two tracks. Consistency would be very high.

Jeff P
02-25-2004, 01:40 AM
Originally posted by Tom-

I have a database of many factors.
What would you suggest I do to start a regression study?
Look at single factors, combinations?
I could use Excel to plot finsh positions or beaten lengths of horses with certain factors by rank, look for postivie correlation?
Is there a better program out there to run multiple studies?
WhatI do now is to querry Access for things like

Running Style F or E
Qirin speed points >6
F1 velocity rank 1
EP velocity rank <3

Then figure out impact values and roi for the results.
It sounds like you guys are skinng the cat with a sharper knife.

Tom,

I've spent hundreds of, no- make that thousands of- man hours, over the last decade or so, coding out my own programs- always tinkering with my own database and the interface to it. I use Visual Basic and SQL.

One thing that I managed to do very well was create a good set of tools for myself. Take a single factor, say best figure last race. Say I want to know how that factor performs across x number of races or for x time period. Instead of writing and running a single query to test how the top rated horse in that category performed, and then writing and running a second query to see how the second ranked horse in that category performed, and then writing and running a third query to see how the third ranked horse in that category performed, etc- I instead took the time to develop and test a VB function that hits the database and returns the results of positions 1-20 by rank within that single category. So instead of 20 separate trips to the database, I click a couple of drop downs and buttons and make ONE TRIP through the database to get back information on how each of the 20 positions (by rank) performed in that category.

I also have another VB function that does the same thing by numeric ranges and difference from category leader instead of by rank. Again, the output I get is that for 20 separate numeric ranges (as opposed to one single range) for a single databse trip or query.

Having stuff like that (once you've taken the time to create it) is a GREAT timesaver.


Sounds like you are an Access user. If you are so inclined you could accomplish the same thing by creating and storing queries in Access.

Personally, I never liked that approach. Even though SQL Query Analyzer (in my case) packs a strong punch in this area, I opted, instead, to write VB code to hit the database, store the output in public variables, and output those to a textbox when done. The numbers I generate are often of my own making. This approach seemed to give me better ability to tweak my own numbers as the need arose. You could also do the same thing in Access using VBA code if you are familiar with it.

Jeff P
02-25-2004, 01:49 AM
I have a database of many factors.
What would you suggest I do to start a regression study?
Look at single factors, combinations?
I could use Excel to plot finsh positions or beaten lengths of horses with certain factors by rank, look for postivie correlation?
Is there a better program out there to run multiple studies?
WhatI do now is to querry Access for things like

Running Style F or E
Qirin speed points >6
F1 velocity rank 1
EP velocity rank <3

Then figure out impact values and roi for the results.


Tom,

Exactly. What you end up doing is running queries trying to spot combinations of things that work. But be prepared- very little of what you test actually does work. And a lot of what works seems contrary to what SHOULD work. Horses that look good on paper attract money at the windows. The trick, in my opinion, is to isolate things that do two things: 1. Cause bettors to run the other way when they are present, And 2. Don't adversely affect a horse's true chances of winning too much.

Jeff P
02-25-2004, 01:55 AM
Lou,

Thanks for posting the links. No. I hadn't stumbled across these yet and enjoyed the read.

ranchwest
02-25-2004, 02:07 AM
Originally posted by Jeff P
Originally posted by Ranchwest:


Ranchwest,

The answer to all of your questions is: No.

Unless somebody on this board is privy to the Bris algorithm for prime power and wants to share, I'm in the dark as to how it is calculated. I've contacted Bris a handful of times iquiring how they calculate it. Their answer has always been "It's proprietary. See our webpage for further explanation." Very helpful- no?

Rail position, although it seems to be a simple isolated factor, as you suggest, might not be. Before using it as such, further testing appears to be in order.

For all we know, the BRIS Prime Power algorithm could already have a factor for post position, so that if you're then testing for post position performance, you're actually "double dipping" to some extent.

Also, the BRIS algorithm might include calculations that are weighted within time frames, so that your own assumptions about time frames might introduce an inherent conflict.

I would anticipate that results might vary between wet and dry tracks.

The winterization of a track could alter the density of the surface, most notably on the rail. Also, some surfaces might perform differently even if not specifically treated with a substance for winterization. Temperature might actually have an effect.

At the point at which horses exit the chute, it is possible that the tractors could pile dirt in such a way as to hamper the inside posts at the point where the chute ends and the track begins. This could affect your findings.

I'm sure there could be other factors.

For instance, what is the profile of the horse that beats the Prime Power horse?

I think I'd come up with a power figure that would be at least somewhat comparable to the BRIS number to test against.

Rick
02-25-2004, 08:35 AM
I originally looked at BRIS Prime Power to see if I could find some factors contained within it that weren't obvious that I could use to develop my own method. I didn't find anything like that so I took a different approach. But, someone else may find things that I missed. There is some value in there somewhere because the top 3 ratings lose much less than average.

garyoz
02-25-2004, 09:26 AM
Tom wrote:

I have a database of many factors.
What would you suggest I do to start a regression study?
Look at single factors, combinations?
I could use Excel to plot finsh positions or beaten lengths of horses with certain factors by rank, look for postivie correlation?

##


Tom, check the above discussions on regression analysis. For a number of reasons this statistical approach is problematic in horse racing. In sum, unless you carefully control for multicolinearity (high correlation of variables) the results are suspect, certainly their Beta values would be. Also, you need to have probit or logistic curve as your dependent variable. Finally if you go on a fishing expedition throwing independent variables into the models you run the risk of interpretability problems. Remember that correlation is not causation. Correlation relationships will ultimately regress to the the mean and would make a poor basis for wagering.

Trying to save you some time and maybe ($'s)

MichaelNunamaker
02-26-2004, 01:30 AM
Hi Game Theory,

Thank You. That was a great post and informative post! Your illustration of gap variation was great. I whipped up a new way (for me) to look at various ratings based on various gap sizes. It is too early to tell how well it will do overall, but on simple tests, it has a pretty good impact on winning percentage. Unfortunately, it doesn't appear to do anything for ROI, but I haven't thrown it into my "grinder" yet. I have great hopes that it will wind up making my odds-line significantly more accurate.

Thanks again!

entropy
03-24-2004, 05:17 PM
Why did Benter choose Hong Kong?

because i went there!!

Dave Schwartz
03-24-2004, 05:37 PM
Entropy,

And that would make you Allan?


Regards,
Dave Schwartz

entropy
03-24-2004, 05:44 PM
only 1 l. cheers!! :) ;) :)

Jaguar
03-24-2004, 06:17 PM
Game Theory, excellent post. Good insights.

All The Best,

Jaguar

arkansasman
03-24-2004, 07:21 PM
Since we are so interested in the computer teams in Asia and for that matter North America, Why don't you enlighten us on what it has been like. Tell us about the early days and everything else you want to tell us. I will tell you all of the people that post here find the computer teams to be fascinating. If it takes a novel to post it, please do it.

entropy
03-24-2004, 07:30 PM
you know the URL. and as i said only one l.

arkansasman
03-24-2004, 07:53 PM
Ok. LOL What is your understanding of the reason why Benter left Hong Kong. It would not surprise me if he is betting somewhere else.

BillW
03-24-2004, 08:10 PM
Sorry for butting in , but he is trying to say "only one ell" i.e. Alan :cool:

arkansasman
03-24-2004, 08:12 PM
Thanks

entropy
03-24-2004, 08:18 PM
why Benter left Hong Kong?

maybe partly connected with the reason i left. don't ask me why --- do a search!!!

Derek2U
03-24-2004, 09:31 PM
Benter .... dont u guys learn anything i say ur worse than inner
city black & assorted non-white thugs who got 0 assets ...
that guy knows 0000 ... racing is NOT a math game .... accept
that .... but math can be used to turn a profit ... when at which
point .. well thats the trick ... neural nets .... lol ... maybe i should
stop saying what i do (since i get NO clear response) and stay at
this other board .... (maybe i just CANT forgive u know-it-alls for
not helping in that MAJOR board contest last year .. Boris was the
Captain LOL)

Dave Schwartz
03-24-2004, 09:56 PM
Alan,

lol - I consider myself a student of the game but never thought to Google "Benter." You suggested a search and I did one. Amazing what you find when you search for Benter. Thanks for the idea.


Welcome aboard.


Regards,
Dave Schwartz

Jaguar
03-24-2004, 10:19 PM
Derek, why would you say racing is not a math game?

In fact, that's exactly what racing is- finding patterns and betting overlays- a spot game which we analyze using mathematical tools.

Horses are athletes that demonstrate discernible patterns in their past performances, while trainers are manipulating their charges to win purses- in measurable, repeated behaviors.

We handicappers love the horses and the romance of the game, but we also know that racing is very much a math game.

All The Best,

Jaguar

entropy
03-24-2004, 10:35 PM
i certainly wasn't suggesting a search for "benter" but rather a search for myself given the final point was why i left HK ( but same is not mentioned in the article ( thankfully ) despite the fact i would have told the journalist the reason ).

in a country far, far away a couple of people troll BB's worldwide daily looking for snippets of information. likely what i post here will get reported to their "boss" who will then inform on me to others and i will get asked not to post "stuff".