Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board


Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board (http://www.paceadvantage.com/forum/index.php)
-   Handicapping Software (http://www.paceadvantage.com/forum/forumdisplay.php?f=3)
-   -   peter wagner, benter and multinomial logit (http://www.paceadvantage.com/forum/showthread.php?t=10365)

arkansasman 02-21-2004 02:18 PM

peter wagner, benter and multinomial logit
 
Does anyone know anything about the North Dakota high roller - Peter Wagner? Does Bill Benter and Peter Wagner both use Multinomial Logit to predict horse race probabilities. If someone is well schooled in Multinomial Logit, please comment. I have a model that has 47 factors but I am stuck getting the probabilities.
I am very interested in hearing how some of you have arrived at probabilities for a model.

GameTheory 02-21-2004 02:24 PM

Logit works okay, as do a number of other methods to make probabilities. The effectiveness of your factors themselves, and how you relate them to the public odds/probabilities are far more important questions.

What are you doing now, and in what way are you stuck?

arkansasman 02-21-2004 02:29 PM

Well, I have the coefficients via maximum likelihood. Is my next step to take the exponential of the sum of the all the factors times their coefficients and then divide each horse by the sum of all the horses exponentials

GameTheory 02-21-2004 03:12 PM

Maximum Likelihood is a general term. Logistic regression is one form of maximum likelihood estimation that uses a dichotomous dependent variable (which means either 1 or 0 -- usually won or didn't win in horseracing). Just look up "logistic regression" in Google and you'll find plenty to read. Of course you'll need some software to do the calculations...

arkansasman 02-22-2004 06:45 AM

Game, thanks for the reply.

I have been wrong many times, but if everything I read about multinomial logit, the dependant variable should not be binary. The reason that Chapman, Benter(if he uses multinomial logit), and others use multinomial logit from what I have read is - that it accounts for the in race competition - how did horse A compete against all the others in the race.

I think that once you have the coefficients for each factor, you then have a linear sum of a horse's attributes which is the vector.
If I am right, (which I might not be) you then arrive at the probabilities by getting the exponential of the vector divided by the sum of the exponential of all horse's vectors in the race.

Thanks for you help Game.

garyoz 02-22-2004 11:31 AM

Logistic regression or probit analyis are the proper ways to express the dependent variable (an S-shaped probability density function). However, you still have the problem of multicolinearity in most horse racing variables. Predictor variables are highly correlated (eg. back class and back speed, or speed and class). Multicolinearity violates the assumption of randomness in the error terms that is required by regression analysis. Thus, I am skeptical that regression analysis could work well using standard canned regression programs (e.g., SPSS). I do not know the multinominal logit methodology, but it likely solves this problem. The multicolinearity problem (plus other statistical issues) is why I am skeptical about a program such as Allways claim that it does multiple regression analysis.

Derek2U 02-22-2004 04:51 PM

that pesky multi-coll****
 
yesterday i typed that this problem, which is VERY real, can be
corrected for. I said "IVs are very susceptible 2 it, but if you
have the raw data, you can estimate this error VERY well." --
of corse, i was being attacked then so ONCE AGAIN, no comment.
But there are OTHER worse probs w/ regression than MC.
in fact, MC is the lesser prob by far. Now, what do YOU THINK
is the REAL issue w/ all regression formulas (in HorseRacing)????

Jeff P 02-22-2004 06:19 PM

My opinion is that horse racing data doen't conform very well to use with regression formulas for two reasons:

Complexity
Horse racing data is by nature very complex data. Many single factors are often intricately co-related with what, on the surface, appear to be other single factors. For example- Try running some regression tests on speed figures. Now do the same thing with class. Higher class horses tend to have higher speed ratings. Before evaluating the true effects of class, wouldn't you first have to estimate and remove the effect that speed figures have on class? That's of course assuming you can find an effective way to measure its performance in the first place.

Posted by Rick yesterday in another thread:

Quote:

Well, here's the thing. To a large extent, speed=class, class=pace, and pace=speed, so when people say the most important factors are A, B, and C it really doesn't mean all that much. What really matters is finding relatively independent measures of performance.
I tend to think Rick's above statement makes a lot of sense.


Noise
Secondly, horse racing data tends to be noisy. What seems to work well during one time period often falls flat on its face when tested during a different time period.

For example, I have recently been trying to develop a play type that finds lots of plays with a very high win percentage and essentially a breakeven ROI. I'm trying to do this to be in a better position to take full advantage of rebates being offered. Okay- back to my point. I did some testing using the horse with the top Bris prime power rating with a myriad of unique single factors. Something interesting that I found was that in dirt sprints, at the tracks that I'm playing this year, the top Prime power horse, when drawn on the rail, wins better than 40% of its races and shows a positive roi. But when I tested this same idea against a sample taken from last year's races the results were horrible: 27 percent winners and a minus 20 percent roi.

Was it simple noise in my first sample? Or are other factors at work here? Perhaps there has been a rail bias at the the tracks I have been playing so far this year and I'm just now becoming aware of it. How would anybody apply regression analysis to THAT?

garyoz 02-22-2004 06:37 PM

Good points Jeff P. I agree with you in terms of the difficulty in modeling. I don't think there is a very clean way around the highly correlated (or colinearity) variable problem except to try to combine them in some type of indices. But I think that would probably blunt their interpretability and precision. I never liked the concept of power figures. In terms of isolating the effects of single variables (or control variables) and then determining the "main effect" of subsequent variables, theoretically this can be accomplished through stepwise regression (at least as I remember it--I could be wrong). But if you have highly correlated variables in a stepwise, the first variable would be associated with most of the variance and that wouldn't leave much to associate with subsequent variables ( once again, as I remember the statistics). This would probably be problematic.

I have pretty much given up on trying to use statistical models, but rather use programs to measure, display and organize handicapping variables. Then I grind out plays using pen and pencil. Not very efficient or elegent and doesn't always work.

Rick 02-22-2004 08:19 PM

Wow! See what I mean about there being some really smart guys here. I'm a little bit lost, but they're bringing up some really good points. Pay attention you mathematical geniuses. Not you Derek.

GameTheory 02-22-2004 08:55 PM

Horse racing data is also full of contradictions, which regression models don't handle well. For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. You accuracy is limited until you start to discover the relationships variables have to each other...

Rick 02-23-2004 04:10 AM

GT,

It's possible to capture relationships like that, for example with an A x B variable, but you have to first guess that they exist and add them to the model. There are just too many nonlinear relationships that are possible.

Rick 02-23-2004 04:39 AM

The thing that confused me about Benter's reference to using a multinomial logit model was, according to what I've read, in a "multinomial" model you would have more than two values for the dependent variable. Now, I've used a logit model with the typical 0,1 dependent variable but not with more values. And, I'm not sure what the values would represent if I were to use more than two. Benter does refer to the interesting trick of effectively increasing the data by including 2nd or even 3rd place finishes as "winners" and considering only the horses below that position. But that wouldn't seem to create additional values for the dependent variable, only double or triple the data set. Also, can I assume that logit regression is the same as logistic regression or is their some difference that I'm missing?

Red Knave 02-23-2004 08:21 AM

I wish I understood more of this stuff
 
Quote:

Originally posted by GameTheory
... But when both A & B are high, they almost never win. You accuracy is limited until you start to discover the relationships variables have to each other...
Is this not what neural networks are supposed to do?
Can you comment?
Anyone with any experience in this area?

Jeff P 02-23-2004 12:57 PM

I don't think the idea of increasing the data set by including 2nd or even 3rd place finishers as "winners" and considering only the horses below that position is a good idea. I say this purely from a logistical standpoint. Back away from statistics for a second and consider the way races are run in the first place.

The gate opens. One or more speed horses scramble for the lead. One speed horse gets the lead. The rest then take up positions behind the leader. They wait. Each makes a move at some point to challenge for the lead. Each challenge either succeeds or fails. That success or failure is only revealed to us when the first horse hits the wire. They load another field in the gate and the whole process is repeated.

Okay. Back to statistics. As soon as you remove the winner from the model and re-evaluate the race using only the horses below that- isn't your model now flawed because it is deviating from the way races are run? The winner that you just removed had some influence on the way the race was run. Probably a very strong one. Now remove the second place horse and re-evaluate using only the horses below that. Did the second place horse have an influence on the way the race was run? Again, very likey yes.

How valid can information obtained in this manner actually be?

GameTheory 02-23-2004 03:27 PM

Re: I wish I understood more of this stuff
 
Quote:

Originally posted by Red Knave
Is this not what neural networks are supposed to do?
Can you comment?
Anyone with any experience in this area?

Well, yes, although I haven't used neural networks much. I do prefer "non-parametric" methods (like NNs) that let the "data speak for itself" without imposing assumptions on it like most statistical techniques, and are able to discover more subtle relationships in the data. However, these methods have their own set of (non-trivial) problems that need to be addressed before they will work well; a topic I could write all day about (but am not going to). But it took me 10 years to solve all those problems...

GameTheory 02-23-2004 03:33 PM

Quote:

Originally posted by Jeff P
I don't think the idea of increasing the data set by including 2nd or even 3rd place finishers as "winners" and considering only the horses below that position is a good idea. I say this purely from a logistical standpoint. Back away from statistics for a second and consider the way races are run in the first place.

The gate opens. One or more speed horses scramble for the lead. One speed horse gets the lead. The rest then take up positions behind the leader. They wait. Each makes a move at some point to challenge for the lead. Each challenge either succeeds or fails. That success or failure is only revealed to us when the first horse hits the wire. They load another field in the gate and the whole process is repeated.

Okay. Back to statistics. As soon as you remove the winner from the model and re-evaluate the race using only the horses below that- isn't your model now flawed because it is deviating from the way races are run? The winner that you just removed had some influence on the way the race was run. Probably a very strong one. Now remove the second place horse and re-evaluate using only the horses below that. Did the second place horse have an influence on the way the race was run? Again, very likey yes.

How valid can information obtained in this manner actually be?

I more or less agree, but I think some real tests need to be done.

The technique might have some usefulness. For instance, if you have only a very small sample of races (say from a particular track) it might improve your predictions.

Also, it depends on the factors you are extracting from the data. General factors like speed will hold up, but certain pace factors will fall apart as you described. Or will they? Throwing the winner out may just bring an almost equally likely but different pace scenario to the fore.

I've learned that most ideas in horseracing can't be defeated with logic, only with experimentation. Most of my best discoveries came from taking something I thought would work well that wasn't working well and doing the exact opposite. (I love it when I describe to somebody something I'm doing that works great, and they explain to me why it can't work.)

Rick 02-23-2004 03:34 PM

Jeff,

Well, I agree with you but Benter says it works in Hong Kong racing. Of course there are some huge differences between there an here. I've only tried it using just the winner and about all I can say is that it works better than linear regression. But, there's something that prevents any of these techniques from working really well. I guess the term would be lack of "robustness". Outliers affect all of these techniques a lot, and there seem to be a lot of outliers in horse racing data.

GameTheory 02-23-2004 04:07 PM

Quote:

Originally posted by Rick
The thing that confused me about Benter's reference to using a multinomial logit model was, according to what I've read, in a "multinomial" model you would have more than two values for the dependent variable. Now, I've used a logit model with the typical 0,1 dependent variable but not with more values. And, I'm not sure what the values would represent if I were to use more than two. Benter does refer to the interesting trick of effectively increasing the data by including 2nd or even 3rd place finishes as "winners" and considering only the horses below that position. But that wouldn't seem to create additional values for the dependent variable, only double or triple the data set. Also, can I assume that logit regression is the same as logistic regression or is their some difference that I'm missing?
I don't remember Benter talking about multinomial, but Bolton and Chapman's original paper had "multinomial" in the title -- maybe they were predicting finish position instead of just won/lost? I don't feel like digging out the paper. The data replication trick was also Bolton & Chapmans.

Logit / Logistic are the same thing.

Rick 02-23-2004 04:18 PM

After looking around a bit, it seems that logit and logistic aren't exactly the same. They use different methods to arrive at the same conclusion though so, for all practical purposes, they are equivalent.

GameTheory 02-23-2004 04:21 PM

Quote:

Originally posted by Rick
After looking around a bit, it seems that logit and logistic aren't exactly the same. They use different methods to arrive at the same conclusion though so, for all practical purposes, they are equivalent.
Really? If you have any links to stuff that explains the difference I'd be interested. I've always seen them used interchangably. (e.g. a "logit model" is something derived via "logistic regression") You're not thinking of logit & probit, are you?

MichaelNunamaker 02-23-2004 07:06 PM

Hi GameTheory,

You wrote "However, these methods have their own set of (non-trivial) problems that need to be addressed before they will work well; a topic I could write all day about (but am not going to). But it took me 10 years to solve all those problems..."

Any hints?

You also wrote "For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. "

OK, I'll bite. Any examples? I've never seen this behavior and I've modeled loads of variable pairs. Perhaps I'm not looking at the correct pairings?

Rick 02-23-2004 07:12 PM

GT,

Here's a link to something that seemed to me to be saying that:

http://www2.chass.ncsu.edu/garson/pa765/logistic.htm

See what you think. I've also seen people use the terms interchangably so I kind of confused about the whole thing. I understand the difference between probit and logit though.

On the other hand, here's something that implies that they're the same:

http://support.sas.com/faq/014/FAQ01494.html

garyoz 02-23-2004 08:02 PM

I don't think that there is a major issue in the difference between probit or logit regression, at least the way I was taught it many years ago in at Univ. of Wisconsin-Madison, by Draper. He was one of the heavy hitters in Applied Regression Analysis. The issue is that you can't regress against a dichotomous dependent variable (0,1 or win, lose). It violates error term distribution assumptions. So, you use a distribution curve (normally S-shaped such as a normal distribution) referred to as a probability density function. Thus, you can predict a probability of the occurance with the regression model. Actually this is a simple but intuitive approach. My Ph.D. dissertation many years ago used logistic regression and i had to write a chapter on methods, which is why I remember this.

My take on this thread is that we have been discussing a new approach which simply uses a logistic (or it could be probit) dependent variable in the form of some type of S-shaped curve. I think most of the questions and uncertainty is on the independent variable side of the equation. With the one exception of the use of two dependent variables, which I'm absolutely clueless on.

If any of the above is incorrect please correct me. This is the best of my rusty and outdated knowledge.

Rick 02-23-2004 08:22 PM

garyoz,

Yeah, I think you should come up with essentially the same results using either. Draper? Didn't he write a famous statistical book a loooong time ago? I see I'm not the only one here who's been around the block a few times.

ranchwest 02-24-2004 01:21 AM

Quote:

Originally posted by Jeff P
[snip]
Something interesting that I found was that in dirt sprints, at the tracks that I'm playing this year, the top Prime power horse, when drawn on the rail, wins better than 40% of its races and shows a positive roi. But when I tested this same idea against a sample taken from last year's races the results were horrible: 27 percent winners and a minus 20 percent roi.

Was it simple noise in my first sample? Or are other factors at work here? Perhaps there has been a rail bias at the the tracks I have been playing so far this year and I'm just now becoming aware of it. How would anybody apply regression analysis to THAT?

Are you certain of the scientific consistency of the formulation of the Prime Power numbers you are using?

Also, are you using only fast tracks?

Are you comparing the same time of year? Comparing a summer surface to a winterized track could make your results invalid.

Have you done any comparisons concerning races coming out of chutes and races not coming out of chutes?

arkansasman 02-24-2004 06:09 AM

multinomial logit
 
Thanks for all the post. This is very interesting.

Here is what I know to do regarding Multinomial Logit.

All the big computer betting syndicates use Multinomial Logit or something similar. Bill Benter uses Multinomial Probit as he says it gives him a 5 to 10 percent increase in profits. He used Multinomial logit before he went to Multinomial Probit Two factors that Benter uses is Average Normalized Finishing Position and Recency Weighted Finishing Position. In Hong Kong, he uses ranking factors also. One could not do this in the United States, because of the number of horses running

If anyone wants these fromulas, email me and I will give those to you. Benter says if you are wanting to start a computer model, these are 2 factors you would start a model with for horse racing.

I have to go to work, I will post some more later.

Jeff P 02-24-2004 03:44 PM

Originally posted by Ranchwest:
Quote:

Are you certain of the scientific consistency of the formulation of the Prime Power numbers you are using?

Also, are you using only fast tracks?

Are you comparing the same time of year? Comparing a summer surface to a winterized track could make your results invalid.

Have you done any comparisons concerning races coming out of chutes and races not coming out of chutes?
Ranchwest,

The answer to all of your questions is: No.

Unless somebody on this board is privy to the Bris algorithm for prime power and wants to share, I'm in the dark as to how it is calculated. I've contacted Bris a handful of times iquiring how they calculate it. Their answer has always been "It's proprietary. See our webpage for further explanation." Very helpful- no?

Rail position, although it seems to be a simple isolated factor, as you suggest, might not be. Before using it as such, further testing appears to be in order.

Rick 02-24-2004 04:16 PM

Jeff,

Several years ago I did a study of Prime Power Rating. I used Race Rating, Class Rating, Speed Rating, and Odds (Log(Odds+1) from the last two races to predict Prime Power and got a very good fit (very high r squared). In fact, I think using just those things for the last race did pretty well. I'm sure there's a lot more factors that they use, but it seems to be heavily weighted toward recent performances.

GameTheory 02-24-2004 06:13 PM

Quote:

Originally posted by MichaelNunamaker
Hi GameTheory,

You wrote "However, these methods have their own set of (non-trivial) problems that need to be addressed before they will work well; a topic I could write all day about (but am not going to). But it took me 10 years to solve all those problems..."

Any hints?

Not really. I haven't made enough money yet to give away my hard-earned secrets.

But what are the problems? A couple of very broad categories:

1) Methodology

This applies to any automatic modelling technique, parametric (any kind of regression) or non-parametric (neural network, genetic algorithms, etc). In either case you've got some historical data that you want to use to create a model that will predict future races. But just exactly how are you going to represent your data (code it numerically) to be presented to your modelling algorithm? And just what exactly are you going to model? (i.e. What are you going to predict?) Will it be probabilities, or finish position, or projected times, or beaten lengths? Or maybe you're going to do some sort of simulation? Each algorithm has it own strengths, weaknesses, and biases. You have to know what is appropriate to attempt to model, and you have to know how the data must be coded for that algorithm for it to work well. This is not a trivial question.

Many many people conclude that such-and-such technique doesn't work for horse racing simply because they haven't prepared the data properly, or because they are trying to get it to answer the wrong question. There isn't much literature on the subject either, at least that is very helpful. Most published research papers on regression/prediction/classification methods have a common structure: first they introduce some new technique or a new variant on an old one; then they demonstrate how great it is with some empirical data. Problem is most of their methodology is seriously flawed. For one thing, the internet has got all the researchers sharing the same known datasets. Which in turn encourages them to make a point of coming up with results that are better than previously published on some known dataset. (If someone publishes a paper demonstrating a technique that shows they can get the lowest known error on the "Boston housing dataset", then it makes them look good in the academic community.) The problem is they are all "tuning" their techniques to perform well on these known datasets and a mass group overfitting is occurring. The result is that a huge amount of the published research is total hogwash and won't hold up when you try it with new data. And horse racing data is invariably a much tougher nut to crack than anything they're doing anyway. (One thing that attracted me to the challenge of horse racing to begin with is that you can't cheat or fool yourself because we keep score with real money -- you're either making it or you're losing it.) The real progress with these technologies is happening in the financial world, which also keeps score with real money. But they tend to keep their discoveries to themselves, just as I am.

Bottom line: you're on your own to figure out some very thorny and often non-intuitive problems just to nail down what your exact methodology is going to be: data representation / modelling technique / what is being modelled. Each technique has it own set of problems, and it is probably up to you just to discover those problems as well as to solve them.


2) Overfitting.

The major stumbling block with non-parametric techniques is the same thing that makes them attractive in the first place -- they make no assumptions about the data (they don't force it to fit a particular distribution). But since they don't, the model is free to fit very tightly to the data, in effect memorizing the training samples. And then we have overfitting, and the model will not perform well on new data. Overfitting has to be addressed one way or another with any non-parametric technique (also with parametric, but much less so). Again, each technique has different appropriate solutions to the problem, some better than others.



Quote:



You also wrote "For instance, let's say that horses tend to win when factor A has a high value. And they also tend to win when factor B has a high value. But when both A & B are high, they almost never win. "

OK, I'll bite. Any examples? I've never seen this behavior and I've modeled loads of variable pairs. Perhaps I'm not looking at the correct pairings?

Well, it depends what you're looking at I guess. But a simple example is the "too good to be true scenario" like a standout in speed, class, & form in a lower or mid-level claiming race. What's he doing in this race? Losing, usually. (Whereas as speed standout that has been racing at a lower level might be a decent bet.) A graded stakes winner in an allowance race? No thanks.

Which relates to another interesting thing I've noticed. I've made hundreds of numerical rating systems over the years, and one almost universal property that I've NEVER seen mentioned by anyone else is this: a smaller margin of advantage usually wins more often than a large one. In other words let's say we're giving each horse a rating based on who-knows-what:


Scenario #1
-------------
Horse A: 100
Horse B: 98
Horse C: 85


Scenario #2
-------------
Horse A: 100
Horse B: 90
Horse C: 85


Does Horse A have a better chance of winning in Scenario #1 or #2? In most of my tests it would be Scenario #1, and that seems to be almost independent of what the rating is based on. Scenario #2 is usually too good to be true. That may be function of the races I typically look at, which are usually claimers and regular non-winners allowance races. But usually there is a threshold somewhere where the positive advantage becomes a negative indicator.

Rick 02-24-2004 06:32 PM

GT,

I agree 100% on what you've said. Another interesting thing about your "Scenario #2" is that many times neither of the top two horses will be the favorite and your ROI will higher as a result. I think some would call that a "chaos" race. I call it a "very profitable" race.

How your selection runs relative to it's odds seems to be a worthwhile thing to predict. But you also want to predict a good % of winners, not just lose by less on the average because you don't get paid for that.


Jeff P 02-24-2004 07:02 PM

Originally posted by Game Theory-
Quote:

Scenario #1
-------------
Horse A: 100
Horse B: 98
Horse C: 85
Such scenarios often turn out to be better from a betting perspective. The public always tends to overbet the more obvious situations (horse with higher apparent advantage) and tends to ignore the less obvious (horse with smaller apparent advantage.)

It is this overreaction to what a horse looks like on paper that makes this game so fascinating- and one that can be beaten in the first place.

That said, I have had success factoring things into my own models that tend to cause the public to look the other way (high morning line odds/low speed fig last race/loss in stretch last race/out of money finish last race, etc. The result is that I tend to get a lower win percentage than most other players (12%-16%) with my own best plays- but I get a win mutuel that averages about $21.00 each time that I am right.

Rick 02-24-2004 07:18 PM

Jeff,

You could say that the public is more predictable than the horses!

arkansasman 02-24-2004 08:31 PM

Benter
 
Derek,

I think that Bill Benter's betting has been confirmed by people who know him and also his rival computer teams in Hong Kong. I do not know him, but the first I ever heard of somone beating the horse races was in a Andy Beyer article that appeared in the Washington Post on December the 23rd and 24th, 1994. I found the article fascinating and was amused at how Andy Beyer wrote the article. Evidently, Beyer visited Benter in 1994 in Hong Kong.

He referred to Benter as Mr B and for whatever reason, said that Benter was from England. I guess the reason for that was misdirection, because in the article, Beyer says that Benter did not want his identity revealed.

From memory, Beyer says in the article something like "I always thought that a computer could not beat the horse races, I now know different."

I think that Andy Beyer is a person who tells it like it is and if he says that Benter was beating the races then, I believe it.

Jack

GameTheory 02-24-2004 09:00 PM

Benter has to post picks now too?

It is possible to evaluate the worth of an someone's idea without them posting picks first, you know. Do people who post winning selections automatically post good advice? One really doesn't have all that much to do with the other...

PaceAdvantage 02-24-2004 09:16 PM

OK, anymore posts that lead to severe thread drift will be deleted, just like recent ones by Derek have just been deleted. Don't even reply to MY post here, cause I'll just delete that too. If you've got a problem with what I'm doing, either start another thread, or write me privately....

Jeff P 02-24-2004 10:21 PM

Not familiar with Hong Kong racing other than what I've read about it on this board. Curious about one thing. So I'll throw the question out there.

Why did Benter choose Hong Kong? Pool sizes over there are much larger than here. But is there something else about racing there that would make it easier to exploit than here? Field size, lack of published information/figures- dumb money in the pools- what?

GameTheory 02-24-2004 10:43 PM

Closed circuit. There are only 1500 horses total or something. That, along with huge pools and most of the public betting with non-scientific methods (a data-based approach is antithetical to the general culture) and you've got an exploitable situation. Eventually, there were many competing computer teams, pretty much all of them created by foreigners (non-Chinese). The Hong Kong Jockey Club did not like the computer teams, and did their best to shut them out...

Tom 02-24-2004 10:45 PM

I have a database of many factors.
What would you suggest I do to start a regression study?
Look at single factors, combinations?
I could use Excel to plot finsh positions or beaten lengths of horses with certain factors by rank, look for postivie correlation?
Is there a better program out there to run multiple studies?
WhatI do now is to querry Access for things like

Running Style F or E
Qirin speed points >6
F1 velocity rank 1
EP velocity rank <3

Then figure out impact values and roi for the results.
It sounds like you guys are skinng the cat with a sharper knife.
I re-read the chapters in the back of Bill Quirrin's first book, about the post position study and the formula he came up with.
My HTR data has some very predictive factors, and some factors that enhance the predictablilty of others (ie, jockey change, workout rating, are very potent)
I want to use my data to its fullest, and since some of these factors are not available to the general public, I think it will worth the work.

LOU M. 02-24-2004 11:12 PM

You've probably read these.

http://www.wired.com/wired/archive/10.03


http://www.asiaweek.com/asiaweek/te....computing.html


1200 horses racing against each other for 600 races at two tracks. Consistency would be very high.


All times are GMT -4. The time now is 04:05 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 04:05 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.