Logit Regression or Bayes Models? [Archive] - Horse Racing Forum - PaceAdvantage.Com

robert99

03-05-2008, 07:48 AM

Bolton and Chapman applied logit regression to races using multiple factors similar to those handicappers might use. They used 500 races to find the statistical average weighting to those factors. That data took months to collect.

What to do though if you already do know the individual probabilities for those individual factors to start with (that work has been done for larger samples than 500), but don't immediately know how they should be mixed together with all the other factors (the old add them all up or multiply them together issue). What methods give that best fit?

If I know for example the average win probability of a last time out winner was 17%, and that horse is trained by a trainer who averages 11% winners, and etc etc - how to combine all those factors into an equation that estimates today's overall probability.

Would a multiple factor Bayes Theorem approach be better than logit, the theorem states that if Hn is one of a set Hi, of mutually exclusive and exhaustive events, then P(Hn|D) = P(D|Hn)P(Hn)/Σi[P(D|Hi)P(Hi)].

That is complex enough for a single factor of given D and how does that actually apply to my example handicapping factors?

Any advice on the best way to proceed or good references?

classhandicapper

03-05-2008, 09:48 AM

These advanced statistics and probabilities questions are way outside my range, but I've never understood how something like this could be as effective as a thorough analysis by a competent handicapper.

Even if you could identfiy all the overlaps between factors (IMO they are almost endless) and get at the core of what each factor contributes to the outcome, it seems to me that the "weight" of each factor changes a great deal depending on the specific circumstances and race conditions.

For example:

There are times when I see that a horse hasn't run for 8 weeks and consider it either irrelevant or a positive sign and others where I think it's a major negative. Some of that has to do with the specific trainer, the quality of the horse, whether he has been working well in between, whether he typically works well between races, whether his PPs seem to suggest he was a tired horse that needed a freshening or whether his PPs seem to suggest that something had to be wrong etc... I don't see how having statistics on recent activity and/or the trainer alone can analyze that situation as well as a human being that is familiar with the horse's entire recent record.

Maybe someone can explain that to me.

GameTheory

03-05-2008, 10:02 AM

There are times when I see that a horse hasn't run for 8 weeks and consider it either irrelevant or a positive sign and others where I think it's a major negative. Some of that has to do with the specific trainer, the quality of the horse, whether he has been working well in between, whether he typically works well between races, whether his PPs seem to suggest he was a tired horse that needed a freshening or whether his PPs seem to suggest that something had to be wrong etc... I don't see how having statistics on recent activity and/or the trainer alone can analyze that situation as well as a human being that is familiar with the horse's entire recent record.

Maybe someone can explain that to me.Well, make it a fair test. Have the computer analyze the same information you are -- i.e. the horse's entire recent record. Why should the computer be restricted to a few summary stats?

Will the human do it better given the same data? In many cases, yes, but the computer can process many more races than the person can, and so even if the advantage of the computer is not as great as the person on a particular race the computer can identify many more playable races. In this way, the computer can be just as "effective" (or more so) than a single handicapper when considering total performance on "the races" rather than just a single race. Of course all this depends on the particular humans and computer programs involved, but I never understand why people always say things like, "Well the computer can't consider this factor or that factor." Why can't it? (Physical inspection of the horse computers can't do, but it can consider all the same stuff in a horse's or a trainer's record that a person can.)

Dave Schwartz

03-05-2008, 10:15 AM

Bolton and Chapman applied logit regression to races using multiple factors similar to those handicappers might use. They used 500 races to find the statistical average weighting to those factors. That data took months to collect.

Is there a book or paper that explains this?

Dave

rrbauer

03-05-2008, 10:17 AM

Isn't this is an exercise in multivariate analysis? If you have SPSS, for instance, there are examples on the internet of how to use it in a variety of multivariate cases.

While I don't have the references in front of me, I recall that both Bill Quirin and Fred Davis developed an impact-value approach to multiple variables that were combined together for each horse (and its statistical attributes) in a race and then normalized for all of the entrants to determine each horses probability of winning. One of Quirin's books had a fairly complex regression formula in it, but I don't recall the nature of it.

jonnielu

03-05-2008, 10:32 AM

Well, make it a fair test. Have the computer analyze the same information you are -- i.e. the horse's entire recent record. Why should the computer be restricted to a few summary stats?

Will the human do it better given the same data? In many cases, yes, but the computer can process many more races than the person can, and so even if the advantage of the computer is not as great as the person on a particular race the computer can identify many more playable races. In this way, the computer can be just as "effective" (or more so) than a single handicapper when considering total performance on "the races" rather than just a single race. Of course all this depends on the particular humans and computer programs involved, but I never understand why people always say things like, "Well the computer can't consider this factor or that factor." Why can't it? (Physical inspection of the horse computers can't do, but it can consider all the same stuff in a horse's or a trainer's record that a person can.)

And all of that calculating and crunching of information that may or may not indicate anything for today's race, can't come close to the human ability to go down to the paddock to spend one minute on answering the simple question, " is the horse in question ready for a race today".

It is not possible for software to tell you any more then what past performances can tell you.

jdl

Dave Schwartz

03-05-2008, 10:41 AM

And it is not possible for a human being to walk into the paddock and turn what he sees into a reasonably accurate probability/odds line for every horse in the field.

Please, we all use what we are capable and best at using. It is all we know. Logically, we denigrate the tools that are not in our toolbox.

Personally, I'd like to know more about Bayesian analysis as well as logit-order regression. Unfortunately, there do not appear to be any "dummy" books.

Dave

GameTheory

03-05-2008, 10:47 AM

And all of that calculating and crunching of information that may or may not indicate anything for today's race, can't come close to the human ability to go down to the paddock to spend one minute on answering the simple question, " is the horse in question ready for a race today".

It is not possible for software to tell you any more then what past performances can tell you.Isn't that exactly what I said?

Dave Schwartz

03-05-2008, 10:53 AM

GT,

Can you recommend a book on these topics?

Dave

classhandicapper

03-05-2008, 10:55 AM

Well, make it a fair test. Have the computer analyze the same information you are -- i.e. the horse's entire recent record. Why should the computer be restricted to a few summary stats?

Will the human do it better given the same data? In many cases, yes, but the computer can process many more races than the person can, and so even if the advantage of the computer is not as great as the person on a particular race the computer can identify many more playable races. In this way, the computer can be just as "effective" (or more so) than a single handicapper when considering total performance on "the races" rather than just a single race. Of course all this depends on the particular humans and computer programs involved, but I never understand why people always say things like, "Well the computer can't consider this factor or that factor." Why can't it? (Physical inspection of the horse computers can't do, but it can consider all the same stuff in a horse's or a trainer's record that a person can.)

I certainly understand the point about volume of bets. A computer can certainly handicap way more races. So if you assume there are many profitable situations out there, a computer should find way more of them.

The one minor question I have is whether a computer with "less than human skill" can actually translate into finding more "profitable" bets even though it is handicapping more races. It may be finding more bets, but producing a much lower ROI because of a bunch of extra unprofitable plays that resulted from less skill.

I'm not sure a computer can make the fine line subjective judgments about the details of a horse's total record (and I say that as a former computer programmer with some understanding).

I'd rather not re-debate that issue though.

I'd rather that someone that uses a computer to make an odds line explain how they overcome some of the difficulties because I may be able to apply some of their insights to my own human analysis.

rrbauer

03-05-2008, 11:02 AM

David,
A couple books that deal with Beyes' theorums:

1. Introduction to Statistics for Business Decisions; Robert Schlaifer; 1961; McGraw-Hill

2. Marketing Decisions: A Beyesian Approach; Enis & Broome; 1971; Int'l Textbook Co.

Schlaifer also wrote an earlier book in 1959; and, a later book in 1969. Both dealing with decision making under conditions of uncertainty.

GameTheory

03-05-2008, 11:03 AM

Is there a book or paper that explains this?
Bolton and Chapman's original paper (1986) was titled, "Searching for Positive Returns at the Track: A Multinomial Logit Model for Handicapping Horse Races." It is probably in that expensive Dr. Z book (which is a collection of papers), "Efficiency of Race Track Markets" although I'm not sure. I'm sure your local librarian can track it down for you, although I doubt you'll find much of interest in it.

Bolton and Chapman was where Benter took off from -- if you have his paper it goes over the same stuff and expands on it...

chickenhead

03-05-2008, 11:05 AM

Bazian oddslines are easy, either 3/2 or 4/5

GameTheory

03-05-2008, 11:07 AM

The one minor question I have is whether a computer with "less than human skill" can actually translate into finding more "profitable" bets even though it is handicapping more races. It may be finding more bets, but producing a much lower ROI because of a bunch of extra unprofitable plays that resulted from less skill. That's true, but lower ROI doesn't necessarily mean lower actual profit. Would you rather make 5 bets a day at +30% ROI or a 100 bets at +10%?

DanG

03-05-2008, 11:12 AM

Great subject Robert; :ThmbUp:

I don’t have the education or frankly the intellect to discuss this subject with any technical expertise. However…I agree 10,000% with Dave’s statement of how horseplayers by nature “denigrating tools that are not in their toolbox”. The statement that a properly programmed computer can’t compete with someone who doesn’t use one at all is blatantly false.

I like Dave; would love this subject explored in somewhat “user friendly” jargon that doesn’t involve algebraic symbols that weren’t part of my ‘street education. (Although Dave has infinitely more knowledge on these mathematical subjects than I.)

Any link to any “layman’s” source on this subject would be most appreciated.

PS: Most times this subject is discussed I hold high hopes of furthering my education and unfortunately it normally ends up resembling a final exam on a day I cut class. (To go to the track no doubt!) :D

Dave Schwartz

03-05-2008, 11:15 AM

Most times this subject is discussed I hold high hopes of furthering my education and unfortunately it normally ends up resembling a final exam on a day I cut class.

Dan,

LOL - I've can validate that statement.

Dave

robert99

03-05-2008, 11:56 AM

Thanks for all the replies.

My motive for the thread is to try to find out how to do these things and better understand them - the objective being to merge current classical handicapping skills. There is a lot on the internet but very few have the knack of explaining things very well, even if they understand it themselves rather than plugging data into a black box. It is either very simple cases or straight into advanced mathematics. I certainly am not saying any one approach is better than another. In fact, the Bolton paper (Dave S - author version available free on internet) does not lead to any untold riches and would not pay back for the effort involved.

Bill Benter adapted the logit approach to basic handicapping factors and even after 5 years trial and error it did not work until they added in the public odds estimation - that turned out to be the highest weighted factor. Another vital factor was the total number of races a horse had had - the logic of why a currently good horse with a lot of races was worth less than a currently good horse, more lightly raced, could not be explained but it had to be included in the model. It appears the money was more made by being first in to a huge unsophisticated betting market and leverage on a small edge than the excellence of the software.

Bill Quirin explained multiple regression in his book "Winning at the Races" but the program gained a modest 8% at first but was not so good at other tracks and keeping up with changes in racing ie it best fitted the data it came from.
The Sports Judge and All-Ways have used similar methods.

Can anyone take this subject further as there is some interest?

Cangamble

03-05-2008, 12:08 PM

GT,

Can you recommend a book on these topics?

Dave
I just found this by doing a web search. Does it help?
Sara Dziech, Exploratory Analysis of Horse Racing Data, May 27, 2003 (Martin Levy, Norman Bruvold, Yan Yu)
Gambling or wagering is big business and is becoming even bigger in the greater Cincinnati area. State lotteries and the gambling boats have brought legal betting back into the spotlight. This surge of renewed interest in gambling has brought more attention to one of the oldest forms of wagering, horse racing. With the increase in the use of home computers and the internet, now more than ever, and overwhelming amount of data are available on individual horse performance, track entries, and results. Using these data, an exploratory statistical study was conducted to look for trends in the data and to create predictive models to help select the “winners.” The study also demonstrated whether the results were truly random, or if there were commonalities that would allow the astute handicapper to have an advantage over the common bettor. The ability to predict which horses would finish in the money (first, second, or third) would be key to actually making money at the track, since the bigger payoffs come from the exotic bets such as the Daily Double or Exacta. The models and analysis presented here may prove useful in successfully selecting the horses that will finish in the top three or in the money. Basic statistics were reviewed and key elements presented. Weighted general linear models were created using the percent of finishes in the money as the dependent variable. The logistic models were developed using a binary dependent variable -- finished in the money or did not finish in the money. CHAID analysis using Answer Tree was also performed. Each type of analysis was conducted from two views -- all tracks combined with the emphasis on overall trends, between-track differences, and track-specific models. The resulting models and their appropriateness were compared. The final part of the project involved testing the predictive ability of the logistic model against the selection performance of a few average people to see if the model was more successful than random guesses.
***************************************
Never heard of the book before. Just got my old university text out Mathematical Statistics-Freund and Walpole. I have no recollection of understanding what is in the book and I have no idea how I got a B in the course.

Dave Schwartz

03-05-2008, 12:16 PM

CanGamble,

It sure does help! Thanks!

Tell me... what search words did you use?

I have been plugging away for an hour trying to find something in Google.

Dave

GameTheory

03-05-2008, 12:23 PM

Understanding the basics is fairly easy if you can understand a simple linear regression equation:

y = a + b*x

where a is a constant and b is the slope of the line. For instance, a simple equation like this can convert Celsius temperatures to the Fahrenheit scale:

F = 32 + 1.8*C

We've got a constant (32) and are giving a "weight" (coefficient) to the celcius factor of 1.8.

We all did this stuff in school, I imagine.

Well, multiple linear regression does the same thing, except with many factors instead of just one (each factor is multiplied by a weight and everything is summed together along with the constant), and usually with problems that don't have exact correspondence like the two temperature scales do. So your answers become predictions instead of conversions. The problem with linear regression is that it is not bounded -- it is a straight line that can go on forever. So the next step is to go non-linear, which is what logistical regression does -- now your final answer is on a s-shaped curve that is bounded between 0 and 1, which happens to be a good thing when you want to predict probabilities.

But you're still using a constant and a then a coefficient/weight for each factor. It just adds some some extra mathematical magic to put it on a log scale. So it is pretty easy to grasp the basic idea conceptually. If you want to understand how to actually arrive at and calculate the weights (besides just plugging it in to your statistical software, which is what you do in practice), then you've got to read those dummies books on statistics and probability to get some background.

The Bayesian stuff is even easier to understand in concept. You simply take the conditional probability of each factor and multiply them all together. What's a conditional probability of a factor? Well, it is the probability of seeing one thing given that we've already got another thing. (Bayesian factors are simply "yes" or "no" -- we've got it or not, so if you're dealing with a factor like speed figures, you've got to threshold it or break it up into bins -- e.g. is the figure above 80? yes or no).

So we're interested in picking winners, so that's our "given" thing -- that the horse won. So like when we do impact values, we ask the question, "Among winners, what percentage of them had X factor?" The answer is the conditional probability of that factor, e.g. 50% of winners were in-the-money last race. So the conditional probability of the factor "being in-the-money last race" is 50%. Take a bunch of such factors, multiply them together, and you'll get a number for each horse that you can compare to the other horses in a race. In this extremely simplified explanation, we won't worry about the fact that final numbers won't be normalized (add up to 1.0) or worry about which factors to choose, etc. In practice, the math would be slightly more complicated, but not much.

Now, Bayesian inference assumes that your factors are all independent (don't measure the same thing in any way), which they never are, but it works surprisingly well anyway for general applications -- it is used mainly in medical research. [ If more actual doctors understood it, there would be a lot less misdiagnoses because unfortunately most doctors (much less patients) don't understand that if you take a medical test for some disease that is 90% percent accurate and it comes up positive it does NOT mean you have a 90% percent chance of having that disease. It means that *if* you have the disease, there is 90% chance of the test coming up positive. But when you consider the fact that most people don't have the disease along with the "false positive" rate of the test, it usually means your chance of having the disease is still quite small if all you're going by is the test. ]

The problem with using Bayesian inference in its simplest form as given here (called "naive Bayes") is the yes/no 1/0 nature of the factors, and of course choosing the factors themselves. It is probably more appropriate to use as a benchmark than for use in practice where other methods will likely work better.

The problem with any of these statistical handicapping methods is that it is hard to narrow in on "where the profit is". They do great at general handicapping, but you've got to beat the crowd, not just pick reasonable probabilities...

DeanT

03-05-2008, 12:34 PM

Forecasting Methods for Horse Racing might help Dave if you have not read it. From reading your website it seems like something you'd be interested in. On the forecasting and building chapter he touches on error, Bayes, Fuzzy Logic etc.

http://www.amazon.co.uk/Forecasting-Methods-Horseracing-Peter-May/dp/1843440024

TrifectaMike

03-05-2008, 12:48 PM

<QUOTE>Now, Bayesian inference assumes that your factors are all independent (don't measure the same thing in any way)...</QUOTE>

Unfortuantely, horseracing is NOT a Bayesian process...period. In fact, it is NOT based on independent events.... barking up the wrong tree. Rather one should look into non-bayesian techniques.

Dave Schwartz

03-05-2008, 12:48 PM

Dean,

Thank you for that link. I will likely order it.

GT,

Get ready for your phone to ring. <G>

Dave

TrifectaMike

03-05-2008, 12:51 PM

and to the concept of regression analysis...it's nearly useless. As one adds more and more variables (factors), the equation will regress to pointing out the favorite. Lots of work and no reward.

classhandicapper

03-05-2008, 12:54 PM

Great subject Robert; :ThmbUp:

I don’t have the education or frankly the intellect to discuss this subject with any technical expertise. However…I agree 10,000% with Dave’s statement of how horseplayers by nature “denigrating tools that are not in their toolbox”. The statement that a properly programmed computer can’t compete with someone who doesn’t use one at all is blatantly false.

For me, I don't think it's a matter of denigrating other people's tools.

I also think it probably theoretically possible to program a computer to consider all the information in the PPs as well as a person can. The best computers can now outplay the best chess players. So the potential must be there.

It's just that I was a computer programmer for over 25 years and I've also been a handicapper for over 30 years. I can't even begin to imagine the possibility of translating my own handicapping analysis into a computer system that could cope with all the combinations of variables that come up in very unique ways each day. Some of the situations I see each day I am seeing for the very first time. That's what makes handicapping so difficult. For me, some of the process is highly subjective, intuitive, and based on personal experience not easily translated to code and rules.

That's not meant to be a statement about the superiority of my method. It's a statement of confusion about how someone else could program for these problems.

I also understand the "number of bets" vs. the "quality of bets" issue that has been debated to death already. Even I am tired of that debate and I'm usually part of it. ;)

Right now, I see computers as more of an information tool than a handicapping tool, but I'd love to know how people that use them for handicapping are coping with some of these complex statistical questions and issues without veering too far off from reality too often.

Cangamble

03-05-2008, 12:59 PM

CanGamble,

It sure does help! Thanks!

Tell me... what search words did you use?

I have been plugging away for an hour trying to find something in Google.

Dave
If there was money in searching for stuff on the internet, I'd be a millionaire.

http://www.google.com/search?aq=f&num=20&hl=en&safe=off&rlz=1B3GGGL_en___CA215&q=multiple+regression+bayes+horse+racing&btnG=Search

But this one was easy.

GameTheory

03-05-2008, 01:04 PM

<QUOTE>Now, Bayesian inference assumes that your factors are all independent (don't measure the same thing in any way)...</QUOTE>

Unfortuantely, horseracing is NOT a Bayesian process...period. In fact, it is NOT based on independent events.... barking up the wrong tree.There is no such thing as a "Bayesian process", although there might be some processes that are easier than others to model with Bayesian inference. I said that of course the factors would never be independent, and I also said it works pretty well anyway. Everyone who uses Bayesian methods in any field acknowledges this even though in practically every field it is used in they have the same problem. It is just that the mathematical theory is based on using independent factors, but it turns out that it works pretty well anyway even when they are not -- often better than other methods.

The much bigger problem with simple Bayesian inference is that you need to use discrete factors (1/0)...

GameTheory

03-05-2008, 01:06 PM

and to the concept of regression analysis...it's nearly useless. As one adds more and more variables (factors), the equation will regress to pointing out the favorite. Lots of work and no reward.That's true, but also true of most other methods. Nothing is easy.

Cangamble

03-05-2008, 01:07 PM

I've mentioned this before, but the first book I ever ordered (I was in university and couldn't get this book in any Toronto book store) was Winning At The Races.
Quirin in Appendix B at the end of the book describes Regression and then uses it for Post Positions and beaten lengths.
He doesn't go into multiple regression though. Just simply average beaten lengths versus post positions at a fixed distance.
You can make it multiple by factoring in the horses running style for example.

Cratos

03-05-2008, 01:17 PM

I'm not sure a computer can make the fine line subjective judgments about the details of a horse's total record (and I say that as a former computer programmer with some understanding).

I'd rather not re-debate that issue though.

I'd rather that someone that uses a computer to make an odds line explain how they overcome some of the difficulties because I may be able to apply some of their insights to my own human analysis.

There are many position papers out there on this subject and admittedly I am a Bayesian. Therefore the following papers might lend some insight:

Explaining the Favorite-Longshot Bias:
Is it Risk-Love, or Misperceptions?

INEFFICIENCIES IN PARIMUTUEL BETTING MARKETS
ACROSS WAGERING POOLS IN THE SIMULCAST ERA

Anomalies: Parimutuel Betting Markets: Racetracks and Lotteries

However I do agree that the human mind can do things that a computer can’t do and that is think, but the computer is a far more reliable calculator.

classhandicapper

03-05-2008, 01:23 PM

There was a book called "Classifying and Ranking Thoroughbreds: A Scientific Handicapping Method" by Kenneth Johnson PhD, and Lawrence G Archer PhD that used multiple regression analysis to study various handicapping factors and find the proper weight for each. The book is ancient. It was one of the first books on horses I ever bought. So it probably dates to mid 70s sometime.

It was very basic in its handicapping (used DRF speed ratings and track variant to measure speed etc..) , so it is very dated now, but the math seemed quite sophisticated.

I still have the book.

Cangamble

03-05-2008, 01:29 PM

The only thing a computer can't do is be subjective.

DanG

03-05-2008, 02:18 PM

Big thanks to Robert99 for getting the ball rolling and to Dave for keeping the fire hot.

I learned something from every post so far…but a major thanks to Dean, CanGamble, Cratos and...GameTheory…if your not a teacher…you should be. (Post #20) :ThmbUp: was the best explanation I’ve seen yet that actually sunk into my think skull.

PS: Class; Even though we seem to disagree often that’s only a good thing and I always learn from your posts…thanks to all!!!

Now…best of luck at Aqueduct regardless of if you’re using a computer, crayon or you’ve looked under Jan Rushton’s hat for her picks! :jump:

zerosky

03-05-2008, 02:34 PM

Is there a book or paper that explains this?
Dave

oddly enough I am currently reading one of the earlier books on this topic.

'Racetrack Betting' by Peter Ansch and Richard Quandt

Not having a firm grasp of the matematics involved it's difficult for me to make an 'informed' decision about this, but using a Logit model and computing Log odds ratios, is probably a better way of analysing the data than impact Values. The ranking of the horses will be the same but the oddsline should be more accurate.

I have found a lot of infomation from the medical side of things, I found this paper to be particuly useful.

http://www.childrensmercy.org/stats/Diagnostic.pdf

You are testing to see if the horse has the 'Disease (where in our case the disease is winning!) by applying a series of test's to all the horses in the field.

A true positive means the horse had the particular atribute we test for ( for example - Best Last Speed) and then goes on to win.

A False positive means the horse had the atribute but didn't win

A True negative is where the horse did not have the atribute (i.e. tested negative) and did not win the race.

A false Negative is where the the horse did not have the atribute but went on to win the race.

With these four numbers and using the tests described in the paper you can
compute the coefficients to plug on the regression equations

This one explains explains how to get the coefficient..

http://www.ats.ucla.edu/stat/Stata/faq/oratio.htm

classhandicapper

03-05-2008, 03:19 PM

Here's an example of a race that "I personally" would have had a tough time analyzing statistically. If anyone did analyze it with a statistical model, I'd like to know what you thought.

In the 2nd race at AQU today, Delta Breeze fit a pattern of mine that is often a play for me as long as the odds are appropriate.

I could have easily used a computer to identify the horse as a potential play.

However, the final decision was an agonizing intuitive analysis into why Mike Hushion (an outstanding trainer), who claimed the horse for 60K, dropped it to 35K and then to 25K and it didn't perform well at all. Without knowing for sure if the horse was on a permanent steep decline that was likely to continue or whether it was on a decline that looked worse than it actually was because of the conditions of its recent races, it was hard to know whether Julian Canet (good off the claim) might get a moderately better performance out of the horse today despite taking it from someone like Mike Hushion.

In order to make that determination, I spent about 10 minutes looking at a variety of details pertaining to the horse's overall record and the record of both trainers - none of which game me a clear answer. Overall, they just gave me an impression about what I might expect today.

It doesn't really matter whether I was right or wrong.

Is there a way to model that kind of thing?

Cratos

03-05-2008, 03:37 PM

Big thanks to Robert99 for getting the ball rolling and to Dave for keeping the fire hot.

I learned something from every post so far…but a major thanks to Dean, CanGamble, Cratos and...GameTheory…if your not a teacher…you should be. (Post #20) :ThmbUp: was the best explanation I’ve seen yet that actually sunk into my think skull.

PS: Class; Even though we seem to disagree often that’s only a good thing and I always learn from your posts…thanks to all!!!

Now…best of luck at Aqueduct regardless of if you’re using a computer, crayon or you’ve looked under Jan Rushton’s hat for her picks! :jump:

Dan,

This is a decent Bayesian overview

classhandicapper

03-05-2008, 04:23 PM

Dan,

This is a decent Bayesian overview

That's would be a very interesting paper even if we weren't looking for ways to analyze horseracing. ;)

OTM Al

03-05-2008, 04:37 PM

If you can't get the book that collects papers, you can always go to JSTOR. Here's some hits for a limited search on "horse racing". I was able to find pretty much every paper in that book through JSTOR

http://www.jstor.org/search/AdvancedSearch?si=1&hp=25&q0=horse+racing&f0=&c0=AND&q1=&f1=&c1=AND&q2=&f2=au&c2=AND&q3=&f3=ti&wc=on&ar=on&sd=&ed=&la=&dc=Economics&dc=Psychology&dc=Statistics&Search=Search

Cangamble

03-05-2008, 04:45 PM

Dan,

This is a decent Bayesian overview
That hurt my poor widdle brain.

DanG

03-05-2008, 05:08 PM

Dan,

This is a decent Bayesian overview
Thanks Cratos!

Robert Fischer

03-05-2008, 05:34 PM

Bolton and Chapman applied logit regression to races using multiple factors similar to those handicappers might use.

The Public uses the Name of the horse, and the Form. They may remember the outcome of a famous race.
The winning player uses their talent to subjectively estimate the value of different factors, that the player estimates will actually affect the live sport in today's race.

You can not calculate on the level of a winning expectation player unless you have that player feeding you his evolving opinions of different factors in an accurate number scale - AND THEN a formula that combines these factors.

last out win% and trainer win% are not very warm in relation to what the winning horseplayer is using. At best statistics like these could contain some correlations.

Any advice on the best way to proceed or good references?
you can get as complicated as you want , but it really boils down to weighting your factors according to importance before you combine them. If you can do those things, algebra or even basic math is fine.

weighting your factors according to importance

You have to have the ability to accurately write the formula. "Importance" is going to be dynamic. Win % of a last-out winner means less at x distance than y distance. Maybe it means more within a "meet" than a "circut". More or less on dirt or turf or synthA or synthB or synthC or synth D... So your "weights" will then be dynamic as well. Your weights will be different depending on the importance. In reality you then have a different "importance" for each narrow set of conditions (colts 6f fast dirt belmont purse 20-50k). So you must sacrafice accuracy to expand conditions (6f fast dirt purse 20-50k). Don't forget that all of this tinkering you are doing is really a showcase of your talent to subjectively estimate the value of different factors per a condition. In this case the medium of "objectivity" is a falsehood. :bang::lol:

Don't be discouraged. It is a good exercise and if you wanted to learn logit regression, it adds some spice to the practice. You may even find some spot plays that are profitable.

classhandicapper

03-05-2008, 07:40 PM

The Public uses the Name of the horse, and the Form. They may remember the outcome of a famous race.
The winning player uses their talent to subjectively estimate the value of different factors, that the player estimates will actually affect the live sport in today's race.

You can not calculate on the level of a winning expectation player unless you have that player feeding you his evolving opinions of different factors in an accurate number scale - AND THEN a formula that combines these factors.

last out win% and trainer win% are not very warm in relation to what the winning horseplayer is using. At best statistics like these could contain some correlations.

you can get as complicated as you want , but it really boils down to weighting your factors according to importance before you combine them. If you can do those things, algebra or even basic math is fine.

weighting your factors according to importance

You have to have the ability to accurately write the formula. "Importance" is going to be dynamic. Win % of a last-out winner means less at x distance than y distance. Maybe it means more within a "meet" than a "circut". More or less on dirt or turf or synthA or synthB or synthC or synth D... So your "weights" will then be dynamic as well. Your weights will be different depending on the importance. In reality you then have a different "importance" for each narrow set of conditions (colts 6f fast dirt belmont purse 20-50k). So you must sacrafice accuracy to expand conditions (6f fast dirt purse 20-50k). Don't forget that all of this tinkering you are doing is really a showcase of your talent to subjectively estimate the value of different factors per a condition. In this case the medium of "objectivity" is a falsehood. :bang::lol:

Don't be discouraged. It is a good exercise and if you wanted to learn logit regression, it adds some spice to the practice. You may even find some spot plays that are profitable.

I agree with what you are saying.

The one additional point I have been trying to make is that sometimes the race conditions and issues are so unique, the only way to answer the important value questions is with a subjective almost intuitive analysis of the details because there are no relevant programmable statistics. At least that's the way I feel about a lot of races.

Dave Schwartz

03-05-2008, 08:05 PM

The one additional point I have been trying to make is that sometimes the race conditions and issues are so unique, the only way to answer the important value questions is with a subjective almost intuitive analysis of the details because there are no relevant programmable statistics. At least that's the way I feel about a lot of races.

And then you still don't know if you are right or wrong.

Dave

TrifectaMike

03-05-2008, 08:28 PM

This might sound simplistic, but have you ever tried modelling yourself, and not worry too much about understanding each and every race outcome.

Cratos

03-05-2008, 08:29 PM

And then you still don't know if you are right or wrong.

Dave

I am not trying to sarcastic, but isn’t predictive statistics about decision under uncertainty with the assigned conditional probability being the risk factor.

jonnielu

03-05-2008, 09:11 PM

And it is not possible for a human being to walk into the paddock and turn what he sees into a reasonably accurate probability/odds line for every horse in the field.

Please, we all use what we are capable and best at using. It is all we know. Logically, we denigrate the tools that are not in our toolbox.

Personally, I'd like to know more about Bayesian analysis as well as logit-order regression. Unfortunately, there do not appear to be any "dummy" books.

Dave

He can decide if he needs to, or not. I am simply saying that no matter how powerful your computer, or the extent of its programming, it is prety much limited to analysis of past performance data. Within that scope, the value of that is dependant on the programming. The greatest of programming is still limited to an analysis of past performance data.

Horseracing differs from any other betting game because of the human element, and the role that it plays in this sport. When you can program the computer to analyze the human element involved in horseracing as well as it analyzes past performances, you will have some very worthwhile output.

I'll wager that you will have an accurate roulette program sooner.

jdl

Overlay

03-05-2008, 09:15 PM

I've mentioned this before, but the first book I ever ordered (I was in university and couldn't get this book in any Toronto book store) was Winning At The Races.
Quirin in Appendix B at the end of the book describes Regression and then uses it for Post Positions and beaten lengths.
He doesn't go into multiple regression though. Just simply average beaten lengths versus post positions at a fixed distance.
You can make it multiple by factoring in the horses running style for example.

While he didn't detail his background calculations as in the Appendix you cited, weren't the numerical sprint and route formulas that Quirin presented at the end of the book examples of the application of multiple regression?

Gambleonclaimers

03-05-2008, 09:24 PM

As far as regression analysis I believe Benter was able to take it the farthest and really apply it to the races. I was not able to find the papers but I do remember bill benter going over this in detail in a few articles which included formulas.. He had another small article on cross pool tabulation and cross pool inefficiences These articles can be found on the web and viewed for free i just can't seem to find them right now.

Dave Schwartz

03-05-2008, 09:47 PM

Jonnie,

There is no doubt in my mind that a human being can detect subtleties that a computer cannot. However, even though the human can detect those subtleties, he has no way to accurately apply them.

The important word there would be "accurately."

It is all well and good to say, "There is an outisde bias at Turfway Park," but how much do you adjust each horse by the impact of that bias is way outside the scope of human abilities.

IMHO, the same thing goes for viewing horseflesh. I have had enough contact with horses (years ago, in my back yard) to recognize a horse with an attitude today or washiness, whatever. But turning that into a quantitive decision (i.e. "this horse's hit rate should be dropped from 37% to 20%") is just beyond most mortal humans.

You may feel differently.

Regards,
Dave Schwartz

TrifectaMike

03-05-2008, 10:23 PM

In my opinion Monte Carlo simulations; random sampling with applicable distributions would lead to better estimates of overall probability than regression analysis or bayesian inference... one just has to define the distributions.

TrifectaMike

03-05-2008, 10:38 PM

For example:

There are times when I see that a horse hasn't run for 8 weeks and consider it either irrelevant or a positive sign and others where I think it's a major negative. Some of that has to do with the specific trainer, the quality of the horse, whether he has been working well in between, whether he typically works well between races, whether his PPs seem to suggest he was a tired horse that needed a freshening or whether his PPs seem to suggest that something had to be wrong etc... I don't see how having statistics on recent activity and/or the trainer alone can analyze that situation as well as a human being that is familiar with the horse's entire recent record.

Maybe someone can explain that to me.

That maybe true for an individual race...a specific circumstance. However, for the purpose of acquiring probability estimates, it is not necessary to know that information deterministically to derive probability estimates. In fact, that information might not even be necessary.

DeanT

03-06-2008, 12:16 AM

That hurt my poor widdle brain.

When I read GT's posts I realize that my minor in Stats should have been a Major.

There are some mighty bright people on this board.

Cangamble

03-06-2008, 12:22 AM

While he didn't detail his background calculations as in the Appendix you cited, weren't the numerical sprint and route formulas that Quirin presented at the end of the book examples of the application of multiple regression?
He did give the formula for multiple regression but his example only had to do with beaten lengths and post positions at specific distances (each their own calculation). So according to him, just regression, not multiple regression.

highnote

03-06-2008, 02:36 AM

Interesting discussion. Hope to read see more contributions.

Overlay

03-06-2008, 06:37 AM

He did give the formula for multiple regression but his example only had to do with beaten lengths and post positions at specific distances (each their own calculation). So according to him, just regression, not multiple regression.

I understand what you're saying, but Chapter 25 of Winning at the Races (pages 272-285) was titled specifically "Computer-Generated Systems: Multiple Regression Formulas", in which Quirin presented numerical handicapping formulas for dirt sprints and dirt routes that resulted from a search he had done as to which combination of performance measures from major handicapping categories (class, speed figures, consistency, distance, early speed, and recent action) produced the most effective results. While he did not show the actual mathematical calculations that went into how he arrived at the optimum combinations of factors and the weights assigned to them (as he did in the simple regression example you're referring to in the separate appendix on pages 301-306 (Appendix B)("Regression")), he presented in Chapter 25 a formula for sprint races using seven weighted factors, and a separate formula for route races using five weighted factors, that assigned an overall numerical rating to horses that could be used to both rank them and calculate their fair odds.

Overlay

03-06-2008, 06:58 AM

I should also have included "recent form" as one of the handicapping categories that Quirin surveyed, as noted in the last post.

jonnielu

03-06-2008, 07:32 AM

Jonnie,

There is no doubt in my mind that a human being can detect subtleties that a computer cannot. However, even though the human can detect those subtleties, he has no way to accurately apply them.

The important word there would be "accurately."

It is all well and good to say, "There is an outisde bias at Turfway Park," but how much do you adjust each horse by the impact of that bias is way outside the scope of human abilities.

IMHO, the same thing goes for viewing horseflesh. I have had enough contact with horses (years ago, in my back yard) to recognize a horse with an attitude today or washiness, whatever. But turning that into a quantitive decision (i.e. "this horse's hit rate should be dropped from 37% to 20%") is just beyond most mortal humans.

You may feel differently.

Regards,
Dave Schwartz

Dave,

I know that I'm just a human being, that is why I inquire. How does the outside bias anywhere get determined in the first place? Computer or human?
Where does the very idea of an outside bias come from in the first place?
Is the notion of a bias sound in the first place? If there is a natural bias, is it in the track surface, or at the starting gate? Does the distance of the race make any difference, if we can establish a bias?

Once you program the computer to accurately predict the pace of a race, do the human beings involved co-operate? It seems that if the stupid humans might do many various things to compensate for factors real, imagined, or believed in, that would present a real programming problem.

Can you program for roulette?

jdl

Cangamble

03-06-2008, 07:44 AM

I didn't reread the book yesterday, and I forgot about that chapter to be honest and forgot that he applied multiple regression in the book.

classhandicapper

03-06-2008, 08:54 AM

And then you still don't know if you are right or wrong.

Dave

Dave,

I actually agree with you.

The thing is, over time I have apparently gained some level of experience that allows me to make those tough intuitive decisions a lot better than I did in my early years. I seem to be right more often than I used to be and that eventually translated into better results. I can't pin it down. At some subconscious level I have a better feel for how to interpret PPs and weigh factors even though in some cases I have no conscious stats or probabilities to work with. I'd bet that other experienced horse players feel the same way.

TrifectaMike

03-06-2008, 09:43 AM

Jonnie,

There is no doubt in my mind that a human being can detect subtleties that a computer cannot. However, even though the human can detect those subtleties, he has no way to accurately apply them.

The important word there would be "accurately."

It is all well and good to say, "There is an outisde bias at Turfway Park," but how much do you adjust each horse by the impact of that bias is way outside the scope of human abilities.

IMHO, the same thing goes for viewing horseflesh. I have had enough contact with horses (years ago, in my back yard) to recognize a horse with an attitude today or washiness, whatever. But turning that into a quantitive decision (i.e. "this horse's hit rate should be dropped from 37% to 20%") is just beyond most mortal humans.

You may feel differently.

Regards,
Dave Schwartz

It is all well and good to say, "There is an outisde bias at Turfway Park," but how much do you adjust each horse by the impact of that bias is way outside the scope of human abilities

You don't adjust each horse by the impact of that bias. In fact, there is no basis for adjustments. Firstly, it can't be quantified...by human or computer application ( which in reality is the same).

Let's not try to solve that problem directly. Instead let's us look at resultants, instead of causes. Assume you can model a horse's performance under all "experienced" conditions (environments). Variations in performance (biases of all types) is nothing more than the variance of the horse's performance function... sample from that performance probability density function sufficently, and observe the outcomes. Do the same for each horse in the race. The effect of the biases will be observed from the sampling.

TrifectaMike

03-06-2008, 09:53 AM

I understand what you're saying, but Chapter 25 of Winning at the Races (pages 272-285) was titled specifically "Computer-Generated Systems: Multiple Regression Formulas", in which Quirin presented numerical handicapping formulas for dirt sprints and dirt routes that resulted from a search he had done as to which combination of performance measures from major handicapping categories (class, speed figures, consistency, distance, early speed, and recent action) produced the most effective results. While he did not show the actual mathematical calculations that went into how he arrived at the optimum combinations of factors and the weights assigned to them (as he did in the simple regression example you're referring to in the separate appendix on pages 301-306 (Appendix B)("Regression")), he presented in Chapter 25 a formula for sprint races using seven weighted factors, and a separate formula for route races using five weighted factors, that assigned an overall numerical rating to horses that could be used to both rank them and calculate their fair odds.

Ranking horses by regression analysis, and then extend those rankings to calculate odds is inherently flawed and entirely subjective... in other words nearly useless.

rrbauer

03-06-2008, 09:55 AM

Ranking horses by regression analysis, and then extend those rankings to calculate odds is inherently flawed and entirely subjective... in other words nearly useless.

You know this because.........?

TrifectaMike

03-06-2008, 10:12 AM

You know this because.........?

The dependent variable utilized in the regression analysis is beaten lengths. The assumption is that even money horses should win by larger margins than 2-1 horses, and so on. Beaten lengths is not a factor in estimating win probabilities. Winning or losing is binary...not a linear relationship.

Dave Schwartz

03-06-2008, 10:19 AM

Once you program the computer to accurately predict the pace of a race, do the human beings involved co-operate? It seems that if the stupid humans might do many various things to compensate for factors real, imagined, or believed in, that would present a real programming problem.

That would depend upon your definition of "accurate." It can be done to some degree of accuracy.

The real point is that computers and people both have advantages and disadvantages.

Computers have the advantage of being able to take a perceived edge and translate it into a numeric one. Humans lack that ability.

Humans have the ability to see subtleties that the computer cannot.

It is easier for a human to be "scientific" than for a computer to be "artful."

It is more difficult for a human to organize lots of factors than it is for a computer.

One can go on and on with the differences. Things that take a high form of artfullness are impossible for a computer. They are almost as impossible for a human.

My belief is that you can teach just about anyone to paint but almost impossible to teach them to be Picasso. If we all had to be Tom Brohamers, most would be (are?) disappointed.

Regards,
Dave Schwartz

rrbauer

03-06-2008, 10:46 AM

The dependent variable utilized in the regression analysis is beaten lengths. The assumption is that even money horses should win by larger margins than 2-1 horses, and so on. Beaten lengths is not a factor in estimating win probabilities. Winning or losing is binary...not a linear relationship.

I don't have the formula in front of me. But, I believe that what Quirin was interested in was making money vis-a-vis the formula (at least that's what he told me...about three years after the book was published) and he indicated that it was "marginally" profitable when it was tested. Today, I don't know. What I do know is that there are a jillion ways to develop algorithms that can be used to determine a benchmark odds' line and the test is not whether, or not, they are binary or linear, but will they help you make money.

The thing that continuously gets lost in these discussions is the fact that all of these "probabilities" are ESTIMATES. They are not true probabilities. They are derived from sampled data that may/may not be valid when used to make predictions about the population based upon them.

As to the "computer can do" versus the "human can do" arguments, it's a "maybe it can" versus a "maybe it can't" situation. I have a computer. I have a brain. When we go to the track to bet against each other, you don't know what I know; and, I don't know what you know. But, I know what I know and whether it comes from my computer or from my brain, if I get my money out that is a binary fact; but, the amount I get out is linear!

rokitman

03-06-2008, 10:47 AM

That would depend upon your definition of "accurate." It can be done to some degree of accuracy.

The real point is that computers and people both have advantages and disadvantages.

Computers have the advantage of being able to take a perceived edge and translate it into a numeric one. Humans lack that ability.

Humans have the ability to see subtleties that the computer cannot.

It is easier for a human to be "scientific" than for a computer to be "artful."

It is more difficult for a human to organize lots of factors than it is for a computer.

One can go on and on with the differences. Things that take a high form of artfullness are impossible for a computer. They are almost as impossible for a human.

My belief is that you can teach just about anyone to paint but almost impossible to teach them to be Picasso. If we all had to be Tom Brohamers, most would be (are?) disappointed.

Regards,
Dave SchwartzIsn't a decision to sell software/data a decision to sell paint brushes to the blind? :D

DeanT

03-06-2008, 10:56 AM

Altho this discussion has come up before, I agree with Dan and Dave: Computers standardize things nicely and put them in a neat little picture for you. To me, that saves time and gives one confidence that his contender plays and odds lines are good. Confidence in a play is important, imo.

Doing what humans do off that - double checking the play with the nuances in the form, paddock inspection and bet sizing is the rest of the picture, of course.

I think if you have all those down, you can become a winner. In the simulcast environment, if you can handicap horseflesh via the PP's on TV, I think computers are the most important part of handicapping, since a program can yield dozens of plays at dozens of tracks a day and gives you a sample to play that can not be done with traditionally handicapping - simply because you can not properly handicap that many tracks in a day.

Dave Schwartz

03-06-2008, 11:05 AM

Isn't a decision to sell software/data a decision to sell paint brushes to the blind?

No. Rather it is offering a tool that can produce a "good picture" with a reasonable degree of technical expertise but without the need to have a high degree of artistry.

TrifectaMike

03-06-2008, 11:18 AM

The thing that continuously gets lost in these discussions is the fact that all of these "probabilities" are ESTIMATES. They are not true probabilities. They are derived from sampled data that may/may not be valid when used to make predictions about the population based upon them.

As to the "computer can do" versus the "human can do" arguments, it's a "maybe it can" versus a "maybe it can't" situation. I have a computer. I have a brain. When we go to the track to bet against each other, you don't know what I know; and, I don't know what you know. But, I know what I know and whether it comes from my computer or from my brain, if I get my money out that is a binary fact; but, the amount I get out is linear!

True, probabilities are estimates. But not all all estimates are equally probable.

Wagering constructs may or may not be linear. A related subject, but not a necessary condition in estimating probabilities.

jonnielu

03-06-2008, 11:38 AM

That would depend upon your definition of "accurate." It can be done to some degree of accuracy.

The real point is that computers and people both have advantages and disadvantages.

Computers have the advantage of being able to take a perceived edge and translate it into a numeric one. Humans lack that ability.

Humans have the ability to see subtleties that the computer cannot.

It is easier for a human to be "scientific" than for a computer to be "artful."

It is more difficult for a human to organize lots of factors than it is for a computer.

One can go on and on with the differences. Things that take a high form of artfullness are impossible for a computer. They are almost as impossible for a human.

My belief is that you can teach just about anyone to paint but almost impossible to teach them to be Picasso. If we all had to be Tom Brohamers, most would be (are?) disappointed.

Regards,
Dave Schwartz

Dave,

Maybe you would agree that the trick then is in, combining the science with the art, with a flexible and reasonably accurate line of demarcation.

jdl

Jeff P

03-06-2008, 11:51 AM

FWIW I've found that adding a degree of my own artfulness to algorithms based primarily on regresion gets better results (not just past results but results going forward) than the same algorithm if based on mainstream ideas alone. If I can give the algorithm a "twist"... make it emphasize something the public doesn't quite get... then I've come a long way towards boosting roi.

Examples:

FactorA = IVForFactor1 x IVForFactor2
FactorB = IVForFactor1 x IVForFactor2 x IVForFactor3

In the first example FactorA represents a value based on impact values for Factor1 and Factor2. If I let Factor1 and Factor2 be something mainstream predictive - like last race speed figure and morning line odds, the top ranked FactorA horse in the field might have a significant chance to win. But betting all top ranked FactorA horses everywhere is pointless because FactorA is likely to have a horrible roi. Why? Because FactorA is using components that are mainstream to the betting public. Put another way: A glance at the toteboard is very likely to tell you who the top FactorA horse in the race is.

What happens if I throw in a "twist?" In the second example let's say Factor3 represents something not mainstream uncovered by my own R&D... something that consistently throws the public off.

What ends up happening is that FactorB is every bit as predictive as FactorA, maybe even a little bit more so... but FactorB provides a much better roi for my final mix when I include FactorB in the final mix instead of FactorA.

FinalMix = IVForFactorB x IVForFactorC x IVForFactorD x IVForFactorE...

The real trick, IMHO, is to use (or create) factors that share two traits:

1. factor is causal and predictive
2. factor is overlooked by the public as they bet the races

I've discovered there is no magic bullet when it comes to assigning weight (importance) to factors in the mix. The best way (the only way for me) to correctly weight a factor in the final mix is through trial and error... use large samples and test different weights (importance) for the factors used in the mix... confirm the results going forward... and use what works best.

I've also come to believe that if a human being can envision it, and describe its components - then somebody somewhere can program a computer to do it.

-jp

.

traveler

03-06-2008, 11:52 AM

I think GameTheory hit the nail on the head "but you've got to beat the crowd, not just pick reasonable probabilities..."
My understanding is a Bayesian system uses new information to revise probabilities based on previous information. So we create a probability based on one factor and as we add new factors(information) we revise our probabilities or our "degree of belief". Given that over a large sample the public by and large creates the perfect odds-line because they have all the information available in general, if we keep adding factors we will arrive at the perfect odds-line too and lose the track take over and over. We as handicappers have to assume the semi-strong version of wall streets random-walk theory is correct in that not all the information is available to everyone at all times and look for those opportunites where the public overestimates or underestimates the value of some information. That's why every race isn't playable.
I have no idea about the math of either approach, I just know I want to be different from the public often enough.
BTW a good book on risk "Against The Gods - the remarkable story of risk"

Cangamble

03-06-2008, 12:16 PM

When you think about it, there isn't anything subjective that a horseplayer can come up with that can't be put into a formula.
This thread regarding coming up with the perfect formula boils down to coming up with the best odds line you can possibly come up.
The best a phenomenol computer program will come up with is an extraordinary odds line. Almost every horse in every race has a chance to win, though some may have a one in one hundred and seventy five chance.
This literally means that if a race is run 175 times, this horse will miraculously win once based on all the information available to you before the race.
The key to winning at the game is to constantly bet on overlays.

TrifectaMike

03-06-2008, 12:49 PM

When you think about it, there isn't anything subjective that a horseplayer can come up with that can't be put into a formula.
This thread regarding coming up with the perfect formula boils down to coming up with the best odds line you can possibly come up.
The best a phenomenol computer program will come up with is an extraordinary odds line. Almost every horse in every race has a chance to win, though some may have a one in one hundred and seventy five chance.
This literally means that if a race is run 175 times, this horse will miraculously win once based on all the information available to you before the race.
The key to winning at the game is to constantly bet on overlays.

This literally means that if a race is run 175 times, this horse will miraculously win once based on all the information available to you before the race.

Not true! You would have to run that racing many more times than 175 for that statistic to become evident.

Minimize the number of factors, while mainting integrity will yield an extraodinary odds line.

Tom

03-06-2008, 12:50 PM

Excellant post, Jeff. The not-widely know component is a key.
Didn't you once post a way to measure a factor based on combining it's roi and win% to get a value? I seem to remember you talking about it but couldn't find it in a search.

An HTR user once came up with plays that used things the public would understand, but liked to throw in a factor to make them look ugly and keep people off the horses - something that wouldn't hurt the effectiveness of the other factors.

asH

03-06-2008, 01:44 PM

No. Rather it is offering a tool that can produce a "good picture" with a reasonable degree of technical expertise but without the need to have a high degree of artistry.

like Beyer numbers..........

Dave Schwartz

03-06-2008, 02:05 PM

more like an expectation of what the Beyr numbers can produce.

Beyer numbers are "just numbers."

Computer output is actually "WHo to bet and for how much."

Dave Schwartz

03-06-2008, 02:20 PM

Maybe you would agree that the trick then is in, combining the science with the art, with a flexible and reasonably accurate line of demarcation.

I could not agree to that statement, at least as it applies to me. I use absolutely zero artfulness in my play and I would say that very few (if any) of our users do.

It might certainly apply to you or others.

Lately I have been working on something that I call "decision enhancement." Think of it as "decision making on steroids."

Specifically, consider your top 3 contenders/picks/selections/ranks or whatever else you might call them. They should have an expectation of some hit rate - 50%, 60%, 70% and some ROI.

If you wager on those three horses, you average 20% winners on each horse. If you pick one at random, or you take the "best horse" you have little or no expectation of being profitable.

Now, imagine if you could take those (say) 60% winners that are spread between 3 horses and concentrate 1/2 to 2/3 of the winners into only 1/3 of the horses.

Let this sink in... in 100 races, you have 300 potential wagers. The 100 wagers you choose with the correct process will produce around 35 wins with a play in every race and the other two will split up the remaining 25 wins in 200 starts.

Put another way, suppose those 3 horses combined for a $1.80 (i.e. -10% ROI). In each race you could select one of those horses such that the selected horse was (say) $2.20 and the other two were $1.20.

In other words, this technology would consistently make a better final decision.

Thus far, this has proven to be exceptionally powerful with favorites, though not usually profitable. The favroites I am playing come in around $1.95 to $2.05, but more important are the favorites I don't play... generally the lose around 35%! Talk about "vulnerable favorites!

BTW, no art whatsoever.

Dave

asH

03-06-2008, 02:20 PM

black box output......

asH

03-06-2008, 02:28 PM

Thus far, this has proven to be exceptionally powerful with favorites, though not usually profitable. The favroites I am playing come in around $1.95 to $2.05, but more important are the favorites I don't play... generally the lose around 35%! Talk about "vulnerable favorites!

BTW, no art whatsoever.

Dave

No Dave I disagree...the art is making it work for profits

perhaps you could add some of that Bayesian logic to your algorithms……which by the way is how one would apply it.

Dave Schwartz

03-06-2008, 04:04 PM

perhaps you could add some of that Bayesian logic to your algorithms……which by the way is how one would apply it.

Oooooh, you are the intelligent one. ;)

Now you see why this thread interested me so much.

GameTheory

03-06-2008, 04:10 PM

BTW a good book on risk "Against The Gods - the remarkable story of risk"Great book -- author is Peter Bernstein.

asH

03-06-2008, 05:32 PM

Oooooh, you are the intelligent one. ;)

Now you see why this thread interested me so much.

http://www.pr-owl.org/basics/bn.php

Bayesian Networks

http://www.pr-owl.org/images/bn_wisepilot_small.jpg (http://www.pr-owl.org/images/bn_wisepilot.jpg) What is a BN?

Bayesian networks provide a means of parsimoniously expressing joint probability distributions over many interrelated hypotheses. A Bayesian network consists of a directed acyclic graph (DAG) (http://en.wikipedia.org/wiki/Directed_acyclic_graph) and a set of local distributions. Each node in the graph represents a random variable (http://en.wikipedia.org/wiki/Random_variable). A random variable denotes an attribute, feature, or hypothesis about which we may be uncertain. Each random variable has a set of mutually exclusive and collectively exhaustive possible values. That is, exactly one of the possible values is or will be the actual value, and we are uncertain about which one it is. The graph represents direct qualitative dependence relationships; the local distributions represent quantitative information about the strength of those dependencies. The graph and the local distributions together represent a joint distribution over the random variables denoted by the nodes of the graph.

Cratos

03-06-2008, 06:06 PM

http://www.pr-owl.org/basics/bn.php

Bayesian Networks

http://www.pr-owl.org/images/bn_wisepilot_small.jpg (http://www.pr-owl.org/images/bn_wisepilot.jpg) What is a BN?

Bayesian networks provide a means of parsimoniously expressing joint probability distributions over many interrelated hypotheses. A Bayesian network consists of a directed acyclic graph (DAG) (http://en.wikipedia.org/wiki/Directed_acyclic_graph) and a set of local distributions. Each node in the graph represents a random variable (http://en.wikipedia.org/wiki/Random_variable). A random variable denotes an attribute, feature, or hypothesis about which we may be uncertain. Each random variable has a set of mutually exclusive and collectively exhaustive possible values. That is, exactly one of the possible values is or will be the actual value, and we are uncertain about which one it is. The graph represents direct qualitative dependence relationships; the local distributions represent quantitative information about the strength of those dependencies. The graph and the local distributions together represent a joint distribution over the random variables denoted by the nodes of the graph.

Very neat, I wish BNs were around (and maybe they were, but I didn’t know of them) in the seventies when I was studying econometrics. However the sophisticated and powerful PC definitely wasn’t around. Thanks again for presenting the post.

Overlay

03-06-2008, 06:08 PM

The dependent variable utilized in the regression analysis is beaten lengths. The assumption is that even money horses should win by larger margins than 2-1 horses, and so on. Beaten lengths is not a factor in estimating win probabilities. Winning or losing is binary...not a linear relationship.

I take it that your comments are with reference to Quirin's two-factor regression analysis in Appendix B (as I mentioned). His discussion of multiple regression formulas in Chapter 25 was the portion that included a discussion of deriving true winning probabilities from multiple regression ratings.

asH

03-06-2008, 06:54 PM

Very neat, I wish BNs were around (and maybe they were, but I didn’t know of them) in the seventies when I was studying econometrics. However the sophisticated and powerful PC definitely wasn’t around. Thanks again for presenting the post.

handicappers inherently follow Bayesian logic when they watch the tote board in MSW races...as a poster already stated it's making better decisions based on new (good)information.

traveler

03-06-2008, 08:28 PM

Great book -- author is Peter Bernstein.
I barely passed algebra in high school and this sits on my night stand, go figure, not that I understand it but interesting as all hell.

Gambleonclaimers

03-06-2008, 10:11 PM

Almost every horse in every race has a chance to win
The key to winning at the game is to constantly bet on overlays.
I feel this is where we can get bogged down because the thing with betting on overlays and giving almost every horse in a race some chance of winning wether it is a 2% or 50% chace. You are only going to get the odds that are on the tote board at post time. I could be subjective and give my top three contenders these odd line for every race and I can guarantee I will have plays in 90% of all races 1/9 on choice 1 1/5 on choice 2 and 2/5 on choice 3 thats my oddline can it be successful yes if I hit at 25-33% and my tote are aren't less than 2-1 ridiculous but successful so betting overlays is no more than a notion of percieved value, basically it may be an overlay to you but an underlay to me so who is right. If your able to pick winners for the types of bets you make at a given percentage you can beat this game percentages on actual plays vary with the type of play obviously. Betting on overlays is just a percieved notion of value do overlays really exsist I guess you could argue both ways but I choose just to play for value.

asH

03-06-2008, 10:14 PM

http://www.fuellesspower.com/Light%20Bulb%20Edison.gif
Thanks for the thread Robert99

jfdinneen

03-07-2008, 01:18 PM

Bill Benter adapted the logit approach to basic handicapping factors and even after 5 years trial and error it did not work until they added in the public odds estimation - that turned out to be the highest weighted factor. Another vital factor was the total number of races a horse had had - the logic of why a currently good horse with a lot of races was worth less than a currently good horse, more lightly raced, could not be explained but it had to be included in the model.. As Robert identified, one of the significant variables that surfaced from Benter's initial analysis was 'number of previous runs' - the lower the better. This factor is easily explained in terms of sampling sizes. The more often a horse runs, the more likely its ability will be reflected in the average of its past performances (see Law of Large Numbers). By contrast, horses that have five or fewer runs are more likely to spring a surprise (either positive or negative) - think maidens second time out! This also explains why, in most jurisdictions, horses are required to run at least three races before they are given an official handicap rating.

John

asH

03-07-2008, 01:48 PM

As Robert identified, one of the significant variables that surfaced from Benter's initial analysis was 'number of previous runs' - the lower the better. This factor is easily explained in terms of sampling sizes. The more often a horse runs, the more likely its ability will be reflected in the average of its past performances (see Law of Large Numbers). By contrast, horses that have five or fewer runs are more likely to spring a surprise (either positive or negative) - think maidens second time out! This also explains why, in most jurisdictions, horses are required to run at least three races before they are given an official handicap rating.

John

good real time example- Aqu 2nd race, Alzehba ran the same as his first...dont let the infusion of speed fool you...needs a drop in class 50K?

3rd Take The Bluff- overlay

classhandicapper

03-07-2008, 05:31 PM

As Robert identified, one of the significant variables that surfaced from Benter's initial analysis was 'number of previous runs' - the lower the better. This factor is easily explained in terms of sampling sizes. The more often a horse runs, the more likely its ability will be reflected in the average of its past performances (see Law of Large Numbers). By contrast, horses that have five or fewer runs are more likely to spring a surprise (either positive or negative) - think maidens second time out! This also explains why, in most jurisdictions, horses are required to run at least three races before they are given an official handicap rating.

John

Most lightly raced horses are young. Most young horses are still developing physically. They are also getting more experience, seasoning, in better shape etc.... So they tend to improve.

When an older (4yo) horse is lightly raced, even though it may be fully developed, it is still often learning, getting seasoning, and getting in better racing shape as time passes. One other difference is that the typical lightly raced older horse has had physical problems that could recur. That's why it is lightly raced.

Older horses that have had many starts have already learned everything they need to learn, have peaked, and have had a lot of wear and tear put on their bodies. So they are less likely to improve. If anything, they start becoming more likely to decline. Even heavily raced younger horses can start feeling the wear and tear.

Cratos

03-07-2008, 07:00 PM

The more often a horse runs, the more likely its ability will be reflected in the average of its past performances (see Law of Large Numbers). John

What you appear to be alluding too is the frequency curve which I am skeptical about when come it comes to horseracing. I believe when you speak of ability, you should also speak of placement. If a horse is placed at the wrong distance, the wrong surface, or the wrong class then its ability may never come to fruition.

Examples of this is Aldebaran who was so-so early in his career on the turf, but late in his career of 25 races he won 6 of his last 9 races which was 75% of his lifetime wins. Another example would be Cigar who was also tried often on the turf early in his career, but when his connections moved him to dirt he became at one point of his career virtually unbeatable.

However from an odds point of view a lightly raced horse might have greater odds (hence greater value), but that is because of the uncertainty coming from not having a long race record.

Dave Schwartz

03-07-2008, 09:37 PM

I am looking for a stats package.

Simple is better as I would not understand something as high-powered as SPSS.

Anyone got any suggestions?

Regards,
Dave Schwartz

jfdinneen

03-07-2008, 10:03 PM

Dave,

If you want an Excel Add-In, I can recommend StatTools from Palisade. Though not the most comprehensive package available, it generates reasonable reports by selecting the default options.

That said, the most important statistical tool I have in my armor is a Monte-Carlo Add-In.

John

Dave Schwartz

03-07-2008, 10:39 PM

John,

That was a great lead! Thank you.

I have downloaded the trial and will play with it next week.

Dave

DanG

03-08-2008, 06:13 AM

I wish I had something really clever to add, but I just wanted to again thank everyone for contributing to a great thread. The best series of thoughts on this subject I’ve read so far. :jump:

(And I’ve searched this forum...on this very topic more than I’ll admit!) :eek:

ryesteve

03-08-2008, 09:42 AM

I am looking for a stats package.

Simple is better as I would not understand something as high-powered as SPSS.

I can't believe that someone who understands HSH would have a problem with SPSS :D

jonnielu

03-08-2008, 10:41 AM

I could not agree to that statement, at least as it applies to me. I use absolutely zero artfulness in my play and I would say that very few (if any) of our users do.

It might certainly apply to you or others.

Lately I have been working on something that I call "decision enhancement." Think of it as "decision making on steroids."

Specifically, consider your top 3 contenders/picks/selections/ranks or whatever else you might call them. They should have an expectation of some hit rate - 50%, 60%, 70% and some ROI.

Okay, and I'm not picking on you, I'm just throwing up thoughts. We could arrive at three horses with a 60% - 70% hit rate by considering the top three ML. Granted, we would be missing the 8 -1 outsider that the computer might swap into this group, depending upon the programming. But, does that hurt us at this point?

If you wager on those three horses, you average 20% winners on each horse. If you pick one at random, or you take the "best horse" you have little or no expectation of being profitable.

I can agree, if taking one at random, or even at highest odds, or lowest odds... if you take that as denoting "best Horse".

Now, imagine if you could take those (say) 60% winners that are spread between 3 horses and concentrate 1/2 to 2/3 of the winners into only 1/3 of the horses.

Let this sink in... in 100 races, you have 300 potential wagers. The 100 wagers you choose with the correct process will produce around 35 wins with a play in every race and the other two will split up the remaining 25 wins in 200 starts.

It sunk in a long time ago, bet the right horse 1/3 third of the time at an average of 3 - 1, and you have it made.

Put another way, suppose those 3 horses combined for a $1.80 (i.e. -10% ROI). In each race you could select one of those horses such that the selected horse was (say) $2.20 and the other two were $1.20.

In other words, this technology would consistently make a better final decision.

Thus far, this has proven to be exceptionally powerful with favorites, though not usually profitable. The favroites I am playing come in around $1.95 to $2.05, but more important are the favorites I don't play... generally the lose around 35%! Talk about "vulnerable favorites!

BTW, no art whatsoever.

Dave

Maybe if you could program the machine with just a touch of art, it could learn to do what the human often can not, due to emotion. Pass those races where the 3 contenders are most equal, and bet the outsider when the favorite is dead. I believe that the dead favorite is programmable, within an artistic understanding of past performances.

It is the art of the outsider that you are missing, think about the factors that bring him to the gate today, able and ready to win this race today, although the calculations would say no, because that horse is resistant to measure, but only as we know measure. Remember, that there is frequently someone who is measuring that horse accurately, and that is why he is going into this gate today.

jdl

Dave Schwartz

03-08-2008, 11:57 AM

I can't believe that someone who understands HSH would have a problem with SPSS

LOL - A.I. is so much easier than satistics.

I am so tempted to throw 50 hours of programming at a genetic algorythm to produce the weights rather than actually learning something new, but I know I need to bite the bullet and do it.

Dave

TrifectaMike

03-08-2008, 12:37 PM

As Robert identified, one of the significant variables that surfaced from Benter's initial analysis was 'number of previous runs' - the lower the better. This factor is easily explained in terms of sampling sizes. The more often a horse runs, the more likely its ability will be reflected in the average of its past performances (see Law of Large Numbers). By contrast, horses that have five or fewer runs are more likely to spring a surprise (either positive or negative) - think maidens second time out! This also explains why, in most jurisdictions, horses are required to run at least three races before they are given an official handicap rating.

John

Sampling size is a function of confidence levels and standard deviation and not averages.

And the law of large numbers hardly comes into effect when observing a horse's entire past history...obsevations are much to small. In fact, it's not a horses entire record that highly correlates with today's expected performance, but rather lies within it's last four races... independent if a horseis lightly raced or has had 40 starts.

Dave Schwartz

03-08-2008, 12:51 PM

Maybe if you could program the machine with just a touch of art, it could learn to do what the human often can not, due to emotion. Pass those races where the 3 contenders are most equal, and bet the outsider when the favorite is dead. I believe that the dead favorite is programmable, within an artistic understanding of past performances.

It is the art of the outsider that you are missing, think about the factors that bring him to the gate today, able and ready to win this race today, although the calculations would say no, because that horse is resistant to measure, but only as we know measure. Remember, that there is frequently someone who is measuring that horse accurately, and that is why he is going into this gate today.

Jonnie,

Okay, so here we go again.

Gotta do it your way, right? Otherwise, it just does not work.

Listen, you are certainly entitled to hanging your hat on your ability to be artful. I applaud you for it. I am sure that your handicapping skills are far beyond mine.

However, using art does not have to be everyone's choice.

The fact that you are successful with your art is not an indication that someone else will be successful with theirs.

It is also not an indication that someone who does not use any art cannot be successful.

You have chosen to step into a thread about applying regression, a clearly scientific approach, to preach that artfulness is the only way. I would suggest that your continuing argument is "off-topic," at the very least.

Regards,
Dave Schwartz

jonnielu

03-08-2008, 01:43 PM

Jonnie,

Okay, so here we go again.

Gotta do it your way, right? Otherwise, it just does not work.

Listen, you are certainly entitled to hanging your hat on your ability to be artful. I applaud you for it. I am sure that your handicapping skills are far beyond mine.

However, using art does not have to be everyone's choice.

The fact that you are successful with your art is not an indication that someone else will be successful with theirs.

It is also not an indication that someone who does not use any art cannot be successful.

You have chosen to step into a thread about applying regression, a clearly scientific approach, to preach that artfulness is the only way. I would suggest that your continuing argument is "off-topic," at the very least.

Regards,
Dave Schwartz

Gee Dave,

I didn't say any of that, but let me apologize for confusing your apparent need to be right with a desire to explore ideas.
jdl

robert99

03-08-2008, 04:05 PM

Thanks once again for all the replies.
We've reached over seven pages and still on topic with just minor skirmishes into disagreement.

The logit model that Bill Benter's team arrived at after 10 years effort was accurate at all price ranges to within a standard error of only 1.4 over 3198 races. Bolton etc could achieve 3% profit in USA and 11% profit in Hong Kong with the model being used on new races as a "dumb" machine. Who knows what they might have achieved, good or bad, if they had used some added opinion at race time?

The 10 year's work involved some huge amounts of hard work but at times there was inspiration to change things around, based on experience and creativity - is that not art? Man decided the parameters to test, not the machine. So as possibly always, it is a combination of all sorts of inputs (stats, form, trial and error, model development, ingenuity, dedication etc) put together to achieve an intelligent purpose.

If I had that model and two horses were good things but closely tied on the numbers, and I saw one was just that bit (no numbers) fitter than last time out then I would go for that one and lay the other - is that art or common sense? Why not use all the tools you have?

Cratos

03-08-2008, 04:16 PM

Thanks once again for all the replies.
We've reached over seven pages and still on topic with just minor skirmishes into disagreement.

The logit model that Bill Benter's team arrived at after 10 years effort was accurate at all price ranges to within a standard error of only 1.4 over 3198 races. Bolton etc could achieve 3% profit in USA and 11% profit in Hong Kong with the model being used on new races as a "dumb" machine. Who knows what they might have achieved, good or bad, if they had used some added opinion at race time?

The 10 year's work involved some huge amounts of hard work but at times there was inspiration to change things around, based on experience and creativity - is that not art? Man decided the parameters to test, not the machine. So as possibly always, it is a combination of all sorts of inputs (stats, form, trial and error, model development, ingenuity, dedication etc) put together to achieve an intelligent purpose.

If I had that model and two horses were good things but closely tied on the numbers, and I saw one was just that bit (no numbers) fitter than last time out then I would go for that one and lay the other - is that art or common sense? Why not use all the tools you have?

I believe it was the late Phil Bull who said that “handicapping is both an art and a science.” I also believe this is where the “geeks” go awry, they make handicapping too much of a “numbers crunching” game.

Dave Schwartz

03-08-2008, 04:48 PM

Jonnie,

I thought that was what you said, but apparently I misintepreted your meaning.

I apologize for the misunderstanding.

Regards,
Dave Schwartz

TrifectaMike

03-08-2008, 04:55 PM

The tangential discussion of art and science is amusing. In fact, so amusing I hope it continues.

Forecasting by any means (including regression) is an art that exploits science in an effort to identify future events or conditions.

TrifectaMike

03-08-2008, 05:04 PM

LOL - A.I. is so much easier than satistics.

I am so tempted to throw 50 hours of programming at a genetic algorythm to produce the weights rather than actually learning something new, but I know I need to bite the bullet and do it.

Dave

In the manner in which you described your project of either betting on either fav 1, 2 or 3, you time might be better spent by reviewing some Game Theory.

jfdinneen

03-08-2008, 05:18 PM

Mike,

Belated welcome to the board!

In the context of handicapping, a good starting point is to ask the Bill Benter question (fundamental question of handicapping):
What additional variables (if any) explain a significant proportion of the variance in results that is not already accounted for by the public odds (Wisdom of Crowds)?

As Benter proved in real-world profits, the answer to that question helps identify which horses individually (Win / Place / Show) or collectively (Exotics) are overlays. As you already know, with any prediction process how we arrive at the correct answer is irrelevant (science, art, magic, or other). However, answering the related question (which variables are most highly correlated with todays race result?) is an interesting intellectual exercise but of little practical value if that information (e.g., best last Beyer, or median last five Beyers off 45+ day layoff) is already included in the parimutuel prices! This is why identifying "number of previous runs" was so valuable to Benter's team - the public in Hong Kong underestimated its importance!

John

Cangamble

03-08-2008, 06:07 PM

If I had that model and two horses were good things but closely tied on the numbers, and I saw one was just that bit (no numbers) fitter than last time out then I would go for that one and lay the other - is that art or common sense? Why not use all the tools you have?
****************************
This too can be put into a formula. You can do late programming using what you see in a post parade.
I don't think there is anything that can't be put into a super equation when it comes to handicapping horses. Though it may take time to do all the inputting required.
I don't think there is any "art" that can't be given a numerical value.

robert99

03-08-2008, 07:00 PM

If I had that model and two horses were good things but closely tied on the numbers, and I saw one was just that bit (no numbers) fitter than last time out then I would go for that one and lay the other - is that art or common sense? Why not use all the tools you have?
****************************
This too can be put into a formula. You can do late programming using what you see in a post parade.
I don't think there is anything that can't be put into a super equation when it comes to handicapping horses. Though it may take time to do all the inputting required.
I don't think there is any "art" that can't be given a numerical value.

So what number / formula would you give it and why?

Benter's team had to wait to the last minute to get the public consensus odds on the PMU but that was "the" method and the programming was done years before. My simple example was a choice between a tie, with one item of extra information that swung it one way or the other. That is a simple yes/no question and I can't see that your comment has any relevance.

TrifectaMike

03-08-2008, 07:03 PM

Mike,

Belated welcome to the board!

In the context of handicapping, a good starting point is to ask the Bill Benter question (fundamental question of handicapping):
What additional variables (if any) explain a significant proportion of the variance in results that is not already accounted for by the public odds (Wisdom of Crowds)?

As Benter proved in real-world profits, the answer to that question helps identify which horses individually (Win / Place / Show) or collectively (Exotics) are overlays. As you already know, with any prediction process how we arrive at the correct answer is irrelevant (science, art, magic, or other). However, answering the related question (which variables are most highly correlated with todays race result?) is an interesting intellectual exercise but of little practical value if that information (e.g., best last Beyer, or median last five Beyers off 45+ day layoff) is already included in the parimutuel prices! This is why identifying "number of previous runs" was so valuable to Benter's team - the public in Hong Kong underestimated its importance!

John

John, you've nailed it perfectly!! And that is EXACTLY why regression in the general sense is nearly useless as a profit forecastor. As you add factors (dependent variables) you will regress to exactly what you see on the toteboard.

If you would like to deviate from the tote (general public betting), I would suggest you use ONE factor, model it as well as possible ( distribution and variance).... understand it totally... and proceed to the Wagering Solution. Which by the way is in all probability more critical than the selection process.

Mike

sjk

03-08-2008, 07:09 PM

I do just the opposite. I investigated every handicapping element that I could think of to see whether it added predictive value. I believe that if I left anything out that the other players were looking at they would take me to the cleaners.

Using every piece of info I can I still find plenty of overlays to bet.

Dave Schwartz

03-08-2008, 07:23 PM

In the manner in which you described your project of either betting on either fav 1, 2 or 3, you time might be better spent by reviewing some Game Theory.

That would make it sound as if you think my efforts are wasted in this area.

LOL - Currently, that approach is working quite well for me and several others who have implemented it.

Of course, almost everyone I talk to about it starts by saying, "That doesn't sound like it would work."

All I know is that if I can identify the losers in a group of contenders that return $1.80 such that those 2 return (say) a -40% ROI, the remaining horses will do much better.

That means I am consistently making "the right decision." That is my goal.

Dave

robert99

03-08-2008, 07:41 PM

John, you've nailed it perfectly!! And that is EXACTLY why regression in the general sense is nearly useless as a profit forecastor. As you add factors (dependent variables) you will regress to exactly what you see on the toteboard.

If you would like to deviate from the tote (general public betting), I would suggest you use ONE factor, model it as well as possible ( distribution and variance).... understand it totally... and proceed to the Wagering Solution. Which by the way is in all probability more critical than the selection process.

Mike

Not really, as Benter did exactly the opposite to what you state and what you state did not happen, either in USA or Hong Kong.
Have a look at his paper "Computer Based Horse Race Handicapping and Wagering Systems: A report". He spells it all out.

TrifectaMike

03-08-2008, 07:42 PM

That would make it sound as if you think my efforts are wasted in this area.

LOL - Currently, that approach is working quite well for me and several others who have implemented it.

Of course, almost everyone I talk to about it starts by saying, "That doesn't sound like it would work."

All I know is that if I can identify the losers in a group of contenders that return $1.80 such that those 2 return (say) a -40% ROI, the remaining horses will do much better.

That means I am consistently making "the right decision." That is my goal.

Dave

NO! Not at all. I'm suggesting a means to arrive at your final decision. I'm quite sure your efforts are not wasted.

jonnielu

03-08-2008, 08:48 PM

Why not use all the tools you have?

A sensable question. I can't think of a single reason to not have and use any tool that is productive. The computer is one that is most valuable, its use creates time needed to apply those tools that fall outside of its scope, but are also productive when put to use. The man that can gain a working command of the full assortment of tools can be a very successful player.

No doubt, any empty drawers in the tool chest can have a man starting vast projects with half vast ideas.

jdl

Cangamble

03-08-2008, 09:59 PM

So what number / formula would you give it and why?

Benter's team had to wait to the last minute to get the public consensus odds on the PMU but that was "the" method and the programming was done years before. My simple example was a choice between a tie, with one item of extra information that swung it one way or the other. That is a simple yes/no question and I can't see that your comment has any relevance.
In your example another factor can be multiplied into the the formula you derived your original system that either is a "fitness" number or an odds "number"
Two very valid ways handicappers who bet with a couple of minutes to post or less use to decide on their final bets.

singunner

03-08-2008, 11:38 PM

I find all these discussions fascinating. The general consensus is that more variables = better predictions. I developed an advanced comparative algorithm that only looks at one variable and returns a hefty theoretical profit three years running.

I also question studies on such small samples. I ran my program on 180,000 races, and I still approach my results with healthy, scientific skepticism.

I'll be looking for further peer review soon, so if anyone was interested, I'd be happy to exchange access to my stats for a little beta testing. Please PM me if you're interested so I can start making a list. (Of course, I'll make a more detailed post in the coming days as I get closer to releasing my results.)

-Sin

Cratos

03-09-2008, 12:49 AM

Mike,

Belated welcome to the board!

In the context of handicapping, a good starting point is to ask the Bill Benter question (fundamental question of handicapping):
What additional variables (if any) explain a significant proportion of the variance in results that is not already accounted for by the public odds (Wisdom of Crowds)?John

There are many "things" that the public odds (i.e. the toteboard) don’t reflect and it is to the benefit of the handicapper/gambler to find those “things” and exploit the public’s thinking.

In the seventies when Andrew Beyer popularize his “speed figure concept” the majority public was using the old speed rating system based on track records which was published by the Morning Telegraph and Daily Racing Form at the time. Essentially Beyer founded “things” which allowed him to exploit the public’s thinking.

Understanding the Beyer experience and learning from it, it would be unwise for anyone who has some “things,” to reveal those “things’ to the betting public who are his wagering adversaries. This game is about winning and the more you know; and the less you reveal, the more you will win.

asH

03-09-2008, 03:38 AM

I find all these discussions fascinating. The general consensus is that more variables = better predictions.

Might be the general consensus but not necessarily the undisputed one (the majority is wrong most of the time). I subscribe to the notion that the cleaner the data the better the predictions. That’s the whole idea behind BN networks and Bayesian logic. What good is it if a variables’ inherent error rate is at 30%, 20% or even 10%, and I’m not understanding my data enough to filter , control, trap, sift, clean the variable to perhaps 99% pure……truth shall set you free

Someone once said “God does not play dice” ...has something to do with probability and uncertainty

singunner

03-09-2008, 03:46 AM

I believe the quote is Einstein, but I think he was trying to state that there is no truly random event, not necessarily make a statement about probability.

asH

03-09-2008, 04:08 AM

Quantum mechanics doesn’t predict a definitive result for an observation, but predicts probable outcomes and possibilities of these outcomes. Hence the statement

asH

03-09-2008, 04:25 AM

Didnt like the way Pyro's ears were pinned to his head as he traveled down the stretch, didnt appear comfortable...Bridgmohan next out?

badcompany

03-09-2008, 05:14 AM

I find all these discussions fascinating. The general consensus is that more variables = better predictions. I developed an advanced comparative algorithm that only looks at one variable and returns a hefty theoretical profit three years running.

-Sin

It's the old saying, "Look long, look wrong." For every race you can always come up with several scenarios. Eventually, you have to pick one and go with it.

Yoiu guys with your algorithms should be ashamed of yourselves. Real men know that all you need to handicap a race is a program and a toilet seat.

classhandicapper

03-09-2008, 09:54 AM

John, you've nailed it perfectly!! And that is EXACTLY why regression in the general sense is nearly useless as a profit forecastor. As you add factors (dependent variables) you will regress to exactly what you see on the toteboard.

If you would like to deviate from the tote (general public betting), I would suggest you use ONE factor, model it as well as possible ( distribution and variance).... understand it totally... and proceed to the Wagering Solution. Which by the way is in all probability more critical than the selection process.

Mike

I also think that as you add factors your odds line will tend to slowly move towards the odds on the board. But IMHO that's probably a good thing because the public consensus is pretty darn good at estimating the chances of horses most of the time. If your odds line doesn't match the public's reasonably well a lot of the time, it's probably YOU that are making a lot of mistakes.

The thing is, your line will not always equal the odds on the board because the public does make some mistakes by underestimating or overestimating some factors. When you know the public is wrong, you can exploit that.

That takes us into an entirely different area.

It's at least reasonable to simply focus all your attention on situations you know the public often gets wrong (master some individual factors or situations) and ignore the factors that the public values very well because they get built into the odds well. This is not the same as making an odds line, but it works fine also.

Overlay

03-09-2008, 10:50 AM

I also think that as you add factors your odds line will tend to slowly move towards the odds on the board. But IMHO that's probably a good thing because the public consensus is pretty darn good at estimating the chances of horses most of the time. If your odds line doesn't match the public's reasonably well a lot of the time, it's probably YOU that are making a lot of mistakes.

The thing is, your line will not always equal the odds on the board because the public does make some mistakes by underestimating or overestimating some factors. When you know the public is wrong, you can exploit that.

That takes us into an entirely different area.

It's at least reasonable to simply focus all your attention on situations you know the public often gets wrong (master some individual factors or situations) and ignore the factors that the public values very well because they get built into the odds well. This is not the same as making an odds line, but it works fine also.

I agree totally with the first part of your post. But it seems to me that the farther away you go from a comprehensive oddsline toward betting individual factors (even those that the public is underbetting at the moment), you have to depend on those factors remaining underbet, and (if and when the public does catch up with you), you have to find new factors or angles. I think it's better to go with a range of variables that are most predictive of race results (whether the public is keying on them or not), and then exploit those cases (as you said) where you can identify why the public has mistakenly overbet one horse and underbet another.

TrifectaMike

03-09-2008, 11:32 AM

Let's suppose a hypothetical. Let's assume you take into account a mutiplicity of factors, and generate an odds line. Now, this odds line tracks the combined opinion of the betting public as reflected on the toteboard to within +- 10 cents on the dollar (1 sigma case) on the lowest four odds.

What can you do with this information?

sjk

03-09-2008, 12:00 PM

My odds line tracks the public odds for 80% of the horses but there are overlays among the other 20%.

What I can do is bet.

DanG

03-09-2008, 12:08 PM

This is been a great thread and a real revelation so far.

I went through my entire life thinking the dominate blood in my veins was Cherokee and here I find out I’m much more “Bayesian”! :jump:

Seriously…Thanks to all and especially to Robert for firing the 1st shot. :ThmbUp: :ThmbUp:

classhandicapper

03-09-2008, 12:30 PM

I agree totally with the first part of your post. But it seems to me that the farther away you go from a comprehensive oddsline toward betting individual factors (even those that the public is underbetting at the moment), you have to depend on those factors remaining underbet, and (if and when the public does catch up with you), you have to find new factors or angles. I think it's better to go with a range of variables that are most predictive of race results (whether the public is keying on them or not), and then exploit those cases (as you said) where you can identify why the public has mistakenly overbet one horse and underbet another.

I agree with you 100% percent (especially from a theoretical point of view).

However, I do find that I when I focus on a particular set of circumstances that I know the public often screws up on it makes the whole process a lot easier for me.

I'll give you an example.

Lately, I haven't been very sharp mentally because I was ill for a couple of weeks (flu), having trouble sleeping, a few extra stresses etc... I think I was having a much tougher time analyzing the "whole ball of wax" as well as I usually do and made a few really stupid mistakes that cost me (both losses and not winning). Yet I had no difficulty isolating a couple of good plays based on "situations" that I like.

I'm starting to think a more balanced approach might be the right path for me because I seem to have my own "form cycle". ;)

robert99

03-09-2008, 12:54 PM

I also think that as you add factors your odds line will tend to slowly move towards the odds on the board. But IMHO that's probably a good thing because the public consensus is pretty darn good at estimating the chances of horses most of the time. If your odds line doesn't match the public's reasonably well a lot of the time, it's probably YOU that are making a lot of mistakes.

The thing is, your line will not always equal the odds on the board because the public does make some mistakes by underestimating or overestimating some factors. When you know the public is wrong, you can exploit that.

That takes us into an entirely different area.

It's at least reasonable to simply focus all your attention on situations you know the public often gets wrong (master some individual factors or situations) and ignore the factors that the public values very well because they get built into the odds well. This is not the same as making an odds line, but it works fine also.

Benter had over 120 factors tested and those were whittled down to the best 20 and he still did not get sufficient agreement with the public consensus - he lost. It is agreement with the average consensus for each horse over many races that is important in the modelling. It is disagreement in a single race where the profits lie. When he added the last minute consensus as the 21st factor that added about 25% to the evidence and was far better than any other single form factor.

The point that form and hiring the best brains in statistics, handicapping and software still does does not do as well as the consensus after 5 years of effort is the amazing thing - there was no identifiable single factor from public form that was underbet. The consensus were presumably not using any of those sophisticated techniques or it would have showed on the PMU board. How the public actually achieve such good results with such little input is a key area that is hardly touched upon. Systems only work because "the public does make some mistakes by underestimating or overestimating some factors". Identifying those factors is the other half of the issue - so having a logit model that is neutral in its bias gives a platform to identify the public's current biases towards error.

Exploiting when the public price for any horse was a little too long was the exact area Benter went into - it was the whole purpose of his efforts. Exactas and exotics were other tools in his exploitation kit and the overlay percentages compound on those bets.

Cratos

03-09-2008, 01:09 PM

My odds line tracks the public odds for 80% of the horses but there are overlays among the other 20%.

What I can do is bet.

What I have failed to see is a pronouncement within this thread on how to exploit the odds line. Making an odds line that is consistent with the toteboard might be a good academic exercise, but it is not putting money in your pocket.

I never look at odds until I am ready to bet and at that time my wagering choice must be 3-1 or better or I do not bet. I am never intimidated by the public because it is my job to beat them more times then they beat me.

Therefore write the equation for the running curve of a horse and that curve is typically nonlinear and downward sloping. Sum all of the curves of each horse in the race and that aggregate curve would typically be the race profile curve. Then compare each entrant against that curve and typically the best fit will be the winner of the race, the public does that 30-34 percent of the time at any track in North America. You the bettor have 66-70 percent opportunity to exploit the public.

sjk

03-09-2008, 01:23 PM

I exploit the odds line as I expect most would. Look for horses where the odds offered on the tote board are some acceptably high multiple of your estimated odds and bet those horses.

Years ago I convinced myself that using a higher multiple corresponded mathematically to placing a higher weight on the public odds as a factor.

In actuality I bet mostly exactas but given odds lines for the 1st and 2nd positions I calculate the required payoff and bet when the payoff shown on the board is as required.

classhandicapper

03-09-2008, 01:30 PM

Benter had over 120 factors tested and those were whittled down to the best 20 and he still did not get sufficient agreement with the public consensus - he lost. It is agreement with the average consensus for each horse over many races that is important in the modelling. It is disagreement in a single race where the profits lie. When he added the last minute consensus as the 21st factor that added about 25% to the evidence and was far better than any other single form factor.

The point that form and hiring the best brains in statistics, handicapping and software still does does not do as well as the consensus after 5 years of effort is the amazing thing - there was no identifiable single factor from public form that was underbet. The consensus were presumably not using any of those sophisticated techniques or it would have showed on the PMU board. How the public actually achieve such good results with such little input is a key area that is hardly touched upon. Systems only work because "the public does make some mistakes by underestimating or overestimating some factors". Identifying those factors is the other half of the issue - so having a logit model that is neutral in its bias gives a platform to identify the public's current biases towards error.

Exploiting when the public price for any horse was a little too long was the exact area Benter went into - it was the whole purpose of his efforts. Exactas and exotics were other tools in his exploitation kit and the overlay percentages compound on those bets.

My guess is that one of Benter's initial problems was that even with 120 factors he couldn't do some of the fine line detailed analysis that individual races often require. That's what I was trying to say earlier in the thread. It is "theoretically possible" to program my handicapping, but it is "practically" impossible.

I could give you a specific profitable "circumstance" within a single factor (trainer, class etc...), but you'd never find it without digging really deeply and understanding the specifics of the situation. They don't come up all that often.

IMHO, the combinations of possibilities are almost endless.

In addition, what the public (and Benter) does not have in the PPs is inside information about the horse's current condition (things that may have changed since his last start), potential juicing, the ability of a FTS, whether a layoff horse is ready or not etc...

Most insiders may not win, but they do have access to information that is not in the PPs and they also bet (some heavily). So they can make the odds board more efficient from time to time. IMO, that's another thing that is hard to program even if you are trying to consider it.

I think there are a lot of terrible horse players, but most of them bet very small. Then there are very good horse players and insiders that adjust the odds to more reasonable levels. Some of them are big bettors (but even big bettors don't know everything).

TrifectaMike

03-09-2008, 01:31 PM

What I have failed to see is a pronouncement within this thread on how to exploit the odds line. Making an odds line that is consistent with the toteboard might be a good academic exercise, but it is not putting money in your pocket.

I never look at odds until I am ready to bet and at that time my wagering choice must be 3-1 or better or I do not bet. I am never intimidated by the public because it is my job to beat them more times then they beat me.

Therefore write the equation for the running curve of a horse and that curve is typically nonlinear and downward sloping. Sum all of the curves of each horse in the race and that aggregate curve would typically be the race profile curve. Then compare each entrant against that curve and typically the best fit will be the winner of the race, the public does that 30-34 percent of the time at any track in North America. You the bettor have 66-70 percent opportunity to exploit the public.

I have many times asked myself the question... why haven't any of the pace "types" explored a growth decay type continuous fumction for velocity as a function of distance, where the constants are related to interior calls of the race.

sjk

03-09-2008, 01:34 PM

How would you know whether they have or have not done so?

TrifectaMike

03-09-2008, 01:38 PM

How would you know whether they have or have not done so?

I actually don't. My comment is based on the literature, and discussions by pace types and race segments defined in terms of average velocities.

sjk

03-09-2008, 01:42 PM

Has there been anything new and useful in the literature in the past decade or so?

You would have to think that people are having more success developing methods for their own use than would be apparent from what has appeared in print.

classhandicapper

03-09-2008, 02:23 PM

I have many times asked myself the question... why haven't any of the pace "types" explored a growth decay type continuous fumction for velocity as a function of distance, where the constants are related to interior calls of the race.

Could you explain exactly what you mean here? Maybe give me an example.

I've done a lot of tinkering with pace figures for a very long time, but I can't give you any input unless I understand the question. ;)

Thanks.

asH

03-09-2008, 02:56 PM

Has there been anything new and useful in the literature in the past decade or so?

You would have to think that people are having more success developing methods for their own use than would be apparent from what has appeared in print.

If I may answer your first question, in my opinion -nothing. I does appear that most are playing 'close to the vest' these days..Your statement seems to be spot-on, the board and odds of the winners suggest most maybe using the same 'play theories'.... And then there are the days when things fall apart (Aqu Friday).

Jeff P

03-09-2008, 03:21 PM

I have many times asked myself the question... why haven't any of the pace "types" explored a growth decay type continuous fumction for velocity as a function of distance, where the constants are related to interior calls of the race.I can assure you some of us pace "types" have done this to a degree. But understand this is not an easy thing to implement. Decay varies. Decay is a function of a horse's stamina, velocity, distance, pace pressure, and track weight. Throw wind in there too - but I haven't made a serious attempt to model that yet.

Most players would tend to agree that decay at 6f is different than decay at 10f.

Most players would tend to agree that decay in a highly pace pressured race is different than decay in a paceless race.

But where many players tend to disagree is over the concept of track weight... whether or not track bias even exists... the idea that decay at 6f on one track can be vastly different than decay at 6f on another track... I can say from experience that is precisely what makes modeling decay worthwhile.

-jp

.

asH

03-09-2008, 03:50 PM

I can assure you some of us pace "types" have done this to a degree. But understand this is not an easy thing to implement. Decay varies. Decay is a function of a horse's stamina, velocity, distance, pace pressure, and track weight. Throw wind in there too - but I haven't made a serious attempt to model that yet.

Most players would tend to agree that decay at 6f is different than decay at 10f.

Most players would tend to agree that decay in a highly pace pressured race is different than decay in a paceless race.

But where many players tend to disagree is over the concept of track weight... whether or not track bias even exists... the idea that decay at 6f on one track can be vastly different than decay at 6f on another track... I can say from experience that is precisely what makes modeling decay worthwhile.

-jp

.

ah Jeff, I believe you just simplified the model

sjk

03-09-2008, 04:00 PM

He made it more complicated for me.

I have taken the point of view that a horse that runs 7 furlongs 2 lengths faster than par in each of the first two segments and then 4 lengths slower than par in the stretch had the same difficult trip irrespective of what track he was running at. I guess Jeff is saying that over some tracks this is not such a tough trip after all.

It does seem reasonable but I'm not sure how I would go about putting it to use in practice.

jfdinneen

03-09-2008, 04:34 PM

I have many times asked myself the question... why haven't any of the pace "types" explored a growth decay type continuous fumction for velocity as a function of distance, where the constants are related to interior calls of the race.

Mike,

With respect, the issue of pace is off-topic in the current thread. Perhaps you could start a new topic and, by way of contribution, I recommend you review the ideas presented in Steve Romans article on "Time Distance and Fatigue.." (http://www.chef-de-race.com/articles/fatigue.htm)

John

(http://www.chef-de-race.com/articles/fatigue.htm)

Overlay

03-09-2008, 04:57 PM

What I have failed to see is a pronouncement within this thread on how to exploit the odds line. Making an odds line that is consistent with the toteboard might be a good academic exercise, but it is not putting money in your pocket.

I never look at odds until I am ready to bet and at that time my wagering choice must be 3-1 or better or I do not bet. I am never intimidated by the public because it is my job to beat them more times then they beat me.

Like you, I don't look at public odds until after I've arrived at my own line. My personal goal in making the line is not to be consistent with the tote board, but to select factors and weights that most closely approximate how horses will actually perform. As mentioned previously, this will often mimic the tote board to a great degree because of the public's overall record of accuracy. But it will also provide a basis for identifying and capitalizing on those individual occasions when the public has misjudged, and that is where the usefulness of the line in making money comes in. Your acceptable odds serve the same purpose, based on your experience with the return needed to produce an overall profit with your particular handicapping model. A full-field odds line just expands this and permits visibility of such opportunities at all odds levels.

TrifectaMike

03-09-2008, 04:58 PM

Mike,

With respect, the issue of pace is off-topic in the current thread. Perhaps you could start a new topic and, by way of contribution, I recommend you review the ideas presented in Steve Romans article on "Time Distance and Fatigue.." (http://www.chef-de-race.com/articles/fatigue.htm)

John

(http://www.chef-de-race.com/articles/fatigue.htm)

Thanks for the link. Excuse my wandering mind.

Mike

robert99

03-10-2008, 11:42 AM

What I have failed to see is a pronouncement within this thread on how to exploit the odds line. Making an odds line that is consistent with the toteboard might be a good academic exercise, but it is not putting money in your pocket.

Cratos,

Possibly a very simplified example might help anyone interested.

It is important not to confuse the overall long term efficiency (100's of races) of the public consensus with their efficiency in any one race. Something which economists often get wrong.

So a 3/1 odds public horse might win 100/(3 +1) = 25 races in a 100, or 21, or 26, whatever, but the average will always be close to 25.

Getting back to a particular race:
Joe Public might believe that the 3/1 Tote Board horse has a 25% chance of winning. (Neglect any takeout for simplicity here).

Actually, that same horse has an unknown true chance of winning but its true chance might lie somewhere (for a sophisticated market) in a range between 17% and 33%, say, (average is still 25%). In other words, the 17% true chances will win around 17 races in a 100, the 33% true chances will win 33 races in a 100, so the 3/1 Tote Board priced horses will win 17 + 33 = 50 races in 200, and the average still remains at around 25%. The true price range remains completely hidden. This happens however good or bad the local track public consensus is - it is the range above and below 25% that varies and is the key thing to understand.

The odds-line model man has used hundreds of races to get his averages over those races and predicts that same 3/1 horse, against these other horses, has a long term chance of 32% (that is he expects to win 64 times in 200 races but all wins paid as 3/1 shots, so unit profit is +56 points).

He has no idea of the exact true odds in the actual race but knows, long term, he is on the right side of 25% long term and bets only on those horses.

He does not bet on any 3/1 horses where his model predicts less than 25% chance. Further, if the consensus in that race should shorten to 2/1 (33% - the high end of the 17- 33% range), he misses out all those bad bets. If the market lengthens to the low end 17%, about 5/1, he has a chance to make additional profit in that race.

All the model has done is give an unbiased long term view of a horse's chance and given evidence of when to avoid a bet or to make it.

As explained previously, Benter team ramped up (partial Kelly) these small advantages on exotics as well as the win bets. Alan woods who funded Benter has been reported to have cleared HK$150 Million over 16 years. There is probably a "me too" computer outfit operating this at your nearby track.

singunner

03-10-2008, 03:53 PM

One of the reasons Benter can be so successful in HK is because he isn't betting against the "wisdom of crowds". There are multiple teams betting based on computer models in HK that make up almost the entirety of the mutuel pools.

That is to say, you don't have to beat the public, you just have to beat the majority of the other computer outfits. Of course, when he started, he was facing a highly superstitious public (the home of feng shui, where a hotel was built with a huge hole in the middle to allow "the dragon of the mountain" to pass through), but his success created a whole new market consisting almost entirely of algorithm-determined wagers. I'd imagine that the variance for this new market is much wider than it would be in America, where so many smaller wagers make up the pools.

Dave Schwartz

03-10-2008, 06:32 PM

That is to say, you don't have to beat the public, you just have to beat the majority of the other computer outfits.

You believe that to be easier than playing against the $2 public?

Fastracehorse

03-10-2008, 07:33 PM

Well, make it a fair test. Have the computer analyze the same information you are -- i.e. the horse's entire recent record. Why should the computer be restricted to a few summary stats?

Will the human do it better given the same data? In many cases, yes, but the computer can process many more races than the person can, and so even if the advantage of the computer is not as great as the person on a particular race the computer can identify many more playable races. In this way, the computer can be just as "effective" (or more so) than a single handicapper when considering total performance on "the races" rather than just a single race. Of course all this depends on the particular humans and computer programs involved, but I never understand why people always say things like, "Well the computer can't consider this factor or that factor." Why can't it? (Physical inspection of the horse computers can't do, but it can consider all the same stuff in a horse's or a trainer's record that a person can.)

Is a very important part of handicapping.

Puter 'cappn lacks that ability to capotilize on short-term factors no?

fffastt

GameTheory

03-10-2008, 07:47 PM

One of the reasons Benter can be so successful in HK is because he isn't betting against the "wisdom of crowds". There are multiple teams betting based on computer models in HK that make up almost the entirety of the mutuel pools.

That is to say, you don't have to beat the public, you just have to beat the majority of the other computer outfits. Of course, when he started, he was facing a highly superstitious public (the home of feng shui, where a hotel was built with a huge hole in the middle to allow "the dragon of the mountain" to pass through), but his success created a whole new market consisting almost entirely of algorithm-determined wagers. I'd imagine that the variance for this new market is much wider than it would be in America, where so many smaller wagers make up the pools.You've got in backwards. That's why Benter *stopped* operating in HK, because his success was diminishing from all the computer-team competition, some of which were Benter-group defectors who took a copy of his program with them.

GameTheory

03-10-2008, 07:54 PM

Is a very important part of handicapping.

Puter 'cappn lacks that ability to capotilize on short-term factors no?
Absolutely not -- why can't the computer capitalize on short-term factors? All it takes is a willingness to program it to do so.

And I think there are people that try to quantify physical factors and possibly even enter those factors into a computer right before post time. (Although I don't know of any, it is certainly possible.) I've got a book that explores this (among other things), showing the results of data samples where the author observes certain tail and ear positions etc in the paddock or post parade. And Benter I believe incorporated subjective trip handicapping into his program, having people watch races and then assign subjective scores of some kind for the trips.

Fastracehorse

03-10-2008, 08:08 PM

Absolutely not -- why can't the computer capitalize on short-term factors? All it takes is a willingness to program it to do so.

And I think there are people that try to quantify physical factors and possibly even enter those factors into a computer right before post time. (Although I don't know of any, it is certainly possible.) I've got a book that explores this (among other things), showing the results of data samples where the author observes certain tail and ear positions etc in the paddock or post parade. And Benter I believe incorporated subjective trip handicapping into his program, having people watch races and then assign subjective scores of some kind for the trips.

The title isn't rhetorical but 'cool!' on the rest.

fffastt

singunner

03-10-2008, 08:20 PM

"You believe that to be easier than playing against the $2 public?"

I'm actually working on an analysis of the HK races right now, so I'm hesitant to speak authoritatively one way or the other, but I find it easier to theorize an advantage for a large group of individuals betting based on every variable under the sun versus a small number of mammoth bettors using programs to determine how to wager. In my mind, it's a quesetion of more competition going after every dollar.

There are obviously strengths and weaknesses to each group. I'd be interested to know which market has better accuracy and precision. Does anyone have data for the HK circuit?

Dave Schwartz

03-10-2008, 08:52 PM

Well, I defer to your experience and hungrily await your updates.

Most interesting.

Dave

GameTheory

03-10-2008, 09:33 PM

There are obviously strengths and weaknesses to each group. I'd be interested to know which market has better accuracy and precision. Does anyone have data for the HK circuit?I believe you can get all the data you want from the Hong Kong Jockey Club. For free, too, I think...

46zilzal

03-10-2008, 09:47 PM

I think it is a hoot listening to the Aussie crew talk at the Hong Kong simulcast. Not a thing about pace, post position, training schedule, previous trips and possible trouble. ALL Physicality and we know there is one heck of a lot more to a horse race than that.

bigmack

03-10-2008, 09:51 PM

Like you, I don't look at public odds until after I've arrived at my own line.
If I might ask, O'Lay, are your lines compooter generated?

asH

03-10-2008, 10:29 PM

doesn't anyone handicap races any more these days ...:D

rokitman

03-10-2008, 11:00 PM

I believe you can get all the data you want from the Hong Kong Jockey Club. For free, too, I think...
http://www.hkjc.com/english/hrc/hindex.asp

singunner

03-11-2008, 03:06 AM

Thanks for the info on the HKJC. I've actually been developing a little PHP to scrape the info off their pages into a nicer file for a little while now, but this conversation prompted me to look into them a little further. I imagine it's nice to have a non-profit taking care of things, but I wonder if that would bring an increase to political pressures. At least on the surface, the HK races are really something.

steveb

03-11-2008, 05:31 AM

I think it is a hoot listening to the Aussie crew talk at the Hong Kong simulcast. Not a thing about pace, post position, training schedule, previous trips and possible trouble. ALL Physicality and we know there is one heck of a lot more to a horse race than that.

and strangely(?), isn't it the aussies making all the money in hk??? :cool:

signed .......an aussie! :)

Overlay

03-11-2008, 06:33 AM

If I might ask, O'Lay, are your lines compooter generated?

They could be, but in my case they aren't. (That is, although it wouldn't be difficult to convert my algorithm to a program format, I wanted to leave it accessible to those working without a computer.)

Fastracehorse

03-11-2008, 02:45 PM

I think it is a hoot listening to the Aussie crew talk at the Hong Kong simulcast. Not a thing about pace, post position, training schedule, previous trips and possible trouble. ALL Physicality and we know there is one heck of a lot more to a horse race than that.

I'll play a horse strictly on physicality.

fffastt

ezpace

05-03-2008, 10:53 AM

it brought me out of hibernation LOL

cappin for 40yrs, still know less than you guys .LOL

Financial markets ,hracing, sports punting are my things (successful) futures ,forex, indexes,baseball,feetballs

my markets.

For those that like visual aids (I use daily) I think the ultimate

cappin tool for me would be a "CHART" of three to five factors with a* MOVING REGRESSION* line of each that projects only the next variable(race)

ie... all past performances would have one regression line, another regression line of much shorter duration would show current form* on each factor.

i know it would work ...it does on almost everything ,I'm

hoping cj or somebody that can write code will put it together

time series= moving* regression is the ultimate IMVHO

...ezpace

.

bobphilo

05-03-2008, 03:28 PM

Bolton and Chapman applied logit regression to races using multiple factors similar to those handicappers might use. They used 500 races to find the statistical average weighting to those factors. That data took months to collect.

What to do though if you already do know the individual probabilities for those individual factors to start with (that work has been done for larger samples than 500), but don't immediately know how they should be mixed together with all the other factors (the old add them all up or multiply them together issue). What methods give that best fit?

If I know for example the average win probability of a last time out winner was 17%, and that horse is trained by a trainer who averages 11% winners, and etc etc - how to combine all those factors into an equation that estimates today's overall probability.

Would a multiple factor Bayes Theorem approach be better than logit, the theorem states that if Hn is one of a set Hi, of mutually exclusive and exhaustive events, then P(Hn|D) = P(D|Hn)P(Hn)/Σi[P(D|Hi)P(Hi)].

That is complex enough for a single factor of given D and how does that actually apply to my example handicapping factors?

Any advice on the best way to proceed or good references?

This literally screams out for a meta analysis, which was created for combining studies like in your situation. I used meta analysis methodology for my Masters thesis on a different topic - the Effectiveness on Nonoxynol-9 in the Prevention of HIV Transmission, where there was a great deal of controversy surrounding the World Health Organization’s unfair decision to come out against the medication.

Meta analysis is a great way to combine different studies of varying sample size and weighting them appropriately. There have been a lot of good books on meta analysis and I suggest Googling the term to find some good sources. Good luck.

Bob

gm10

09-05-2008, 07:17 AM

As Robert identified, one of the significant variables that surfaced from Benter's initial analysis was 'number of previous runs' - the lower the better. This factor is easily explained in terms of sampling sizes. The more often a horse runs, the more likely its ability will be reflected in the average of its past performances (see Law of Large Numbers). By contrast, horses that have five or fewer runs are more likely to spring a surprise (either positive or negative) - think maidens second time out! This also explains why, in most jurisdictions, horses are required to run at least three races before they are given an official handicap rating.

John

I disagree with you. You are quoting it correctly but the interpretation is wrong. He optimizes his logit model by performing a maximum likelihood estimation to find the weights of his factors. This MLE maximizes the product of all the probabilities of the winners in your sample. It does not take profitability into account. It just 'finds' the weights of the factors so that you will have the most winners.

I do agree with what you are saying though, races with unexposed horses are less predictable for the majority of (speed) handicappers and there is a pocket of opportunity there.

gm10

09-05-2008, 07:43 AM

Benter had over 120 factors tested and those were whittled down to the best 20 and he still did not get sufficient agreement with the public consensus - he lost. It is agreement with the average consensus for each horse over many races that is important in the modelling. It is disagreement in a single race where the profits lie. When he added the last minute consensus as the 21st factor that added about 25% to the evidence and was far better than any other single form factor.

The point that form and hiring the best brains in statistics, handicapping and software still does does not do as well as the consensus after 5 years of effort is the amazing thing - there was no identifiable single factor from public form that was underbet. The consensus were presumably not using any of those sophisticated techniques or it would have showed on the PMU board. How the public actually achieve such good results with such little input is a key area that is hardly touched upon. Systems only work because "the public does make some mistakes by underestimating or overestimating some factors". Identifying those factors is the other half of the issue - so having a logit model that is neutral in its bias gives a platform to identify the public's current biases towards error.

Exploiting when the public price for any horse was a little too long was the exact area Benter went into - it was the whole purpose of his efforts. Exactas and exotics were other tools in his exploitation kit and the overlay percentages compound on those bets.

Completely agree with you.
So far, I've tried to add the Morning Line price and while this increased the accuracy of my estimates by quite a bit, it did not improve my ROI.
I think that adding live odds would improve the ROI but so far I haven't seen any downloadable Tote that didn't require web scraping. There is Betfair but those odds just follow the actual Tote odds.

gm10

09-05-2008, 08:10 AM

You've got in backwards. That's why Benter *stopped* operating in HK, because his success was diminishing from all the computer-team competition, some of which were Benter-group defectors who took a copy of his program with them.

No, it's because he fell out with the HK authorities over not paying any taxes on his winnings.

Dave Schwartz

09-05-2008, 11:05 AM

No, it's because he fell out with the HK authorities over not paying any taxes on his winnings.

According to one of my clients (and former Benter employee now working for another team), he quit playing because his advantage had gone away.

Seems that every guy who ever worked for him took a copy of the software (written in FoxPro) with him when he left. Each year the return dropped a couple of points until he checked out at +2%.

BTW, as a side note, the fact that there are so many copies of Benter's software around has caused an interesting phenomenon in the Asian racing world: Everybody uses FoxPro! :lol:

Dave

rrbauer

09-05-2008, 11:34 AM

Absolutely not -- why can't the computer capitalize on short-term factors? All it takes is a willingness to program it to do so.

And I think there are people that try to quantify physical factors and possibly even enter those factors into a computer right before post time. (Although I don't know of any, it is certainly possible.) I've got a book that explores this (among other things), showing the results of data samples where the author observes certain tail and ear positions etc in the paddock or post parade. And Benter I believe incorporated subjective trip handicapping into his program, having people watch races and then assign subjective scores of some kind for the trips.

Sat with a guy at Los Al, watchng simulcasts a few years back, who was using a small PC and entering factors that represented jockeys' body language when they came on the track for the post parade. I guess I was asking him too many questions because he got up and moved before I could see the results of his "AI" analysis. So, as GT says, there are people doing analysis from last-minute observations.

mwilding1981

09-05-2008, 11:41 AM

GM10 I think I may have read you wrong but Betfair odds don't follow Tote odds. Exchange prices are made up on the amount of money to back and lay.

jfdinneen

09-05-2008, 12:05 PM

...You are quoting it correctly but the interpretation is wrong...

gm10,

It seems that you have misinterpreted my post. In which case, I was not clear enough in my explanation.

The statement on small sample sizes is statistically correct on its own merits - it was not meant to reflect how Benter used "number of past races" in his model. In fact, it was a comment on his admission that he had no idea why that particular variable was significant ("The author knows of no 'common sense' reason why this factor should be important..." [Ziemba et al (1994), pp. 186].

John

jasperson

09-06-2008, 05:56 AM

<I certainly understand the point about volume of bets. A computer can certainly handicap way more races. So if you assume there are many profitable situations out there, a computer should find way more of them.

The one minor question I have is whether a computer with "less than human skill" can actually translate into finding more "profitable" bets even though it is handicapping more races. It may be finding more bets, but producing a much lower ROI because of a bunch of extra unprofitable plays that resulted from less skill.

I'm not sure a computer can make the fine line subjective judgments about the details of a horse's total record (and I say that as a former computer programmer with some understanding).

I'd rather not re-debate that issue though.

I'd rather that someone that uses a computer to make an odds line explain how they overcome some of the difficulties because I may be able to apply some of their insights to my own human analysis.

Here is the approach I use with my oddsline program. First I make a printout of most of the variables that I would normally use handicapping if I were handicapping the race strictly from the pp lines. Such as speed last race, earning per/start, speed aveage last 2, last 4 speed ratings pace average last 2, best speed @ distance (speed@distance) and the trouble line of last race. In total I have a print out of 15 factors on the race for each horse. This plus knowing the strengths and weakness of the program I can do a pretty good job of knowing if I want to bet the race or not. The weaknesses are turf races and maiden races with a lot of first timers. I don't know who the jockeys and trainers are but the program adds the winning percentages together and prints it out.

gm10

09-06-2008, 07:38 PM

GM10 I think I may have read you wrong but Betfair odds don't follow Tote odds. Exchange prices are made up on the amount of money to back and lay.

I use Betfair every day. The odds follow the Tote odds. You are right about what exchange odds are, but this is what the result is.

There are a few exceptions now and then. Usually they are favs which are much higher on Betfair than on the Tote. And they are usually losing favs. I'm not going to name names, but some American trainers obviousy use Betfair and bet their horse to lose.