Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > Handicapping Software


Reply
 
Thread Tools Rate Thread
Old 04-10-2002, 01:14 PM   #1
GameTheory
Registered User
 
Join Date: Dec 2001
Posts: 6,128
Bagging & Stacking

[WARNING: CURE FOR INSOMNIA BELOW]

A few of you inquired via email what "bagging" & "stacking" were, which I mentioned in the Factor Models thread. The answer is incredibly long-winded, so I figured I'd post it here & see if I could get the prize for the MOST SLEEP-INDUCING POST EVER.

I don't have enough experience with them yet to intelligently comment on how well they work, but here's the gist of the theory:

Ok, so we have a "training set" of examples (a bunch of race data) that we want to use to create a "predictor" (something which will try to predict future races from new data). We probably also have a "validation set" (data not used in training to see how well our predictor does on data it hasn't seen).

The predictor can be anything: a neural network, a classifier system, a regression tree -- anything that can be trained with the training set and be fed new data in order to predict. The final predictor may just be a mathematical formula, a set of weights to apply to each factor, whatever.

Normally, when creating a predictor, you'd just train it on the training set, and that would be that. You might also use techniques like "early stopping" to avoid over-training so it generalizes better. (Stopping the training at the point that the error on the validation set stops going down & starts going up again.)

Bagging & stacking are techniques designed to further improve generalization accuracy. They are both "ensemble" methods that involve somehow combining the results of multiple predictors to get your final prediction.

Bagging is short for "Bootstrap Aggregating", which means "combining the results of multiple bootstrap replicates." What's a bootstrap replicate? Take your training set, and create another training set of the same size by randomly picking samples out of the original. Do this WITHOUT regard to whether you repeatedly pick some of the same samples. You'll end up with 60-something percent of the original in the replicate, with the other 30-something percent being duplicates. Make 10 to 100 of these replicates, all different. Now, make a predictor from EACH one of these replicates using your normal training procedure. (You could even use different learning algorithms -- one neural network, one regression tree, etc as long as they all attempt to predict the same thing.)

So now you've got multiple predictors, all made from replicates (none from the whole original training set). To get your final prediction, you do some sort of averaging (or "voting") of the individual predictions to get your final prediction. Whew!

Bagging only works with "unstable" learning algorithms like neural networks or regression trees where small changes in the training set & initial conditions can lead to totally different predictors. With algorithms like "nearest neighbor" where it tries to find the most similar examples in the model it will make performance worse because they are "stable" and you are better off using the biggest single training set you can.


So what's stacking? It is a fancy way of combining the results of multiple predictors, although it can also be used with a single predictor. It is well-suited to use in combination with bagging. Let's say we've got 10 "bagged" predictors, except that when we made them we saved a portion of the training set that wasn't used at all. Now instead of just averaging the output of our 10 predictors to get our final prediction, we do this: we take that leftover bit of training set, and feed it to our 10 predictors. For each sample then, we get 10 outputs. In simple bagging, we'd just average them and call that our prediction. But instead, we take those 10 outputs as INPUTS in a new training set, but using the original output variable still as our "target" (or dependent variable, i.e. that thing we are trying to predict). So we train a new predictor that way, one step removed from the original training set, but still trying to predict the same thing. You can do this for multiple levels if desired, hence the name stacking. The top level predictor is the one you use to get your final prediction. The idea is that it will pick up the biases of the 10 predictors toward the original training set, and correct for them.


The literature on these two techniques claims significant generalization improvement from bagging over using a single predictor (usually by 20% or more), and with stacking added on a further improvement by a factor of two, with no result worse than simple bagging alone. Like I said, I don't know how well it is going to work on my stuff yet. You regression guys might want to give a whirl and see what happens...

[END OF LONG & BORING POST. RETURN TO YOUR LIVES.]
GameTheory is offline   Reply With Quote Reply
Old 04-10-2002, 01:24 PM   #2
PaceAdvantage
PA Steward
 
PaceAdvantage's Avatar
 
Join Date: Mar 2001
Location: Del Boca Vista
Posts: 88,600
Long and boring?? Far from it....thanks for the food for thought!!


==PA
PaceAdvantage is online now   Reply With Quote Reply
Old 04-10-2002, 01:34 PM   #3
GameTheory
Registered User
 
Join Date: Dec 2001
Posts: 6,128
I should have added the caveat that IF you find it interesting, as I do, then YOU are probably boring, as I am. In any case, I haven't had much luck picking up chicks with this sort of topic...
GameTheory is offline   Reply With Quote Reply
Old 04-10-2002, 11:40 PM   #4
Jaguar
Registered User
 
Join Date: Mar 2002
Location: Hamden, CT
Posts: 420
Bagging and Stacking-vis-a-vis neural nets

My experience in handicapping with several different neural net programs was disappointing.

It may be that because neural nets are so subtle in their analytic approach, that they are unable to capture elements which are uniquely powerful predictors.

For example, we know that if we take Nunamaker's impact values and enhance them by adding pace analysis, we get a killer app., when we make models based on linear regression analysis.

Whereas, with neural nets, we are by definition trying to discover very esoteric patterns.

In other words, while neural nets are brilliant at some types of analysis, when it comes to horse handicapping, they can't find the forest because they can't see beyond individual trees, or groups of trees.

Linear regression analysis seems a better mode of measuring a horse's performance potential, because regression weights large, discrete chunks of data, which might not even be observed by neural nets- which are looking for subtle threads.

In fact, vector analysis by itself, is a stronger method of handicapping than a combination of neural nets. Furthermore,
if we take vector analysis(acceleration-deceleration measured in fps) and combine it with a measurement of the horse's form cycle(23-day cycle- See David Brown's study) and add to that a "consistency within class" measurement, we wind up with a pretty darn good handicapping method, even without using Nunamaker.

Add Nunamaker, and you're in business.

The case against neural nets as applied to horse handicapping is best proved by RaceCom's program Analog 2.0, a huge and expensive flop.

Jaguar
Jaguar is offline   Reply With Quote Reply
Old 04-11-2002, 01:03 AM   #5
Dave Schwartz
 
Dave Schwartz's Avatar
 
Join Date: Mar 2001
Location: Reno, NV
Posts: 16,909
Jag,

Who is David Brown?

Dave
Dave Schwartz is online now   Reply With Quote Reply
Old 04-11-2002, 01:58 AM   #6
Jaguar
Registered User
 
Join Date: Mar 2002
Location: Hamden, CT
Posts: 420
Bagging and Stacking reply

Dave, the late David Brown was quite a guy. He was an amateur and enthusiastic student of science, as applied to horse and greyhound handicapping.

David Brown was also an accomplished propeller plane pilot, a successful insurance executive, a millionaire, and a gentleman.
He was- due to press coverage- widely known in the 1970s and 1980s as the most successful greyhound handicapper of the day.

Though modest by nature, David Brown would- on occasion- show a skeptic some of his IRS Form 99s issued to him by the greyhound tracks he frequented in his home State, Florida.

He was a pioneer in using a personal computer for handicapping.

In the days before legal lasix and bute, David Brown's studies in form cycles were very useful. Today form cycles have been somewhat adulterated by legal and illegal drugs, and no longer have the standalone utility they once had for the handicapper.

For example, not too long ago, 85% of winning horses had raced within 15 days of today's race.

Nowadays, juiced-up animals coming back after a 45-day layoff,
can sweep the field.

David Brown's achievements probably belong to the age that he lived in, the pre-internet era. But, looking back from today's vantage point, he was a heckuva handicapper for his day, a very wise man, and a great human being.

David Brown died around 1989. Anyone who knew him thinks of him fondly.

He certainly would have enjoyed meeting you.

Jaguar
Jaguar is offline   Reply With Quote Reply
Old 04-11-2002, 06:47 AM   #7
GameTheory
Registered User
 
Join Date: Dec 2001
Posts: 6,128
Yeah,

I haven't had much success with getting neural nets to much of anything using standard "backprop" training. I've had better success using genetic algorithms to create the neural net because you can really steer it where you want it to go, but I think NN's have been an over-hyped & under-performing approach in general...
GameTheory is offline   Reply With Quote Reply
Old 04-12-2002, 06:36 AM   #8
Arkle
Registered User
 
Join Date: Sep 2001
Posts: 14
Re: Bagging and Stacking-vis-a-vis neural nets

Jaguar, I thought I read somewhere that Racecom's Analog 2.0 only did a post position analysis based on track, distance, and size of field?

If that's true, then it didn't have much chance of success to begin with.


Quote:
Originally posted by Jaguar

The case against neural nets as applied to horse handicapping is best proved by RaceCom's program Analog 2.0, a huge and expensive flop.

Jaguar [/B]
Arkle is offline   Reply With Quote Reply
Old 04-12-2002, 09:34 AM   #9
wes
Registered User
 
Join Date: Nov 2001
Location: bama
Posts: 687
Bagging & Stacking

Jag

Did Mr. Brown have the SQUARE system for handicapping or was that some one else?


wes
wes is offline   Reply With Quote Reply
Old 04-12-2002, 09:38 AM   #10
Jaguar
Registered User
 
Join Date: Mar 2002
Location: Hamden, CT
Posts: 420
Reply to Arkle re: RaceCom's Analog 2.0

Arkle, when RaceCom's Analog 2.0 appeared it was the successor to their earlier pace analysis software, but Analog 2.0 went way beyond that.

Analog 2.0 was touted as "the answer" for horse handicappers because it was produced by the men who revolutionized financial market analysis.

Joseph and David Shepard were the first computer scientists to use neural nets for the analysis of securities and other financial instruments. A book was written about them and their products were a sensation in the early and middle 90's.

While they were making a fortune selling to investment firms, they continued to issuing half-hearted handicapping programs, which evolved into Analog 2.0.

The Analog 2.0 concept of pace analysis was very sophistcated and the idea of training and modeling behind it was brilliant. The Achilles heel of the product was that it used very subtle neural nets.

The end result was that one horse could not be compared to another horse at today's class and distance.

Their current program, Analog 5.0 has corrected all that and is the state-of the art handicapping program on the market. No other program comes close to it in it's predictive power.

Anyone who bought Analog 2.0 is astonished.

Jaguar
Jaguar is offline   Reply With Quote Reply
Old 04-12-2002, 09:52 AM   #11
Jaguar
Registered User
 
Join Date: Mar 2002
Location: Hamden, CT
Posts: 420
Reply to Wes

Wes, David Brown's first handicapping method involved analyzing horses and dogs in groups or "teams"- as in PP 1-4 and 4-8- for example.

Once he had selected the best animal in each of 2 or three segments of the race, he "baseballed"or "key wheeled" those entrants in his bet, to wit: 1/234/234 and 234/1/234 etc.

He used sound principles for selecting his key animals, but his handicapping made a quantum leap of improvement when he applied Huey Mahl's thereom of measuring expended energy.

David Brown had to withdraw his excellent hand-held Sharp handicapping calculator from the market, due to a lawsuit from a jealous handicapper.

Had he lived, his superb calculator would have been, I am sure, re-issued and revised, and many handicappers would be taking it to the track and OTB today.

David's widow told me that the calculator will not be sold. Pity.
Jaguar is offline   Reply With Quote Reply
Old 04-12-2002, 02:54 PM   #12
Arkle
Registered User
 
Join Date: Sep 2001
Posts: 14
Re: Reply to Arkle re: RaceCom's Analog 2.0

Jaguar:

According to their website, RaceCom is about to issue a new version of their handicapping program; their pricing structure is unclear - at least to me, other than that the software is expensive.

There doesn't seem to be any track record available on the website. I note that one of their conditions of sale is that there is no returns policy; you buy it, you're stuck with it.

Given the number of dabblers out there, I can understand this, but surely if anyone is thinking of buying it under these circumstances, they would have to have some form of record of past accomplishments?

Seems strange to me.

By the way, thank you for your patience in answering these queries; I'm sure they've been dealt with many times before here and elsewhere.

Regards,

Arkle




Their current program, Analog 5.0 has corrected all that and is the state-of the art handicapping program on the market. No other program comes close to it in it's predictive power.

Anyone who bought Analog 2.0 is astonished.

Jaguar [/B][/QUOTE]
Arkle is offline   Reply With Quote Reply
Old 04-12-2002, 11:04 PM   #13
Jaguar
Registered User
 
Join Date: Mar 2002
Location: Hamden, CT
Posts: 420
Reply to Arkle Re: Racecom Software

Arkle, until recently RaceCom was updating their results page on their website.

I assume that the imminent release of their latest update for Analog 5.0- which is called "Analog 7.0"- has caused them to lose interest in promoting the older software.

When dealing with RaceCom, one has to remember that the company has an installed customer base which has reached critical mass.

In other words, the expense and relative awkwardness of using a RaceCom handicapping program, has very likely resulted in RaceCom winding up in a low-growth stage.

Few people can afford RaceCom's products and services, and those people who can afford to be a RaceCom customer are dedicated to the company and it's products.

While RaceCom burned out so many of their original customers in the last 15 years, they have acquired an entirely new group of customers- who know nothing of RaceCom's wretched earlier handicapping- and this new group is profiting mightily from Analog
5.0.

The reason I say that it is less than convenient to use RaceCom's handicapping products, is because while heretofor the company sold their programs on floppies- today they have a policy of emphasizing their in-house race-training and handicapping capability.

RaceCom has an enormous horse race database and they make incredibly accurate models. They even offer a "deluxe" handicapping service which will allow customers access the company's server and train on it. Amazing.

Please note that Racecom's very brief comments on their website seem to indicate that their planned policy of no longer selling the discs(CD's of course, I assume these days- as opposed to floppies) has been reversed, and that RaceCom will continue to sell their software.

For a while, it appeared in recent years that RaceCom would simply allow their customers to access the company's server for a monthly, 6-month, or annual fee- as opposed to selling the software itself.

I have not been a RaceCom customer since the middle 90's, but I may be again someday. They win 66% of their trifectas.

That is, when they indicate which horses to bet in a trifecta, that bet is a winner two-thirds of the time. I have tracked their output for almost a year, and their accuracy is mind-boggling.

While the typical, pace-oriented, traditional programs pick an occasional brief series of winners at different tracks, their results are sporadic.

RaceCom's models just dominate horse racing in a way that old handicappers like myself have never imagined would be possible.

I am a former system developer myself, and I am an extremely harsh judge of the sloppy and inadequate programs on the market.(Anyone who is not using Horsesense, Joe T.'s new program, or Thorobrain, or Snapcapper, is going into battle under-gunned.)

Napolean is quoted as having said, "Victory belongs to the big battalions."

Well, in the handicapping wars, RaceCom isn't a battalion- it's a Brigade- and a big one at that.
Jaguar is offline   Reply With Quote Reply
Old 04-12-2002, 11:33 PM   #14
Jaguar
Registered User
 
Join Date: Mar 2002
Location: Hamden, CT
Posts: 420
Correction of error made in reply to Arkle

Arkle, I made an error in my latest reply to your query about RaceCom. I referred to "Joe T.'s new program".

Apologies to Joe. I should have written, "Joe Z.'s new program."

Joe Zambuto's recently released handicapping method is a comprehensive, sophisticated, and altogether impressive package.

Jaguar
Jaguar is offline   Reply With Quote Reply
Old 04-13-2002, 12:12 AM   #15
PaceAdvantage
PA Steward
 
PaceAdvantage's Avatar
 
Join Date: Mar 2001
Location: Del Boca Vista
Posts: 88,600
Jaguar,

I find it mind-boggling that you have nothing but the highest praise for RaceCom's offerings, but yet you say this:

Quote:
I have not been a RaceCom customer since the middle 90's, but I may be again someday.
If you haven't been a customer since the middle 90's, how can you know that their recent crop of offerings performs so well?? And if their software performs as you claim, why the hell AREN'T you a customer right now???

How do you reconcile these incongruous statements?? Have I missed something you said that explains this?

In addition, you are perhaps one of only a handful of people I have ever heard or read about IN MY LIFE that had positive things to say about RaceCom and their software. Why do you think this is so??


==PA
PaceAdvantage is online now   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 03:33 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.