Interactive Dichotomizer 3 [Archive] - Horse Racing Forum - PaceAdvantage.Com

View Full Version : Interactive Dichotomizer 3

Sly7449

09-01-2009, 09:32 PM

GameTheory

09-01-2009, 11:41 PM

Greetings,

ID3 Technology (Ross Quinlan) or C4.5 algorithim application for Horse Racing.

Are there any known vendors that utilize this concept as a tool in their Program?

Google Search not much help.

While I don't use those variants exactly, but built upon those concepts into something very similar, I have used decision trees very heavily for horse racing data. I've used them (with different algorithms) for estimating probabilities (using 1/0 as the dependent variable) and also a regression version for real-valued data. They can work quite well, but only with quite a lot of tweaking (esp. for probabilities). I made "forests" -- 100s or 1000s of trees each trained "fully" (pure leaf nodes with no pruning) on (usually) bagged training samples with the answers all averaged together for the final value. What was nice about decision trees ("was" because I haven't used them lately) is that I created a a clever way to deal with missing values (which are always such a pain with other methods) that allowed me to leave them as missing and not worry about it. So when you are dealing with first-time starters or other horses where you can't make this or that factor but you still want to include it, you can just ignore it.

As far as vendors with such a thing built-in, none that I know of specifically for horse-racing, although there are certainly a few general-purpose decision tree packages.

PaceAdvantage

09-02-2009, 04:26 AM

I vote this the coolest thread name of the year thus far....

Nets

09-02-2009, 08:17 AM

I vote this the coolest thread name of the year thus far....

I agree. Checked it out just to see what the heck it was. Still not sure if I do.

headhawg

09-02-2009, 08:49 AM

I thought it was something created by Ron Popeil and pitched by the late Billy Mays.

Or maybe somethig sold at Lover's Lane. :eek:

Tom

09-02-2009, 09:28 AM

I thought it was new sex thread!

Dave Schwartz

09-02-2009, 11:51 AM

Near as I can tell, this is a modern "classifier" system.

Ever since John Holland wrote about his early ones in Hidden Order I have liked the idea. I wrote several early on, but they are very tough to get to work effectively.

Consider that rules often work together as an "assembly," much as a sports team.

The basic concept is you have a bunch of rules - created at random. Think of each rule as a player on a 200-man hockey team (still only a handful of players on the ice at a time). The question you are trying to answer is, "Which players have the greatest impact on their team winning?"

When one looks at hockey statistics, one sees goals, assists and the total points derived from those two stats. It is quite logical to assume that the player who scores the most points is the most important player. However, what about the unsung defenseman whose very presence on the ice helps his team immensely despite rarely scoring a point?

Classifiers attempt to address this through an economy of sorts. Using the hockey analogy, consider that each player starts with a bankroll. He pays (by the minute) for the opportunity to play. Whenever a team scores, money is paid to all players on the ice share. In the more advanced classifiers, the actual scoring play is traced back through who touched the puck, with the most recent player touching the puck (i.e. the goal scorer) receiving more money that the one that passed it to him (the assister). Ultimately, all the players receive something, based upon their closeness to the goal.

Thus, the players who seem to contribute more directly are given more credit... as the rules which fired closest to the win-producer receive more credit.

The problem with this real-world example is precisely the same as faced by the classifiers: the guy that never touched the puck but decoyed the other teams defensemen out of position gets little or no credit.

In a similar analogy in basketball, consider the big scorer who gets double-teamed resulting in a team mate always being left open. It appears that the normally big scorer had an ineffectual game because he generated no stats and rarely touched the ball.

Personally, my GAs - I am in considering designing a new one - are always patterned after LS-1.
http://www.springerlink.com/content/n83r83v087685215/

The interesting thing about LS-1 versus all the others is that in LS-1 the entire SYSTEM is one entity. Whereas in the conventional GA each rule is a stand alone entity.

Think of a rule as being broken into parts. The parts fill out the following statement:

IF [field] is [sign] [value] then [action] [action value].

In English:
"If [Early Speed Points] are [greater than] [6] then [add] [10 points].

Think of the GA as a spreadsheet... where each rule is a line and the parts of the rule are columns. Thus, the columns are field, sign, value, action, action value.

The goal is to create a mechanism whereby the better, more effective rules move to the top of the spreadsheet.

The LS-1 version is different. In that version imagine that you take all of your rows of rules and put them on a single row. That is, if you had 500 rules x 5 columns (2,500 cells) as I described above, instead you would have a single row with 2,500 cells.

Now imagine (say) 200 rows x 2,500 cells.

Each row is a stand alone system. The goal is to get the best overall system to the top.

It is a much bigger solution, but all the problems mentioned above with classifiers is gone because it is a "total package" solution. You get a better answer but the time it takes to produce a good answer increases exponentially.

Regards,
Dave Schwartz

JBmadera

09-02-2009, 12:00 PM

oh good, now everything is clear......back to the game theory book Dan G recommended......:faint:

it's so embarrassing asking my kids for help with this stuff!

good luck today - not that any of ya'll that understand this stuff will need it.

JB

Sly7449

09-02-2009, 12:19 PM

Greetings,

Fist off, let me addresss the nomenclature of this Thread.

It was intended to be brief and to the point.

Based on such, only the well informed Professors would be able to help shed some light on the Topic.

In addition to those Genius, maybe an additional 3 % may want to share their use of this Tool.

I don't think that the whales are using it because someone once stated that they (Mamu's) have hundreds of employees working for them.

I understand that this concept is by far superior than AI and KBS.

It help lead us away from the Wisdom Of The Crowds because as you and I know that their wisdom is at 33% and that has not changed in Decades.

Referencing a Software Vendor's Report, the ROI on this 33% is just about equivalent to throwing darts at the Race program.

O.K. Back to ID3/C4.5

GT, thanks for sharing you views on this. I suspected that you would be one of those with some exposure to this concept. Your suggestion for an Add-On could be the Best Option.

Thanks

Sly

Jeff P

09-02-2009, 01:23 PM

C4.5 in layman's terms:
http://en.wikipedia.org/wiki/C4.5_algorithm

Explained using a little more detail:
Building Classification Models: ID3 and C4.5:
http://www.cis.temple.edu/~ingargio/cis587/readings/id3-c45.html

Yes. I've done someting similar. To be specific I've been experimenting with an algorithm of my own design that arrives at something close to optimal factor weight or "importance" for individual factors in a mix when factors selected by the user are assembled together to create a user's own custom factors. In JCapper these are known as UserFactors.

-jp

.

ddog

09-02-2009, 01:54 PM

"I don't think that the whales are using it because someone once stated that they (Mamu's) have hundreds of employees working for them."

To gather more input to the process?

I can assure you many many "professors" use and did use this and more to model the risk profiles leading to the current mess. Good on them! :D

That's the problem more so than the process itself.
You can chase your tail with the routine. If I know a horse is not eating or not working or not feeling well ......BOOM.

Tom had it nailed in a previous post.
It's the Viagra for professors. ;)

Red Knave

09-02-2009, 01:59 PM

I vote this the coolest thread name of the year thus far....Not only that, PA, but it's also interesting and thought provoking. I had not heard these variants/algorithms before. Thanks, as usual, to Game Theory, Dave S, and Jeff P (and Sly for starting the thread). Any time I think I've read it all on this board "they pull me back in".

And, Tom, Interactive Dichotomizer 3 was my garage band's name back in '69.

;)

Red Knave

09-02-2009, 02:15 PM

Actually, I had heard of See5. I didn't get the connection with C4.5.
Thanks again to all for the info.

Sly7449

09-02-2009, 03:52 PM

Allright Now,

Let me explain what aroused my curiousity but before that I must say Thanks for those well respected gentlemen that Alerted to the Thread Title.

Also to PA for considering it "Cool" but it looks like this is Warming Up.<G>

History: Some three months ago, I ran into an Article of which I then posted on a BBS expecting some kind of Feedback.

After 90 days with No Response, I decided to explore it here at PA. Well you see, I did not too many folks in on this quite yet<GGG>.

Here goes the Link to the article :
http://ai.arizona.edu/papers/dog93/dog93.html

My Google search produced many thought provoking ideas which has been in my sub-conscious state for quite a while.

Words like Factor Pairs, Random Weights, Bootstrapping, EDDIE, and Boosting, sure sounds good.

Is this What's Missing in this generation of Handicapping Software? I think that Ron Tiller has a Thread here on PA addressing "What's Missing" or something like that.

I did happen to run into a site that pointed to DDSS but the Thread for that on PA (back then) did not point to pleasant reviews and they were talking Dogs then.

From the look of things, IDE3 and C4.5 may be old and there may be even better stuff out there.

Thanks

Sly

Jeff P

09-02-2009, 04:44 PM

Is this what's missing from current generation handicapping software?

I say no.

A few days ago I was working on material for a seminar. I'm not sure when or where (or even if) I'll do it but I wanted to wrap my head around certain key areas of the game.

I came up with the following six general areas as possible seminar segments:

1. Horse Selection/Handicapping Methodologies

2. Play or Pass Decision Making

3. Ticket Structure/Backing An Opinion

4. Bet Sizing

5. Carrying an Edge into the Long Run

6. Discipline

One tangent that I went off on was that if the player performs areas 2 through 6 correctly it is possible to achieve profitable play - even if every horse selected is arrived at using a random number generator. After proving this point to my satisfaction using a database I moved on to other more important pieces of the seminar.

The point is this:

Most if not all players put almost all of their energy into area #1, the selection process. I believe this to be true for the overwhelming majority of software users no matter what went into the software they might be using.

Food for thought... IMHO the selection process might actually be the least important area on the above list when it comes to "putting it all together" for profitable play.

-jp

.

GameTheory

09-02-2009, 05:04 PM

Near as I can tell, this is a modern "classifier" system.

Ever since John Holland wrote about his early ones in Hidden Order I have liked the idea. I wrote several early on, but they are very tough to get to work effectively.These are classifiers in the general sense (because they classify), but are nothing like the bucket-brigade Holland stuff. These are recursive partitioning algorithms.

Here's a simple version: Take a bunch of data, say with 20 features and one output variable (1/0 = won/lost). Now, find somewhere to split the data based on a single feature or some combination of features. Keeping it simple, we use one feature only. We find that the percentage of winners is significantly higher when feature A > {some threshold}. Now the entire set of data has been split into two groups -- each group with feature A above or below/equal to {some threshold}. And our output variable has naturally also been split into two groups, and if we picked a "good" split, then each group is "more pure" than it was before -- greater % of 1's in one group, greater % of 0's in the other group.

Now, recurse. i.e. take each of those groups in turn and split them some other way. Keep doing this until all the nodes (groups) are completely pure or as pure as possible. Remember all the rules for splitting.

Now, to classify a new row, follow the rules down the "branches" until you end up on a "leaf" (no more splits) and you end up with a prediction -- 1 or 0 (if the leaf is pure) or a percentage (i.e. 1 = 80%, 0 = 20%) if it wasn't possible to get a pure leaf. You're basically creating a big "if statement". e.g. "if feature A > thresholdA and feature B < thresholdB, etc. etc"

Obviously this is highly overfit. So either prune it according to some pruning scheme (cut off some of the final leaves and branchs) or do what I do and just build a whole bunch of them on differing versions of the training sample, and average the results together.

And then add all the tweaks that the academics don't know about that will actually make it be useful!

What is very interesting about decision trees that is different from just "applying weights" or some such thing is that each new row to be classified is treated differently, taking a different path through the tree. Those groups where feature A > thresholdA may find the algorithm then concentrating on features B, C, D & F for the further rules, where those in the other feature A <= thresholdA group may never look at features B, C, D, & F at all!

raybo

09-04-2009, 11:34 PM

These are classifiers in the general sense (because they classify), but are nothing like the bucket-brigade Holland stuff. These are recursive partitioning algorithms.

Here's a simple version: Take a bunch of data, say with 20 features and one output variable (1/0 = won/lost). Now, find somewhere to split the data based on a single feature or some combination of features. Keeping it simple, we use one feature only. We find that the percentage of winners is significantly higher when feature A > {some threshold}. Now the entire set of data has been split into two groups -- each group with feature A above or below/equal to {some threshold}. And our output variable has naturally also been split into two groups, and if we picked a "good" split, then each group is "more pure" than it was before -- greater % of 1's in one group, greater % of 0's in the other group.

Now, recurse. i.e. take each of those groups in turn and split them some other way. Keep doing this until all the nodes (groups) are completely pure or as pure as possible. Remember all the rules for splitting.

Now, to classify a new row, follow the rules down the "branches" until you end up on a "leaf" (no more splits) and you end up with a prediction -- 1 or 0 (if the leaf is pure) or a percentage (i.e. 1 = 80%, 0 = 20%) if it wasn't possible to get a pure leaf. You're basically creating a big "if statement". e.g. "if feature A > thresholdA and feature B < thresholdB, etc. etc"

Obviously this is highly overfit. So either prune it according to some pruning scheme (cut off some of the final leaves and branchs) or do what I do and just build a whole bunch of them on differing versions of the training sample, and average the results together.

And then add all the tweaks that the academics don't know about that will actually make it be useful!

What is very interesting about decision trees that is different from just "applying weights" or some such thing is that each new row to be classified is treated differently, taking a different path through the tree. Those groups where feature A > thresholdA may find the algorithm then concentrating on features B, C, D & F for the further rules, where those in the other feature A <= thresholdA group may never look at features B, C, D, & F at all!

Thanks for adding the "IF statement" analogy. This brings us Excel guys into the mix, concerning this highly complex idea centering on "decision processing" and those such processes able to "learn" from themselves, in real time.

I, for one, have been using "nested (in multiple "pathways") If,Then,Else statements" for several years. Excel 2007, because of that version's expanded "If" capabilities in a single cell, makes this form of decision processing more efficient, although somewhat more difficult to "remember the rules".

I use these processes for "form", "class", and "line selection" decision making, among other things.

This capability is, in my opinion, the reason that Excel/spreadsheet handicapping remains, after all these years, among the best tools for handicapping/wagering, available.

Dave Schwartz

09-05-2009, 12:44 AM

GameTheory

09-05-2009, 01:52 AM

GT,

Good description!

Is this approach to tee-making not the ultimate backfit?

Is it not very sensitive to new information very much destorying the entire tree-structure on a re-train?

In case I didn't say that well, I'll try again a different way... isn't this type of learning approach very sensitive to new information?

Making a single tree, yes -- but non-pruned single trees are so overfit to be worthless. And although there are methods of adding new data to existing trees, I've never bothered with that because maintaining what the previous version was like is just not a concern of mine. And since I make forests rather than just single-trees, although all the components are likely to be completely different, the results won't be.

I make forests, usually "bagged" or something similar. (Bagging = "bootstrap aggregating" = training on many bootstrap replicates of the original training sample and averaging the results together. Bootstrap replicate = a sample drawn from the original sample at random WITH REPLACEMENT that is equal in size to the original. "With replacement" means we choose from the original randomly as many times as we need to, but we don't mind choosing some of the same rows multiple times. The result is that for each bootstrap replicate, about 1/3 of the original rows are left out and multiples of other rows are used instead.) All of the averaging makes it much more robust, and let's all the overfitting of the individual trees cancel itself out. There are also lots of complicated pruning mechanisms you can try, but this is much easier.

Now, there are a couple of other variables -- you've got to have a metric for how you decide if one split is better than another -- the heart of your algorithm. The metric you use will make a huge difference in the results, and there are dozens to choose from. (And you can make your own -- I invented one that seems to work better than any of the ones suggested by the academics.) Also, metrics used for trees that are to predict probabilities (1/0) are completely different than if you are doing regression (real-valued data). Probabilities from trees are quite tricky, and there are a few other tricks to get them working well, but they are too esoteric to get into here.

Bottom-line: trees can work very well for horse-racing, but you're not likely to conclude so based on the usual vanilla implementations, at least for probabilities. In general, they tend to be good at ranking (this horse is better than that one), but the actual values are very poorly calibrated (extreme highs and lows). There are a few secrets you need to know that will "snap them into focus" and make them highly accurate probability-wise. I worked pretty much exclusively with decision trees for a couple of years, and couldn't find another technique that worked better (or faster) on complex data. I don't use them as much now, but that is because I have found ways to make the data much less complex and more traditional methods work just as well. If you have high-dimensional data (lots of factors) where the features are not independent or simply unclear (factors are correlated, factor relevance is unknown, etc), non-parametric methods like decision trees, neural nets, etc will work better. If you have low-dimensional data (few factors) that are highly independent, then good old logistic regression works just as well as anything without all the fuss.

Dave Schwartz

09-05-2009, 09:20 AM

GT,

That was really well-explained. Made perfect sense.

The sampling techniques you mentioned are the same ones I use.

I recall that my idea for swarms came after a conversation with you regarding your "forest of trees." Perhaps you'd care to tell THAT story of the highly selected trees versus the many random trees.

Dave

mwilding1981

09-05-2009, 10:07 AM

What an interesting discussion, GT would you be able to point me in the direction of any good literature on trees?

Thanks

GameTheory

09-05-2009, 06:38 PM

I recall that my idea for swarms came after a conversation with you regarding your "forest of trees." Perhaps you'd care to tell THAT story of the highly selected trees versus the many random trees.Uhh... which story would that be exactly?

Dave Schwartz

09-05-2009, 08:43 PM

The one about how you found that generating 10,000 trees at random outperformed the finely trained ones.

GameTheory

09-05-2009, 09:13 PM

The one about how you found that generating 10,000 trees at random outperformed the finely trained ones.Ahh...yes. They did -- sometimes. I later found some circumstances where they didn't. The performance of decision trees is affected by how big the data set is, and bigger isn't always better. Bigger sets need more splits, which can cause more noise and adds more randomness to the leaf nodes (the averaging doesn't work as well). The splitting metric affects performance in large part based on how big or small it tends to make the resulting trees. The splitting metric I invented turned out to be very efficient, creating smaller trees, which is probably why it worked better on the large data sets I usually use. That's comparing forests to forests -- a forest using almost any method will beat out even the most pain-stakingly created single tree (or a very small ensemble of trees) almost every time in my experience.

But randomness in general works wonders on all these methods we talk about -- I read these detailed papers by these academics trying to come up with something clever and I ask myself "what if we just made this part completely random"? The resulting performance is often surprisingly good. That might be less true with less noisy data than horse racing provides. If there actually is in that data somewhere THE answer then less random methods could work better. We have to realize with our data that the best performing models (best performing on the training sample) are ALWAYS getting a boost from luck and in most cases one of our good but not best models with actually be the best generalizer in the real world on new data.

Dave Schwartz

09-05-2009, 09:24 PM

That is much as I have found... that probabilitites produced by "higher-level" processes are generally better than those produced by lower-level ones.

For those who don't speak my version of the language, that means that probabilities derived from 50 (or 500) different factors are likely to be more consistent than those derived from (say) 5.

Dave

Jeff P

09-05-2009, 10:32 PM

For those who don't speak my version of the language, that means that probabilities derived from 50 (or 500) different factors are likely to be more consistent than those derived from (say) 5.Which circles back to what I was trying to get across earleir when Sly asked "Is this What's Missing in this generation of Handicapping Software?"

Once you get the selection process past a certain point - or more correctly stated: once you understand what the relevant factors are (in a forest or in large data samples) and your selection process reflects that - the selection process can actually become quite good. Of course from there additional work and fine tuning often leads to incremental gains and improvements...

I think most current generation software is already pretty good when it comes to horse selection processes.

IMHO what players DO with their selections is far more important. And that's one area where I think opportunity for improvement in current generation software exists.

-jp

.