Competitive Learning [Archive] - Horse Racing Forum - PaceAdvantage.Com

mwilding1981

08-01-2009, 05:43 AM

I would like to test creating a probability model using some form of competitive learning and am not sure where to start so was hoping may be able to get some pointers on here. Every rating in the race will be horse specific and not take into account the other runners in the race. The ratings will all be adjusted though for class, weight etc... so that they are all on the same level.

The idea would then be for every horse in each race to start with an equal probability of winning which is adjusted based on it's and it's competitors ratings. Is there software that can do this or where would I begin to start reading information on this?

Thanks

GameTheory

08-01-2009, 12:20 PM

I would like to test creating a probability model using some form of competitive learning and am not sure where to start so was hoping may be able to get some pointers on here. Every rating in the race will be horse specific and not take into account the other runners in the race. The ratings will all be adjusted though for class, weight etc... so that they are all on the same level.

The idea would then be for every horse in each race to start with an equal probability of winning which is adjusted based on it's and it's competitors ratings. Is there software that can do this or where would I begin to start reading information on this?
Isn't that what most software does -- uses ratings to determine the relative chances? (And they all implicitly assume each horse is equal in the absence of ratings.)

Competitive learning is a training method to determine how you might weight each of those ratings. It is not something that would "happen" for each race at handicapping time. The "competition" is between competing "solutions" (model weights) -- i.e. which one works better on the training sample? This process is continually repeated by replacing the poorest performers with new solutions based on slightly altered versions of the better performers so you can start with no knowledge and work towards a good solution via the competition. (More or less) Competitive learning covers a whole class of different strategies that all have that basic feature.

And the goal of it is no different than any other training method for making models. So the models it comes up with aren't any different (in kind) from a model made some other way -- for instance just setting the weights using only your handicapping judgment.

Can you elaborate on what you are looking for exactly?

Tom Barrister

08-01-2009, 01:08 PM

The whole thing sounds like another way to backfit. I'm willing to learn new things, so feel free to point out the difference, if one exists.

I don't know if HSH's "ant" feature would qualify as "competitive learning". It did seem to have some promise when I was testing it, but I didn't really give it a good workout.

The only other software that I can think of would be the QuickHorse/Quickdog Supertune feature, which as far as I'm concerned is glorified backfitting.

GameTheory

08-01-2009, 01:40 PM

The whole thing sounds like another way to backfit. I'm willing to learn new things, so feel free to point out the difference, if one exists.

I don't know if HSH's "ant" feature would qualify as "competitive learning". It did seem to have some promise when I was testing it, but I didn't really give it a good workout.

The only other software that I can think of would be the QuickHorse/Quickdog Supertune feature, which as far as I'm concerned is glorified backfitting.Nothing wrong with backfitting if you do it right, and then validate the model on another set of data. (Most don't do it right of course.) That's pretty much the definition of the handicapping process.

mwilding1981

08-02-2009, 04:45 AM

GameTheory thank you for the good simple explanation of competitive learning. I may have misnamed it slightly as I am not 100% sure what it would be called. Competitive learning I am looking into as a way of weighting my factors at the moment.

Is there a way of creating an oddsline by having different factors that are unique (as much as possible) and normalised so the effects of class etc... have been removed. These factors would be completely about the horse and it's preferences and have nothing to do with the other runners. Then start with an even probability for all horses in the race (e.g. 10 runners every horse starts with .10). Look at horse 1 and his first rating and compare it to the other runners first ratings and say well he has never raced with horses like this before but we would expect his rating in this field to increase his performance over the other runners by (.01) we then adjust his probability and remove it equally from the other runners. Go through all his ratings doing this and move on to the next horse in the race etc..... Could be talking about something that makes no sense but that is the general direction of my thoughts.

Tom Barrister

08-02-2009, 12:03 PM

I thought I offered some helpful suggestions. Maybe I'm on ignore. I'm sure I have a lot of anti-fans that way.

For those of you who can read what I type, starting each horse with an equal percentage and then adding/subtracting will generally result in an odds line that tends to flatten out towards a mean/median. It's better to simply assign points to each horse by whatever method is used, add the points up, divide each horse's points by the sum of all points, and derive an odds line from that.

GameTheory

08-02-2009, 12:16 PM

Is there a way of creating an oddsline by having different factors that are unique (as much as possible) and normalised so the effects of class etc... have been removed. These factors would be completely about the horse and it's preferences and have nothing to do with the other runners. Then start with an even probability for all horses in the race (e.g. 10 runners every horse starts with .10). Look at horse 1 and his first rating and compare it to the other runners first ratings and say well he has never raced with horses like this before but we would expect his rating in this field to increase his performance over the other runners by (.01) we then adjust his probability and remove it equally from the other runners. Go through all his ratings doing this and move on to the next horse in the race etc..... Could be talking about something that makes no sense but that is the general direction of my thoughts.Is there a way? Of course. You can make an oddsline by reading chicken guts if you want to. Will it be any good? That depends, not only on the process, but the factors. Let's face it, mainly the factors. Really good factors have a way of pointing you in the right direction.

So the question is -- why do it this way? What do you think will make it special/different than simply adding up points as Tom suggests? Are you trying to focus on the COMPARISON process of each factor from horse-to-horse? In other words, this horse has X amount of advantage over that horse, and therefore has Y percentage greater chance of winning? What is the kernel of insight you are trying to expand into a whole method here? I'm not quite clear where you're trying to go.

mwilding1981

08-03-2009, 08:01 AM

Hi Tom, thank you for your messages and sorry I didn't post to them before, they are indeed helpful. It makes sense that doing it in this way would flatten out towards the mean. If you are doing it live for each race though would it actually be backfitting?

GameTheory the reason for this is the competitive angle for each factor. i.e. how much better is this horse in this factor in comparison with the other runners in the race in terms of whether this is going to increase or decrease the horses probability and by how much.

gm10

08-03-2009, 08:55 AM

GameTheory thank you for the good simple explanation of competitive learning. I may have misnamed it slightly as I am not 100% sure what it would be called. Competitive learning I am looking into as a way of weighting my factors at the moment.

Is there a way of creating an oddsline by having different factors that are unique (as much as possible) and normalised so the effects of class etc... have been removed. These factors would be completely about the horse and it's preferences and have nothing to do with the other runners. Then start with an even probability for all horses in the race (e.g. 10 runners every horse starts with .10). Look at horse 1 and his first rating and compare it to the other runners first ratings and say well he has never raced with horses like this before but we would expect his rating in this field to increase his performance over the other runners by (.01) we then adjust his probability and remove it equally from the other runners. Go through all his ratings doing this and move on to the next horse in the race etc..... Could be talking about something that makes no sense but that is the general direction of my thoughts.

Imo this is not possible. Odds = 1/(probability of winning)-1.

The probability of winning .... how would you calculate this without knowing who you have to beat to win?

mwilding1981

08-03-2009, 11:25 AM

GM10 well if the factors are horse based only and don't relate to the other runners, could you not then use some kind of competing function to compete each horses ratings against the other runners?

GameTheory

08-03-2009, 11:54 AM

Hi Tom, thank you for your messages and sorry I didn't post to them before, they are indeed helpful. It makes sense that doing it in this way would flatten out towards the mean. If you are doing it live for each race though would it actually be backfitting?

GameTheory the reason for this is the competitive angle for each factor. i.e. how much better is this horse in this factor in comparison with the other runners in the race in terms of whether this is going to increase or decrease the horses probability and by how much.You can do this, but I don't see how you can do it "blind". In other words, you first have to take a big sample of these ratings and simply see how much each rating point is worth in terms of increased chances of beating a horse with that many less points, either with a table (1 point = X, 2 points = Y, etc) which would work better if the change isn't "smooth" (although that probably indicates an unreliable factor) or create a mathematical function out of it to transform points to probability (again, not the probability of winning, but of simply finishing ahead of another horse who is less that many points for that rating).

Having done this, now when you get to a fresh race, you do head-to-head comparisons and make your adjustments. There are a few ways to do this, which can get quite complicated. One fairly easy method is outlined in the old book "Beating the Races With A Computer" from the early 80s which described setting things up like a tennis tournament draw. (Don't have the details off the top of my head, but I can get them.) I've tried this before -- it works well enough. I'm also remember someone selling a neural network in which they had some sort of "competition" module that did head-to-head matchups to create oddslines.

But really, it comes down to the factors used and how independent they are from each other. For a method like this, they need to be as independent as possible.

Jake

08-03-2009, 08:11 PM

All the advice given so far has been excellent. Suggest you do a search here under neural nets, genetic algorithms, ga's, ants, or artificial intelligence. This has been discussed repeatedly under different guises over the years, and you can get a good indications of what has been tried, problems involved, and how successful those attempts have been.

Jake

mwilding1981

08-04-2009, 05:06 AM

Thank you for all the replies. GT I would be very grateful for the details if you can find them :) I would probably go the route of making a mathematical function to model the difference in points into probability.

What are your thoughts on this method as opposed to the more often used weighting and regression?

GameTheory

08-04-2009, 02:15 PM

Thank you for all the replies. GT I would be very grateful for the details if you can find them :) I would probably go the route of making a mathematical function to model the difference in points into probability.

What are your thoughts on this method as opposed to the more often used weighting and regression?Actually, now that I think about it, the tennis tournament was actually not the final form of the method in that book. He hypothesized that the percentage of time a horse would win such a tournament (if played many times, randomizing the draw each time) would be equal to his probability of winning the race. And then he had a formula that gave you the equivalent of running that simulation without actually running it. (I'll get you those details -- I can't get to that book right now.)

Of course, he only had one factor -- he used multiple regression on several factors to come up with one final projected speed figure I think, and then was turning the gap in the speed figures into a probability. So if you want to go through that process with multiple factors, you're still left with the problem of how to combine it all into a single final probability. So you've got to weigh those factors somehow relative to each other.

Dave Schwartz

08-04-2009, 02:44 PM

GT,

The "tennis tournament" is the basis for the "experts" in HSH. Essentially, they turn each race into a tournament - drawing seeds at random. Then they "play" the games of the tournament.

A 3,000-"tournament" simulation produces reasonably strong output. Several of our users depend upon these experts.

Dave

GameTheory

08-04-2009, 03:20 PM

GT,

The "tennis tournament" is the basis for the "experts" in HSH. Essentially, they turn each race into a tournament - drawing seeds at random. Then they "play" the games of the tournament.

A 3,000-"tournament" simulation produces reasonably strong output. Several of our users depend upon these experts.
And how is each "game" played? (In broad "strokes".) You've probably written this down somewhere that I should have read...

GameTheory

08-07-2009, 10:19 PM

Ok, as promised, the pair probability tennis tournament method from "Beating the Races with a Computer" (1980) by Steven L. Brecher. Now, despite this being a "computer method", obviously the author was pretty restricted in terms of computer power at the time, and the method is designed with those limitations in mind. (For instance, these days we might just run the simulation a million times like Dave does.)

At the point in the book where he gets to this method, the author has gone over how to use multiple regression with a number of factors to create predicted times for each horse in a race we want to play. But it could be any rating -- the point is we have a rating that is supposed to be monotonic (either higher is better or lower is better, not some weird middle ground) and more or less linear (a bigger gap equals a bigger advantage of about the same amount by which it is bigger). Most ratings fall into this category (in theory) -- predicted times certainly do.

This method also assumes this is your single final power rating -- you just want to convert ratings to odds. (How you might blend the results of this method after applying it on several factors is another subject.)

First, convert the gap in ratings between 2 horses to a probability of one horse beating the other (i.e. finishing ahead of). To do this, you'd go through each race in your database (or a sample of races to be train your model on -- the training sample) and look at all the possible head-to-head matchups in the race. And then for each matchup, you'd note the gap in the rating in question, and which horse finished ahead. An example of a 5-horse race:

Horse A 95
Horse B 88
Horse C 80
Horse D 92
Horse E 88

Let's assume they finished in that order. So we look at all their rating gaps for each matchup:

A/B +7 1 (meaning +7 advantage for 1st horse in pair, and 1st horse in pair finished ahead -- 0 if he didn't)
A/C +15 1
A/D +3 1
A/E +7 1
B/C +8 1
B/D -4 1
B/E +0 1
C/D -12 1
C/E -8 1
D/E +4 1
Now represent all the reverse matchups, in which case the first horse will always lose (only because we are doing them in finish order, and these are the reverse permutations).

B/A -7 0
C/A -15 0
D/A -3 0
...etc

So we've got absolute symmetry there with a total of 20 possible permutations (including the reverses) for a 5-horse race. Do that for your whole sample of races and you've got a mess of data. Plug that data into your favorite "create an equation of out this data please" software and now you've got a formula for converting a gap in a rating into a probability of finishing ahead. A simple linear equation will probably do as long as you've got a decent sample to work with -- a logit equation would probably be better. In the book, he actually creates his equation (without much explanation) by graphing the data and just eyeballing it.

Anyway, that's just the prerequisite -- how you come up with such an equation is your problem. The important thing is that your starting point is being able to matchup two horses use their rating to determine the probability that one will finish in front of the other. Let's keep it simple and say we come up with a simple linear equation like so:

p = 0.5 + (rating_gap * 0.01) (which translates into a 2% greater chance of coming in front for each rating point)

So, given a race, our ratings, and our equation to convert them, here's how to determine the final probabilities of winning with the tennis tournament method.

From the book:
Consider a tennis tournament ... organized as a tree-like structure:

[imagine standard tennis tournament (or NCAA basketball) single-elimination bracketed draw diagram here with the final on left and the first round on the right]

We hypothesize that the probability of a horse's winning a race be approximated by the fraction of the time he would win a tournament organized as shown with a random assignment of initial matches on the right. The expected number of matches the winner of such a tournament will have played is calculable for any given number of competitors. If there are three entrants, the expected number of matches the winner will have played is 1 2/3 (1/3 of the time the winner will have drawn a "bye" on the first round, and other 2/3 of the time he will have had to compete in both rounds.) If there are eight entrants, the winner will always have played three matches. Generally, the expected number of matches the winner will have played in a tournament of N entrants is bounded by log2(N). [base-2 logarithm of N]

The expected probability of winning a randomly selected match (the expected pair probability) is approximated by the geometric mean of pair probabilities, i.e., the (N-1)th root of the product of the entrant's pair probabilities with each of the other entrants. The estimated probability of winning the tournament (race) is thus this geometric mean raised to a power equal to the expected number of matches the winner will have played. Or, the win probability of a horse is approximated by the product of his pair probabilities with each of the other starters, to the Eth power, E >= log2(N)/(N-1). For each race, an exact value of E is selected such that the sum of the win probabilities is equal to 1.00; in practice this value of E was found by brute force (successive approximation), and convergence was obtained after a few iterations.Got that?

Let's go through it.

Let's say we've got a 3-horse race:

Horse A 89
Horse B 92
Horse C 80

Now, we look at the head-to-head matchups, one horse at a time:

A/B => 89-92 => -3 => 0.5 + (-3 * 0.01) => 0.47 (Horse A has 47% chance of beating horse B)
A/C => 89-80 => +9 => 0.5 + (+9 * 0.01) => 0.59

So the "pair probabilities" for Horse A are {0.47,0.59}

Doing the same for the others, and we get pair probabilities for each of them:

Horse A {0.47,0.59}
Horse B {0.53,0.62}
Horse C {0.41,0.38}

So, "the expected probability of winning a randomly selected match is approximated by the geometric mean of pair probabilities":

Geometric means for each:

Horse A => (0.47 * 0.59)^(1/2) = 0.52659
Horse B => (0.53 * 0.62)^(1/2) = 0.57324
Horse C => (0.41 * 0.38)^(1/2) = 0.39472

[raising something to 1/2 power is the same as taking the square root, so if you needed the Nth root, you'd raise it to 1/N. So the geometric mean of 3 items is (a*b*c)^(1/3)]

So, the chance of Horse A winning a random matchup in this race (i.e. could be B or C, we don't know) is 0.52659. Thus, his chance of winning the race is this amount raised to the power of the number of expected matches he'd have to play in the tennis tournament style bracket. The calculation log2(N) [i]approximates this number of expected matches, although it is not exact. Above he gives the example that in a 3-horse race, the winner would be expected to "play" 1 2/3 matches (1/3rd of time he gets a bye in the first round) -- however log2(3) = 1.5849 rather than 1.6667 so we can see it is not exact for all numbers of entrants (for others it is, e.g. log2(8) = 3).

Anyway, using that estimate for our sample race, and we get:

Horse A => (0.52659 ^ 1.5849) = 0.362
Horse B => (0.57324 ^ 1.5849) = 0.414
Horse C => (0.39472 ^ 1.5849) = 0.229

which we see add up to 1 (more or less since we've got some rounding errors in there)

So those are the final probabilities based on our rating, and the equation derived from it.

You see he also gives an alternate method above: "Or, the win probability of a horse is approximated by the product of his pair probabilities with each of the other starters, to the Eth power, E >= log2(N)/(N-1)." And he finds E by successive iteration search such that the results add up to 1.0.

In this case, we'd get E = 0.7961837702

Horse A => (0.47 * 0.59)^E = 0.360
Horse B => (0.53 * 0.62)^E = 0.412
Horse C => (0.41 * 0.38)^E = 0.226

Pretty much the same, although I used more precision in the second case.

So there you have it.

Convert probabilities to odds as usual, and you've got a fair odds line based on your rating.

DanG

08-08-2009, 09:17 AM

Ok, as promised,

So there you have it.

You can’t explain complex subjects any better then Game Theory does imo. Post #17 is being digested by the computer cappers at warp speed. The man is a born teacher as several are on this site. :ThmbUp:

GameTheory

08-08-2009, 12:36 PM

You can’t explain complex subjects any better then Game Theory does imo. Post #17 is being digested by the computer cappers at warp speed. The man is a born teacher as several are on this site. :ThmbUp:Thanks, Dan.

mwilding1981 -- clean out your PM box -- it's full

mwilding1981

08-10-2009, 05:24 AM

Thank you for the excellent explanation GT, I will need to re-read a few times (always do with math) to make sure I understand it all correctly.

headhawg

08-10-2009, 09:19 AM

Yeah, thanks GT. Very nice explanation.

Jake

08-10-2009, 01:02 PM

You can’t explain complex subjects any better then Game Theory does imo. Post #17 is being digested by the computer cappers at warp speed. The man is a born teacher as several are on this site. :ThmbUp:

Sorry to piggyback on DanG's reply, but I had the exact same reaction. Natural born teacher, GT. Excellent explanation, the whole nine yards. Thank you.

Jake

TurfRat

09-25-2009, 02:15 PM

This method also assumes this is your single final power rating -- you just want to convert ratings to odds. (How you might blend the results of this method after applying it on several factors is another subject.)

GT explains it better than the original author!!!

Tackling that other subject - suppose I have two (or more) probability lines derived from independent factors. How does one properly blend them?

Thanks

GameTheory

09-25-2009, 03:54 PM

GT explains it better than the original author!!!

Tackling that other subject - suppose I have two (or more) probability lines derived from independent factors. How does one properly blend them?
A logit model would do if the factors were truly independent. The real problem, as usual, is not with making decent probabilities from factors, but decent probabilities that you can actually bet on when they show value compared to the tote odds, i.e. does betting on the overlays create profit? Creating good probabilities is actually not too difficult -- being able to bet on the overlays to create profit is very difficult.

You might want to read over the "valueline" threads where the poor author of the associated book was lambasted (mostly by me) for failing to come to terms with this difficulty. We covered a lot of ground there in the back-and-forth. He was convinced that coming up with an "accurate" oddsline was enough. It is not. (In his case I suspect he just couldn't admit it because his book and product were out there for sale and he couldn't well just say "you're right, they're worthless" -- notice he never did come back with any answers.) But for the rest of us without a conflict of interest the question of what to do about the oddsline/toteboard bias is the big question that we most honestly deal with. Can you make an oddsline that is independent from the toteboard? I doubt it -- not with traditional factors based on the usual stuff, no matter how massaged. So you've got to figure out not just how to make a line, but how to use it to bet profitability. The failure to answer the latter question successfully will usually send you back to the drawing borad to make your lines some other way with some other factors, often a "less accurate" way but that is nevertheless more profitable.

Valueline threads: (be prepared to wade)

http://www.paceadvantage.com/forum/showthread.php?t=29559

http://www.paceadvantage.com/forum/showthread.php?t=30434

Jake

09-25-2009, 04:28 PM

A logit model would do if the factors were truly independent. The real problem, as usual, is not with making decent probabilities from factors, but decent probabilities that you can actually bet on when they show value compared to the tote odds, i.e. does betting on the overlays create profit? Creating good probabilities is actually not too difficult -- being able to bet on the overlays to create profit is very difficult.

You might want to read over the "valueline" threads where the poor author of the associated book was lambasted (mostly by me) for failing to come to terms with this difficulty. We covered a lot of ground there in the back-and-forth. He was convinced that coming up with an "accurate" oddsline was enough. It is not. (In his case I suspect he just couldn't admit it because his book and product were out there for sale and he couldn't well just say "you're right, they're worthless" -- notice he never did come back with any answers.) But for the rest of us without a conflict of interest the question of what to do about the oddsline/toteboard bias is the big question that we most honestly deal with. Can you make an oddsline that is independent from the toteboard? I doubt it -- not with traditional factors based on the usual stuff, no matter how massaged. So you've got to figure out not just how to make a line, but how to use it to bet profitability. The failure to answer the latter question successfully will usually send you back to the drawing borad to make your lines some other way with some other factors, often a "less accurate" way but that is nevertheless more profitable.

Valueline threads: (be prepared to wade)

http://www.paceadvantage.com/forum/showthread.php?t=29559

http://www.paceadvantage.com/forum/showthread.php?t=30434

That was an extraordinary discussion because of the rigidity of the perspectives. What was most amazing to me is that the author had a PH.D in economics, which meant he should have some inking of econometrics and statistical biases--but it never occurred to him that baseline probabilities are always relative to other data measurements. That was mind boggling to me. Sorry he quit posting here, it was an interesting discussion.

Jake