Merging win probabilities - Horse Racing Forum - PaceAdvantage.Com

TravisVOX · 09-15-2014, 03:57 PM

Let's say you have a set of win probabilities for a race...

Code:

HORSE		PUBLIC	USER	OVER%
V. E. DAY	0.0410	0.072	76%
WICKED STRONG	0.2290	0.170	-26%
TONALIST	0.2290	0.210	-8%
KID CRUZ	0.0630	0.063	0%
MR SPEAKER	0.0970	0.114	18%
VIVA MAJORCA	0.0270	0.020	-26%
CHARGE NOW	0.0250	0.083	232%
ULANBATOR	0.0100	0.073	630%
COMMAND. CURVE	0.0400	0.036	-10%
BAYERN		0.2420	0.160	-34%

Let's say the user column was your probabilities of winning the race. The right column would be "overlay percentage" of each horse relative to the public's win percentage.

My question is this...

Obviously, overlays can carry a positive expectation if you account for odds. However, I'm not worried about that in this part of the process. What I'm trying to do is balance the difference between the users odds and the users odds overlay percentage.

Said differently, the user thinks that Ulanbator is a tremendous overlay. However, he/she still thinks that Bayern is a more likely winner, despite being an underlay.

What mathematical and/or statistical techniques can be utilized to balance/merge/combine the user and over% columns to reach a logical win percentage or ordinal representation of the analysis.

One way I could do it is to rank each column then sum the rankings and order them. Then I thought, however, I might want to weight each of those columns differently... at which point I figured I would come on here and see if anyone had thoughts or insight.

davew · 09-15-2014, 09:28 PM

How good is your line? Many people feel that even if your line is very good, the public line on the average is better.

How about the third column instead of a percentage overlay/underlay make it a difference of your probability percentage and the publics.

Then add a 4th column that moves your line a certain percentage towards the publics. I read somewhere a guy moving it 15% although up towards 25% seems reasonable.

This will give you a blended line that lessens the outliers (which may or may not be correct). You should be able to check this to see which line is closer with a few thousand races. (public, yours, blended)

Then use the best probabilities for your exactas and straight pools to form profitable 'dutch bets' that include all overlays and very minor underlays that are higher frequency hits. The goal is to try and stay away from massive underlays (which pay the track takeout when they miss and give you profit on your overlay bets when they hit).

sjk · 09-16-2014, 01:06 PM

I would bet in proportion to edge x win probability as per fractional Kelly. It is likely that a linear combination of your line and the public's line is a more accurate estimate of win probability than your line on its own so you might use a 50-50 blend of the two as your win probability.

For the edge part you want to establish a base over% which is the lowest you will bet. If you think the 50-50 blend is reasonable, a 100% threshold is probably OK. Pick a ceiling (say 300%) since the further you are from the public's line the more suspect your overlay tends to be.

So here you bet the two who are over 100%:

Charge now - 2.32 x 1/2(.025+.083)
Ulanbator - 3.00 x 1/2(.01+.073) since the 630 exceeds the cap.

Oddly both work out to around .125. Choose a base amount to multiply by to scale the bet, say $200. In this case you bet $25 on each.

classhandicapper · 09-16-2014, 03:47 PM

I never really understood combining your own line with the public's line.

I understand why it might be more predictive. But IMO all you need to do is give yourself a margin of safety based on your confidence level in that particular race before you bet.

Let's say you give a horse a 33% chance of winning and he's 2.1 - 1.

You know none of your lines are going to be perfect, so you don't bet. There's not enough margin of safety in 2.1 -1 .

If he's 5-2 and you are very confident in your analysis you might consider it.

If you aren't as confident, maybe 3-1 is not enough.

sjk · 09-16-2014, 04:32 PM

Incorporating the public odds into your line is arithmetically equivalent to manipulating the acceptable overlay percent. It doesn't really matter why you are doing it.

GameTheory · 09-16-2014, 06:15 PM

When the idea of blending your own oddsline with the public's to make a "better" one comes up, this is always the response. What is the point? You're just averaging them and you'll end up somewhere in the middle, right? Just demand a bigger overlay. And that's all true.

But a simple weighted average is not the proper way to make a blended line -- the idea is to end up with an unbiased line relative to the public's. In other words, if you take your own oddsline and look at (for instance) all the horses that it has assigned 3-1 odds to (looking over many races), then we all know even if that sample wins at the overall average rate of 25% as it should that the subset of those where the public odds are below 3-1 are going to win more than those where the public odds are higher than 3-1. An unbiased line doesn't have that property, the 3-1 horses will win at 25% regardless of their public odds. A simple constant weighted linear average of the two lines will never achieve this (unless you set the weights to 100-0 in favor of the public).

So I would suggest instead of simple averaging that you actual make a new model of your line and the public line's. This is exactly what Benter originally suggested and presumably just what he did, I don't know why that always gets lost in the shuffle when talk of blending lines comes up. He said make a new logistic regression model to create a new output line. (But there are various other forms it could take, and you're not strictly limited to only those two inputs either.)

In other words build a new model using a training sample of your lines and the public's from actual races and using actual results. The output from this model for each race would then then need to be re-normalized (so the probabilities sum to 1.0 again) and/or further calibrated (sometimes spreading the probabilities is needed).

Now instead of just making some assumption about how much weight each line should have, you are using a training sample of both lines and using actual race results to predict a better number which will in some cases be higher or lower than EITHER of the input numbers (an amplification effect if you will, rather than just a blending to the middle), and then once re-normalized you'll have a (possibly) completely different line (an unbiased one hopefully). This is a much more powerful transformation. That is, it will be if the user line actually contains value-added information not contained in the public line. If it does not, the model will end up spitting out a copy of the public line. (And that's what often happens -- this will show you just what your line is really worth.)

OR, depending on how you came up with your line to begin with, you might be able to just throw the public probability in your base model as a handicapping factor, and it would have a similar effect. But in order to get an unbiased probability, you've got to deal with the public probability SOMEHOW, and a simple averaging at the end isn't it as it is just too crude...

TrifectaMike · 09-16-2014, 08:35 PM

Quote:

Originally Posted by GameTheory

When the idea of blending your own oddsline with the public's to make a "better" one comes up, this is always the response. What is the point? You're just averaging them and you'll end up somewhere in the middle, right? Just demand a bigger overlay. And that's all true.

But a simple weighted average is not the proper way to make a blended line -- the idea is to end up with an unbiased line relative to the public's. In other words, if you take your own oddsline and look at (for instance) all the horses that it has assigned 3-1 odds to (looking over many races), then we all know even if that sample wins at the overall average rate of 25% as it should that the subset of those where the public odds are below 3-1 are going to win more than those where the public odds are higher than 3-1. An unbiased line doesn't have that property, the 3-1 horses will win at 25% regardless of their public odds. A simple constant weighted linear average of the two lines will never achieve this (unless you set the weights to 100-0 in favor of the public).

So I would suggest instead of simple averaging that you actual make a new model of your line and the public line's. This is exactly what Benter originally suggested and presumably just what he did, I don't know why that always gets lost in the shuffle when talk of blending lines comes up. He said make a new logistic regression model to create a new output line. (But there are various other forms it could take, and you're not strictly limited to only those two inputs either.)

In other words build a new model using a training sample of your lines and the public's from actual races and using actual results. The output from this model for each race would then then need to be re-normalized (so the probabilities sum to 1.0 again) and/or further calibrated (sometimes spreading the probabilities is needed).

Now instead of just making some assumption about how much weight each line should have, you are using a training sample of both lines and using actual race results to predict a better number which will in some cases be higher or lower than EITHER of the input numbers (an amplification effect if you will, rather than just a blending to the middle), and then once re-normalized you'll have a (possibly) completely different line (an unbiased one hopefully). This is a much more powerful transformation. That is, it will be if the user line actually contains value-added information not contained in the public line. If it does not, the model will end up spitting out a copy of the public line. (And that's what often happens -- this will show you just what your line is really worth.)

OR, depending on how you came up with your line to begin with, you might be able to just throw the public probability in your base model as a handicapping factor, and it would have a similar effect. But in order to get an unbiased probability, you've got to deal with the public probability SOMEHOW, and a simple averaging at the end isn't it as it is just too crude...

Nice post GT....well stated.

Mike

sjk · 09-16-2014, 11:30 PM

I make a line for the race with no reference to the public line. At the end I determine whether the bet is an to be made by comparing it with the public line and demanding an overlay percentage.

I have not found it necessary to do any other transformation involving the two lines.

I don't see how you can say it is not good enough. I have bet 63,000+ races over a period of years with good results. In my mind it is not broken so I am not going to try to fix it.

I never actually make a linear combination of lines but in the context of scaling the bet I think it is a reasonable way to go.

DLigett · 09-16-2014, 11:45 PM

Quote:

Originally Posted by GameTheory

So I would suggest instead of simple averaging that you actual make a new model of your line and the public line's. This is exactly what Benter originally suggested and presumably just what he did, I don't know why that always gets lost in the shuffle when talk of blending lines comes up. He said make a new logistic regression model to create a new output line. (But there are various other forms it could take, and you're not strictly limited to only those two inputs either.)

GameTheory, thanks for the pointer. I just looked at Benter's "Computer Based Horse Race Handicapping and Wager Systems: A Report" and I'm confused on one point... are his alpha and beta the result of a logistic regression or some other approach?

Thanks for any insight.

thaskalos · 09-17-2014, 12:03 AM

Quote:

Originally Posted by sjk

I make a line for the race with no reference to the public line. At the end I determine whether the bet is an to be made by comparing it with the public line and demanding an overlay percentage.

I have not found it necessary to do any other transformation involving the two lines.

I don't see how you can say it is not good enough. I have bet 63,000+ races over a period of years with good results. In my mind it is not broken so I am not going to try to fix it.

I never actually make a linear combination of lines but in the context of scaling the bet I think it is a reasonable way to go.

Have you noticed a difference in your results in recent years over the years prior?

sjk · 09-17-2014, 12:09 AM

Quote:

Originally Posted by thaskalos

Have you noticed a difference in your results in recent years over the years prior?

Absolutely. There are fewer plays in general and far fewer among the low and medium priced horses.

sjk · 09-17-2014, 12:15 AM

Here is what I don't get about the step that I transform the two lines into one. Suppose I make a horse 4-1 and the public makes him 10-1. I don't see why there should be any useful information in those two numbers that would get me to a better line.

It appears to be an overlay. Sometimes I am right and sometimes I am wrong but I don't think there is a single reason or set of reasons why I would be wrong. Maybe I have made a poor speed figure for the horses last race; maybe I have totally misjudged some other horse; maybe the guys in the paddock have observed something and bet big.

There are a thousand possible reasons. Of course I could look at all of the horses I made 4-1 and the public made 10-1 over the last year and see how often they win. This strikes me as backfitting and I can't imagine there is a sustainable relationship.

Perhaps I will take a look at it when time permits and see.

GameTheory · 09-17-2014, 12:15 AM

Quote:

Originally Posted by sjk

I don't see how you can say it is not good enough. I have bet 63,000+ races over a period of years with good results. In my mind it is not broken so I am not going to try to fix it.

That's completely fair, I was not suggesting that if you do not blend the lines using a new model you are doomed to failure -- just addressing the "what is the point of blending the lines?" question because the assumption always seems to be you'd just be averaging and ending up in the middle somewhere (which is rather pointless, or equivalent to a simple overlay threshold percentage anyway, just as you describe). However, don't knock it until you try it, the results are more interesting and profound than you might think. (And since you've had good results for so long, we know your line has some powerful non-public information in it, and this may help leverage that even more than you are. And presumably you've got a large sample of your own lines you could use to make a training sample for such a model. So something you may want to toy with if you're bored.)

Of course you need to be able to recalculate the percentages as the odds change (or very close to post time anyway) to recalc the line before betting. (Or possibly even make a probability distribution of where the final line might end up to optimally compute your bet size.) So implementation is much trickier than just setting a threshold...

sjk · 09-17-2014, 12:18 AM

Quote:

Originally Posted by GameTheory

That's completely fair, I was not suggesting that if you do not blend the lines using a new model you are doomed to failure -- just addressing the "what is the point of blending the lines?" question because the assumption always seems to be you'd just be averaging and ending up in the middle somewhere (which is rather pointless, or equivalent to a simple overlay threshold percentage anyway, just as you describe). However, don't knock it until you try it, the results are more interesting and profound than you might think. (And since you've had good results for so long, we know your line has some powerful non-public information in it, and this may help leverage that even more than you are. And presumably you've got a large sample of your own lines you could use to make a training sample for such a model. So something you may want to toy with if you're bored.)

Of course you need to be able to recalculate the percentages as the odds change (or very close to post time anyway) to recalc the line before betting. (Or possibly even make a probability distribution of where the final line might end up to optimally compute your bet size.) So implementation is much trickier than just setting a threshold...

Thanks for the suggestion. Recalculating the odds is not an issue at all. I will take a look at it.

GameTheory · 09-17-2014, 12:45 AM

Quote:

Originally Posted by sjk

Here is what I don't get about the step that I transform the two lines into one. Suppose I make a horse 4-1 and the public makes him 10-1. I don't see why there should be any useful information in those two numbers that would get me to a better line.

Well, remember you'd be doing this on a whole race basis, not just for one horse. Some horses go up, some go down, but not in equal proportions like a simple weighted average would. So you must renormalize (and possibly an additional "spreading" calibration depending on how you make that model) and THEN you'll know how it all shakes out -- your 4-1 horse could end up as 3-1 (which cannot happen with linear averaging) and now maybe you have a bet you didn't have before (because it is a bigger overlay) or you can bet more on him, etc.

Quote:

It appears to be an overlay. Sometimes I am right and sometimes I am wrong but I don't think there is a single reason or set of reasons why I would be wrong. Maybe I have made a poor speed figure for the horses last race; maybe I have totally misjudged some other horse; maybe the guys in the paddock have observed something and bet big.

Since we are going to be modeling your line vs the public, it is assumed that your line is made in some consistent fashion, you can't model randomness. (For instance if you make lines for all races, but your handicapping procedure for maidens or turf races or [whatever category] is radically different than how you do things for other races, then it might be appropriate to separate those into different models.) But generally the specific reasons don't really matter.

Quote:

There are a thousand possible reasons. Of course I could look at all of the horses I made 4-1 and the public made 10-1 over the last year and see how often they win. This strikes me as backfitting and I can't imagine there is a sustainable relationship.

Well of course there is SOME sustainable relationship or else you'd be a perpetual loser with your current method. And again, think whole race, not one horse. I will just repeat that the results are more interesting than you'd think at first, downright magical in some cases.

It would seem the number of people that have actually tried this instead of just deciding it is not helpful beforehand is startlingly small. But of course you DO have to have a method to implement it in practice -- i.e. the resulting model has to be hooked up to a tote feed and you must bet as late as possible.