Handicapping Model Questions [Archive] - Horse Racing Forum - PaceAdvantage.Com

mwilding1981

08-11-2008, 08:42 AM

I am in the process of putting together and testing my own handicapping model and would like to ask some advice please.

Recency weighting- I have a rather crude method at the moment which involves subtracting the days since run from a figure and using the resulting number as a multiplier. Could anbody help me or point me in a direction of a better method?

Normalising - I am currently normalising a lot of factors but was wondering if it is possible to normalise Finishing Position and Lengths Beaten? This has me stumped.

Distance Preference - I have been testing various methods of calculating a figure to be used as a preference number for the distance. As of yet quite unsuccessfully. Again would anybody be able to point me in the direction of possible ways to do this without using regressional analysis?

Thanks in advance :)

sjk

08-11-2008, 11:44 AM

Recency weighting- I have a rather crude method at the moment which involves subtracting the days since run from a figure and using the resulting number as a multiplier. Could anbody help me or point me in a direction of a better method?

Thanks in advance :)

Not clear how you are using the multiplier.

I would calculate the horse's chances of a successful outcome if he should run back to that figure and then multiply that probability by a factor like the one you have described - scaled by .3% per day or thereabouts.

robert99

08-11-2008, 04:42 PM

To normalise finishing position (FP) you could use this, where R = runners in race.

N = (1-FP/R)

so a horse finishing 2 in 5 runners gets an N of 0.6.
If there were 10 runners, then N rises to 0.8.

Beaten lengths really need the track variant and race distance to be taken into account, but you could make a stab by summing the total beaten lengths down to, say, 6th place then summing to whatever your horse finishes - then take the ratio. Use 1 minus this factor to get the winner to equal 1.0.

Recency certainly is not linear and varies greatly with trainer and horse.
Perhaps this part really needs 3 logit variables - recency, prior race recency, trainer success rate with recent horses.

Distance preference can be determined in terms of average relative speed figures awarded at each distance.

mwilding1981

08-12-2008, 09:02 AM

Thank you for the replies. I thought beaten lengths would need those variants, I am also assuming that it would need to take into account weight carried as well?

SJK if I am using the weighting before calculating the horses chance of winning, would I be right in assuming that I can weight the horses figures,eg speed figure by decreasing it at a rate of 0,3% per day? Just wondering why around 0.3%? At that rate after 34 days it would be going into negative effect, is that correct?

Robert99, if I am using speed figures to get the distance preference can I use some kind of linear relationship to be able to estimate a distance preference for a distance that the horse has not raced in yet?

Thanks again for all the help.

sjk

08-12-2008, 09:48 AM

I don't think you can apply that kind of scaling to the figure itself.

A horse that ran a 100 figure 80 days ago is somewhat less likely to run back to that figure again today (say 24% less likely) but is still more likely to run 100 than to run a 76.

The other issue is that applying a percentage to the figure has a more pronounced effect in higher class races with bigger numbers than in slower races.

Jeff P

08-12-2008, 01:14 PM

In cases where you are evaluating the effectiveness of factors, I find it it helpful to look at how a given factor has performed historically.

For example, using Bris data, if I were evaluate the performance of horses having the top Bris speed figure last out, here is what I have in my Q3 2008 database:

Data Summary Win Place Show
Mutuel Totals 5775.70 5687.70 5846.90
Bet -6730.00 -6730.00 -6730.00
Gain -954.30 -1042.30 -883.10

Wins 866 1498 1998
Plays 3365 3365 3365
PCT .2574 .4452 .5938

ROI 0.8582 0.8451 0.8688
Avg Mut 6.67 3.80 2.93

Almost 26 pct winners and reducing the takeout ever so slightly. That tells me something about the horse with the best last out speed fig.

However, all horses with the best last out speed fig are not the same. Which is where I think you were going when you posted:...would I be right in assuming that I can weight the horses figures,eg speed figure by decreasing it at a rate of 0,3% per day?
Here is what that same sample looks like broken out by recent activity days last start in 10 day chunks:

By: Recent Activity- Days Last Start

>=Min < Max Gain Bet Roi Wins Plays Pct Impact
-999 0 0.00 0.00 0.0000 0 0 .0000 0.0000
0 10 5.10 226.00 1.0226 28 113 .2478 0.9628
10 20 -317.50 2100.00 0.8488 267 1050 .2543 0.9881
20 30 -322.00 2280.00 0.8588 309 1140 .2711 1.0532
30 40 -125.40 832.00 0.8493 108 416 .2596 1.0088
40 50 -58.30 504.00 0.8843 74 252 .2937 1.1410
50 60 -20.10 162.00 0.8759 19 81 .2346 0.9115
60 70 -11.00 84.00 0.8690 9 42 .2143 0.8326
70 80 34.30 66.00 1.5197 10 33 .3030 1.1775

80 90 -20.80 28.00 0.2571 2 14 .1429 0.5551
90 100 -15.80 32.00 0.5063 3 16 .1875 0.7286
100 110 -9.80 14.00 0.3000 1 7 .1429 0.5551
110 120 -9.50 20.00 0.5250 2 10 .2000 0.7771
120 130 -10.00 10.00 0.0000 0 5 .0000 0.0000
130 140 4.20 4.00 2.0500 1 2 .5000 1.9428
140 150 0.00 16.00 1.0000 2 8 .2500 0.9714
150 160 -10.00 10.00 0.0000 0 5 .0000 0.0000
160 170 6.00 16.00 1.3750 3 8 .3750 1.4571
170 180 2.80 10.00 1.2800 2 5 .4000 1.5543
180 999999 -76.50 316.00 0.7579 26 158 .1646 0.6394

Notice the way winners within the sample are distributed? The drop in performance isn't linear. From 1-79 days the performance is ok. But things tail off dramatically (at least in this sample) after 80 days. Not all factors behave this way.

When modeling handicapping data I've found it useful to take on the role of an actuary to some degree.

Hope that suggestion helps.

-jp

.

Tom Barrister

08-12-2008, 03:44 PM

It would make sense that factors based on the last race, such as "best last race BRIS" would correlate to days since last race, since the longer ago the figure was obtained, the less relevant it would be.

And as Jeff said, not all factors work that way. For example, I'd be surprised to see the same dropoff after 80 days for "Best BRIS figure in the past 10 races".

robert99

08-12-2008, 05:42 PM

Thank you for the replies. I thought beaten lengths would need those variants, I am also assuming that it would need to take into account weight carried as well?

SJK if I am using the weighting before calculating the horses chance of winning, would I be right in assuming that I can weight the horses figures,eg speed figure by decreasing it at a rate of 0,3% per day? Just wondering why around 0.3%? At that rate after 34 days it would be going into negative effect, is that correct?

Robert99, if I am using speed figures to get the distance preference can I use some kind of linear relationship to be able to estimate a distance preference for a distance that the horse has not raced in yet?

Thanks again for all the help.

As regards predicting race distance changes, you cannot presume all horses/trainers/distance changes are equal for this factor - it is highly non-linear. You have to get to reality first then find a formula that best estimates reality.

Relative pace in the last quarter of a shorter/longer race, proven trainer methods, number of times raced and breeding may give "fuzzy" clues but predicting /backing horses under such new conditions is a quick way to the poorhouse.

mwilding1981

08-13-2008, 07:21 AM

Once again thank you for the excellent posts. It makes sense that in order to weight factors I would need to analyse each factor seperately and then weight it accordingly based on the data. I am still a little confused with how to go about estimating a horses preference for a distance that it has not yet run. I shall re-read everything but if robert99 or anybody else can help me again with this I would be most grateful.

podonne

08-13-2008, 10:56 AM

mwilding1981,

I wish you the best of luck with your model. I've struggled with many of the questions you have so don't feel stupid coming on here and asking.

Negative weighting past performancespeed figures is tough because it assumes that a horse's performance naturally decreases over time not spent racing, and I don't think it does. What you might consider is weighting that particular figure less in your overall model. For instance if you were estimating speed based on the last three speed figures, you might use a weighted average and base the weights on how recent the race was.

This question though:

I am still a little confused with how to go about estimating a horses preference for a distance that it has not yet run.

the simple answer is, you can't. Some horses who win at 6f won't at 7f. If you can get ahold of pedigree you might make a good guess, or if you see a horse improving from 6F to 7F, you might guess he will improve further at a mile. Whatever the case though, demand high odds to bet on these horses.

mwilding1981

08-14-2008, 03:52 AM

Hi Podone, thank you for the message. Weighted average is an excellent idea and is where I think I was heading and you have just kicked me forward there :) Do you have any advice on calculating the weights before finding the average. I was looking at taking the info from everybody above and looking at past performance data to see how much of a derease there is in the results based on how long ago the figure was calculated (not sure how I am going to do this yet though).

podonne

08-14-2008, 11:35 AM

Do you have any advice on calculating the weights before finding the average. I was looking at taking the info from everybody above and looking at past performance data to see how much of a derease there is in the results based on how long ago the figure was calculated (not sure how I am going to do this yet though).

I always fall back on the Rnd function when faced with a question like that :)

Just guessing, but, say you were looking at the last three races, pick three random numbers between 0 and 1, multiply them by the number of days since the first, second, and third pps respectivly to get weights, normalize the weights so they add up to 1.0, and then multiply the speed figures by its weight, should give you an "estimated speed".

Do that a thousand and a half times and then see which set of weights gave you the best result (closest to the actual).

I also came across this in my wanderings: guy named Rosenbloom wrote a paper in 2000 trying to predict beyers based on the pp beyers, and came up with this formula. It's not exactly what you want, but it does have two different calculations if there was a layoff, so it may be close. 'M' is the mean of the oldest 6 figures, 'B' is the projected beyer. (Note: it only works if you have 10 pps).

M = (L(1) + L(2) + L(3) + L(4) + L(5) + L(6)) / 6
If Val(horse("dayssincelastrace")) <= 45 Then
B = 2.95 + 0.37 * L(10) + 0.12 * L(9) + 0.14 * L(8) + 0.15 * L(7) + 0.17 * M
Else
B = -6.84 + 0.19 * L(10) + 0.12 * L(9) + 0.14 * L(8) + 0.15 * L(7) + 0.41 * M
End If

podonne

08-14-2008, 06:23 PM

Sorry, Rosenbloom also ignored horses who rated a 0 speed figure in thier last race.

2000, E. S. Rosenbloom, "A better probability model for the racetrack using Beyer speed numbers",

raybo

08-15-2008, 01:32 PM

Does anyone know if there is any free modeling software that uses the Bris .drf data files? Bris' Neurax software uses $1 .nrx files but since I already have gobs of .drf and .xrd files on my computer I'd like to use them for this purpose, if possible.

mwilding1981

08-18-2008, 05:01 AM

Podone and everyone, thank you for all your information. I shall find that paper and read it as well.

mwilding1981

08-19-2008, 05:33 AM

Thank you very much for the paper, I am just about to settle down and read it :)

raybo

08-19-2008, 12:42 PM

Thank you very much for the paper, I am just about to settle down and read it :)

Good luck! :confused:

mwilding1981

08-19-2008, 02:07 PM

There are some formulas missing because of the ascii format of word, you don't happen to have a pdf version do you gm10?

gm10

08-20-2008, 04:18 AM

There are some formulas missing because of the ascii format of word, you don't happen to have a pdf version do you gm10?

Don't think so ... sorry

mwilding1981

08-20-2008, 05:29 AM

No probs, worth asking though :)

PaceAdvantage

08-20-2008, 07:03 AM

Did the rules suddenly change regarding the posting of copyrighted material?

gm10

08-20-2008, 08:31 AM

Did the rules suddenly change regarding the posting of copyrighted material?

Apologies I didn't know there would be any problem with posting it.