PDA

View Full Version : The dilemma of line averaging


markgoldie
07-11-2010, 11:44 AM
A discussion in the software section of the forum brought this subject to mind. Since it is something which I struggle with (and I would suppose many others do as well), I thought I'd throw it out to see what opinions there are on the subject:

Background:

Many programs use a form of line averaging to create a probable strength-picture of an animal's ability. The rationale for this is clear. We all know that simply hanging your hat, so to speak, on a single line is a prescription for disaster. Clearly the last line on a horse's past performance is the most pertinent because it is the latest information we have about the animal. But we also know that horses "bounce" and show other form-pattern fluctuations and because of this, playing something like the highest last Beyer in the field will result in considerable losses.

So we must take into consideration other lines if we hope to be successful. If you look at a black box construction such as Brisnet's Prime Power numer, it's clear that such averaging is going on. It is weighted to the last line, but is sharply influenced by previous efforts. So there is a form of line averaging going on here, even though it's adjusted for recency.

Personally, I try to find a line in the horse for that I think will best represent his potential performance on the day at hand. I emphasize "try" because it's not always easy to find a pattern with which you can be comfortable. Here, I'm not talking about horses racing against its best distance or surface, where you might easily find a more indicative effort-line. But the situations which are more "sticky' is when the consecutive lines are all with the same trainer, same track, same distance, comparable jockey, no overt trouble lines, etc.

Let's assume that you are using some form of numerical rating scale for a given performance. (In my case, it is an adjusted fig, but it could be other forms of "power" designations). What do you do with consecutive numbers that might look like this?:

Race 1: 75
Race 2: 68
Race 3: 72
Race 4: 67

Or possibly worse:

Race 1: 60
Raec 2: 71
Race 3: 68
Race 4: 65

In case #1, the average is 70.5
In case #2, the average is 66

But in neither case, has the horse ever run to the exact average and most likely will not today. If you are a firm believer in up and down patterns, then it's clear that in the first case, the horse will likely regress from last top number of 75. Okay. But if so, how far will the regression be? If not to the average, then where?

In case two, he may well better the recent low number. But by how much? Again, if not to the average, then where?

The bottom line is that I revert to recent averaging more or less, even though I know this is not the probable correct answer. I was wondering how others who handicap and play a large number of races approach this dilemma.

Robert Goren
07-11-2010, 12:35 PM
If you do a multi-regression using past SRs to predict todays SR, you will find a pretty good correlation( although still a long way from 1) to the last race and almost none to any other race after the last race is factored in. I think you probably already knew that. I tried a bunch of things such as best SR in the last 90 days. None of that stuff did much. I would send you the stuff, but it was lost when an old computer's hard drive went.

sjk
07-11-2010, 05:51 PM
If you weight last 4 races 4,3,2,1 that will be a decent representation.

thaskalos
07-11-2010, 06:11 PM
(Best Beyer in the last 3...x 2) + (Second-best Beyer in the last 5...x 1.5) + (Best Beyer in the last 2...x 1) = POWER RATING.

The Beyers used, should be from races similar to today's...as far as surface and approximate distance are concerned.

Overlay
07-11-2010, 06:20 PM
Ranking according to the average of those races out of the last three (or fewer, if three races are not available) that were run on the same surface as today's race, or the average of all three (or fewer) of the latest races if none of them were run on today's surface, has worked well for me from a winning probability standpoint (not as a stand-alone factor, but as one part of the overall handicapping picture).

Dave Schwartz
07-11-2010, 08:53 PM
In the book, How to Measure Anything...

http://www.amazon.com/How-Measure-Anything-Intangibles-Business/dp/0470110120

... the author discusses what he calls "Mathless CI," how to achieve 90% confidence interval approach. Depending upon sample size...

5 - 1st
8 - 2nd
11- 3rd
13- 4th

... you add the best/worst together. Thus, in a sample of 5, you take the 1st best and 1st worst and average them together.

In a sample of 8 you take the 2nd best and 2nd worst; in a sample of 11 you take 3rd best and 3rd worst.

My experience is that with pacelines you need to shade a little towards "best."

Thus, with 7 or less, I average 1st-best and 2nd-worst.
With 8 or 9 I use 2nd-best and 3rd worst.

Try it.

markgoldie
07-12-2010, 10:54 AM
Intersting answers. As I say, by far the best approach is to find the line which has the most relevance to today's spot. But, as I also say, this is not always possible.

As far as Dave's advice, I can't say I'll be able to take much advantage of it because in dealing with horses and their form, I will never go back further than 3 lines unless the recency pattern is badly broken up or there is different surface or distance information that appears more than 3 lines back. In my experience, these vestigal performances will not affect the current form to the degree that I can afford to give them any weight. (Along those lines, I've always wondered about who is looking 25 or 30 lines back that Formulator is providing in their pp's. To me it's just wasted "scrolling" time. Even looking for back for information of performance patterns following layoffs is fruitless because that was then, and this is now).

As for pace lines, I have significantly less trouble because unless the horse is a straight E-type, I'm not overly interested in the consistency of these numbers. The object of the exercise for the jockey is to win the race, not to reach the animal's best pace capability in every event. So the bottom line for me here is final speed. I've tried looking at a recent pace-speed top for an indication of better performance, even if the final-speed number was down somewhat. But this has pretty much led to a dead end.

I suppose that Robert Goren's comments about the correlation of performance to the most recent number is hard to refute. In a way, I've come to that conclusion pretty much myself- that is, view the last number as the most probable, with preceeding numbers representing a "threat" to the current number. The point I was driving at is that this "threat" scenario is not well expressed at all by averaging. For example, take a horse whose last number was 85 but who shows recent results that could make him a "threat" to perform as low as 68. Here, the average is 76.5 and if we skew at double the weight to the most recent, we get a 79.3 number. But just what good is it calling this animal a 79.3 and proceeding to compare him to the other horses in the race whose numbers were similarly arrived at? Can I really ask this horse to perform at a 79.3 level in this event?

It's just not a satisfying way to handle the problem. And this is the "dilemma" I was referring to in the title of the post.

Robert Goren
07-12-2010, 11:21 AM
A horse who runs an average race for him is usually not the winner.

aaron
07-12-2010, 11:37 AM
The sheets use various patterns to signal a bounce or a forward move. At HDW,they use a number based on many factors to predict what a horse will run today. With that said,as a player you must use instinct, trips, and other factors to predict weather a horse will go forward or backwards. Some times the key could be the shape of the race,other times it could be a trainer change. Many times its just the experience of encountering a situation you have seen many times.

markgoldie
07-12-2010, 12:07 PM
A horse who runs an average race for him is usually not the winner.
That's my point exactly. Averages just aren't satisfying. But what happens when you are handicappng, say, 40 races a day and (especially if you are playing verticals), you need a better-than-average assessment of the entire field? You must rely on something.

sjk
07-12-2010, 12:13 PM
Perhaps I should have said if you play as though there is a 40% chance he will run like his last race, a 30% chance he will run like the race before, a 20% chance he will run to his 3rd back and a 10% chance of running to his 4th back that would make a decent representation of what will happen.

At the end of the day it is a game of probabilities.

markgoldie
07-12-2010, 12:13 PM
The sheets use various patterns to signal a bounce or a forward move. At HDW,they use a number based on many factors to predict what a horse will run today. With that said,as a player you must use instinct, trips, and other factors to predict weather a horse will go forward or backwards. Some times the key could be the shape of the race,other times it could be a trainer change. Many times its just the experience of encountering a situation you have seen many times.
Right. That's what I was saying about an informed extrapolation to a particular line or even to a number outside any of the available lines. But (see my original post in this thread), that's not always possible- lots of very similar situations but with "bobbing" numbers.

Robert Goren
07-12-2010, 12:33 PM
It is about 60% chance that a horse will run a race some what close to his last race. About a 25% chance he will run a race(either a good one or a bad one) resembling another a race. A 15% chance that he run a race unlike anything he has before. That race could be better, but most likely worse.

46zilzal
07-12-2010, 12:50 PM
Think of how limited this is. The horse. just like any sports team. was up against different challenges in each contest and to average them as if they were somehow the same is not smart.

Light
07-12-2010, 01:14 PM
Think of how limited this is. The horse. just like any sports team. was up against different challenges in each contest and to average them as if they were somehow the same is not smart.

Exactly. This is a reference to class which nobody ever talks about when considering averaging races like ignoring a pink elephant in the room. What good is a discussion without the obvious.

Dave Schwartz
07-12-2010, 01:35 PM
Think of how limited this is. The horse. just like any sports team. was up against different challenges in each contest and to average them as if they were somehow the same is not smart.

This is a great description of why horse racing is so difficult.

Each race is unique enough to make every race different.

As we look at the last 10 races they are at different distances than today, or the surface was different, or the pace of race was different or the field size was much different, or the track condition, post position, conditions of race, level of competition...

It often boils down to a choice between a very low-level look at the past lines and the view from the 10,000-foot level.

Low Level = using Mark's approach - trying to pick the one that best describes "how the horse will run today"

High-Level = how the horse runs on average - using a much broader filtering approach


In recent years I have moved more to the 10,000-foot view because it correlates so much better than the low-level approach.

The "Benter time-decay" approach is also one that seems to have merit.

TrifectaMike
07-12-2010, 03:15 PM
This is a great description of why horse racing is so difficult.

Each race is unique enough to make every race different.

As we look at the last 10 races they are at different distances than today, or the surface was different, or the pace of race was different or the field size was much different, or the track condition, post position, conditions of race, level of competition...

It often boils down to a choice between a very low-level look at the past lines and the view from the 10,000-foot level.

Low Level = using Mark's approach - trying to pick the one that best describes "how the horse will run today"

High-Level = how the horse runs on average - using a much broader filtering approach


In recent years I have moved more to the 10,000-foot view because it correlates so much better than the low-level approach.

The "Benter time-decay" approach is also one that seems to have merit.

As someone, like yourself, who has been at this gane for so long understands the complexity of predicting winners. Unfortunately many, myself included, often look for solutions on a micro level; choosing the "more correct" data, making a better calculation, etc.

What we fail to do is to look at the problem on a macro level; are races primarily random events? are races deterministic, but chaotic?

For several years, I have concluded that race outcomes are deterministic, but chaotic. I am not suggesting that anyone should run out and become an expert on Chaos Theory. However, I am suggesting that one lean in that direction.

That said, for many who will not venture in that area, I do have some other suggestions, which can be useful. And no it's not Decision Trees, Bayes Inference, NN, GA, GP, or Regression of any type. That is not to say that those techniques are not helpful nor unproductive. All these methods will help build a " mousetrap". Some better than others, but in most cases a useful "mousetrap".

Putting Chaos Theory aside, let's see if we can choose an approach, which can be useful, and can avoid the pitfalls of many other expert systems approach. This approach would be of the simulation variety.

One of the basic (and by the way a very good one, although simple) simulation approaches is to simulate the result of the race from beginning to an end, without regard to interior points.

For this we need a "good" performance model for the horses. Let me disgress a bit. The reason Mark asks his question is because he is trying to represent a horse by a single performance or the average of several performances. As one can see by the responses to his questions there are sevral ah-hoc responses.

In reality each horse has a performance distribution. This distribution can be modeled. Depending on how one chooses the performance criteria, it can in most cases be either a Normal(Gaussian) or Triangular Distribution. My suggestion is to start with these two.

Once you have a "Good" distribution, you are closer to answering many unanswered question. It might not be Chaos Theory, but for many close enough.

So, what exactly do we do with Horse's Perfomance Distribution?

We sample from it!! We make comparisons from all the horses entered in the race, and draw conclusions.

The process is not overly complex, but yet powerful.

Mike

markgoldie
07-12-2010, 03:53 PM
Exactly. This is a reference to class which nobody ever talks about when considering averaging races like ignoring a pink elephant in the room. What good is a discussion without the obvious.
For the purposes of this post, I have not discussed the effects of class on numbers. However, if you archive my posts, you will see that I always adjust my figures to both class and pace. So the dilemma of which I speak should be understood as totally independent of those adjustments. I think we all realize that "Beyer slaves" face a difficult, if not impossible, task.

thaskalos
07-12-2010, 04:32 PM
I believe that it was in his book "Beyer on speed", where Andy Beyer took a stand against the concept of averaging the speed ratings.

I recall that his comment went something like:

"Let's say that a horse enjoys an easy front-running trip, and earns a speed rating of 85. Next race, the horse is part of a "contested" pace...and earns a speed rating of 71.

Going into today's race, an argument can be made about whether the horse is a true 85 or a 71.

But in no way should the horse be called a 78.

completebill
07-12-2010, 05:32 PM
There have been excellent cntributions on this subjectby, particularly, Messrs. Goren and Schwartz. I don't know exactly what approach Dave uses in his excellent HSH program, but it is obviously sucsessful.

A lot has to do with the use to which you are putting your paceline selection. If you are using a computer handicapping program, some programs are much more paceline-dependent than others.

I use the HTR program, which allows manual pace-line selection (or averaging of 2 lines) by the user, or a choice of 7 other pre-programmed methods, ranging from last line only to averaging of ALL of the last 10 lines (!!).

The default method uses Artificial Intelligence to select a line (or sometimes avg. of 2 lines), considering many factors, including form cycle and PACE ---horse's pace and anticipated race shape/pace. It has been tested and proven best, overall, in large samples, although a couple of the other methods work slightly better in certain narrow situations ( Eg.: some turf, races for very lightly raced horses).

Interestingly, a very recent test against a large database of "ALL" races, showed that there were NOT greatly significant differences among ALL of the automated methods!!

Cratos
07-12-2010, 08:22 PM
A discussion in the software section of the forum brought this subject to mind. Since it is something which I struggle with (and I would suppose many others do as well), I thought I'd throw it out to see what opinions there are on the subject:

Background:

Many programs use a form of line averaging to create a probable strength-picture of an animal's ability. The rationale for this is clear. We all know that simply hanging your hat, so to speak, on a single line is a prescription for disaster. Clearly the last line on a horse's past performance is the most pertinent because it is the latest information we have about the animal. But we also know that horses "bounce" and show other form-pattern fluctuations and because of this, playing something like the highest last Beyer in the field will result in considerable losses.

So we must take into consideration other lines if we hope to be successful. If you look at a black box construction such as Brisnet's Prime Power numer, it's clear that such averaging is going on. It is weighted to the last line, but is sharply influenced by previous efforts. So there is a form of line averaging going on here, even though it's adjusted for recency.

Personally, I try to find a line in the horse for that I think will best represent his potential performance on the day at hand. I emphasize "try" because it's not always easy to find a pattern with which you can be comfortable. Here, I'm not talking about horses racing against its best distance or surface, where you might easily find a more indicative effort-line. But the situations which are more "sticky' is when the consecutive lines are all with the same trainer, same track, same distance, comparable jockey, no overt trouble lines, etc.

Let's assume that you are using some form of numerical rating scale for a given performance. (In my case, it is an adjusted fig, but it could be other forms of "power" designations). What do you do with consecutive numbers that might look like this?:

Race 1: 75
Race 2: 68
Race 3: 72
Race 4: 67

Or possibly worse:

Race 1: 60
Raec 2: 71
Race 3: 68
Race 4: 65

In case #1, the average is 70.5
In case #2, the average is 66

But in neither case, has the horse ever run to the exact average and most likely will not today. If you are a firm believer in up and down patterns, then it's clear that in the first case, the horse will likely regress from last top number of 75. Okay. But if so, how far will the regression be? If not to the average, then where?

In case two, he may well better the recent low number. But by how much? Again, if not to the average, then where?

The bottom line is that I revert to recent averaging more or less, even though I know this is not the probable correct answer. I was wondering how others who handicap and play a large number of races approach this dilemma.

A horse’s past performances is in part a representation of its form cycle and to average 10 races (the typical number shown in the DRF) which many times are of different distances and sometimes of different surfaces is fraught with error.

A suggestion would be too normalized the races to the race distance of today’s race and then you will have a clear picture of the “what if” which you are attempting to predict.

This can be done manually which is tedious and tiring, but I use an automated feature of my software program and it makes this effort much easier

gm10
07-12-2010, 08:31 PM
As someone, like yourself, who has been at this gane for so long understands the complexity of predicting winners. Unfortunately many, myself included, often look for solutions on a micro level; choosing the "more correct" data, making a better calculation, etc.

What we fail to do is to look at the problem on a macro level; are races primarily random events? are races deterministic, but chaotic?

For several years, I have concluded that race outcomes are deterministic, but chaotic. I am not suggesting that anyone should run out and become an expert on Chaos Theory. However, I am suggesting that one lean in that direction.

That said, for many who will not venture in that area, I do have some other suggestions, which can be useful. And no it's not Decision Trees, Bayes Inference, NN, GA, GP, or Regression of any type. That is not to say that those techniques are not helpful nor unproductive. All these methods will help build a " mousetrap". Some better than others, but in most cases a useful "mousetrap".

Putting Chaos Theory aside, let's see if we can choose an approach, which can be useful, and can avoid the pitfalls of many other expert systems approach. This approach would be of the simulation variety.

One of the basic (and by the way a very good one, although simple) simulation approaches is to simulate the result of the race from beginning to an end, without regard to interior points.

For this we need a "good" performance model for the horses. Let me disgress a bit. The reason Mark asks his question is because he is trying to represent a horse by a single performance or the average of several performances. As one can see by the responses to his questions there are sevral ah-hoc responses.

In reality each horse has a performance distribution. This distribution can be modeled. Depending on how one chooses the performance criteria, it can in most cases be either a Normal(Gaussian) or Triangular Distribution. My suggestion is to start with these two.

Once you have a "Good" distribution, you are closer to answering many unanswered question. It might not be Chaos Theory, but for many close enough.

So, what exactly do we do with Horse's Perfomance Distribution?

We sample from it!! We make comparisons from all the horses entered in the race, and draw conclusions.

The process is not overly complex, but yet powerful.

Mike

They aren't i.i.d. samples, there will be correlation because of form cycle.

Sham
07-12-2010, 09:07 PM
It is about 60% chance that a horse will run a race some what close to his last race. About a 25% chance he will run a race(either a good one or a bad one) resembling another a race. A 15% chance that he run a race unlike anything he has before. That race could be better, but most likely worse.

Interesting you say that...as I was reading through this thread, I was thinking how often a horse doesn't repeat his last performance. Is your statement based on data, or just observation? (I have no data...yet anyway).

In fact, were I to go back to picking pacelines, I've always thought an interesting approach would be to choose any line other than the last race. You may be wrong more often, but I expect you'd get paid better when you're right.

TrifectaMike
07-13-2010, 01:23 AM
They aren't i.i.d. samples, there will be correlation because of form cycle.

I'm not sure I understand what you are saying. What aren't IID samples? I haven't defined any random variables. I have suggested the same probability distribution for each horse, and all are mutually independent.

Please explain further.

Mike

gm10
07-13-2010, 05:54 AM
I'm not sure I understand what you are saying. What aren't IID samples? I haven't defined any random variables. I have suggested the same probability distribution for each horse, and all are mutually independent.

Please explain further.

Mike

If you have a probability distribution, it's because you have a random variable. You have not defined which random variable you are using. You said 'performance', but what is your measure of it?


I do not understand this

"I have suggested the same probability distribution for each horse, and all are mutually independent."

Perhaps you mean that their performances follow the same type of distribution with different parameters? In which case, I agree, although a Gaussian is not quite adequate if you're using final times. Something like Gamma will work better I think. But I guess you are dealing with this problem through your 'performance criteria'?

What I meant was: a horse's performances aren't i.i.d. samples from a normal RV. Any statistical inference needs to take this into account.

Are you using ordered statistics to estimate each horse's chance of winning?

TrifectaMike
07-13-2010, 10:39 AM
If you have a probability distribution, it's because you have a random variable. You have not defined which random variable you are using. You said 'performance', but what is your measure of it?


I do not understand this

"I have suggested the same probability distribution for each horse, and all are mutually independent."

Perhaps you mean that their performances follow the same type of distribution with different parameters? In which case, I agree, although a Gaussian is not quite adequate if you're using final times. Something like Gamma will work better I think. But I guess you are dealing with this problem through your 'performance criteria'?

What I meant was: a horse's performances aren't i.i.d. samples from a normal RV. Any statistical inference needs to take this into account.

Are you using ordered statistics to estimate each horse's chance of winning?

Yes, I am suggesting the use of ordered statistics.

The aim of my post is not to solve the problems, which arise in a simulation environment. My aim is to stear people to an approach different than used by many in the past.

Formulas can be derived which entend the generalized IID case and covers various situations in which the underlying observations are not independent and/or not identically distributed.

Mike

Robert Goren
07-13-2010, 11:24 AM
Interesting you say that...as I was reading through this thread, I was thinking how often a horse doesn't repeat his last performance. Is your statement based on data, or just observation? (I have no data...yet anyway).

In fact, were I to go back to picking pacelines, I've always thought an interesting approach would be to choose any line other than the last race. You may be wrong more often, but I expect you'd get paid better when you're right. I used to have Data, but I lost it when my hard drive died on my old computer. It was no great loss since I had already learned what I set out to learn. Good luck with your research.

gm10
07-13-2010, 11:48 AM
Yes, I am suggesting the use of ordered statistics.

The aim of my post is not to solve the problems, which arise in a simulation environment. My aim is to stear people to an approach different than used by many in the past.

Formulas can be derived which entend the generalized IID case and covers various situations in which the underlying observations are not independent and/or not identically distributed.

Mike

My guess ... you'll be depending heavily on normal distributions if you go or have gone in that direction