PDA

View Full Version : Mathmatics of Combining Impact Values


podonne
10-02-2006, 07:15 PM
If I was going to build a system that calculated a very long list of impact values to determine the true odds of a horse winning a race, does anyone know the proper mathmatics for combining Impact Values?

Say I have two impact values of 1.12 and 1.16 and a horse matches both. Is the combined Impact Value 1.14? (mean), or a more complicated probability formula using 12% and 16%, or some kind of weighting using the values from the impact values formula (winners\runners).

I'm on a bit of a math kick lately. Hope there are some Math majors out there.

Overlay
10-02-2006, 07:27 PM
People like Paul Peterson in his Morning-Liner series of written titles and software programs have developed techniques based on multiplying combinations of impact values to arrive at fair odds, but you have to watch out for things like redundancy (using multiple factors from any single handicapping area); dependent relationships between variables; and applying values too broadly (without regard for things like differences in distance or running surface), in order to avoid skewing your final composite odds figure too much in one direction or another.

ryesteve
10-02-2006, 07:36 PM
The answer is that there is no answer. Factors interact with one another, so you can't take them, mash them together and assess their effect based on some kind of function of the individual IVs. Easy example that illustrates this: a class drop would be a positive factor; a good finish last race would be positive factor; but a class drop off a good race is a negative factor.

Big Bill
10-02-2006, 11:28 PM
podonne,

James Jasper, in his book Basic Betting - The Micro-Computer Edge (St. Martin's Press, New York) treats this subject in depth. Get it if you can.

However, I tend to agree with ryesteve's comments.

Big Bill

BlueShoe
10-03-2006, 01:27 AM
One of my almost automatic throw outs or go against is the runner dropping off of a good race.Even worse is dropping off of a win.As Ryesteve says,this is a big negative.Sure,some of them will win,but usually at a very short price,often odds on.Am referring to claiming races here,with the runner racing for a tag in its last and today.Different situation entirely when a horse races well in a stake and today goes in an allowance for which it still is eligible.

PlanB
10-03-2006, 05:18 PM
Using IVs to get one performance # requires MULTIPLYING the IVs.
Of course, almost certainly every/each IV overlaps in the set with
another IV and so MULTIPLYING them is step One of 2 steps. Removing
their overlap is EASY, ONLY IF you have the raw data. Then a chi square on
the raw data, two IVs at a time, will give a VERY GREAT estimate of the
overlap. PS: the issue of a "Class drop off a good last race" would be treated as any other IV; if its Negative, its impact on any equation would be to reduce the final rating, and no more. Chi Square is the answer to make highly correlated variables seem independent.

ryesteve
10-03-2006, 05:27 PM
He said he had "a very long list" of variables. I don't think you want to do a chisq with 500 dimensions.

PlanB
10-03-2006, 05:34 PM
I doubt any equation with 500 variables would work no matter what you did.
On the other hand, any computer w/ the right program could do it in less than
5 mins.

robert99
10-03-2006, 06:49 PM
Using IVs to get one performance # requires MULTIPLYING the IVs.
Of course, almost certainly every/each IV overlaps in the set with
another IV and so MULTIPLYING them is step One of 2 steps. Removing
their overlap is EASY, ONLY IF you have the raw data. Then a chi square on
the raw data, two IVs at a time, will give a VERY GREAT estimate of the
overlap. PS: the issue of a "Class drop off a good last race" would be treated as any other IV; if its Negative, its impact on any equation would be to reduce the final rating, and no more. Chi Square is the answer to make highly correlated variables seem independent.

Chi Squared only tells the probability of correlation. It tells nothing about what the physical condition is that causes the "overlap" or its race significance. If you do have all the original data then you can do IV3 = (IV1 "AND" IV2) directly, and get the "right" answer for a combination of IV1 and IV2.

As ryesteve has stated in effect, IVs are not true probabilities and their data comes from different source data sets. IVs normally only give information on winners - the losers might run 2nd or last etc - it is not known. That last data is significant if a horse is going to be close up sufficently to have a chance of winning.

You get a better answer by multiplication than adding but it is not the true relative probability as that manipulation has no physical relationship to the range of final race time performances each horse is capable of in the race you are estimating the true odds for and determine the relative race probabilities.

For example, if two race relevant IVs are used, and a horse has IV1=2.0 and IV2 = 0, adding and averaging gets (2+))/2 =1 (the horse which has no chance (IV2=0) appears to have an "even" chance, which is illogical. Multiplying gets 2x0 = 0, which is no chance and more logical. However, based on the range of race times that horse has proved capable of it may have a good chance or not - IVs tell you little about that. Using a computer to manipulate bad information more quickly does not make the answers any more believable.

rrbauer
10-04-2006, 12:13 PM
Using IVs to get one performance # requires MULTIPLYING the IVs.
Of course, almost certainly every/each IV overlaps in the set with
another IV and so MULTIPLYING them is step One of 2 steps. Removing
their overlap is EASY, ONLY IF you have the raw data. Then a chi square on
the raw data, two IVs at a time, will give a VERY GREAT estimate of the
overlap. PS: the issue of a "Class drop off a good last race" would be treated as any other IV; if its Negative, its impact on any equation would be to reduce the final rating, and no more. Chi Square is the answer to make highly correlated variables seem independent.

Generally, I agree with multiplying the IV's as a means of "combination". I haven't thought through the process of using chi square to remove overlap, although I agree that overlap can be an issue. The final step, however, is to take the resultant combination figures for each horse and normalize them to 1 for the field. The normalized figure for each horse will yield a value that you can use as a probability estimate, or as a simple factor to weigh and rank order the horses based on the attributes that made up the IV factors that you started with. And, they can be of positive (identifying strength) or negative (identifying weakness) influence.

I would guess that there is some software floating around that uses some form of IV combination as a means to develop a "fair" odds line.

Bill Cullen
10-04-2006, 12:31 PM
If I was going to build a system that calculated a very long list of impact values to determine the true odds of a horse winning a race, does anyone know the proper mathmatics for combining Impact Values?

Say I have two impact values of 1.12 and 1.16 and a horse matches both. Is the combined Impact Value 1.14? (mean), or a more complicated probability formula using 12% and 16%, or some kind of weighting using the values from the impact values formula (winners\runners).

I'm on a bit of a math kick lately. Hope there are some Math majors out there.

The guy who wrote "Modern Impact Values" (I'm forgetting his name at the moment) said you multiplied the Impact Values. He warned, however, against using too many impact values at any one time. His suggestion was to forgo the work of making an odds line and just accept any horse at 4/1 or higher for your typical system.

That's my recall of the gist of what we said on the subject.

Bill C

DJofSD
10-04-2006, 01:24 PM
FWIW, the attempt to integrate multiple IV's has caused me to recall some early experiments in modern physics.

I don't have any of my text books at hand (in storage with not enough shelf space) but in any event there were experiments performed where X-rays were shot at various substances with the occasional off-axis deflection caught on film. The results were lots of "spots" that formed a pattern. Using the spots on the film and working backwards, the scientists were able to model the structure that could not be seen otherwise. This was initially done by Bragg in the 1920's. See the photograph in this Wiki article. (http://en.wikipedia.org/wiki/X-ray_crystallography)

podonne
10-04-2006, 04:15 PM
Not as simple as I'd hoped. But I read that if you had chosen your Impact Values that would apply to the horse, you could run those IVs combined as a kind of Master IV on the data set and get an IV for the horse, sort of, 1.30 means that a horse like this will win 30% more often than randomly. Of course, how would you transfer that into an odds line (or could you?).

The only problem I can see is that you would need a REALLY large dataset to get enough data points that met that Master IV to make it significant. That's interesting though, if the dataset was large enough. Hmmm

Overlay
10-04-2006, 06:21 PM
The guy who wrote "Modern Impact Values" (I'm forgetting his name at the moment) said you multiplied the Impact Values. He warned, however, against using too many impact values at any one time.

That would be Mike Nunamaker, who posts here from time to time.

Overlay
10-07-2006, 05:50 PM
The guy who wrote "Modern Impact Values" (I'm forgetting his name at the moment) said you multiplied the Impact Values. He warned, however, against using too many impact values at any one time. His suggestion was to forgo the work of making an odds line and just accept any horse at 4/1 or higher for your typical system.

Just for the record, here were Mike Nunamaker's complete thoughts, verbatim:

"The first author that I am aware of to construct systems using Impact Values was Fred Davis back in 1974. The method produced a healthy winning percentage back then, and it still does. One of the beauties of Impact Value-based systems is that they allow the handicapper to combine as many factors as he or she wishes to see which horse is really better. Is a horse with the highest speed rating and no recent race better than the horse with the second highest speed rating and a recent race? Impact Value-based systems can provide the answer to this question.

"So how does it work? First you must identify which factors you are going to apply. For example, perhaps you will use the speed rating in the last race, recency, and last finish. Once you've decided on these, you look up the impact value for each horse for each factor you are using. When I used to handicap with paper and pencil, I wrote these numbers either in the margin of the form, or on a separate sheet of paper. Now you multiply each of the numbers together and you have a rating for each horse. The horse with the highest rating is your play. That's it!

"The worst pitfall is that the factors you include in a system overlap each other. Mathematically, the multiplying of impact values together is really only a valid thing to do if they are independent of each other. For example, putting a factor based on speed ratings and one based on workouts into the same system general works very well because there is little similarity between the two factors. But combining two factors that measure the same thing generally doesn't work. For example, using the best speed rating in the last 30 days and the speed rating from the last race is probably a pretty bad combination. They are measuring things that are very close (speed in both cases) so that factor will tend to be overemphasized. Of course, if you want to strongly emphasize one type of factor over another, putting more than one type of factor into a system can make sense. But, in general, you want to be very cautious about putting more than one factor from a certain handicapping area into a system.

"Another pitfall that traps people is simply including too many factors in a system. It has been my experience that if you include more than about eight factors in a system, the results that you get will not be very accurate. It seems that with so many factors, there are bound to be so much overlapping that certain areas of handicapping will be dramatically over-represented."

"At this point, presumably you've picked which factors you are going to apply, looked up the impact values for each factor for each horse, and multiplied them together so you have a single rating for each horse. Now what do you do? If you always bet the top pick, you are done. Just bet the horse with the highest rating. But if you want to look for value, you can calculate an odds line from the ratings that you've just created. Simply add up the ratings for all the horses. Each horse's chance of winning is his rating divided by the sum of all the ratings. For example, say we have a field of five horses:

Dry Water, 1.1, 17.2%, 4.81
Wet Sand, 0.7, 10.9%, 8.17
Super horse, 2.1, 32.8%, 2.05
Garbage Truck, 2.0, 31.2%, 2.20
Show Boat, 0.51, 7.9%, 11.65


"The total rating for the race, or the sum of the rating of all the horses, is 6.41. This was calculated by addition (1.1 + 0.7 + 2.1 + 2.0 + 0.51). We find the percentage chance each horse will win by dividing each horse's rating by 6.41, which is the sum of all the ratings. Now that we have the percentage chance, we can calculate the odds needed to break even by dividing one by the percent chance to win and subtracting one.

"Keep in mind that these odds will only have you breaking even, and even then only if the combination of factors that you have chosen to build a system is perfect. Because you probably want to do more than just break even and no one's system is perfect, you will maximize your profits or minimize your loss if you restrict yourself to horses that are going off at odds substantially higher than what you calculate. A margin for error of 50% is pretty safe. For example, if you calculate that a horse should go off at 2-1 odds, don't play it unless it is going off at 3-1 odds.

"A reasonable substitute for all of this is to simply play any top ranked horse that is going off at odds of 4-1 or more. If a horse has the highest ranking, it will almost surely be profitable at that odds range, and all of the time needed to calculate an odds line is done away with."

garyoz
10-07-2006, 09:15 PM
Do a correlation matrix of the variables. If they have a positive correlation greater than .70 (and that is being very lenient) than you will be double counting in your model. I am certain that you will find many variables that are highly correlated. "Impact values" is a modified form of regression analysis, and pretty much a an improperly applied one. This board has many threads and posts on the topic. I suggest that you do a search of the term "multiple regression" There are statistical reasons why you can't combine highly correlated variables (I believe they were referred to as "overlapping" in earlier posts).

IMHO all IV's will do is identify favorites.

podonne
10-08-2006, 02:38 AM
Well, I'm trying to use them to eliminate horses, I figure I can sort the list by IV and start at the bottom eliminating horses untilI get to some number left.