PDA

View Full Version : Ummmm another question


tilson
05-16-2001, 09:41 AM
Here is another question which i quess is sorta an off shoot of my 1st question.
For the guys who say that they have large data bases this might be a question you can get a good beat on.
Assuming one uses the same methodology in a selection process, at what point would the average win price not vary much ?
I realise in maybe only 50 races a run of favorites or longshots could skew an average win price greatly....but i am also fairly sure the ability of several winners to do the same would be greatly limited when the sample size became large enough.
I would think this is a question that wouldn't be ultra dependent on the countless variables associated with handicapping, rather this is a question that is almost exclusively tied into the functional principles of math and odds.
I truely believe that chaos is as much a part of reality as order is, however there are averages that seem to be universal independent of such things as weather, local etc etc etc
ie: Favorites will win 30 to 34% of all races at virtaully any race meet if the sample size of races is large enough.

Larry Hamilton
05-16-2001, 09:58 AM
You want to know at what point do the daily events settle around the average. At least, this is what I read. Let me answer a question with a question.
you state that you know the favorite wins 1/3 of time time and that maybe the oldest known stat around. Here is the question: Even knowing there is a huge data base behind "1/3 of favs win", havent you seen at the track a day goes by without a single favorite winning?

"How can this be?" you scream, " My stats say 1 in 3 is ordained to win"

Its all relative, 1 in 3 is accually inaccurate. A correct restatement is 10 in 30 or 100 in 300 or 1000 in 3000 or 10,000 in 30,000. You may have noticed that the larger the sample you pick, the more "losers" you assemble. So, why would it be alarming for 9 favorites in a row to lose?

The answer to your question is, for the purposes of gambling--never.

I dont really know if I am answering the question that was asked, but what the hell, I never really know.

tilson
05-16-2001, 10:39 AM
I understand what you are saying and i view it only as a 1/2 truth....yes the number of losers became greater but so too did the number of winners and in your examples it was an algerbraic formula that sysmetrically influenced both sides of the equation.
I might also add that to illustrate my example.....if you had a 100 yard dash with a 1 sec diif between the 1st and 2nd place finishers.....i don't think anyone would argue that the 1sec diff made the 1st place finisher clearly superior.....But had the race been 1 mile with a 1 sec diff the competitors would be viewed as VERY evenly matched. Thus while the 1 sec diff remained the same, it meant something entirely diff when measured vs a greater distance.....This would clearly illustrate that the function of time vs distance is not linear rather it is more representive of a type of parabla.
Thus the same should be true of average price vs sample size........as the sample gets bigger, the ability of 1 winner to skew an average becomes greatly diminished.

Dave Schwartz
05-16-2001, 10:46 AM
Tilson,

There is no easy answer to your questions... But, if I had to venture an educated guess, I would say that the number of races you would need to stabilize your database studies and reality so that the future is very predictable (for lack of a better word) is huge and way beyond anyone's ability to find such a number.

What I am saying is that we cannot PROVE anything!

Consider:

1. To Prove something one must have a "large" sample.
2. To beat the races one must segment the races into "similar" groups for study.
3. Then, after one has built a "system" there must still be enough races left in the segment that have never been looked at to prove the validity of the study.

It is this "segmentation" that destroys the entire validation process!

We have a database that is over 160,000 races. Yet, when we begin studying we find that we (typically) need to break down the data by surface, distance, race type, age, sex, track condition and field size just to get started.

Consider those permutations alone:

Surface: 2 choices (dirt, turf)

Distances: Assume a minimum of 4 choices (dash, sprint, route, marathon) and one could easily use more.

Race Type: Can be very difficult. We see at least 7 if not more (CL, NW, AL, Hc/Gr, Starter, MSW, Md)

Age: 4 choices (2, 3, 3u, 4u)

Sex: 2 choices

Track Condition: 2 choices (fst/gd, off)

Field Size: At least 4 choices. (2-4, 5-6, 7-9, 10+)

Now, do the math: (i.e. multiply the choices together)

2 x 4 x 7 x 4 x 2 x 2 x 4 = 3584

Thus, we are saying there are 3,584 permutations. If we divide our 160,000 races by the number of permutations we get...

160,000 / 3,584 = 44.6

... an average of only 44.6 races per segment! Now, as you can imagine, there will be some segments which are very common (i.e. 6f claiming races on a dirt fast track with 7-9 horses for 4up males). But just as the segment size of the more common samples go up, the less-common ones go down.

So, the question becomes, "How does one build and test any system against small segments?"

The answer is that you cannot!

If you are interested I shall explain what we did (and are doing in the future) to solve the problem.

Larry Hamilton
05-16-2001, 10:49 AM
if you are talking about the influence of todays race on an average (in the db), then I couldnt agree more. But, I thought you wanted to know what is then effect of the db on todays results. For instance, can I expect the payoff of my winners to get close to the average. And that is a totally different question

sorry

I just read daves response, SO I must add this following addendum:

"what dave said"

Rick Ransom
05-16-2001, 02:31 PM
The distribution of winning prices is very asymmetrical (probably lognormal), causing there to be a large variance in the mean, even in large samples. It's like trying to pin down what the typical home sells for in your city. One home selling for a million dollars would make the average price totally unrepresentative of what homes usually sell for. That is why they use the median price more often in real estate. I guess you could try using the median price and assuming that the mean in a really large sample would be some multiple of that. I don't know what the multiplier would be or even whether it would be relatively constant for different types of methods. If you're the real estate agent who sells the million dollar house or the horse player who gets the 100 dollar winner, you don't care about this. If you're one of the rest, you might wait a long time for the averages to "even out". Sorry, but that's the way it really is. Horse racing makes a better investment than an income.

Big Bill
05-16-2001, 03:31 PM
Dave,

You post to this thread stated, in part:

Consider those permutations alone:

Track Condition: 2 choices (fst/gd, off)

Am I wrong to conclude that you see little difference between
a fast track and a good track, particularly in selecting a pace-
line for rating purposes? I've gone back and forth on this. First
using both fast and good tracks, and then later using only
fast tracks.

In my desire to find out what others on the board thought
about this, I started a thread last year but didn't get much of
a response.

Big Bill

Dave Schwartz
05-16-2001, 03:36 PM
Bill,

No, actually I see a huge difference in fst and gd tracks. I just did not want to get into explaining that. In reality, our database contains about 20,000 race shapes because we split the hairs a little finer.

We have the capability to break it down by specific "off" tracks as well. We find that it is a less-than-fruitful endeavor because you just do not get any sample size at all.