Stats Question Normal Distribution [Archive] - Horse Racing Forum - PaceAdvantage.Com

Robert Goren

01-02-2011, 03:25 PM

Lets say we have 10 horses each one which on the average runs an SR of 90 with a standard deviation of 5. If they run a hundred(or whatever)races, what will be the average high SR of the races and what will be the SD of the high SR? How do you arrive at it?

TrifectaMike

01-02-2011, 04:36 PM

Lets say we have 10 horses each one which on the average runs an SR of 90 with a standard deviation of 5. If they run a hundred(or whatever)races, what will be the average high SR of the races and what will be the SD of the high SR? How do you arrive at it?

Robert,

Give this a bit more thought. I don't believe you are asking the correct question.

Mike

Robert Goren

01-02-2011, 05:01 PM

Robert,

Give this a bit more thought. I don't believe you are asking the correct question.

Mike I don't know how else to word it. maybe this way. if you had a class with an average of 90 on their test scores and and the SD of the test scores is 5, if you drew out a sample of 10 students' tests. if you did this over and over again what would be the average of the highest test scores in the samples and what would the highest test scores' SD? How do you get the answers?

TrifectaMike

01-02-2011, 05:07 PM

I don't know how else to word it. maybe this way. if you had a class with an average of 90 on their test scores and and the SD of the test scores is 5, if you drew out a sample of 10 students' tests. if you did this over and over again what would be the average of the highest test scores in the samples and what would the highest test scores' SD? How do you get the answers?

Robert, back to your horse question.

Using the parameters you've described same mean and standard deviation for each horse, take a piece a paper and draw a bell curve on the same chart for each horse. Do that, and tell me what you see.

Mike

Robert Goren

01-02-2011, 05:18 PM

I see your point, very badly worded. but i still want to know what on average the top rating of that race would be. It will be above 90 but how much? I figured this out in my stats class back in 1967, but have forgotten how.

Robert Goren

01-02-2011, 05:24 PM

I know this much. the average of the average for all 10 horses ( or test papers) in the samples would be 90 with a sd of 5/sq root of 10( or maybe 9).

CBedo

01-02-2011, 07:28 PM

i just did a quick simulation of 200 races with 10 horses in each race, all with 90 mean speed rating, stdev of 5, and came out with an average max (winning) speed rating of 97.7.

Just reran with a sample of 3000 races and comes up with:

mean 97.8
median 97.6

CBedo

01-02-2011, 07:34 PM

FYI, for those who want to do do this in excel, the easy way to generate a random number from the normal distribution would be with the formula:

= NORM.INV(RAND(), mean, standard_deviation)

In older versions of Excel, the function was just NORMINV, but there were issues with the rand() function as well as the norm.inv functions that made it not quite accurate (but for most purposes ok). The rand() seems to be better now (used to have autocorrelation issues), but the norm.inv still doesn't seem perfect (according to real math guys, not me, lol).

The more accurate way to generate a random number from the normal distribution is with this formula:

=SQRT(-2*LN(RAND()))*SIN(2*PI()*RAND())

Then multiply it by your standard deviation and add your mean.

TrifectaMike

01-02-2011, 09:48 PM

FYI, for those who want to do do this in excel, the easy way to generate a random number from the normal distribution would be with the formula:

= NORM.INV(RAND(), mean, standard_deviation)

In older versions of Excel, the function was just NORMINV, but there were issues with the rand() function as well as the norm.inv functions that made it not quite accurate (but for most purposes ok). The rand() seems to be better now (used to have autocorrelation issues), but the norm.inv still doesn't seem perfect (according to real math guys, not me, lol).

The more accurate way to generate a random number from the normal distribution is with this formula:

=SQRT(-2*LN(RAND()))*SIN(2*PI()*RAND())

Then multiply it by your standard deviation and add your mean.

Or the very cheap way...

Generate three random numbers from -1 to 1 and sum them up.

Then multiply the sum by the standard deviation and add the mean. Good for paper and pencil too.

Mike

CBedo

01-02-2011, 10:30 PM

Or the very cheap way...

Generate three random numbers from -1 to 1 and sum them up.

Then multiply the sum by the standard deviation and add the mean. Good for paper and pencil too.

MikeThat seems much easier than my way!

Just for fun, I simulated races with from 2 to 15 entries (sample size 1,000,000 races for each), each entry having the same average speed and the same standard deviation, to see how many standard units greater than the mean the average winner's speed would be (so to find average winning speed for a race with n entries, look up n in the table and then multiply by the standard units and add the mean speed).

n Max
2 0.5646148
3 0.8457556
4 1.030093
5 1.163620
6 1.266904
7 1.351474
8 1.423697
9 1.484695
10 1.538076
11 1.586184
12 1.629051
13 1.667971
14 1.703604
15 1.735989

Mike, is there a function or other way of approximating these expected values without having to do a simulation?

Robert Goren

01-02-2011, 10:35 PM

i just did a quick simulation of 200 races with 10 horses in each race, all with 90 mean speed rating, stdev of 5, and came out with an average max (winning) speed rating of 97.7.

Just reran with a sample of 3000 races and comes up with:

mean 97.8
median 97.6Thanks, I get the picture now.

TrifectaMike

01-02-2011, 11:45 PM

That seems much easier than my way!

Just for fun, I simulated races with from 2 to 15 entries (sample size 1,000,000 races for each), each entry having the same average speed and the same standard deviation, to see how many standard units greater than the mean the average winner's speed would be (so to find average winning speed for a race with n entries, look up n in the table and then multiply by the standard units and add the mean speed).

n Max
2 0.5646148
3 0.8457556
4 1.030093
5 1.163620
6 1.266904
7 1.351474
8 1.423697
9 1.484695
10 1.538076
11 1.586184
12 1.629051
13 1.667971
14 1.703604
15 1.735989

Mike, is there a function or other way of approximating these expected values without having to do a simulation?

What you've done is probably the easiest way to analyze the situation. A closed form solution I don't believe so. A more formal approach of generating a multivariate normal with mean mu and and covariance matrix SIGMA is not any different for the purpose at hand.

Mike

Skanoochies

01-03-2011, 12:27 AM

Wow. I need a Tylenol Extra. :bang:

raybo

01-03-2011, 06:16 PM

Wow. I need a Tylenol Extra. :bang:

Yeah, kinda like listening to 2 or 3 engineers talking shop isn't it. Means absolutely nothing to the rest of us.

CBedo

01-03-2011, 06:26 PM

Interestingly, in a quick sample I just ran of claiming races in 2010 that had from 5 to 13 runners, the average winning speed stays roughly the same,not go up. What seems to happen is that the standard deviation drops as the number of runners increase.

raybo

01-03-2011, 06:31 PM

Interestingly, in a quick sample I just ran of claiming races in 2010 that had from 5 to 13 runners, the average winning speed stays roughly the same,not go up. What seems to happen is that the standard deviation drops as the number of runners increase.

I'm not a stats guy but wouldn't that be expected? More horses, same winning speeds = less deviation, vs, less horses, same speeds = more deviation?

sjk

01-03-2011, 07:08 PM

Mike, is there a function or other way of approximating these expected values without having to do a simulation?

It seems to me you can get these numbers by finding the point (in a lookup table) on the normal curve where the area under the curve and and to the left of the point are roots of 1/2.

The assumption that all of the horses have the same distribution simplifies matters here.

CBedo

01-03-2011, 07:10 PM

I'm not a stats guy but wouldn't that be expected? More horses, same winning speeds = less deviation, vs, less horses, same speeds = more deviation?Seems logical to me that the variance would be reduced if the winning figure doesn't change.

Greyfox

01-03-2011, 09:06 PM

Sample sizes with only 10 horses lead to questionable findings.

Without doing any trials on a normal curve over 99 % of the population is captured within 3 standard deviations both ways.
Theoretically, if the average is 90, after thousands of those trials the most likely max is to be 105 or 90 + 3 SD (5).
However, a normal curve would hardly work. Some horses will drop coming out of the gate and get a SR of 0.

CBedo

01-03-2011, 09:18 PM

Sample sizes with only 10 horses lead to questionable findings.

Without doing any trials on a normal curve over 99 % of the population is captured within 3 standard deviations both ways.
Theoretically, if the average is 90, after thousands of those trials the most likely max is to be 105 or 90 + 3 SD (5).
However, a normal curve would hardly work. Some horses will drop coming out of the gate and get a SR of 0.Of course there will always be outliers (roughly 2.5% of winning races will be over 100), but as you can see from the table in post 10, using these assumptions, even with 15 horses the average winning figure will be less than 100 (1.74 std units above the mean).

And the sample size isn't 10. That's the number of runners in each race.

Native Texan III

01-05-2011, 05:09 PM

Of course there will always be outliers (roughly 2.5% of winning races will be over 100), but as you can see from the table in post 10, using these assumptions, even with 15 horses the average winning figure will be less than 100 (1.74 std units above the mean).

And the sample size isn't 10. That's the number of runners in each race.

As I understand it the 3 deviations is talking about the individual horse having a normal distribution of time ratings whereas your most interesting test is against races with varying number of horses where the fastest one wins but only has to run faster but not necessarily its fastest and you have taken the assumptions on normal distribution as factual.

Real races do not follow that result as the number of competitors increase. The error alone in making any individual time ratings is of the order +/- 5 to 10 points. There is also the controversial bounce theory saying that horses will slow after an exceptional effort - what is exceptional in the past results for that particular horse? So your model presumes accurate ratings and for a normal distribution a horse with say a mean speed of 85 and SD of 5 having a lowest rating of 70 and a high potential rating of 100.

There is no evidence that the time ratings do follow any normal curve for any individual horses. Also some folks are trying to predict that a horse that has never run anywhere near that predicted faster rating over a series of competitive race tests has some hidden reserve of "speed" that can be determined or even estimated mathematically without knowing anything about the horse's fitness, drug and injury history etc etc. A horse may be able to run faster but it is most unlikely.

CBedo

01-05-2011, 05:36 PM

As I understand it the 3 deviations is talking about the individual horse having a normal distribution of time ratings whereas your most interesting test is against races with varying number of horses where the fastest one wins but only has to run faster but not necessarily its fastest and you have taken the assumptions on normal distribution as factual.

Real races do not follow that result as the number of competitors increase. The error alone in making any individual time ratings is of the order +/- 5 to 10 points. There is also the controversial bounce theory saying that horses will slow after an exceptional effort - what is exceptional in the past results for that particular horse? So your model presumes accurate ratings and for a normal distribution a horse with say a mean speed of 85 and SD of 5 having a lowest rating of 70 and a high potential rating of 100.

There is no evidence that the time ratings do follow any normal curve for any individual horses. Also some folks are trying to predict that a horse that has never run anywhere near that predicted faster rating over a series of competitive race tests has some hidden reserve of "speed" that can be determined or even estimated mathematically without knowing anything about the horse's fitness, drug and injury history etc etc. A horse may be able to run faster but it is most unlikely.I'm not sure what your point is, but I enjoyed reading it... :rolleyes:

The deviation was for the average winning speed rating, not an individual horse.

The whole discussion has been about race results, nothing about individual horses, and if you notice in one of the previous posts, it has been mentioned already that the reality doesn't synch up with the simulated results, but the simulated results were produced to answer Robert's original question.