Algorithms Regression etc - Page 2 - Horse Racing Forum - PaceAdvantage.Com

classhandicapper · 08-22-2017, 03:46 PM

Quote:

Originally Posted by DeltaLover

Does this mean that you are developing separate models based on the number of the starters?

I thought about that, but the samples might start getting too small.

classhandicapper · 08-22-2017, 03:59 PM

Quote:

Originally Posted by sjk

If you are doing a linear model you want to do something to limit the effect of a horse being beat double digit lengths, such as limit the beaten length parameter to a predetermined value or you might use 10 minus beaten lengths and bottom out at 0.

On the theory that a horse that runs last in a 6 horse field hasn't really done anything better than a horse that runs last in a 12 horse field you might limit the finish position (or use 6 minus fin position as above).

The speed and class numbers should probably be in relation to what would be expected at that level.

I would be leery of the information that is only available for winners. That is probably not going to fit in a nice linear manner with the others.

1. Finding some cutoffs sounds like a good idea.

2. Field size of last race is one of the inputs I am using. So I would hope it would know for example that a horse that beat a 12 horse field did more than a horse that beat a 5 horse field. But perhaps I can change that into a somewhat higher quality/expressive input. In my own "intuitive" model I use a fixed value adjustment per number of starters, but I was never satisfied I had a great way of doing it. It just works better than not doing it. That's the kind of thing I was hoping to learn here.

3. The way I did the figures was to find the maximum value in each race and set that to 100.

So for example:

If the top figure in a race was 115, I set it to 100 and then lowered the figure for each of the other horses in the race by 15 to keep the relationships the same.

If the top figure in the race was 75, I set it to 100 and then raised the figure for each of the other horses by 25 to keep the relationships the same.

classhandicapper · 08-22-2017, 04:16 PM

Quote:

Originally Posted by classhandicapper

1. Finding some cutoffs sounds like a good idea.

2. Field size of last race is one of the inputs I am using. So I would hope it would know for example that a horse that beat a 12 horse field did more than a horse that beat a 5 horse field. But perhaps I can change that into a somewhat higher quality/expressive input. In my own "intuitive" model I use a fixed value adjustment per number of starters, but I was never satisfied I had a great way of doing it. It just works better than not doing it. That's the kind of thing I was hoping to learn here.

3. The way I did the figures was to find the maximum value in each race and set that to 100.

So for example:

If the top figure in a race was 115, I set it to 100 and then lowered the figure for each of the other horses in the race by 15 to keep the relationships the same.

If the top figure in the race was 75, I set it to 100 and then raised the figure for each of the other horses by 25 to keep the relationships the same.

4. I think it might also help to add a figure rank as someone suggested earlier.

DeltaLover · 08-22-2017, 04:25 PM

Quote:

Originally Posted by classhandicapper

I thought about that, but the samples might start getting too small.

So in this case how do you structure your patterns? I am asking this because the input to the logistic regression must be of the same size, so if you vary it you will need to somehow fill the missing values with some normalized data, something that can become very challenging.

classhandicapper · 08-23-2017, 04:38 PM

Quote:

Originally Posted by DeltaLover

So in this case how do you structure your patterns? I am asking this because the input to the logistic regression must be of the same size, so if you vary it you will need to somehow fill the missing values with some normalized data, something that can become very challenging.

If every race has to be the same field size or it will break, that might be one of the issues I have. It can't be a major issue though because the results are good. They just aren't as good as my intuitive weights.

I'm still thinking about it. At least I'm at the stage where I am getting output and learning.

traveler · 08-23-2017, 09:58 PM

If you have 1000 races, study the winners only and horses who finished say less than 1 length behind - so you got say 1100 horses.

Take the factor that produced the most winners, remove those winners from your dataset, now what factor grabs the most winners from the remaining data.

You got 2 factors weight them the same and then raise and lower the weights to get your best answer, add a 3rd factor rinse and repeat. You need to use a "sample" of your total database so once you have it built you can test against some different data.

You'd be better off studying what makes the favorites lose but few ever want to hear that.

Good luck.

classhandicapper · 08-25-2017, 11:52 AM

Quote:

Originally Posted by traveler

If you have 1000 races, study the winners only and horses who finished say less than 1 length behind - so you got say 1100 horses.

Take the factor that produced the most winners, remove those winners from your dataset, now what factor grabs the most winners from the remaining data.

You got 2 factors weight them the same and then raise and lower the weights to get your best answer, add a 3rd factor rinse and repeat. You need to use a "sample" of your total database so once you have it built you can test against some different data.

You'd be better off studying what makes the favorites lose but few ever want to hear that.

Good luck.

1. I like the idea of studying losing favorites. I already do that on some level via queries against my database. I hadn't thought about using regression.

2. On your first point, I've tried things like that but you run into issues. That's why I am trying to use a more formal regression.

It's very easy to find the appropriate weights when you have 2 factors, but when you add a 3rd, 4th, 5th etc... it gets trickier.

To make it simple, let's say my research says that factor 1 and factor 2 should each be weighted at 50% to get the optimal result.

Next I use the combination of 1 and 2 with factor 3.

Let's say it says Factor 3 should be 20%.

That means factor 1 and 2 are 40% each and factor 3 is 20%.

That may be a good answer, but not necessarily the correct answer because a lot of stats overlap to some degree. A formal regression might come up with a better result by using different weights.

Red Knave · 08-25-2017, 12:07 PM

Quote:

Originally Posted by classhandicapper

If the top figure in a race was 115, I set it to 100 and then lowered the figure for each of the other horses in the race by 15 to keep the relationships the same.

If the top figure in the race was 75, I set it to 100 and then raised the figure for each of the other horses by 25 to keep the relationships the same.

This would work better if you use the rankings rather than the values.
Simply adding or subtracting in order to normalize the values will change their relationships to one another. In order to keep relationships the same you should calculate what is required to change your max value to equal 100 and then use that to modify the other values.
i.e. - 100 / 115 = 0.87 and 100 / 75 = 1.33 so multiply the values by these quotients to get them in the same range.

mikesal57 · 08-27-2017, 09:18 AM

Quote:

Originally Posted by traveler

Take the factor that produced the most winners, remove those winners from your dataset, now what factor grabs the most winners from the remaining data.

Good luck.

Would it be better to run another query with that top factor part of it???

Doesn't a winning horse have some sort of relationship with each factor?

Say your top factor is a class based one...if you take it out , its like starting from scratch again....But if you leave it in than other factors will take that one into the mix...

just a thought..

mike

classhandicapper · 08-27-2017, 06:13 PM

Quote:

Originally Posted by Red Knave

This would work better if you use the rankings rather than the values.
Simply adding or subtracting in order to normalize the values will change their relationships to one another. In order to keep relationships the same you should calculate what is required to change your max value to equal 100 and then use that to modify the other values.
i.e. - 100 / 115 = 0.87 and 100 / 75 = 1.33 so multiply the values by these quotients to get them in the same range.

Are you essentially saying that 30 to 15 is different than 100 to 85 even though both are a 15 point difference?

I thought about that and agree that's clearly the case with math, but I'm not so sure that the case when we are talking about the difference between horses because it represents a fixed number of lengths.

If some horse at Finger Lakes is 5 lengths faster than his opposition is that different than if Arrogate is 5 lengths faster than Gun Runner?

I can try it.

Red Knave · 08-29-2017, 07:46 AM

Quote:

Originally Posted by classhandicapper

Are you essentially saying that 30 to 15 is different than 100 to 85 even though both are a 15 point difference?

Yes. And my thought was more that the 2nd, 3rd or 4th rank will be unduly rewarded or penalized by simple adding/subtracting. Especially if these ratings flow to impact other ratings.

JJMartin · 08-29-2017, 02:25 PM

Quote:

Originally Posted by classhandicapper

1. I like the idea of studying losing favorites.

Study the winners in the losing favorites races. Especially when they are 3rd or higher ranking in post time odds.

ReplayRandall · 08-29-2017, 02:32 PM

Quote:

Originally Posted by JJMartin

Study the winners in the losing favorites races. Especially when they are 3rd or higher ranking in post time odds.

Now we finally have something to delve into...