Weighing Factors [Archive] - Horse Racing Forum - PaceAdvantage.Com

mikesal57

01-21-2015, 12:47 PM

Hi All...

Say you have some sort of an algorithm to create a number for Early, Late, Form, Class, and Speed number.

How would you go about creating a over-all number by weighing each factor ?

Ex:

Early- 30
Late- 10
Form-15
Class-25
Sp #- 20
----------
100

P.S. Any misc facors you can think of?

Thxs

Mike

cj

01-21-2015, 06:24 PM

I don't know if I'd leave out the trainer, definitely not miscellaneous.

mikesal57

01-21-2015, 07:13 PM

I don't know if I'd leave out the trainer, definitely not miscellaneous.

Thxs CJ...

I believe that can slip in there ...at what % ??

I thought I'd get some more responses with all the computer guys on here that create power figures :(

cj

01-21-2015, 07:28 PM

Thxs CJ...

I believe that can slip in there ...at what % ??

I thought I'd get some more responses with all the computer guys on here that create power figures :(

I think it depends to be honest. If it is a new trainer, I'd look at the old and new trainer and the bigger the difference the higher the percentage. There are times that is the ONLY factor I consider for a horse.

As for the rest, I think it varies by surface, class, distance, etc. I've done similar to what you are doing to create an odds line, not a power number, though in the end they are kind of doing the same thing.

I guess it depends how complicated you want to make things. Usually in this game, simplistic doesn't cut it. I think a power rating is a good idea and an great start on making a good betting line. But I don't think the formula for a power rating can be generic. It should vary not only race to race, but horse to horse within the same race. Just my two cents...

Dave Schwartz

01-21-2015, 07:58 PM

I would start with a Fibonacci weighting, like:

100
62
38
24
15
9

I would put the factors in, with the factor I deem most important at the top, and progress downward from there.

Is the weighting "correct?" Of course not. But it is a good starting point for you do go out and handicap about 500 races, race-by-race.

big frank

01-21-2015, 09:01 PM

betmix ?

mikesal57

01-21-2015, 09:13 PM

I would start with a Fibonacci weighting, like:

100
62
38
24
15
9

I would put the factors in, with the factor I deem most important at the top, and progress downward from there.

Is the weighting "correct?" Of course not. But it is a good starting point for you do go out and handicap about 500 races, race-by-race.

Thanks Dave...good to hear from you again...it sounds like a plan :)

mike

traveler

01-21-2015, 10:01 PM

Take what you decide are the most predictive factors - lets say speed and early speed(and yes trainer may be the most important) and weight them at 50% each and see what your result is. Then raise the weight of speed, better result, raise it a little more, worse result drop it back, try raising lowering the early speed factor. When you've dialed in as good as you can, add a third factor and repeat the process.

I don't believe you can start with 5 factors and randomly or with a fibonacci sequence assign weights and proceed. The problem is which one do you change the weight of first? You need to start small, dial in the weighting and add factors.

Given that almost all factors are depenedent on each other and each factor will mean more or less in any given race as CJ points out, the task can quickly become daunting.

If you don't have a large database and the ability to run each newly weighted power number against your sample you will need a large supply of coffee.

upthecreek

12-28-2015, 10:47 AM

I include BH in my calculations Here's the problem--a horse that finishes 3rd in a field of 6 has beaten horse % of 50% A horse that finishes 6/12 has the same %,yet has done more by beating 6 horses Any way to weight the 6/12 to count more than he 3/6? I thought of adding bh- 53 to 56 but it doesn't seem to make much difference
Thnx in advance!

cj

12-28-2015, 01:02 PM

I include BH in my calculations Here's the problem--a horse that finishes 3rd in a field of 6 has beaten horse % of 50% A horse that finishes 6/12 has the same %,yet has done more by beating 6 horses Any way to weight the 6/12 to count more than he 3/6? I thought of adding bh- 53 to 56 but it doesn't seem to make much difference
Thnx in advance!

There are lots of ways to do it, just depends on how significant you think it is. Personally I'd probably consider them equal but that doesn't help answer the question, so here are a few alternatives.

1. Make it more significant by doubling the horses beaten, so you'd have 56 and 62, or triple it for 59 and 68. You could use whatever factor you wanted.

2. Use the beaten lengths instead of finish position. Try 1 - beaten lengths / field size instead. So if a horse is beaten 3 lengths in a 12 horse field you get 1 - 3 / 12 = .75. If the horse is beaten 3 lengths in a 6 horse field you get 1 - 3 / 6 = .50.

3. Add field size to the percentage. So 50+6 = 56 and 50 + 12 = 62.

Just a few ideas to get you thinking, not really suggesting any of them.

raybo

12-28-2015, 01:22 PM

The last few months I have been messing around with a new program/Excel workbook that I named, "weighted factors". Right now I have 10 factors, each of which can be weighted, or not used at all.

It is not fully automated yet, so I don't currently have the ability to "test" tracks like we do in my Black Box program. I have been looking at individual races.

So far, I agree with CJ, the factors used, and the weighting of each is dependent on track, surface, distance, and class, at the very least. One size definitely DOES NOT fit all.

upthecreek

12-28-2015, 02:26 PM

There are lots of ways to do it, just depends on how significant you think it is. Personally I'd probably consider them equal but that doesn't help answer the question, so here are a few alternatives.

1. Make it more significant by doubling the horses beaten, so you'd have 56 and 62, or triple it for 59 and 68. You could use whatever factor you wanted.

2. Use the beaten lengths instead of finish position. Try 1 - beaten lengths / field size instead. So if a horse is beaten 3 lengths in a 12 horse field you get 1 - 3 / 12 = .75. If the horse is beaten 3 lengths in a 6 horse field you get 1 - 3 / 6 = .50.

3. Add field size to the percentage. So 50+6 = 56 and 50 + 12 = 62.

Just a few ideas to get you thinking, not really suggesting any of them.
Thnx! That's what I'm looking for Going to experiment with the suggestions

classhandicapper

12-28-2015, 03:08 PM

I include BH in my calculations Here's the problem--a horse that finishes 3rd in a field of 6 has beaten horse % of 50% A horse that finishes 6/12 has the same %,yet has done more by beating 6 horses Any way to weight the 6/12 to count more than he 3/6? I thought of adding bh- 53 to 56 but it doesn't seem to make much difference
Thnx in advance!

The dilemma is that sometimes the extra horses in a larger field are all long shot fillers and sometimes some of them are other solid contenders. If you are trying to put it into a simple formula, the answer is somewhere in between - Finishing 6/12 is better than 6/8, but not nearly as good as 3/6.

If you are doing a manual analysis, you can look at the PPs of the horses in the race and see who a horse finished ahead of and who he finished behind, with what trips. That will give to the best idea of what happened.

steveb

12-28-2015, 03:27 PM

The dilemma is that sometimes the extra horses in a larger field are all long shot fillers and sometimes some of them are other solid contenders. If you are trying to put it into a simple formula, the answer is somewhere in between - Finishing 6/12 is better than 6/8, but not nearly as good as 3/6.

If you are doing a manual analysis, you can look at the PPs of the horses in the race and see who a horse finished ahead of and who he finished behind, with what trips. That will give to the best idea of what happened.

with my compliments!

' function that normalises the finish position(or some other ranking)
' where NF is the number of runners and FP is the finish position.
Public Function NFP_HORSE(ByVal NF As Integer, ByVal FP As Integer) As Double
Return (NF + 1 - (FP * 2)) / (3 * Math.Sqrt((1 / 3) * (NF + 1) / (NF - 1)) * (NF - 1))
End Function

cj

12-28-2015, 03:33 PM

The dilemma is that sometimes the extra horses in a larger field are all long shot fillers and sometimes some of them are other solid contenders. If you are trying to put it into a simple formula, the answer is somewhere in between - Finishing 6/12 is better than 6/8, but not nearly as good as 3/6.

If you are doing a manual analysis, you can look at the PPs of the horses in the race and see who a horse finished ahead of and who he finished behind, with what trips. That will give to the best idea of what happened.

How can you state as fact that finishing 6/12 is not nearly as good as 3/6?

Otherwise, I agree with your premise, but doing things manually would eliminate the need of formulas like this. I would argue that if you can think it, it can be programmed and be done more accurately and more efficiently. I can definitely program what you are talking about and I'm an amateur, as needed programmer.

classhandicapper

12-28-2015, 03:40 PM

How can you state as fact that finishing 6/12 is not nearly as good as 3/6?

Otherwise, I agree with your premise, but doing things manually would eliminate the need of formulas like this. I would argue that if you can think it, it can be programmed and be done more accurately and more efficiently. I can definitely program what you are talking about and I'm an amateur, as needed programmer.

I've tested it in my database, though granted it was not for all race types.

I think anything can be programmed also, but when it comes to trips, I don't think formulas work very well. The conditions, race flow, and horses vary so much.

cj

12-28-2015, 03:44 PM

I've tested it in my database, though granted it was not all race types.

OK, you confused me when you started talking about doing it manually.

classhandicapper

12-28-2015, 03:52 PM

OK, you confused me when you started talking about doing it manually.

My own handicapping is mostly manual analysis, but I've been creating numbers to mimic my thinking (especially on class) and then testing the concepts against races in my database. That has helped me verify some of my concepts and refined my thinking.

classhandicapper

12-28-2015, 03:54 PM

with my compliments!

' function that normalises the finish position(or some other ranking)
' where NF is the number of runners and FP is the finish position.
Public Function NFP_HORSE(ByVal NF As Integer, ByVal FP As Integer) As Double
Return (NF + 1 - (FP * 2)) / (3 * Math.Sqrt((1 / 3) * (NF + 1) / (NF - 1)) * (NF - 1))
End Function

Thanks. It's going to take me a while to grasp all that.

What code is that?

Is that a theoretical formula or has it been tested against real races?

steveb

12-28-2015, 04:06 PM

Thanks. It's going to take me a while to grasp all that.

Is that a theoretical formula or has it been tested against real races?

it's not mine(well the code is, but not the formula), so i can't vouch for it, as i don't use it
i think it came via entropy's(Alan Woods) syndicate, unless they pinched it from some place else.

i have folders full of stuff like that, and far far more detailed stuff, and part of me would love to release it all, as it may encourage them to pay their debts.

steveb

12-28-2015, 04:57 PM

What code is that?

it is just vb .net

BCOURTNEY

12-28-2015, 06:06 PM

Thanks. It's going to take me a while to grasp all that.

What code is that?

Is that a theoretical formula or has it been tested against real races?

I'm going to strongly recommend against this normalization approach.
It arbitrarily forces out integer values for speed in calculations (probably on older computers) and sacrifices precision in the process.

thaskalos

12-28-2015, 06:19 PM

If you are trying to put it into a simple formula, the answer is somewhere in between - Finishing 6/12 is better than 6/8, but not nearly as good as 3/6.

You are obviously more impressed by the 3/6 finish, than I am.

steveb

12-28-2015, 06:44 PM

I'm going to strongly recommend against this normalization approach.
It arbitrarily forces out integer values for speed in calculations (probably on older computers) and sacrifices precision in the process.

while i don't have any opinion on that formula, other than i think it is simplistic, i am wondering what you are saying?
what has it got to do with 'speed'

BCOURTNEY

12-28-2015, 06:57 PM

while i don't have any opinion on that formula, other than i think it is simplistic, i am wondering what you are saying?
what has it got to do with 'speed'

I wouldn't call it simplistic, it's logical but better is available now. The older machines Woods team used used integer based calculations for speed purposes, this has no relation to the speed of the animals in question. I'm saying we are beyond the 386 and 486 hardware from 30 years back. Hopefully that is clear.

Dark Target

12-28-2015, 07:02 PM

I wouldn't call it simplistic, it's logical but better is available now. The older machines Woods team used used integer based calculations for speed purposes, this has no relation to the speed of the animals in question. I'm saying we are beyond the 386 and 486 hardware from 30 years back. Hopefully that is clear.

Is it supposed to, the algorithm normalizes finish position based on field size?

classhandicapper

12-28-2015, 07:07 PM

You are obviously more impressed by the 3/6 finish, than I am.

I just go with what my class ratings say after testing them.

Obviously, you will occasionally stumble across a 6 horse field with 5 slugs. So finishing 3rd would not be especially impressive. You will also find 12 horse fields that are 8 deep with contenders. So finishing 6th will be better than it looks. But in general, a 12 horse field, while tougher than a 6 horse field, will not be so much tougher that you can you can say 6/12 is close to 3/6.

What you should really be doing anyway is looking at the makeup the field and where the horse finished relative to the quality of the other horses in the race given their trips (and not worrying so much about the field size).

If you are more speed figure oriented, the tests I have run indicated that field size has only a very marginal impact. At best it's a tiebreaker between two horses that otherwise look the same on speed figures.

I still have more work to do in this area.

steveb

12-28-2015, 07:12 PM

Is it supposed to, the algorithm normalizes finish position based on field size?

correct dt, it has nothing whatsoever to do with speed.

classhandicapper

12-28-2015, 07:17 PM

I'm going to strongly recommend against this normalization approach.
It arbitrarily forces out integer values for speed in calculations (probably on older computers) and sacrifices precision in the process.

I was playing with it a little and it produced a negative number in my first test. So either I screwed something up or I'm not sure how to use it.

The formula I use works well, but I know it has a flaw. I haven't made it a priority to try to work out something better. My initial efforts to tweak it all produced worse results than the one I am using now.

whodoyoulike

12-28-2015, 07:18 PM

I'm curious. This thread was started almost a year ago. How about some feedback just for info? ..... What did the OP decide and do?

JJMartin

12-28-2015, 07:19 PM

I just go with what my ratings say after testing them.

Obviously, you will occasionally stumble across a 6 horse field with 5 slugs. So finishing 3rd would not be especially impressive. You will also find 12 horse fields that are 8 deep with contenders. So finishing 6th may be better than it looks. But in general, a 12 horse field, while tougher than a 6 horse field, will not be so much tougher that you can you can say 6/12 is close to 3/6.

If you are more speed figure oriented, the tests I have run indicate that field size has very marginal impact. At best it's a tiebreaker between two horses that otherwise look the same.
Imo, what place they finish is not as important as their 1/4 and 1/2 times.
So a 6/12 finish has to be looked at in context. Was he racing against all bums or were they all stake class horses? That's the problem with looking at things in aggregate, you can get whacked out results.

steveb

12-28-2015, 07:30 PM

I wouldn't call it simplistic, it's logical but better is available now. The older machines Woods team used used integer based calculations for speed purposes, this has no relation to the speed of the animals in question. I'm saying we are beyond the 386 and 486 hardware from 30 years back. Hopefully that is clear.

ok, i think i am following you now, and i have no idea what they used at the time.
alan though, did tell me they went to 3 decimal places in their calculations.
but i am intrigued by.......'but better is available now'?
in what context do you mean that?
example?

BCOURTNEY

12-28-2015, 07:45 PM

Here's the formula "decoded" for the non-math inclined, No gross calculations are needed, it can be satisfied with a lookup table.
I did however take (minor) issue with this approach because the values [-x,x] are not bounded [-1,1] or [0,1] and I'm a picky person.

Field Size, Finish Position, "Score"
5 1 4
5 2 2
5 3 0
5 4 -2
5 5 -4
6 1 5
6 2 3
6 3 1
6 4 -1
6 5 -3
6 6 -5
7 1 6
7 2 4
7 3 2
7 4 0
7 5 -2
7 6 -4
7 7 -6
8 1 7
8 2 5
8 3 3
8 4 1
8 5 -1
8 6 -3
8 7 -5
8 8 -7
9 1 8
9 2 6
9 3 4
9 4 2
9 5 0
9 6 -2
9 7 -4
9 8 -6
9 9 -8
10 1 9
10 2 7
10 3 5
10 4 3
10 5 1
10 6 -1
10 7 -3
10 8 -5
10 9 -7
10 10 -9
11 1 10
11 2 8
11 3 6
11 4 4
11 5 2
11 6 0
11 7 -2
11 8 -4
11 9 -6
11 10 -8
11 11 -10
12 1 11
12 2 9
12 3 7
12 4 5
12 5 3
12 6 1
12 7 -1
12 8 -3
12 9 -5
12 10 -7
12 11 -9
12 12 -11
13 1 12
13 2 10
13 3 8
13 4 6
13 5 4
13 6 2
13 7 0
13 8 -2
13 9 -4
13 10 -6
13 11 -8
13 12 -10
13 13 -12
14 1 13
14 2 11
14 3 9
14 4 7
14 5 5
14 6 3
14 7 1
14 8 -1
14 9 -3
14 10 -5
14 11 -7
14 12 -9
14 13 -11
14 14 -13

BCOURTNEY

12-28-2015, 07:53 PM

ok, i think i am following you now, and i have no idea what they used at the time.
alan though, did tell me they went to 3 decimal places in their calculations.
but i am intrigued by.......'but better is available now'?
in what context do you mean that?
example?

Completely agree with you. The reason for truncating values in the past was to accelerate the processing. Running complex conditional logit based systems used to take a lot of horsepower and time is of course of the essence.

I would say better is available now in terms of you can't predict when the next machine learning breakthrough will occur, new formulas, tools, etc. So in that spirit I shy away from any formula that produces results or values outside of a range bounded by [-1,1] or [0,1] as this is standard for processing in today's machine learning methods and tools, and you never know when you will replace or upgrade your approach, particularly if you are successful the speed in which you can scale your system is important. When I mentioned better, I tend to want to preserve floats, doubles and all the precision I can get, individually per factor it might not be critical but errors have a tenancy to amplify quickly when data becomes co-mingled as well.

Horse racing is a game of inches to be certain, but I have found that the game of inches happens at the window. Decimal percentages here and there add up quickly (in my crazy universe)

Cheers

steveb

12-28-2015, 09:45 PM

Here's the formula "decoded" for the non-math inclined, No gross calculations are needed, it can be satisfied with a lookup table.
I did however take (minor) issue with this approach because the values [-x,x] are not bounded [-1,1] or [0,1] and I'm a picky person.

Field Size, Finish Position, "Score"
5 1 4
5 2 2
5 3 0
5 4 -2
5 5 -4
etc

not sure how you figured that.
i just run the values and they are as follows.....

5 1 0.471404520791032
5 2 0.235702260395516
5 3 0
5 4 -0.235702260395516
5 5 -0.471404520791032
6 1 0.487950036474267
6 2 0.29277002188456
6 3 0.0975900072948533
6 4 -0.0975900072948533
6 5 -0.29277002188456
6 6 -0.487950036474267
7 1 0.5
7 2 0.333333333333333
7 3 0.166666666666667
7 4 0
7 5 -0.166666666666667
7 6 -0.333333333333333
7 7 -0.5
8 1 0.509175077217316
8 2 0.363696483726654
8 3 0.218217890235992
8 4 0.0727392967453308
8 5 -0.0727392967453308
8 6 -0.218217890235992
8 7 -0.363696483726654
8 8 -0.509175077217316
9 1 0.516397779494322
9 2 0.387298334620742
9 3 0.258198889747161
9 4 0.129099444873581
9 5 0
9 6 -0.129099444873581
9 7 -0.258198889747161
9 8 -0.387298334620742
9 9 -0.516397779494322
10 1 0.522232967867094
10 2 0.406181197229962
10 3 0.29012942659283
10 4 0.174077655955698
10 5 0.0580258853185659
10 6 -0.0580258853185659
10 7 -0.174077655955698
10 8 -0.29012942659283
10 9 -0.406181197229962
10 10 -0.522232967867094
11 1 0.52704627669473
11 2 0.421637021355784
11 3 0.316227766016838
11 4 0.210818510677892
11 5 0.105409255338946
11 6 0
11 7 -0.105409255338946
11 8 -0.210818510677892
11 9 -0.316227766016838
11 10 -0.421637021355784
11 11 -0.52704627669473
12 1 0.531085004543794
12 2 0.434524094626741
12 3 0.337963184709687
12 4 0.241402274792634
12 5 0.14484136487558
12 6 0.0482804549585268
12 7 -0.0482804549585268
12 8 -0.14484136487558
12 9 -0.241402274792634
12 10 -0.337963184709687
12 11 -0.434524094626741
12 12 -0.531085004543794
13 1 0.534522483824849
13 2 0.445435403187374
13 3 0.356348322549899
13 4 0.267261241912424
13 5 0.17817416127495
13 6 0.0890870806374748
13 7 0
13 8 -0.0890870806374748
13 9 -0.17817416127495
13 10 -0.267261241912424
13 11 -0.356348322549899
13 12 -0.445435403187374
13 13 -0.534522483824849
14 1 0.53748384988657
14 2 0.454794026827098
14 3 0.372104203767625
14 4 0.289414380708153
14 5 0.206724557648681
14 6 0.124034734589208
14 7 0.0413449115297361
14 8 -0.0413449115297361
14 9 -0.124034734589208
14 10 -0.206724557648681
14 11 -0.289414380708153
14 12 -0.372104203767625
14 13 -0.454794026827098
14 14 -0.53748384988657

.....although i personally have no use for it.
and it's only the z score divided by -3 anyway.

cj

12-28-2015, 10:27 PM

not sure how you figured that.
i just run the values and they are as follows.....

Attached is a spreadsheet that shows these if anyone is interested in seeing the math.

cj

12-28-2015, 10:30 PM

I'm also a little confused on the talk of integers. Field size and finish position will always be an integer (unless you want to split for dead heats) and the output wasn't posted to be an integer as I read it. It is a double (long float).

classhandicapper

12-29-2015, 08:19 AM

Attached is a spreadsheet that shows these if anyone is interested in seeing the math.

Thanks.

A quick glance at the results looks fairly encouraging, but I'll need a way of converting those numbers into something useful for my method.

Capper Al

12-29-2015, 09:27 AM

Recommend two books if you are into factors, Quirin's 'Winning at the Races' and Dave's 'Percentages and Probabilities 2012'. I just use the IV(impact values) as weights.

mikesal57

12-29-2015, 10:50 AM

The last few months I have been messing around with a new program/Excel workbook that I named, "weighted factors". Right now I have 10 factors, each of which can be weighted, or not used at all.

It is not fully automated yet, so I don't currently have the ability to "test" tracks like we do in my Black Box program. I have been looking at individual races.

So far, I agree with CJ, the factors used, and the weighting of each is dependent on track, surface, distance, and class, at the very least. One size definitely DOES NOT fit all.

Hey Ray...

lets back up a little and talk how you "constructed" the factor itself....

I'm figuring the usual's like, early , late , class , form , speed rating...etc

How would you start?

How/What would say is the difference between a 2nd ranked horse vs a 5th rank horse?

mike

Dave Schwartz

12-29-2015, 11:34 AM

Recommend two books if you are into factors, Quirin's 'Winning at the Races' and Dave's 'Percentages and Probabilities 2012'. I just use the IV(impact values) as weights.

The IV's are the values. Raise those values to a power.

The POWER is the weight.

jasperson

12-29-2015, 01:24 PM

Attached is the explanation of a program that I use. It was based on an idea posted by Trifect Mike. I uses class,speed,early pace,form and trainer jockey ratings. It has been working well for me.

Capper Al

12-29-2015, 01:48 PM

The IV's are the values. Raise those values to a power.

The POWER is the weight.

Each IV's weight is added together in the denominator. Then any individual IV is divided by the sum of the IV's to give you a percentage value of that particular IV. It's that simple.

Example:

A: Iv 1.3
B: iv 1.4
C: iv 1.25

Total 1.3 + 1.4 + 1.25 = 3.95

A's percentage would be 1.3/3.95
B's 1.4/3.95
C's 1.25/ 3.95

That's it.

mikesal57

12-29-2015, 02:12 PM

Each IV's weight is added together in the denominator. Then any individual IV is divided by the sum of the IV's to give you a percentage value of that particular IV. It's that simple.

Example:

A: Iv 1.3
B: iv 1.4
C: iv 1.25

Total 1.3 + 1.4 + 1.25 = 3.95

A's percentage would be 1.3/3.95
B's 1.4/3.95
C's 1.25/ 3.95

That's it.

Lets add a ROI column....what would you do now?
Example:
.........................ROI
A: Iv 1.3..............98
B: iv 1.4...............91
C: iv 1.25...........1.03

Dave Schwartz

12-29-2015, 02:32 PM

Each IV's weight is added together in the denominator. Then any individual IV is divided by the sum of the IV's to give you a percentage value of that particular IV. It's that simple.

I completely understood.

Doing it your way waters down the weights.

Example:

You have

A= 1.4 46.7%
B=1.0 33.3%
C=0.6 20.0%

Works fine for a single factor. Now, create a 2nd factor with the same values.

Your approach keeps the same pcts.

classhandicapper

12-29-2015, 02:33 PM

Lets add a ROI column....what would you do now?
Example:
.........................ROI
A: Iv 1.3..............98
B: iv 1.4...............91
C: iv 1.25...........1.03

That an interesting dilemma that comes up for me sometimes.

When 2 weightings produce the same win%, but one has a higher ROI, I go with the ROI.

When 2 weightings produce the same ROI, but one has a higher win%, I go with the win%.

When 1 weighting produces a higher ROI and one produces a higher win%, I generally go with the win% on the assumption it will lead me to better value oriented decisions since I am not making "system" plays. I do keep that ROI weighting in mind though.

mikesal57

12-29-2015, 03:19 PM

That an interesting dilemma that comes up for me sometimes.

I believe its the # 1 dilemma...

Say you want to use Bris Prime as your final speed factor...

Its at the top of the impact table but it produces a roi of .86

So weighing this as the top factor puts you in a bind that's it is going to lose money for you
How do people compensate for that?

Pick 'em Charlie

12-29-2015, 03:20 PM

I completely understood.

Doing it your way waters down the weights.

Example:

You have

A= 1.4 46.7%
B=1.0 33.3%
C=0.6 20.0%

Works fine for a single factor. Now, create a 2nd factor with the same values.

Your approach keeps the same pcts.

Why should a second factor not work. A second factor would have it's own IV and be treated as such.

Pick 'em Charlie

12-29-2015, 03:24 PM

That an interesting dilemma that comes up for me sometimes.

When 2 weightings produce the same win%, but one has a higher ROI, I go with the ROI.

When 2 weightings produce the same ROI, but one has a higher win%, I go with the win%.

When 1 weighting produces a higher ROI and one produces a higher win%, I generally go with the win% on the assumption it will lead me to better value oriented decisions since I am not making "system" plays. I do keep that ROI weighting in mind though.

I'll go with the high ROI too. But I keep hit ratio analysis separate from ROI analysis. First I want to know what hits then I want to know what it pays. I imagine a mathematician could compound the formulas, but why? For analysis sake and less confusion it's better to keep them separate.

classhandicapper

12-29-2015, 04:02 PM

I believe its the # 1 dilemma...

Say you want to use Bris Prime as your final speed factor...

Its at the top of the impact table but it produces a roi of .86

So weighing this as the top factor puts you in a bind that's it is going to lose money for you
How do people compensate for that?

That's one of the reasons I've been trying to get away from commercial final time speed figures altogether. The times of races certainly matter, but final time figures generally aren't very good from an ROI perspective. Some of the angles I've found that outperform the track take do WORSE when combined with final time speed figures.

I don't have much of an issue with dropping final time figures from my turf handicapping because a lot of the time they aren't even really final time figures anyway. They are broken out projected figures done because of extremely slow paces. It's more problematical for dirt though, especially for some race types.

Pick 'em Charlie

12-29-2015, 04:27 PM

That an interesting dilemma that comes up for me sometimes.

When 2 weightings produce the same win%, but one has a higher ROI, I go with the ROI.

When 2 weightings produce the same ROI, but one has a higher win%, I go with the win%.

When 1 weighting produces a higher ROI and one produces a higher win%, I generally go with the win% on the assumption it will lead me to better value oriented decisions since I am not making "system" plays. I do keep that ROI weighting in mind though.

Actual with a split between win% and ROI, you should be able to multiple win x ROI for each factor to determine the best ROI over the long run.

Dave Schwartz

12-29-2015, 05:40 PM

A= 1.4 46.7%
B=1.0 33.3%
C=0.6 20.0%

Works fine for a single factor. Now, create a 2nd factor with the same values.

Your approach keeps the same pcts.

Why should a second factor not work. A second factor would have it's own IV and be treated as such.

Imagine they are exactly the same.

A=1.4 + 1.4 = 2.8 = 46.7% (same as before)
B=1.0 + 1.0 = 2.0 = 33.3% (same as before)
C=0.6 + 0.6 = 1.2 = 20.0% (same as before)

BCOURTNEY

12-29-2015, 07:05 PM

IV features should not be combined when they are highly collinear this will increase the standard error of the coefficients, these errors in turn mean coefficients for some variables may be found to be near zero, so by increasing the errors, collinearity will make variables statistically useless when they should not be. Ensuring you use features that are not collinear means less errors and better coefficients. This is the dilemma of trying to model non-linear problems in a linear manner. To anyone who believes that horse racing is simply linear analysis, I wish them the best of luck, and thank their continued generous contributions to the pools.

Dave Schwartz

12-29-2015, 07:25 PM

IV features should not be combined when they are highly collinear this will increase the standard error of the coefficients, these errors in turn mean coefficients for some variables may be found to be near zero, so by increasing the errors, collinearity will make variables statistically useless when they should not be. Ensuring you use features that are not collinear means less errors and better coefficients. This is the dilemma of trying to model non-linear problems in a linear manner. To anyone who believes that horse racing is simply linear analysis, I wish them the best of luck, and thank their continued generous contributions to the pools.

Everything is correlated to everything else to some degree. The question is, "How much?"

BCOURTNEY

12-29-2015, 07:47 PM

Everything is correlated to everything else to some degree. The question is, "How much?"

That is an easy question to answer. People should answer it often and retest it frequently.

Cratos

12-29-2015, 09:00 PM

IV features should not be combined when they are highly collinear this will increase the standard error of the coefficients, these errors in turn mean coefficients for some variables may be found to be near zero, so by increasing the errors, collinearity will make variables statistically useless when they should not be. Ensuring you use features that are not collinear means less errors and better coefficients. This is the dilemma of trying to model non-linear problems in a linear manner. To anyone who believes that horse racing is simply linear analysis, I wish them the best of luck, and thank their continued generous contributions to the pools.
I agree; a very good post

classhandicapper

12-30-2015, 08:43 AM

IV features should not be combined when they are highly collinear this will increase the standard error of the coefficients, these errors in turn mean coefficients for some variables may be found to be near zero, so by increasing the errors, collinearity will make variables statistically useless when they should not be. Ensuring you use features that are not collinear means less errors and better coefficients. This is the dilemma of trying to model non-linear problems in a linear manner. To anyone who believes that horse racing is simply linear analysis, I wish them the best of luck, and thank their continued generous contributions to the pools.

How about this for an approach?

Rather than trying to calculate the weights in some theoretical way, create a model that best reflects what you think to be true from years of handicapping experience and then tweak it until it's producing the best possible results. Then keep testing it going forward.

I don't have a deep stats background, but I've been discussing advanced stats basketball research for years with guys that are about as expert as you you can get at crunching numbers. A couple have gone on to work for NBA teams and others have produced peer reviewed papers. Some of the stuff I observed was enlightening. I've seen peer reviewed papers that were laughably wrong in basketball terms even though the math may have been brilliant.

Unless you have a lot of experience with the subject, doing the math correctly from a theoretical point of view does not always get you the answer you need from an application point of view. So trying to do it from a "what actually works" point of view may make more sense.

Dave Schwartz

12-30-2015, 11:01 AM

I don't have a deep stats background, but I've been discussing advanced stats basketball research for years with guys that are about as expert as you you can get at crunching numbers. A couple have gone on to work for NBA teams and others have produced peer reviewed papers. Some of the stuff I observed was enlightening. I've seen peer reviewed papers that were laughably wrong in basketball terms even though the math may have been brilliant.

Classy,

If you like basketball stats, you will LOVE this Ayasdi Analysis (https://www.youtube.com/watch?v=i-49Nniwuiw).

Dave

Pick 'em Charlie

12-30-2015, 11:04 AM

I have a few fourth best factors that have a much better ROI than the top factor for the same element.

classhandicapper

12-30-2015, 12:21 PM

Classy,

If you like basketball stats, you will LOVE this Ayasdi Analysis (https://www.youtube.com/watch?v=i-49Nniwuiw).

Dave

Thanks.

I bet basketball for 4 years and netted a few dollars, but I felt like my edge was diminishing quickly. I was using the work of the basketball stats guys I was talking to for a few years. They weren't interested in gambling at all. They were looking for jobs in the NBA. So I took the data and ran with it. But as they started publishing and creating web sites to promote their work, some of the angles I had started getting discussed widely.

I was importing data into a spreadsheet that was automatically generating an odds line other than "today's injuries" and any specific matchup issues I might deem significant. I would make manual adjustments for those things based on other data. I'd find 1 or 2 games a day to play in the beginning.

It eventually got to the point where my lines were almost always within a point of the actual betting lines. When they weren't there would often be a late injury announcement that the bookmakers apparently knew about before it was made public. So it got really hard to pull the trigger.

I scrapped the project this year under the assumption that I had no edge anymore. I have 2 angles that "may" still be profitable, but they almost never come up. I made 5 bets this year so far and won 3.

Dave Schwartz

12-30-2015, 01:08 PM

That video is part of the methodology I am moving towards in our horse racing endeavors.

In case nobody watch the video, the idea (of that video) is to group NBA players together by statistical skill set into "positions," and then study the positions to see which players in a given position produce the greatest results.

("Statistical Skill Set" means that a guy that blocks a lot of shots and gets a lot of rebounds is 'forward/center-like" while a guy that gets a lot of assists and steals is more likely to be classified as a "guard-type."

The video indicates that there are 13 positions in basketball.

Here are a couple more from Ayasdi that helps get the point across:
1. Quick Intro (under 3 minutes) (https://www.youtube.com/watch?v=XfWibrh6stw)
2. English Premiere League (short) (https://www.youtube.com/watch?v=jofwWlH4qZk)
3. English Premiere League (long version) (https://www.youtube.com/watch?v=WrHOLkLr_hA)

The application to horse racing would be to quantify horses in a database together for "likeness to each other." That is, build groups (subsets) of horses with relative similarities so that a group might be studied.

Of course, I am not interested in a graphical picture. Rather, I am interested in the groupings.

If anyone is interested (as a programmer) about how one goes about coding something like this, I would be happy to explain it. It is not THAT difficult to understand, nor to code.

The difficulty lies in which (and how many) factors are used for the fields. The English Premiere League example uses (I believe - from memory) 23 factors. Even then, with 23 factors and only a couple hundred players, the search takes a minute or two (as I recall).

Imagine what that would look like with (say) a year of entries (200,000) and 200 factors.

BTW, in the soccer example, there is one player who stands completely on his own. That is, he is so different (from a statistical standpoint) that he is his own group. Is it a coincidence that he is the highest-paid player by far?

JJMartin

12-30-2015, 01:58 PM

IV features should not be combined when they are highly collinear this will increase the standard error of the coefficients, these errors in turn mean coefficients for some variables may be found to be near zero, so by increasing the errors, collinearity will make variables statistically useless when they should not be. Ensuring you use features that are not collinear means less errors and better coefficients. This is the dilemma of trying to model non-linear problems in a linear manner. To anyone who believes that horse racing is simply linear analysis, I wish them the best of luck, and thank their continued generous contributions to the pools.
I see your point, good observation. It is better to find and implement distinctly uncorrelated factors if possible.

raybo

12-30-2015, 03:39 PM

I see your point, good observation. It is better to find and implement distinctly uncorrelated factors if possible.

Maybe a list of all "uncorrelated" factors would be interesting (and would probably somewhat controversial to some, or maybe several). I suspect the list would be quite small.

Cratos

12-30-2015, 04:28 PM

How about this for an approach?

Rather than trying to calculate the weights in some theoretical way, create a model that best reflects what you think to be true from years of handicapping experience and then tweak it until it's producing the best possible results. Then keep testing it going forward.

I don't have a deep stats background, but I've been discussing advanced stats basketball research for years with guys that are about as expert as you you can get at crunching numbers. A couple have gone on to work for NBA teams and others have produced peer reviewed papers. Some of the stuff I observed was enlightening. I've seen peer reviewed papers that were laughably wrong in basketball terms even though the math may have been brilliant.

Unless you have a lot of experience with the subject, doing the math correctly from a theoretical point of view does not always get you the answer you need from an application point of view. So trying to do it from a "what actually works" point of view may make more sense.

From your post I think you would want applied math and not theoretical math; and the easiest way to think of it is that theoretical math is math done for its own sake, while applied math is math with a practical use.

Applied math tries to model, predict and explain things in the real world: for example, one area of applied mathematics relating to horse race handicapping is the kinetic force of the horse, which analyses how the kinetic force of the horse is affected the environmental resistance forces.

Cratos

12-30-2015, 04:40 PM

I see your point, good observation. It is better to find and implement distinctly uncorrelated factors if possible.
And it can be added that if two variables are uncorrelated, there is no linear relationship between them.

Cratos

12-30-2015, 04:50 PM

That video is part of the methodology I am moving towards in our horse racing endeavors.

In case nobody watch the video, the idea (of that video) is to group NBA players together by statistical skill set into "positions," and then study the positions to see which players in a given position produce the greatest results.

("Statistical Skill Set" means that a guy that blocks a lot of shots and gets a lot of rebounds is 'forward/center-like" while a guy that gets a lot of assists and steals is more likely to be classified as a "guard-type."

The video indicates that there are 13 positions in basketball.

Here are a couple more from Ayasdi that helps get the point across:
1. Quick Intro (under 3 minutes) (https://www.youtube.com/watch?v=XfWibrh6stw)
2. English Premiere League (short) (https://www.youtube.com/watch?v=jofwWlH4qZk)
3. English Premiere League (long version) (https://www.youtube.com/watch?v=WrHOLkLr_hA)

The application to horse racing would be to quantify horses in a database together for "likeness to each other." That is, build groups (subsets) of horses with relative similarities so that a group might be studied.

Of course, I am not interested in a graphical picture. Rather, I am interested in the groupings.

If anyone is interested (as a programmer) about how one goes about coding something like this, I would be happy to explain it. It is not THAT difficult to understand, nor to code.

The difficulty lies in which (and how many) factors are used for the fields. The English Premiere League example uses (I believe - from memory) 23 factors. Even then, with 23 factors and only a couple hundred players, the search takes a minute or two (as I recall).

Imagine what that would look like with (say) a year of entries (200,000) and 200 factors.

BTW, in the soccer example, there is one player who stands completely on his own. That is, he is so different (from a statistical standpoint) that he is his own group. Is it a coincidence that he is the highest-paid player by far?
Dave, have you given thought to constructing a Bayesian predictive model?

classhandicapper

12-30-2015, 04:53 PM

From your post I think you would want applied math and not theoretical math; and the easiest way to think of it is that theoretical math is math done for its own sake, while applied math is math with a practical use.

Applied math tries to model, predict and explain things in the real world: for example, one area of applied mathematics relating to horse race handicapping is the kinetic force of the horse, which analyses how the kinetic force of the horse is affected the environmental resistance forces.

I'm not sure of the correct term for what I am doing now, but let's say I think both a final time speed figure and my class rating matter.

I want to know the correct way to weight them to optimize the results.

I start with what I think is the correct weighting based on years of trial and error experience and run it against my database. Then I start tweaking the weighting in each direction to see what is happening to the win%, ITM%, ROI, and average finish position at different values. I keep moving it around until have the optimal result. Then I continue testing those values going forward. After a few months I do it all again and keep going forward.

I also have my original intuitive values locked in so I can keep testing that because you never know if the changes you are making are actually correct or just working on this set of past data.

It's relatively simple with 2 factors. When it gets to 3 or more it gets very complex to do it this way, but I don't have the software or knowledge to do more formal regression analysis. So I operate intuitively. My goal is not to create a system anyway. It's to learn. I'm still going to dig through the PPs, charts and replays manually the way I do now.

Cratos

12-30-2015, 05:20 PM

I'm not sure of the correct term for what I am doing now, but let's say I think both a final time speed figure and my class rating matter.

I want to know the correct way to weight them to optimize the results.

I start with what I think is the correct weighting based on years of trial and error experience and run it against my database. Then I start tweaking the weighting in each direction to see what is happening to the win%, ITM%, ROI, and average finish position at different values. I keep moving it around until have the optimal result. Then I continue testing those values going forward. After a few months I do it all again and keep going forward.

I also have my original intuitive values locked in so I can keep testing that because you never know if the changes you are making are actually correct or just working on this set of past data.

It's relatively simple with 2 factors. When it gets to 3 or more it gets very complex to do it this way, but I don't have the software or knowledge to do more formal regression analysis. So I operate intuitively. My goal is not to create a system anyway. It's to learn. I'm still going to dig through the PPs, charts and replays manually the way I do now.
This is not about terminology, but expected results and it is not clear to me what your expectation is.

Therefore I am not sure if you want to do a multivariate data analysis or a multivariate regression. A multivariate data analysis refers to any statistical technique used to analyze data that arises from more than one variable and a multivariate regression is a technique that estimates a single regression model with more than one outcome variable.

Dave Schwartz

12-30-2015, 05:21 PM

Dave, have you given thought to constructing a Bayesian predictive model?

I have done so. It is a strong approach but not unique enough IMHO.

Looking to do something that is very difficult for someone to duplicate.

Capper Al

12-30-2015, 05:55 PM

A problem with racing data is that it is pretty much all confounded variables. Class and speed correlate, for example. A better horse usually has a better trainer and/or jockey, etc.

Cratos

12-30-2015, 06:22 PM

I have done so. It is a strong approach but not unique enough IMHO.

Looking to do something that is very difficult for someone to duplicate.
I understand and the world is your “oyster” so to speak with the parametric or nonparametric, data mining algorithm or statistical models, etc out there for you to experiment with.

I am assuming that you are building a predictive model, but a descriptive model might be an option. This type of modeling is aimed at summarizing or representing the data structure in a compact manner.

Also the reliance on casual theory is absent or minimized in the descriptive model and the focus is at the measurable level rather than at the construct level.

Dave Schwartz

12-30-2015, 06:43 PM

Cratos,

What I am looking to do is complete a project I have been working on for over a year: unique (to the user) past performances.

The idea is that you fetch from the database "entries like this one" instead of the past 10 races for this horse.

There are just so many problems with looking at the last 10 races, beyond the obvious, which is that everyone else has them.

So, the goal is to quantify every horse into groups based upon "likeness." The race shape itself is part of the likeness.

Truthfully, I have no interest in a hard-wired set of groups. Rather, I am looking for something that will create a unique (to the user) "score" for each entry and then go to the database and fetch the (say) 200 entries most like this one to build a set of past performance lines.

Don't think "pacelines" as much as building models of lengths behind, odds, etc. to predict today's performance.

The Ayasdi example represents what the entries would look like topographically, while I am more interested in grabbing the entire group to study one-the-fly.

BCOURTNEY

12-30-2015, 06:51 PM

Cratos,

What I am looking to do is complete a project I have been working on for over a year: unique (to the user) past performances.

The idea is that you fetch from the database "entries like this one" instead of the past 10 races for this horse.

There are just so many problems with looking at the last 10 races, beyond the obvious, which is that everyone else has them.

So, the goal is to quantify every horse into groups based upon "likeness." The race shape itself is part of the likeness.

Truthfully, I have no interest in a hard-wired set of groups. Rather, I am looking for something that will create a unique (to the user) "score" for each entry and then go to the database and fetch the (say) 200 entries most like this one to build a set of past performance lines.

Don't think "pacelines" as much as building models of lengths behind, odds, etc. to predict today's performance.

The Ayasdi example represents what the entries would look like topographically, while I am more interested in grabbing the entire group to study one-the-fly.

I think I'm confused, why not use TDA and cluster then?

BCOURTNEY

12-30-2015, 07:05 PM

The difficulty lies in which (and how many) factors are used for the fields. The English Premiere League example uses (I believe - from memory) 23 factors. Even then, with 23 factors and only a couple hundred players, the search takes a minute or two (as I recall).

Imagine what that would look like with (say) a year of entries (200,000) and 200 factors.

I'm not trying to be funny, but I can't understand the difficulty with 200,000 rows with 200 columns of data for analysis or query in real time, can you expound?

Dave Schwartz

12-30-2015, 11:58 PM

I think I'm confused, why not use TDA and cluster then?

I guess you did not watch the video. That was what I was suggesting.

I'm not trying to be funny, but I can't understand the difficulty with 200,000 rows with 200 columns of data for analysis or query in real time, can you expound?

LOL - Do you have about a month?

BCOURTNEY

12-31-2015, 01:49 AM

I guess you did not watch the video. That was what I was suggesting.

LOL - Do you have about a month?

Ah. well I ran some TDA clustering on my data, I have attached the results. (yes, this IS a real TDA cluster on speed and pace figures)

JJMartin

12-31-2015, 02:45 AM

Ah. well I ran some TDA clustering on my data, I have attached the results. (yes, this IS a real TDA cluster on speed and pace figures)
makes sense since it's green.

classhandicapper

12-31-2015, 08:15 AM

This is not about terminology, but expected results and it is not clear to me what your expectation is.

My expectation is to analyze all the factors and approaches that are part of my usual thinking about the "class" of a horse and to test them alone and in combination until I find what performs best. The idea would be to refine my own current thinking based the results of actual tests against a large database instead of these things coming from intuition and trial and error experience.

I've already made progress using the approach I outlined, which is simply to tweak formulas and run them against the database over and over until I get the best answer. But some kind of formal regression test would probably be quicker and more accurate if I had the background.

It's unlikely that anyone will be able to duplicate what I have exactly because some of it is original and the core of it comes from 40 years of handicapping experience.

Cratos

12-31-2015, 10:19 AM

Ah. well I ran some TDA clustering on my data, I have attached the results. (yes, this IS a real TDA cluster on speed and pace figures)
I wasn't aware of TDA until I read your post and subsequently read an overview about it and I found it to be fascinating with the possibility of being used in our model.

Thanks for posting such interesting idea for a handicapping application.

Cratos

12-31-2015, 10:51 AM

My expectation is to analyze all the factors and approaches that are part of my usual thinking about the "class" of a horse and to test them alone and in combination until I find what performs best. The idea would be to refine my own current thinking based the results of actual tests against a large database instead of these things coming from intuition and trial and error experience.

I've already made progress using the approach I outlined, which is simply to tweak formulas and run them against the database over and over until I get the best answer. But some kind of formal regression test would probably be quicker and more accurate if I had the background.

It's unlikely that anyone will be able to duplicate what I have exactly because some of it is original and the core of it comes from 40 years of handicapping experience.
I now understand what you are attempting to achieve, but I am doubtful about your success because you are using a "trial and error" method which lacks controls and consistency; and that will make outcome repeatibility very difficult.

JJMartin

12-31-2015, 01:40 PM

Ah. well I ran some TDA clustering on my data, I have attached the results. (yes, this IS a real TDA cluster on speed and pace figures)
Is TDA similar to a neural net?

JJMartin

12-31-2015, 01:41 PM

My expectation is to analyze all the factors and approaches that are part of my usual thinking about the "class" of a horse and to test them alone and in combination until I find what performs best. The idea would be to refine my own current thinking based the results of actual tests against a large database instead of these things coming from intuition and trial and error experience.

I've already made progress using the approach I outlined, which is simply to tweak formulas and run them against the database over and over until I get the best answer. But some kind of formal regression test would probably be quicker and more accurate if I had the background.

It's unlikely that anyone will be able to duplicate what I have exactly because some of it is original and the core of it comes from 40 years of handicapping experience.
Beware of the back-fitting effect.

classhandicapper

12-31-2015, 02:05 PM

Beware of the back-fitting effect.

I'm tying to mitigate that risk the best I can.

I have a parameter file that allows me to enter all the values and weights for each factor and then do a quick run against the database.

When I started this project I entered a series of values for every input based on my years of experience as a handicapper and the way I actually think about these things in live betting now. I tested it against the database at that time. Those values are locked in and can be retested any time I want as I add races to the database. I frequently do that.

When I discover that changing one of the values is producing a better result, I lock that change in and continue testing.

So at any given time as the database grows I can track every stage of changes I have made. I can look at everything going forward from the date of a change or at the entire database.

The original values are producing a small flat bet profit. The changes I have made were for turf racing and were minor. Those changes tweaked the win% and ROI slightly higher going backwards and now forward. I have them locked in, but the results are so similar for a range of values on one factor I'm still not sure what the optimal value is.

This is without any trip analysis, distance preference analysis, trainer, jockey, race setup analysis etc... This is just a horse rating.

DeltaLover

12-31-2015, 02:25 PM

I'm tying to mitigate that risk the best I can.

I have a parameter file that allows me to enter all the values and weights for each factor and then do a quick run against the database.

When I started this project I entered a series of values for every input based on my years of experience as a handicapper and the way I actually think about these things in live betting now. I tested it against the database at that time. Those values are locked in and can be retested any time I want as I add races to the database. I frequently do that.

When I discover that changing one of the values is producing a better result, I lock that change in and continue testing.

So at any given time as the database grows I can track every stage of changes I have made. I can look at everything going forward from the date of a change or at the entire database.

The original values are producing a small flat bet profit. The changes I have made were for turf racing and were minor. Those changes tweaked the win% and ROI slightly higher going backwards and now forward. I have them locked in, but the results are so similar for a range of values on one factor I'm still not sure what the optimal value is.

This is without any trip analysis, distance preference analysis, trainer, jockey, race setup analysis etc... This is just a horse rating.

It could be better if you automate the weight assignment and build an algorithm to select the optimal values. I have been doing something like this for many years..

STILL this procedure is not solving the back-fitting effect. Gimmick angles of the type " less than two weeks out with a single workout while stretching out for time switching to a top jock" (LOL) that sometimes appear on paper to be a gold mine, are nothing else than over fitted illusions and nothing else (same applies to a large degree to most of the trainer stats as well)... The game is way more complicated than this...

classhandicapper

12-31-2015, 03:21 PM

It could be better if you automate the weight assignment and build an algorithm to select the optimal values. I have been doing something like this for many years..

STILL this procedure is not solving the back-fitting effect. Gimmick angles of the type " less than two weeks out with a single workout while stretching out for time switching to a top jock" (LOL) that sometimes appear on paper to be a gold mine, are nothing else than over fitted illusions and nothing else (same applies to a large degree to most of the trainer stats as well)... The game is way more complicated than this...

These are not angles.

These are horse ratings based on the handicapping techniques I was already using. They were programmed in from day 1 before I could do any tests at all on my database.

The idea behind tweaking is to try to improve on what I am already doing.

Lets say it's July 1st and I test January through June with my original method and it produces 28% winners and 2.05 ROI. I make a tweak and the win% goes up to 30% and the ROI to 2.10 for that same period.

That may or may not mean something.

At the end of December I test that tweak from July through December to see what the results were and compare them to July through December the old way. If the new way sustained it's advantage going forward from the date of the change, I have a lot of reason to think it's the superior way of doing it. If not it is rejected.

raybo

12-31-2015, 04:42 PM

These are not angles.

These are horse ratings based on the handicapping techniques I was already using. They were programmed in from day 1 before I could do any tests at all on my database.

The idea behind tweaking is to try to improve on what I am already doing.

Lets say it's July 1st and I test January through June with my original method and it produces 28% winners and 2.05 ROI. I make a tweak and the win% goes up to 30% and the ROI to 2.10 for that same period.

That may or may not mean something.

At the end of December I test that tweak from July through December to see what the results were and compare them to July through December the old way. If the new way sustained it's advantage going forward from the date of the change, I have a lot of reason to think it's the superior way of doing it. If not it is rejected.

Here's my thinking about testing. If it is December 31, 2015, today, and I want to get an idea of how a track will play starting January 1, 2016 and continuing through March 31, 2016 (for a track that runs during that time period), I would go back and import cards and results files, from that track, for Jan 1 through Mar 31, 2013 and run them through my program, then repeat for 2014, and 2015 (testing forwardly in effect). My program automatically records those results, for whatever method(s) I am using. From those 3 years of results I have a pretty good idea of what I can expect for the time period from Jan 1, 2016 through Mar 31, 2016.

If each year's results are similar, but produced a poor hit rate/$2 ROI/$2 flat bet profit I would then tweak a single factor and run the whole 3 year test again, and check the results stats again. I would continue this process until I either have exhausted all the possible tweaks and found no significant change on the positive side, or I find a set of tweaks that produces relatively consistent positive stats, in which case I would use that positive stat method for my play in the first quarter of 2016.

Due to the app I use for testing (Excel), this kind of testing could take 8 to 24 hours (or more) of testing, depending on what the initial tests produced (if they were significantly negative, for example, or significantly positive, or somewhere in between). If I were just testing my "class" ratings for example, the testing would take much less time, my program is currently set up to test 11 rankings methods at one time, which takes longer to run than a single attribute like "class".

The nature of my record keeping method, during testing, is that I can visually see the results for every race run, on all the cards, as well as the combined results/stats for all those races, which allows me to look at dates, distances, surfaces, classes, surface conditions, etc., from which I can filter by each of those factors, removing types of races that are net losses, leaving only those race types that are net profitable (this filtering/deleting is a manual process, requiring me to manually delete all net loss race types from the record keeping sheet). The remaining profitable races are the only race types I will play in the same upcoming time period as the tested time periods, for that track. My program also allows me to record each new race played (using "manual record" mode), updating the current live stats as each race is played, so I can readily see current play stats and compare them with all the previously tested time periods, for significant changes in the "environment of the track", in near real time.

ReplayRandall

12-31-2015, 04:57 PM

Here's my thinking about testing. If it is December 31, 2015, today, and I want to get an idea of how a track will play starting January 1, 2016 and continuing through March 31, 2016 (for a track that runs during that time period), I would go back and import cards and results files, from that track, for Jan 1 through Mar 31, 2013 and run them through my program, then repeat for 2014, and 2015 (testing forwardly in effect). My program automatically records those results, for whatever method(s) I am using. From those 3 years of results I have a pretty good idea of what I can expect for the time period from Jan 1, 2016 through Mar 31, 2016.

If each year's results are similar, but produced a poor hit rate/$2 ROI/$2 flat bet profit I would then tweak a single factor and run the whole 3 year test again, and check the results stats again. I would continue this process until I either have exhausted all the possible tweaks and found no significant change on the positive side, or I find a set of tweaks that produces relatively consistent positive stats, in which case I would use that positive stat method for my play in the first quarter of 2016.

Due to the app I use for testing (Excel), this kind of testing could take 8 to 24 hours (or more) of testing, depending on what the initial tests produced (if they were significantly negative, for example, or significantly positive, or somewhere in between). If I were just testing my "class" ratings for example, the testing would take much less time, my program is currently set up to test 11 rankings methods at one time, which takes longer to run than a single attribute like "class".

The nature of my record keeping method, during testing, is that I can visually see the results for every race run, on all the cards, as well as the combined results/stats for all those races, which allows me to look at dates, distances, surfaces, classes, surface conditions, etc., from which I can filter by each of those factors, removing types of races that are net losses, leaving only those race types that are net profitable (this filtering/deleting is a manual process, requiring me to manually delete all net loss race types from the record keeping sheet). The remaining profitable races are the only race types I will play in the same upcoming time period as the tested time periods, for that track. My program also allows me to record each new race played (using "manual record" mode), updating the current live stats as each race is played, so I can readily see current play stats and compare them with all the previously tested time periods, for significant changes in the "environment of the track", in near real time.

Overall, just an excellent post.....especially the areas I put in bold. Effectively puts to rest the data conundrum of the "back-fitting" nonsense. :ThmbUp:

raybo

12-31-2015, 05:55 PM

Overall, just an excellent post.....especially the areas I put in bold. Effectively puts to rest the data conundrum of the "back-fitting" nonsense. :ThmbUp:

"Back testing", without forward testing, is a waste of time, and can be very expensive. Time of year and time of meet testing, IMO, is extremely important also (weather conditions as well as other "environment of the track" factors), as is testing by individual track (combining tracks in the database/testing muddles the data).

whodoyoulike

12-31-2015, 07:35 PM

... The idea would be to refine my own current thinking based the results of actual tests against a large database instead of these things coming from intuition and trial and error experience.

I've already made progress using the approach I outlined, which is simply to tweak formulas and run them against the database over and over until I get the best answer. But some kind of formal regression test would probably be quicker and more accurate if I had the background.

It's unlikely that anyone will be able to duplicate what I have exactly because some of it is original and the core of it comes from 40 years of handicapping experience.

How do you know what's the best answer and how many iterations is really good enough?

Capper Al

01-01-2016, 08:57 AM

Here's the problem. The racing data isn't linear. A race that best represents a horses' talent in today's race might be his last, his second race back, or his tenth race back. Averaging and summing may not paint an accurate picture either. Software needs to be able to read through the PPs somehow to develop a scenario for today's race.

Magister Ludi

01-01-2016, 09:48 AM

...if you torture them long enough, they'll eventually tell you anything that you want to hear.

classhandicapper

01-01-2016, 11:55 AM

Here's the problem. The racing data isn't linear. A race that best represents a horses' talent in today's race might be his last, his second race back, or his tenth race back. Averaging and summing may not paint an accurate picture either. Software needs to be able to read through the PPs somehow to develop a scenario for today's race.

That's why I've generally been negative on mathematical models.

Like I said earlier in the thread, I've seen peer reviewed papers on basketball done by brilliant mathematicians that were laughably wrong in basketball terms because they didn't/couldn't account for some of the nuances of the game.

The idea for me was to recreate my existing thinking (primarily on class) as closely as possible and be able to test it against a database in an objective way. The advantage being that as a horse player, you typically continue learning by trial and error over years of play. With a database, you can test a new concept or change in a few minutes by writing a new query or adjusting a factor and running it against the existing data again. When you find something promising, you can then test it going forward. It's an automation of the learning process.

Capper Al

01-01-2016, 12:25 PM

That's why I've generally been negative on mathematical models.

Like I said earlier in the thread, I've seen peer reviewed papers on basketball done by brilliant mathematicians that were laughably wrong in basketball terms because they didn't/couldn't account for some of the nuances of the game.

The idea for me was to recreate my existing thinking (primarily on class) as closely as possible and be able to test it against a database in an objective way. The advantage being that as a horse player, you typically continue learning by trial and error over years of play. With a database, you can test a new concept or change in a few minutes by writing a new query or adjusting a factor and running it against the existing data again. When you find something promising, you can then test it going forward. It's an automation of the learning process.

Trial and Error, learning, and moving forward is the way I work also. It is informally a scientific approach. Variables are isolated in the best way we can. And testing forward proves the hypothesis.

Dave Schwartz has done a lot of work in aggregating PPs in his book P&P 2012. My groupings(before Dave's) were maiden, claiming, allowance, graded, and lightly raced horses. My original groupings worked well also. But I believe for speed Dave's work better.

Robert Fischer

01-01-2016, 12:53 PM

The same factors may have different weights when used in different models. :confused:

...

For example, late speed may have a greater weight in a 'Pace Projector' model that happens to project an extreme pace duel for today's race, than it would in a 'power ranking' model.

Capper Al

01-01-2016, 01:21 PM

The same factors may have different weights when used in different models. :confused:

...

For example, late speed may have a greater weight in a 'Pace Projector' model that happens to project an extreme pace duel for today's race, than it would in a 'power ranking' model.

Absolutely. Each grouping is a different scenario.

classhandicapper

01-01-2016, 01:55 PM

Some of the results I am seeing in turf racing are blowing my mind.

There are basic factors I assumed would be fairly important for separating horses in turf racing, but when I change the weighting quite significantly in either direction it has almost no impact on the results. I get the same 25% top winners and a very similar ROI. The winning horses just change. That means some of these things don't matter much in my model unless it's extreme. That was shocking to me. Now I just have to figure out why and how to separate them better.

raybo

01-01-2016, 02:14 PM

Some of the results I am seeing in turf racing are blowing my mind.

There are basic factors I assumed would be fairly important for separating horses in turf racing, but when I change the weighting quite significantly in either direction it has almost no impact on the results. I get the same 25% top winners and a very similar ROI. The winning horses just change. That means some of these things don't matter much in my model unless it's extreme. That was shocking to me. Now I just have to figure out why and how to separate them better.

Don't you find that, in turf racing, especially non-sprint turf racing, the pace is generally slower and more horses are within striking distance longer than on dirt? if so, then the variance in turf racing would tend to be higher than in dirt racing, so those "tried and true" turf separators won't have as much importance due to the higher variance. The more the variance, the less traditional factors will have to do with the result. So, changing the weightings of those traditional factors will not produce significant changes in the results you see.

Capper Al

01-01-2016, 03:53 PM

Trial and Error, learning, and moving forward is the way I work also. It is informally a scientific approach. Variables are isolated in the best way we can. And testing forward proves the hypothesis.

Dave Schwartz has done a lot of work in aggregating PPs in his book P&P 2012. My groupings(before Dave's) were maiden, claiming, allowance, graded, and lightly raced horses. My original groupings worked well also. But I believe for speed Dave's work better.

I should of added to this that surface and distance also refined these groupings even more.

Capper Al

01-01-2016, 03:54 PM

Don't you find that, in turf racing, especially non-sprint turf racing, the pace is generally slower and more horses are within striking distance longer than on dirt? if so, then the variance in turf racing would tend to be higher than in dirt racing, so those "tried and true" turf separators won't have as much importance due to the higher variance. The more the variance, the less traditional factors will have to do with the result. So, changing the weightings of those traditional factors will not produce significant changes in the results you see.

These are studies within themselves for each grouping.

classhandicapper

01-01-2016, 07:04 PM

Don't you find that, in turf racing, especially non-sprint turf racing, the pace is generally slower and more horses are within striking distance longer than on dirt? if so, then the variance in turf racing would tend to be higher than in dirt racing, so those "tried and true" turf separators won't have as much importance due to the higher variance. The more the variance, the less traditional factors will have to do with the result. So, changing the weightings of those traditional factors will not produce significant changes in the results you see.

Yeah. The paces are slower, the finishes are tighter, and you can win from almost anywhere. So the difference between winning and finishing a few positions back can be very subtle differences in trip.

raybo

01-01-2016, 07:57 PM

Yeah. The paces are slower, the finishes are tighter, and you can win from almost anywhere. So the difference between winning and finishing a few positions back can be very subtle differences in trip.

Yeah, because of that variance, when you look at the results a few of those races, you may think that a certain factor actually determined the winners, but in reality it could have been any of a number of other factors, or no "factor" at all, just racing variance.

upthecreek

03-22-2016, 12:12 PM

There are lots of ways to do it, just depends on how significant you think it is. Personally I'd probably consider them equal but that doesn't help answer the question, so here are a few alternatives.

1. Make it more significant by doubling the horses beaten, so you'd have 56 and 62, or triple it for 59 and 68. You could use whatever factor you wanted.

2. Use the beaten lengths instead of finish position. Try 1 - beaten lengths / field size instead. So if a horse is beaten 3 lengths in a 12 horse field you get 1 - 3 / 12 = .75. If the horse is beaten 3 lengths in a 6 horse field you get 1 - 3 / 6 = .50.

3. Add field size to the percentage. So 50+6 = 56 and 50 + 12 = 62.

Just a few ideas to get you thinking, not really suggesting any of them.
What would you give a horse that wins or is beaten less than a length? A 100?

cj

03-22-2016, 12:31 PM

What would you give a horse that wins or is beaten less than a length? A 100?

It depends how close the conditions match today's race.