InControlX
10-31-2012, 12:47 PM
Several PA Users have posted some interesting theoretical approaches to computer-based handicapping. I think the willingness to share concepts and ideas is positive, and that we can to a certain extent help each other out without giving up secrets. Sometimes, though, I think the shared concepts are too vague to be of practical use and a more defined suggestion would be better. To that aim I will attempt to illustrate a simplified short version of a specific computer analysis technique I call "Head to Head" with which I've had good success. This technique utilizes no new age thought processes nor any artificial intelligence algorithms, but is more of a cookbook approach for a "test, verify, and validate" framework than a mysterious black box for selection.
If there is interest I will go into further detail, and yes, there is a lot more of detail.
Disclaimers: Nothing is for sale here, nor will be. No guarantees. You might pitch your laptop against the wall in frustration if you attempt this. This is not claimed to be the best method ever nor the total solution to handicapping. You don't have to join a cult or go to grad school to try it. This is a method of handicapping, not a wagering strategy.
First, several things are needed to start:
1. Two years of delimited Chart Files.
2. The same two years of home made or purchased delimited Past Performance Files.
3. Visual Basic (VB) and good familiarity with its use. VB is not hard to learn.
4. A relatively modern laptop or desktop computer with 900+ GB hard drive
5. Eight to Twelve key independent handicapping Boolean Parameters of YOUR OWN DESIGN.
6. A LOT of free time.
The method described has only been possible on commercially available laptop and desk top personal computers the past eight or so years. Prior to that, the memory and speed requirements would bog down the machines. Perhaps future equipment advances will permit expansion of this process to larger arrays.
The basic Head to Head method is formulated around the concept that unlike Blackjack, Roulette, other games of chance and pure statistics, horse racing is a contest between competing entrant horses. I'm surprised how often this fact is ignored by approaches which import analysis methods from other studies. Playing cards don't compete to get to the top of the deck, have class levels, nor are (legally) manipulated by their owners.
Let's take a quick jump past 1 through 4 above, assume they are in hand or at least accessible, and delve into number 5.
If you've been at this game for awhile I'm sure you have some favorite things to see in an entry's past performances which indicate a good performance is pending under the right conditions. The key in setting up a Head to Head run is to define eight to twelve of them in Boolean (true/false) format. I am not going to divulge the parameters I use and I don't recommend you do either. What I suggest is that you plug YOUR favorites into the Head to Head and see what you discover. What has usually happened for me is finding that I have to refine my initial parameter list and start over. Although tedious, this eventually generates a proven handicapping approach after three to five trials. There is no rule that restricts you to eliminate pre-filters. If your best filters fit one particular class, say dirt routes, just run those. Note, however, that pre-filters will cut into your sample counts and if too restrictive will cause problems later. It's all a trade-off. You will note that this method does not pick spot plays, where a specific pattern is found which yields a high success rate, but rather finds races where the head-to-head competition stacks one entrant as a standout. In Head-to-Head analysis we automatically consider the strengths and weaknesses of all the entrants, not just the spot play pick. We also find the key combinations of Boolean Parameters that are best without having to guess ahead.
The Boolean Parameters must reduce down to true/false determinations about the horse's past performances prior to race. Of course, the more predictive the parameters are the better your eventual results will be. However, if you focus too much on speed figures or obvious indicators for the majority of your parameters you will likely end up with a "morning line favorite" picker with a fine winning percentage (40%+) but a poor ROI (80% or so). Try to use your more obscure handicapping edges which are not so obvious or well known. An example set, (definitely not my best, but not bad) Boolean Parameters are:
(1) 1 Has made early position gain at this class before at Sprint Distances.
(2) 2 Has made late position gain at this class before at Sprint Distances.
(3) 4 Has made early position gain at this class before at Route Distances.
(4) 8 Has made late position gain at this class before at Route Distances.
(5) 16 Has made early position gain at greater than this class before at Sprint Distances.
(6) 32 Has made late position gain at greater than this class before at Sprint Distances.
(7) 64 Has made early position gain at greater than this class before at Route Distances.
(8) 128 Has made late position gain at greater than class before at Route Distances.
(9) 256 Has made early position gain at less than this class before at Sprint Distances.
(10) 512 Has made late position gain at less than this class before at Sprint Distances.
(11) 1024 Has made early position gain at less than this class before at Route Distances.
(12) 2048 Has made late position gain at less than class before at Route Distances.
Each of the Boolean Parameters is assigned a value of consecutive powers of "2". These are shown after the index numbers above as 1, 2, 4, 8, ...2048.
The purpose in defining the Boolean Parameters is to differentiate the entrants into a fixed number of possible Preparation Groups consisting of all combinations. Entrant's individual Preparation Groups are identified by the sum of values for TRUE answered Boolean Parameters. In this example we have twelve Boolean Parameters which will yield 2^12 = 4096 possible Preparation Groups. Note that each Boolean Parameter MUST be answerable with a clear true/false response. There can be no gray area in the question.
Now a word or two about Type Discriminators is needed. We have some good evidence that not all races favor the same preparation. Therefore, we need to sub-categorize the race types by some logical division. For now, let's choose four types of dirt sprint, dirt route, turf sprint, and turf route. The artificial surfaces are included with "dirt", or you could add two more types for them.
The next step is to turn each Boolean Parameter into a Visual Basic program filter, and one-by-one go through the first database year's charts with the VB program keeping track of each Preparation Pattern Group's record against each other. A "competitive victory" is defined as the PGrp(X) entrant finishing in at least the top three positions and the PGrp(Y) entrant finishing behind the PGrp(X). (PGrp(X) and PGrp(Y) are two Preparation Pattern Groups). The past performance files are opened to yield the data for testing the Boolean Parameters and determining each entrant's Preparation Pattern Group. In English, the result is a scorecard of each Preparation Pattern AGAINST each other Preparation Pattern over the year's database. Because we are counting head-to-head outcomes we derive many more data samples than single pattern win/loss analysis which yield only one sample per race. A single six horse field in this method provides fifteen samples, a field of nine yields twenty four. A second run tabulates the results into a head-to-head comparison array over the first database year which holds the winning ratios of PGrp(X) vs. PGrp(Y).
HTH(PGrp(X), PGrp(Y), Type) where X = 0 to 4095, and Y = 0 to 4095 and Type = 0 for dirt sprints, 1 for dirt routes, 2 for turf sprints, and 3 for turf routes.
A third program run uses the head-to-head comparison array derived above in attempt to predict race outcomes in the second year's database of charts. This essentially repeats the second program run but adds a ranking determination to sort each race by HTH matrix values to yield a Race Prediction Matrix on new data. We use the second year's charts to confirm or disprove predictability. If the first year's patterns result in a poor winning percentage and ROI as applied to the second year's data, we can hardly have confidence of success in future application to real time entries. The Race Prediction Matrix is easier to understand by example. The insert figure is the Race Prediction Matrix for Belmont Race 1 on October 26, 2012. In the matrix, the head-to-head winning percentage ratios are entered for each entrant vs. each other for the entered race type, in columns vs. rows alignment. These are the HTH array values previously determined. The matrix column sums and this value divided by the total entrants minus one yields Average Competitive Ratios for each entrant printed beneath the matrix. It is the Average Competitive Ratio that is used to rank the entries.
The Average Competitive Ratios are then used to rank the entrants predictive finishing position from first to last:
Race 1 BEL 20121026 Predictive Rank
Dirt 1 M Purse 30,000
Fillies and Mares 3 Year Olds And Up CLAIMING ( $15,000 )
1 #6 pp 6 [0.739] PGrp= 1025 Coast of Sangria M/L= 2-1
2 #2 pp 2 [0.535] PGrp= 2060 Glynisthemenace M/L= 3-1
3 #1 pp 1 [0.498] PGrp= 3 Miss Brass Bonanza M/L= 20-1
4 #7 pp 7 [0.48] PGrp= 2176 Miss Libby M/L= 12-1
5 #8 pp 8 [0.474] PGrp= 2056 Destination Moon M/L= 15-1
6 #4 pp 4 [0.457] PGrp= 1 File Gumbo M/L= 8-5
7 #3 pp 3 [0.408] PGrp= 0 Katy's Office Girl M/L= 12-1
8 #5 pp 5 [0.408] PGrp= 0 So Much Heart M/L= 30-1
This race was chosen as an example because the first place pick Average Competitive Ratio (0.739 for #6) is much greater than the second place pick's (0.535 for #2) indicating a considerable advantage (GAP = (0.739 - 0.535) = 0.204) . Although in initial test runs I include all rankings, I later subdivide the results according to the GAP to establish a practical limit. An increasing winning percentage with GAP is a good sign that you're on to something.
A few other observations:
- In the matrix, the opposing rows and column entries (i.e., 3 vs. 4 and 4 vs. 3) should add up to 1.000, because if PGrp(X) beats PGrp(Y) 0.600 or 60% of the time, PGrp(Y) must beat PGrp(X) 0.400 or 40% of the time.
- The last two rated entries in the example race have no "true" Boolean Parameters and thus are Preparation Group Zero. I use a race filter which skips any race which has a Group Zero ML of less than 5:1 and only one less than 10:1.
- The Normalized Predictive Odds includes a few other factors than just Average Competitive Ratio, too messy to include now.
- A large GAP between 2nd and 3rd picks, and 3rd and 4th picks, and so on can be tested and used for exotics in more elaborate wagering.
After test application for tens of thousands of races a good evaluation is determined on the predictive quality of the original Boolean Parameters. If the results are not good, adjust and try again. I usually have a laptop or two running continuously over parameter or filter iterations.
It's a good idea to continuously monitor the success rate of selection gap picks. I perform this on a monthly basis. It's also prudent to refine and optimize pre-filters and post-filters around a selected set of Boolean Parameters. In other words, it never ends.
ICX
If there is interest I will go into further detail, and yes, there is a lot more of detail.
Disclaimers: Nothing is for sale here, nor will be. No guarantees. You might pitch your laptop against the wall in frustration if you attempt this. This is not claimed to be the best method ever nor the total solution to handicapping. You don't have to join a cult or go to grad school to try it. This is a method of handicapping, not a wagering strategy.
First, several things are needed to start:
1. Two years of delimited Chart Files.
2. The same two years of home made or purchased delimited Past Performance Files.
3. Visual Basic (VB) and good familiarity with its use. VB is not hard to learn.
4. A relatively modern laptop or desktop computer with 900+ GB hard drive
5. Eight to Twelve key independent handicapping Boolean Parameters of YOUR OWN DESIGN.
6. A LOT of free time.
The method described has only been possible on commercially available laptop and desk top personal computers the past eight or so years. Prior to that, the memory and speed requirements would bog down the machines. Perhaps future equipment advances will permit expansion of this process to larger arrays.
The basic Head to Head method is formulated around the concept that unlike Blackjack, Roulette, other games of chance and pure statistics, horse racing is a contest between competing entrant horses. I'm surprised how often this fact is ignored by approaches which import analysis methods from other studies. Playing cards don't compete to get to the top of the deck, have class levels, nor are (legally) manipulated by their owners.
Let's take a quick jump past 1 through 4 above, assume they are in hand or at least accessible, and delve into number 5.
If you've been at this game for awhile I'm sure you have some favorite things to see in an entry's past performances which indicate a good performance is pending under the right conditions. The key in setting up a Head to Head run is to define eight to twelve of them in Boolean (true/false) format. I am not going to divulge the parameters I use and I don't recommend you do either. What I suggest is that you plug YOUR favorites into the Head to Head and see what you discover. What has usually happened for me is finding that I have to refine my initial parameter list and start over. Although tedious, this eventually generates a proven handicapping approach after three to five trials. There is no rule that restricts you to eliminate pre-filters. If your best filters fit one particular class, say dirt routes, just run those. Note, however, that pre-filters will cut into your sample counts and if too restrictive will cause problems later. It's all a trade-off. You will note that this method does not pick spot plays, where a specific pattern is found which yields a high success rate, but rather finds races where the head-to-head competition stacks one entrant as a standout. In Head-to-Head analysis we automatically consider the strengths and weaknesses of all the entrants, not just the spot play pick. We also find the key combinations of Boolean Parameters that are best without having to guess ahead.
The Boolean Parameters must reduce down to true/false determinations about the horse's past performances prior to race. Of course, the more predictive the parameters are the better your eventual results will be. However, if you focus too much on speed figures or obvious indicators for the majority of your parameters you will likely end up with a "morning line favorite" picker with a fine winning percentage (40%+) but a poor ROI (80% or so). Try to use your more obscure handicapping edges which are not so obvious or well known. An example set, (definitely not my best, but not bad) Boolean Parameters are:
(1) 1 Has made early position gain at this class before at Sprint Distances.
(2) 2 Has made late position gain at this class before at Sprint Distances.
(3) 4 Has made early position gain at this class before at Route Distances.
(4) 8 Has made late position gain at this class before at Route Distances.
(5) 16 Has made early position gain at greater than this class before at Sprint Distances.
(6) 32 Has made late position gain at greater than this class before at Sprint Distances.
(7) 64 Has made early position gain at greater than this class before at Route Distances.
(8) 128 Has made late position gain at greater than class before at Route Distances.
(9) 256 Has made early position gain at less than this class before at Sprint Distances.
(10) 512 Has made late position gain at less than this class before at Sprint Distances.
(11) 1024 Has made early position gain at less than this class before at Route Distances.
(12) 2048 Has made late position gain at less than class before at Route Distances.
Each of the Boolean Parameters is assigned a value of consecutive powers of "2". These are shown after the index numbers above as 1, 2, 4, 8, ...2048.
The purpose in defining the Boolean Parameters is to differentiate the entrants into a fixed number of possible Preparation Groups consisting of all combinations. Entrant's individual Preparation Groups are identified by the sum of values for TRUE answered Boolean Parameters. In this example we have twelve Boolean Parameters which will yield 2^12 = 4096 possible Preparation Groups. Note that each Boolean Parameter MUST be answerable with a clear true/false response. There can be no gray area in the question.
Now a word or two about Type Discriminators is needed. We have some good evidence that not all races favor the same preparation. Therefore, we need to sub-categorize the race types by some logical division. For now, let's choose four types of dirt sprint, dirt route, turf sprint, and turf route. The artificial surfaces are included with "dirt", or you could add two more types for them.
The next step is to turn each Boolean Parameter into a Visual Basic program filter, and one-by-one go through the first database year's charts with the VB program keeping track of each Preparation Pattern Group's record against each other. A "competitive victory" is defined as the PGrp(X) entrant finishing in at least the top three positions and the PGrp(Y) entrant finishing behind the PGrp(X). (PGrp(X) and PGrp(Y) are two Preparation Pattern Groups). The past performance files are opened to yield the data for testing the Boolean Parameters and determining each entrant's Preparation Pattern Group. In English, the result is a scorecard of each Preparation Pattern AGAINST each other Preparation Pattern over the year's database. Because we are counting head-to-head outcomes we derive many more data samples than single pattern win/loss analysis which yield only one sample per race. A single six horse field in this method provides fifteen samples, a field of nine yields twenty four. A second run tabulates the results into a head-to-head comparison array over the first database year which holds the winning ratios of PGrp(X) vs. PGrp(Y).
HTH(PGrp(X), PGrp(Y), Type) where X = 0 to 4095, and Y = 0 to 4095 and Type = 0 for dirt sprints, 1 for dirt routes, 2 for turf sprints, and 3 for turf routes.
A third program run uses the head-to-head comparison array derived above in attempt to predict race outcomes in the second year's database of charts. This essentially repeats the second program run but adds a ranking determination to sort each race by HTH matrix values to yield a Race Prediction Matrix on new data. We use the second year's charts to confirm or disprove predictability. If the first year's patterns result in a poor winning percentage and ROI as applied to the second year's data, we can hardly have confidence of success in future application to real time entries. The Race Prediction Matrix is easier to understand by example. The insert figure is the Race Prediction Matrix for Belmont Race 1 on October 26, 2012. In the matrix, the head-to-head winning percentage ratios are entered for each entrant vs. each other for the entered race type, in columns vs. rows alignment. These are the HTH array values previously determined. The matrix column sums and this value divided by the total entrants minus one yields Average Competitive Ratios for each entrant printed beneath the matrix. It is the Average Competitive Ratio that is used to rank the entries.
The Average Competitive Ratios are then used to rank the entrants predictive finishing position from first to last:
Race 1 BEL 20121026 Predictive Rank
Dirt 1 M Purse 30,000
Fillies and Mares 3 Year Olds And Up CLAIMING ( $15,000 )
1 #6 pp 6 [0.739] PGrp= 1025 Coast of Sangria M/L= 2-1
2 #2 pp 2 [0.535] PGrp= 2060 Glynisthemenace M/L= 3-1
3 #1 pp 1 [0.498] PGrp= 3 Miss Brass Bonanza M/L= 20-1
4 #7 pp 7 [0.48] PGrp= 2176 Miss Libby M/L= 12-1
5 #8 pp 8 [0.474] PGrp= 2056 Destination Moon M/L= 15-1
6 #4 pp 4 [0.457] PGrp= 1 File Gumbo M/L= 8-5
7 #3 pp 3 [0.408] PGrp= 0 Katy's Office Girl M/L= 12-1
8 #5 pp 5 [0.408] PGrp= 0 So Much Heart M/L= 30-1
This race was chosen as an example because the first place pick Average Competitive Ratio (0.739 for #6) is much greater than the second place pick's (0.535 for #2) indicating a considerable advantage (GAP = (0.739 - 0.535) = 0.204) . Although in initial test runs I include all rankings, I later subdivide the results according to the GAP to establish a practical limit. An increasing winning percentage with GAP is a good sign that you're on to something.
A few other observations:
- In the matrix, the opposing rows and column entries (i.e., 3 vs. 4 and 4 vs. 3) should add up to 1.000, because if PGrp(X) beats PGrp(Y) 0.600 or 60% of the time, PGrp(Y) must beat PGrp(X) 0.400 or 40% of the time.
- The last two rated entries in the example race have no "true" Boolean Parameters and thus are Preparation Group Zero. I use a race filter which skips any race which has a Group Zero ML of less than 5:1 and only one less than 10:1.
- The Normalized Predictive Odds includes a few other factors than just Average Competitive Ratio, too messy to include now.
- A large GAP between 2nd and 3rd picks, and 3rd and 4th picks, and so on can be tested and used for exotics in more elaborate wagering.
After test application for tens of thousands of races a good evaluation is determined on the predictive quality of the original Boolean Parameters. If the results are not good, adjust and try again. I usually have a laptop or two running continuously over parameter or filter iterations.
It's a good idea to continuously monitor the success rate of selection gap picks. I perform this on a monthly basis. It's also prudent to refine and optimize pre-filters and post-filters around a selected set of Boolean Parameters. In other words, it never ends.
ICX