Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > Handicapping Software


Reply
 
Thread Tools Rate Thread
Old 01-20-2017, 11:56 AM   #76
traynor
Registered User
 
traynor's Avatar
 
Join Date: Jan 2005
Posts: 6,626
Quote:
Originally Posted by JJMartin
I use it for everything. The program I built, assembles the pp files including results files. I can then run a single file through one of multiples models, or build a database to back test a model against a whole year of data or whatever range I choose to use. Mostly I use it for the latter. I try to automate to the absolute maximum.
What kind of data source do you use? From your description, it seems you are downloading PPs and results separately. I assume you create/generate your own track variants on the fly using the above data sources? That was a key issue in my own apps, and it took a bit of work to get right.

When you back test models, do you set filters mainly for accuracy (win%) or (possible) return (ROI)? Most lean toward the ROI side, to their disadvantage. In most cases, ROI (in smaller samples of whatever size) is derived from what are essentially anomalies. Rarely repeat going forward, yet everyone seems obsessed with the "woulda coulda shoulda" type modeling.

You might try parsing for winner attributes (ignoring ROI) and test that on future races. ROIs in the 90s (in past races) with relatively high win rates can be especially productive. My conjecture is that bettors "seeking value wagers" tend to avoid the obvious best choices for win, and those obvious best choices for win may often generate a decent profit going forward (that may be concealed or missing in the sample used to build the model).

The big advantage is that the results of a frequency in the 45% and up is much more likely to be productive than a lower frequency is likely to reproduce a paper ROI from a sample of past races.

Last edited by traynor; 01-20-2017 at 11:58 AM.
traynor is offline   Reply With Quote Reply
Old 01-20-2017, 02:02 PM   #77
JJMartin
Registered User
 
JJMartin's Avatar
 
Join Date: Jun 2011
Posts: 588
Quote:
Originally Posted by traynor
What kind of data source do you use? From your description, it seems you are downloading PPs and results separately. I assume you create/generate your own track variants on the fly using the above data sources? That was a key issue in my own apps, and it took a bit of work to get right.

When you back test models, do you set filters mainly for accuracy (win%) or (possible) return (ROI)? Most lean toward the ROI side, to their disadvantage. In most cases, ROI (in smaller samples of whatever size) is derived from what are essentially anomalies. Rarely repeat going forward, yet everyone seems obsessed with the "woulda coulda shoulda" type modeling.

You might try parsing for winner attributes (ignoring ROI) and test that on future races. ROIs in the 90s (in past races) with relatively high win rates can be especially productive. My conjecture is that bettors "seeking value wagers" tend to avoid the obvious best choices for win, and those obvious best choices for win may often generate a decent profit going forward (that may be concealed or missing in the sample used to build the model).

The big advantage is that the results of a frequency in the 45% and up is much more likely to be productive than a lower frequency is likely to reproduce a paper ROI from a sample of past races.
The PP's and results are 2 separate files. I don't use track variants or create them but I did write a program for someone who wanted to automate their own manually derived variant formula.

I have done thousands of tests over 12+ years. I look at both value and strike rate. In the beginning I was probably more focused on ROI. Any time I see an outlier I convert it to the win average of the rest of the group. The main thing I have realized is that static testing of the most common or obvious factors such as speed figures, distance, surface and class or any combination of them will generally result in a negative ROI in the long term. For example looking at a long list of past data and just filtering factor "A" with factor "B" then moving on to factor "A" with "C" and so on. Without developing some external formula or calculation that creates a new metric that is not in the raw data per se but is derived from the data (or not), I would say there is no hope of developing anything of any value (unless possibly you are extremely selective with great discipline or are very intuitive with visual cues or something like that). The novice and most handicapping software/services usually end up with a selection in the top 2 or 3 M/L or post time odds. Since this category is over bet, you end up with underlays consistently. I look at the more competitive races that are more confusing to the public and find an edge there. The general public will gravitate towards the easier or obvious choices without fail. So part of what I do is handicap the handicappers and use "outside" factors that can still be derived from the data through their theoretically (hopefully objective) implied meaning. So a big part of the battle is overcoming the majority consensus which dictates a high percentage of the finish order in the results. It is no secret that the post time odds are extremely efficient. When analyzing data, the trick is to distinguish the things that truly have a real effect from the ones that are merely illusions. The problem is that the illusions can be very convincing when looking at patterns. The hardwired human ability to detect patterns can be detrimental in this scope. I would agree about trying to increase the win rate and looking at attributes.
JJMartin is offline   Reply With Quote Reply
Old 01-20-2017, 03:38 PM   #78
traynor
Registered User
 
traynor's Avatar
 
Join Date: Jan 2005
Posts: 6,626
Quote:
Originally Posted by JJMartin
The PP's and results are 2 separate files. I don't use track variants or create them but I did write a program for someone who wanted to automate their own manually derived variant formula.

I have done thousands of tests over 12+ years. I look at both value and strike rate. In the beginning I was probably more focused on ROI. Any time I see an outlier I convert it to the win average of the rest of the group. The main thing I have realized is that static testing of the most common or obvious factors such as speed figures, distance, surface and class or any combination of them will generally result in a negative ROI in the long term. For example looking at a long list of past data and just filtering factor "A" with factor "B" then moving on to factor "A" with "C" and so on. Without developing some external formula or calculation that creates a new metric that is not in the raw data per se but is derived from the data (or not), I would say there is no hope of developing anything of any value (unless possibly you are extremely selective with great discipline or are very intuitive with visual cues or something like that). The novice and most handicapping software/services usually end up with a selection in the top 2 or 3 M/L or post time odds. Since this category is over bet, you end up with underlays consistently. I look at the more competitive races that are more confusing to the public and find an edge there. The general public will gravitate towards the easier or obvious choices without fail. So part of what I do is handicap the handicappers and use "outside" factors that can still be derived from the data through their theoretically (hopefully objective) implied meaning. So a big part of the battle is overcoming the majority consensus which dictates a high percentage of the finish order in the results. It is no secret that the post time odds are extremely efficient. When analyzing data, the trick is to distinguish the things that truly have a real effect from the ones that are merely illusions. The problem is that the illusions can be very convincing when looking at patterns. The hardwired human ability to detect patterns can be detrimental in this scope. I would agree about trying to increase the win rate and looking at attributes.
I agree with the difficulties, primarily because people tend to seek validation in the form of agreement. Every bettor wants the horse with "everything going for it" that wins frequently--but always goes off at long odds. The easy way out is to look for primaries, and diminish the reliance on (and impression with) secondaries. Not as many "sure things" but better returns.

An example would be "cheap speed"--almost always labeled AFTER the race (as an excuse for doing or not doing whatever). Many "speed handicappers" (and more than a few "pace handicappers") believe their area of specialization overcomes something that "class handicappers" consider an obvious deficiency. If speed is viewed as a primary, with secondary attributes of pace and class ignored or diminished in significance, results (and models) tend to vary significantly when compared to scenarios in which the latter are considered equivalent (or near-equivalent) in importance.

The primaries (in any given sample, regardless of size) may NOT be the same as the factors most bettors consider important. They also may exist (in equal or greater measure) in entries considered throw-outs.
traynor is offline   Reply With Quote Reply
Old 01-20-2017, 03:39 PM   #79
Cratos
Registered User
 
Join Date: Jan 2004
Location: The Big Apple
Posts: 4,252
Quote:
Originally Posted by JJMartin
The PP's and results are 2 separate files. I don't use track variants or create them but I did write a program for someone who wanted to automate their own manually derived variant formula.

I have done thousands of tests over 12+ years. I look at both value and strike rate. In the beginning I was probably more focused on ROI. Any time I see an outlier I convert it to the win average of the rest of the group. The main thing I have realized is that static testing of the most common or obvious factors such as speed figures, distance, surface and class or any combination of them will generally result in a negative ROI in the long term. For example looking at a long list of past data and just filtering factor "A" with factor "B" then moving on to factor "A" with "C" and so on. Without developing some external formula or calculation that creates a new metric that is not in the raw data per se but is derived from the data (or not), I would say there is no hope of developing anything of any value (unless possibly you are extremely selective with great discipline or are very intuitive with visual cues or something like that). The novice and most handicapping software/services usually end up with a selection in the top 2 or 3 M/L or post time odds. Since this category is over bet, you end up with underlays consistently. I look at the more competitive races that are more confusing to the public and find an edge there. The general public will gravitate towards the easier or obvious choices without fail. So part of what I do is handicap the handicappers and use "outside" factors that can still be derived from the data through their theoretically (hopefully objective) implied meaning. So a big part of the battle is overcoming the majority consensus which dictates a high percentage of the finish order in the results. It is no secret that the post time odds are extremely efficient. When analyzing data, the trick is to distinguish the things that truly have a real effect from the ones that are merely illusions. The problem is that the illusions can be very convincing when looking at patterns. The hardwired human ability to detect patterns can be detrimental in this scope. I would agree about trying to increase the win rate and looking at attributes.
I will agree with your statement with the following caveat that it can be done (and we are doing it successfully), but it takes a rigorous understanding of force and motion with the mathematical ability to apply the theory into a useful and practical wagering model with profitable results.
__________________
Independent thinking, emotional stability, and a keen understanding of both human and institutional behavior are vital to long-term investment success – My hero, Warren Edward Buffett

"Science is correct; even if you don't believe it" - Neil deGrasse Tyson
Cratos is offline   Reply With Quote Reply
Old 01-22-2017, 02:23 PM   #80
traynor
Registered User
 
traynor's Avatar
 
Join Date: Jan 2005
Posts: 6,626
Quote:
Originally Posted by Cratos
I will agree with your statement with the following caveat that it can be done (and we are doing it successfully), but it takes a rigorous understanding of force and motion with the mathematical ability to apply the theory into a useful and practical wagering model with profitable results.
I look forward to seeing how the (apparently) new spin-off project that has been mentioned recently on this thread will integrate and implement your concepts in the (apparently) new project to study the effects of pace on race outcomes. (If that is, in fact, the intent/purpose of the spin-off project.)

One of the major deficiencies of software developers is over-reliance on technology as a solution to every problem. Rather than seeking (and discovering, as you seem to have done) new concepts, new insights, and new understandings, they seem to believe that mindless number crunching of the same old same old everyone else uses will generate "better" results than everyone else is getting--if only they can design/code/build an app slick enough to do their thinking for them.
traynor is offline   Reply With Quote Reply
Old 01-26-2017, 07:41 PM   #81
BCOURTNEY
Registered User
 
Join Date: May 2008
Posts: 686
I will encourage anyone posting in this thread to PM contact details if you are interesting in participating.

Last edited by BCOURTNEY; 01-26-2017 at 07:42 PM.
BCOURTNEY is offline   Reply With Quote Reply
Old 02-02-2017, 11:42 AM   #82
traynor
Registered User
 
traynor's Avatar
 
Join Date: Jan 2005
Posts: 6,626
Quote:
Originally Posted by traynor
Anyone interested in developing/working on developing/testing/contributing (time and effort--not money) to a collaborative pace analysis software app (FREE to anyone who wants it, even to clone/copy/steal it, otherwise known as "open source") intended to develop/increase/sharpen the user's pace analysis skills?

NOT a "horse picker" app. An app (or addition to/component of/module of an existing or other app) that creates a deeper level of understanding of pace (and how it affects race outcomes) than (whatever else is available). Something that a new or novice user can "fiddle with" for a relatively short period of time and gain insights into pace analysis that many experienced/"expert"/hardcore pace analysts lack.
I think what got lost in the shuffle was that the intent was to create instructional media--not marketable software--to increase the individual handicapper's skills, rather than promote his or her reliance on yet another silly software app. "Pattern recognition" in pace analysis implies that one can look at readily available PP data for a given race and "recognize" situations in which "pace" may play a strong role in the determination of which horse wins and which horses lose.

Somehow that was transformed into a "software development project"--which is the complete opposite of the original intent (to develop/improve/enhance NON-COMPUTER pattern recognition skills). The main use for the computer is/was assumed to be to present the instructional media--NOT to replace thinking. And to clearly establish that much (if not most) of what is assumed "known" about "pace analysis" varies between misleading and wrong, when reduced to algorithms and applied to the scenarios in a large set of races. (The latter is the easy part.)

I am a very patient person. If the (apparent) spinoff software development project produces something of value--free or otherwise--I applaud their success and strongly encourage anyone interested to use/purchase/lease/whatever the resulting application(s)/product(s). Pace analysis is an area of considerable opportunity for those willing to bypass the superficial nonsense generally accepted as such. Even modest insights into the realities (as opposed to the theories) can provide a LOT of leverage.

If there is still a need for a FREE software (instructional media training) application to develop pace analysis pattern recognition skills at some later date, I'll be back.
traynor is offline   Reply With Quote Reply
Old 02-02-2017, 12:34 PM   #83
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by traynor
I think what got lost in the shuffle was that the intent was to create instructional media--not marketable software--to increase the individual handicapper's skills, rather than promote his or her reliance on yet another silly software app. "Pattern recognition" in pace analysis implies that one can look at readily available PP data for a given race and "recognize" situations in which "pace" may play a strong role in the determination of which horse wins and which horses lose.

Somehow that was transformed into a "software development project"--which is the complete opposite of the original intent (to develop/improve/enhance NON-COMPUTER pattern recognition skills). The main use for the computer is/was assumed to be to present the instructional media--NOT to replace thinking. And to clearly establish that much (if not most) of what is assumed "known" about "pace analysis" varies between misleading and wrong, when reduced to algorithms and applied to the scenarios in a large set of races. (The latter is the easy part.)

I am a very patient person. If the (apparent) spinoff software development project produces something of value--free or otherwise--I applaud their success and strongly encourage anyone interested to use/purchase/lease/whatever the resulting application(s)/product(s). Pace analysis is an area of considerable opportunity for those willing to bypass the superficial nonsense generally accepted as such. Even modest insights into the realities (as opposed to the theories) can provide a LOT of leverage.

If there is still a need for a FREE software (instructional media training) application to develop pace analysis pattern recognition skills at some later date, I'll be back.
I think that was is needed is a comprehensive solution instead of specialized software relying on black boxed data as its input.

The pattern recognition (PR) software interacts with several other layers that need to be implemented in advance; the transformation of raw data to metrics must be capable to evolve based on the results of the higher levels of processing. I believe that is a mistake to think of the PR layer as a closed system; instead a constant interaction between the higher and lower components is required and both need to be able to evolve in parallel.

Having said this, it becomes clear that for this project to succeed it must cover the whole spectrum of the necessary operations, including the raw data collection, storage to database, metrics building and finally the AI layer.

As far as distributing this kind of software I think that FreeBSD or MIT are probably the most applicable licences.
__________________
whereof one cannot speak thereof one must be silent
Ludwig Wittgenstein
DeltaLover is offline   Reply With Quote Reply
Old 02-02-2017, 12:58 PM   #84
ebcorde
Veteran
 
Join Date: Feb 2009
Posts: 1,950
years ago I took a stab at Pattern reconigtion

yeah we all use the same data pace, speed. etc.

while I worked developing Neural Nets and a few algorithms This was 15 years. I had about 8 yrs exp Neural Nets embedded in firmware so I had a clue at that.

took 1 week to get it running, about 2 months tweaking. I was a novice Horse player.

what I recall
The best way to utilize a NN is to break the problem up in little chunks. separate NN's. class, surface, etc etc. That's a ton of work. so I stopped
A large NN set will result in poor quality.

I would suggest a Data cloud algorithm of vectors. Each vector a dimension multiple dimensions. Back then I knew that was the way to go But as I was a novice back then I had no clue what metrics to utilize as the right vectors.

The idea would have been projecting the best probability of a winner by the closest proximity to a limited learn M-distance.(adjusted to track,class maybe)

and that is a lot of work too. Obviously You would need a proper set of winning races. and a lot of testing to verify your data set works



.

Last edited by ebcorde; 02-02-2017 at 01:01 PM.
ebcorde is offline   Reply With Quote Reply
Old 02-02-2017, 01:19 PM   #85
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by ebcorde
yeah we all use the same data pace, speed. etc.

while I worked developing Neural Nets and a few algorithms This was 15 years. I had about 8 yrs exp Neural Nets embedded in firmware so I had a clue at that.

took 1 week to get it running, about 2 months tweaking. I was a novice Horse player.

what I recall
The best way to utilize a NN is to break the problem up in little chunks. separate NN's. class, surface, etc etc. That's a ton of work. so I stopped
A large NN set will result in poor quality.

I would suggest a Data cloud algorithm of vectors. Each vector a dimension multiple dimensions. Back then I knew that was the way to go But as I was a novice back then I had no clue what metrics to utilize as the right vectors.

The idea would have been projecting the best probability of a winner by the closest proximity to a limited learn M-distance.(adjusted to track,class maybe)

and that is a lot of work too. Obviously You would need a proper set of winning races. and a lot of testing to verify your data set works



.
Good posting


The problems I see with NN are the following:

(1) Its input is not straight forward. For example: You have horses with 1, 2, 3, .. N past performances. How do you handle them? Do you train separate nets for each number of pps? Do you pass all the pps for all horses or every one as an individual?

(2) What your output is supposed to be? For example: You can pass an individual horse and ask for the exact speed figure of the next race, whether it will improve or decline, whether of it is a good bet or not etc.

(3) Since your data are stochastic and contain a lot of contradictions, chances are that your network will never learn beyond a simple approximation of the betting crowd.
__________________
whereof one cannot speak thereof one must be silent
Ludwig Wittgenstein
DeltaLover is offline   Reply With Quote Reply
Old 02-02-2017, 02:05 PM   #86
JJMartin
Registered User
 
JJMartin's Avatar
 
Join Date: Jun 2011
Posts: 588
Quote:
Originally Posted by DeltaLover

(3) Since your data are stochastic and contain a lot of contradictions, chances are that your network will never learn beyond a simple approximation of the betting crowd.
That seems to be about the results I got when I tested with NN's. The strongest factor in the data, in my case the speed figures, over powered the results and make the most reliant determining factor in the NN. I would suggest anyone testing with NN's to not input any highly win-correlated metrics and use the results in a secondary nature.
JJMartin is offline   Reply With Quote Reply
Old 02-02-2017, 02:09 PM   #87
JJMartin
Registered User
 
JJMartin's Avatar
 
Join Date: Jun 2011
Posts: 588
Quote:
Originally Posted by traynor
I think what got lost in the shuffle was that the intent was to create instructional media--not marketable software--to increase the individual handicapper's skills, rather than promote his or her reliance on yet another silly software app. "Pattern recognition" in pace analysis implies that one can look at readily available PP data for a given race and "recognize" situations in which "pace" may play a strong role in the determination of which horse wins and which horses lose.

Somehow that was transformed into a "software development project"--which is the complete opposite of the original intent (to develop/improve/enhance NON-COMPUTER pattern recognition skills). The main use for the computer is/was assumed to be to present the instructional media--NOT to replace thinking. And to clearly establish that much (if not most) of what is assumed "known" about "pace analysis" varies between misleading and wrong, when reduced to algorithms and applied to the scenarios in a large set of races. (The latter is the easy part.)

I am a very patient person. If the (apparent) spinoff software development project produces something of value--free or otherwise--I applaud their success and strongly encourage anyone interested to use/purchase/lease/whatever the resulting application(s)/product(s). Pace analysis is an area of considerable opportunity for those willing to bypass the superficial nonsense generally accepted as such. Even modest insights into the realities (as opposed to the theories) can provide a LOT of leverage.

If there is still a need for a FREE software (instructional media training) application to develop pace analysis pattern recognition skills at some later date, I'll be back.
This is why I think Excel would be at least a good starting point if not the most practical and best choice for this project. If you are able to explain the rules of the algorithm I could make an assessment as to the work involved in generating an output.
JJMartin is offline   Reply With Quote Reply
Old 02-02-2017, 03:21 PM   #88
Cratos
Registered User
 
Join Date: Jan 2004
Location: The Big Apple
Posts: 4,252
Quote:
Originally Posted by JJMartin
This is why I think Excel would be at least a good starting point if not the most practical and best choice for this project. If you are able to explain the rules of the algorithm I could make an assessment as to the work involved in generating an output.
I am on board as a contributor with you, BCourtney,Traynor, others; just tell me how I can contribute without violating any confidential info.
__________________
Independent thinking, emotional stability, and a keen understanding of both human and institutional behavior are vital to long-term investment success – My hero, Warren Edward Buffett

"Science is correct; even if you don't believe it" - Neil deGrasse Tyson
Cratos is offline   Reply With Quote Reply
Old 02-02-2017, 04:33 PM   #89
ebcorde
Veteran
 
Join Date: Feb 2009
Posts: 1,950
Quote:
Originally Posted by DeltaLover
Good posting


The problems I see with NN are the following:

(1) Its input is not straight forward. For example: You have horses with 1, 2, 3, .. N past performances. How do you handle them? Do you train separate nets for each number of pps? Do you pass all the pps for all horses or every one as an individual?

(2) What your output is supposed to be? For example: You can pass an individual horse and ask for the exact speed figure of the next race, whether it will improve or decline, whether of it is a good bet or not etc.

(3) Since your data are stochastic and contain a lot of contradictions, chances are that your network will never learn beyond a simple approximation of the betting crowd.
You could be right. I think it would take 2 guys and 6-8 months if work full-time just to start. a real job. it was fun.


I was a Sr. developer in Switzerland working with a team of
Swiss co-workers Pd'd. for 1 year transferring technology to the US. In US we were using area under the curve ,triangle math. Eculid stuff . "STUPID AMERICANS" they would laugh.


just guessing off top
a series of small neural FRONT END nets leading to 1 winner-loser net
the front end you can discriminate class, sex, ????. purse. track



input- a team of people to select agree on 300 ideal winning races for the winning horse. per Net

1 small neural nets 2 outputs. WINNER-LOSER

or you can break it up by running style of Horse. lead, presser closer

the middle layer "s" curve figures out the rest

????? its a real job man!!!!!!



and its best to have a second algorithm to test validity. one I used is simple Mahalanobis. M-distance perfect for horses.


I think you can skip NN and just do Mahalanobis why? because we all have our favorite metrics already, and you can do as many Mahalanobis as you want.

iI'll look into it. I feel like doing it.
ebcorde is offline   Reply With Quote Reply
Old 02-02-2017, 04:54 PM   #90
ebcorde
Veteran
 
Join Date: Feb 2009
Posts: 1,950
Mahalanobis

using matlab shows a graph. yikes all that funny looking math, its basic means standard deviation math just looks hard

https://www.mathworks.com/help/stats/mahal.html?requestedDomain=www.mathworks.com


will it help? YES it will verify the soundness of your handicapping.
it's still how good a handicapper you are. Garbage in = Garbage out. lol.

fun stuff though
ebcorde is offline   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 08:02 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.