Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > Handicapping Software


Reply
 
Thread Tools Rate Thread
Old 06-12-2017, 08:03 PM   #1
traynor
Registered User
 
traynor's Avatar
 
Join Date: Jan 2005
Posts: 6,626
Studying Past Results

A good explanation of why handicappers "studying" their clumps of races so often go astray, and wind up chasing rainbows that don't exist in the real world. "Overfitting to a specific clump of races" should not be dismissed lightly.

"To train a machine learning system, you start with a lot of training data: millions of photos, for example. You divide that data into a training set and a test set. You use the training set to "train" the system so it can identify those images correctly. Then you use the test set to see how well the training works: how good is it at labeling a different set of images? The process is essentially the same whether you're dealing with images, voices, medical records, or something else. It's essentially the same whether you're using the coolest and trendiest deep learning algorithms, or whether you're using simple linear regression.

But there's a fundamental limit to this process, pointed out in Understanding Deep Learning Requires Rethinking Generalization. If you train your system so it's 100% accurate on the training set, it will always do poorly on the test set and on any real-world data. It doesn't matter how big (or small) the training set is, or how careful you are. 100% accuracy means that you've built a system that has memorized the training set, and such a system is unlikely to indentify anything that it hasn't memorized."

https://www.oreilly.com/ideas/the-ma...wsltr_20170607

Last edited by traynor; 06-12-2017 at 08:05 PM.
traynor is offline   Reply With Quote Reply
Old 06-12-2017, 09:32 PM   #2
traynor
Registered User
 
traynor's Avatar
 
Join Date: Jan 2005
Posts: 6,626
Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.

It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.
traynor is offline   Reply With Quote Reply
Old 06-12-2017, 10:04 PM   #3
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by traynor View Post
Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.

It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.
The biggest challenge lies in the way your training data are presented in your earning algorithm. Of course the data transformation can also require ML so we can say that the process is recursive to some extend. The size of your training universe is proportional to the features you are going to pass as the deepness of your networks as well.
__________________
whereof one cannot speak thereof one must be silent
Ludwig Wittgenstein
DeltaLover is offline   Reply With Quote Reply
Old 06-12-2017, 10:33 PM   #4
whodoyoulike
Veteran
 
Join Date: Aug 2005
Posts: 3,428
Quote:
Originally Posted by traynor View Post
A good explanation of why handicappers "studying" their clumps of races so often go astray, and wind up chasing rainbows that don't exist in the real world. "Overfitting to a specific clump of races" should not be dismissed lightly.

"To train a machine learning system, you start with a lot of training data: millions of photos, for example. You divide that data into a training set and a test set. You use the training set to "train" the system so it can identify those images correctly. Then you use the test set to see how well the training works: how good is it at labeling a different set of images? The process is essentially the same whether you're dealing with images, voices, medical records, or something else. It's essentially the same whether you're using the coolest and trendiest deep learning algorithms, or whether you're using simple linear regression.

But there's a fundamental limit to this process, pointed out in Understanding Deep Learning Requires Rethinking Generalization. If you train your system so it's 100% accurate on the training set, it will always do poorly on the test set and on any real-world data. It doesn't matter how big (or small) the training set is, or how careful you are. 100% accuracy means that you've built a system that has memorized the training set, and such a system is unlikely to indentify anything that it hasn't memorized."

https://www.oreilly.com/ideas/the-ma...wsltr_20170607
I have to ask you these questions.

What kind of computer system (PC??) do you think most people on here own?

and

What kind of computer system do you own? Is it also a PC?

Then, maybe your recent posts would make some sense to me.
whodoyoulike is offline   Reply With Quote Reply
Old 06-12-2017, 11:37 PM   #5
traynor
Registered User
 
traynor's Avatar
 
Join Date: Jan 2005
Posts: 6,626
Quote:
Originally Posted by DeltaLover View Post
The biggest challenge lies in the way your training data are presented in your earning algorithm. Of course the data transformation can also require ML so we can say that the process is recursive to some extend. The size of your training universe is proportional to the features you are going to pass as the deepness of your networks as well.
Absolutely. If one is using standard PP data, finding stuff that everyone else misses or overlooks is almost impossible. Whatever one discovers is guaranteed to be found (or to have been found) by others.

One of the "depth" problems is that the more factors/attributes included, the more likely it is that others will be using the same factors/attributes (more or less in combination with other factors/attributes that one may or may not be using). It often seems that trying to include too many factors is a bigger problem than including too few. Fewer factors, better prices.
traynor is offline   Reply With Quote Reply
Old 06-12-2017, 11:44 PM   #6
traynor
Registered User
 
traynor's Avatar
 
Join Date: Jan 2005
Posts: 6,626
Quote:
Originally Posted by whodoyoulike View Post
I have to ask you these questions.

What kind of computer system (PC??) do you think most people on here own?

and

What kind of computer system do you own? Is it also a PC?

Then, maybe your recent posts would make some sense to me.

Plain vanilla, standard laptop and desktop. Nothing spectacular. Some of the most useful data mining apps (and processes) are well-suited to pretty basic computer hardware.

It is the approach to data analysis that is as (or more) important than any gee whiz hardware or Big Data software.
traynor is offline   Reply With Quote Reply
Old 06-12-2017, 11:52 PM   #7
barn32
tmrpots
 
barn32's Avatar
 
Join Date: Jun 2008
Posts: 2,285
Quote:
Originally Posted by traynor View Post
Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.

It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.
Quote:
Originally Posted by DeltaLover View Post
The biggest challenge lies in the way your training data are presented in your earning algorithm. Of course the data transformation can also require ML so we can say that the process is recursive to some extend. The size of your training universe is proportional to the features you are going to pass as the deepness of your networks as well.
I still think you two guys are the same person.
barn32 is offline   Reply With Quote Reply
Old 06-13-2017, 10:13 AM   #8
lamboy
Registered User
 
Join Date: Oct 2012
Location: Big Apple
Posts: 52
ML is indeed difficult to apply to handicapping especially since flow and trips need to be taken into account -- however, these factors are so subjective. Take other fields where ML systems are applied and experts all say it requires SMEs to interpret the data.

At the end, imho, an ensemble method of algorithms work ok but more importantly a good visualation tool works best. After all--aren't the bris,timeform and drf pps nothing more than data dashboards?
lamboy is offline   Reply With Quote Reply
Old 06-13-2017, 10:45 AM   #9
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by lamboy View Post
ML is indeed difficult to apply to handicapping especially since flow and trips need to be taken into account -- however, these factors are so subjective. Take other fields where ML systems are applied and experts all say it requires SMEs to interpret the data.

At the end, imho, an ensemble method of algorithms work ok but more importantly a good visualation tool works best. After all--aren't the bris,timeform and drf pps nothing more than data dashboards?
The difficulty lies in the problem definition more than anything else. One of the core challenges has to do with the representation of the primitive handicapping factors along with the derived metrics and their through time and circuit behavior.
__________________
whereof one cannot speak thereof one must be silent
Ludwig Wittgenstein
DeltaLover is offline   Reply With Quote Reply
Old 06-13-2017, 11:16 AM   #10
lamboy
Registered User
 
Join Date: Oct 2012
Location: Big Apple
Posts: 52
Quote:
Originally Posted by DeltaLover View Post
The difficulty lies in the problem definition more than anything else. One of the core challenges has to do with the representation of the primitive handicapping factors along with the derived metrics and their through time and circuit behavior.
i use graph theory to represent the core handicapping factors which allows me to see the relationships between different circuits and classes of horses.
lamboy is offline   Reply With Quote Reply
Old 06-13-2017, 11:24 AM   #11
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by lamboy View Post
i use graph theory to represent the core handicapping factors which allows me to see the relationships between different circuits and classes of horses.
What you say here is not very descriptive though. Questions like what you use as vertex - edge in your graph, how you calculate edge weights and how you are searching the graph can clarify your statement.
__________________
whereof one cannot speak thereof one must be silent
Ludwig Wittgenstein
DeltaLover is offline   Reply With Quote Reply
Old 06-13-2017, 11:31 AM   #12
lamboy
Registered User
 
Join Date: Oct 2012
Location: Big Apple
Posts: 52
Quote:
Originally Posted by DeltaLover View Post
What you say here is not very descriptive though. Questions like what you use as vertex - edge in your graph, how you calculate edge weights and how you are searching the graph can clarify your statement.
LOL, that's why i stress building a great visualizaion tool!!
lamboy is offline   Reply With Quote Reply
Old 06-13-2017, 11:32 AM   #13
DeltaLover
Registered user
 
DeltaLover's Avatar
 
Join Date: Oct 2008
Location: FALIRIKON DELTA
Posts: 4,439
Quote:
Originally Posted by lamboy View Post
LOL, that's why i stress building a great visualizaion tool!!
??
__________________
whereof one cannot speak thereof one must be silent
Ludwig Wittgenstein
DeltaLover is offline   Reply With Quote Reply
Old 06-13-2017, 11:50 AM   #14
lamboy
Registered User
 
Join Date: Oct 2012
Location: Big Apple
Posts: 52
Quote:
Originally Posted by DeltaLover View Post
??
i think the disconnect is you're thinking along the lines of building a blackbox?

i parse the necessary data and run it through my algos which spit it out in a gui. it's up to me (SME/handicapper) to sculpt the data. imho, handicapping is sometimes an art.
lamboy is offline   Reply With Quote Reply
Old 06-13-2017, 01:16 PM   #15
ReplayRandall
Buckle Up
 
ReplayRandall's Avatar
 
Join Date: Apr 2014
Posts: 10,614
Quote:
Originally Posted by lamboy View Post
i think the disconnect is you're thinking along the lines of building a blackbox?

i parse the necessary data and run it through my algos which spit it out in a gui. it's up to me (SME/handicapper) to sculpt the data. imho, handicapping is sometimes an art.
That's how I see it as well, good point Phil. BTW, congrats on your 4th place finish at the Belmont Stakes Challenge, $45K+prize money+NHC seat....
ReplayRandall is offline   Reply With Quote Reply
Reply




Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 03:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.