Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board


Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board (http://www.paceadvantage.com/forum/index.php)
-   Handicapping Software (http://www.paceadvantage.com/forum/forumdisplay.php?f=3)
-   -   Studying Past Results (http://www.paceadvantage.com/forum/showthread.php?t=139031)

traynor 06-12-2017 08:03 PM

Studying Past Results
 
A good explanation of why handicappers "studying" their clumps of races so often go astray, and wind up chasing rainbows that don't exist in the real world. "Overfitting to a specific clump of races" should not be dismissed lightly.

"To train a machine learning system, you start with a lot of training data: millions of photos, for example. You divide that data into a training set and a test set. You use the training set to "train" the system so it can identify those images correctly. Then you use the test set to see how well the training works: how good is it at labeling a different set of images? The process is essentially the same whether you're dealing with images, voices, medical records, or something else. It's essentially the same whether you're using the coolest and trendiest deep learning algorithms, or whether you're using simple linear regression.

But there's a fundamental limit to this process, pointed out in Understanding Deep Learning Requires Rethinking Generalization. If you train your system so it's 100% accurate on the training set, it will always do poorly on the test set and on any real-world data. It doesn't matter how big (or small) the training set is, or how careful you are. 100% accuracy means that you've built a system that has memorized the training set, and such a system is unlikely to indentify anything that it hasn't memorized."

https://www.oreilly.com/ideas/the-ma...wsltr_20170607

traynor 06-12-2017 09:32 PM

Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.

It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.

DeltaLover 06-12-2017 10:04 PM

Quote:

Originally Posted by traynor (Post 2184165)
Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.

It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.

The biggest challenge lies in the way your training data are presented in your earning algorithm. Of course the data transformation can also require ML so we can say that the process is recursive to some extend. The size of your training universe is proportional to the features you are going to pass as the deepness of your networks as well.

whodoyoulike 06-12-2017 10:33 PM

Quote:

Originally Posted by traynor (Post 2184134)
A good explanation of why handicappers "studying" their clumps of races so often go astray, and wind up chasing rainbows that don't exist in the real world. "Overfitting to a specific clump of races" should not be dismissed lightly.

"To train a machine learning system, you start with a lot of training data: millions of photos, for example. You divide that data into a training set and a test set. You use the training set to "train" the system so it can identify those images correctly. Then you use the test set to see how well the training works: how good is it at labeling a different set of images? The process is essentially the same whether you're dealing with images, voices, medical records, or something else. It's essentially the same whether you're using the coolest and trendiest deep learning algorithms, or whether you're using simple linear regression.

But there's a fundamental limit to this process, pointed out in Understanding Deep Learning Requires Rethinking Generalization. If you train your system so it's 100% accurate on the training set, it will always do poorly on the test set and on any real-world data. It doesn't matter how big (or small) the training set is, or how careful you are. 100% accuracy means that you've built a system that has memorized the training set, and such a system is unlikely to indentify anything that it hasn't memorized."

https://www.oreilly.com/ideas/the-ma...wsltr_20170607

I have to ask you these questions.

:1: What kind of computer system (PC??) do you think most people on here own?

and

:2: What kind of computer system do you own? Is it also a PC?

Then, maybe your recent posts would make some sense to me.

traynor 06-12-2017 11:37 PM

Quote:

Originally Posted by DeltaLover (Post 2184183)
The biggest challenge lies in the way your training data are presented in your earning algorithm. Of course the data transformation can also require ML so we can say that the process is recursive to some extend. The size of your training universe is proportional to the features you are going to pass as the deepness of your networks as well.

Absolutely. If one is using standard PP data, finding stuff that everyone else misses or overlooks is almost impossible. Whatever one discovers is guaranteed to be found (or to have been found) by others.

One of the "depth" problems is that the more factors/attributes included, the more likely it is that others will be using the same factors/attributes (more or less in combination with other factors/attributes that one may or may not be using). It often seems that trying to include too many factors is a bigger problem than including too few. Fewer factors, better prices.

traynor 06-12-2017 11:44 PM

Quote:

Originally Posted by whodoyoulike (Post 2184194)
I have to ask you these questions.

:1: What kind of computer system (PC??) do you think most people on here own?

and

:2: What kind of computer system do you own? Is it also a PC?

Then, maybe your recent posts would make some sense to me.


Plain vanilla, standard laptop and desktop. Nothing spectacular. Some of the most useful data mining apps (and processes) are well-suited to pretty basic computer hardware.

It is the approach to data analysis that is as (or more) important than any gee whiz hardware or Big Data software.

barn32 06-12-2017 11:52 PM

Quote:

Originally Posted by traynor (Post 2184165)
Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.

It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.

Quote:

Originally Posted by DeltaLover (Post 2184183)
The biggest challenge lies in the way your training data are presented in your earning algorithm. Of course the data transformation can also require ML so we can say that the process is recursive to some extend. The size of your training universe is proportional to the features you are going to pass as the deepness of your networks as well.

I still think you two guys are the same person.

lamboy 06-13-2017 10:13 AM

ML is indeed difficult to apply to handicapping especially since flow and trips need to be taken into account -- however, these factors are so subjective. Take other fields where ML systems are applied and experts all say it requires SMEs to interpret the data.

At the end, imho, an ensemble method of algorithms work ok but more importantly a good visualation tool works best. After all--aren't the bris,timeform and drf pps nothing more than data dashboards?

DeltaLover 06-13-2017 10:45 AM

Quote:

Originally Posted by lamboy (Post 2184277)
ML is indeed difficult to apply to handicapping especially since flow and trips need to be taken into account -- however, these factors are so subjective. Take other fields where ML systems are applied and experts all say it requires SMEs to interpret the data.

At the end, imho, an ensemble method of algorithms work ok but more importantly a good visualation tool works best. After all--aren't the bris,timeform and drf pps nothing more than data dashboards?

The difficulty lies in the problem definition more than anything else. One of the core challenges has to do with the representation of the primitive handicapping factors along with the derived metrics and their through time and circuit behavior.

lamboy 06-13-2017 11:16 AM

Quote:

Originally Posted by DeltaLover (Post 2184291)
The difficulty lies in the problem definition more than anything else. One of the core challenges has to do with the representation of the primitive handicapping factors along with the derived metrics and their through time and circuit behavior.

i use graph theory to represent the core handicapping factors which allows me to see the relationships between different circuits and classes of horses.

DeltaLover 06-13-2017 11:24 AM

Quote:

Originally Posted by lamboy (Post 2184303)
i use graph theory to represent the core handicapping factors which allows me to see the relationships between different circuits and classes of horses.

What you say here is not very descriptive though. Questions like what you use as vertex - edge in your graph, how you calculate edge weights and how you are searching the graph can clarify your statement.

lamboy 06-13-2017 11:31 AM

Quote:

Originally Posted by DeltaLover (Post 2184307)
What you say here is not very descriptive though. Questions like what you use as vertex - edge in your graph, how you calculate edge weights and how you are searching the graph can clarify your statement.

LOL, that's why i stress building a great visualizaion tool!!

DeltaLover 06-13-2017 11:32 AM

Quote:

Originally Posted by lamboy (Post 2184315)
LOL, that's why i stress building a great visualizaion tool!!

??

lamboy 06-13-2017 11:50 AM

Quote:

Originally Posted by DeltaLover (Post 2184316)
??

i think the disconnect is you're thinking along the lines of building a blackbox?

i parse the necessary data and run it through my algos which spit it out in a gui. it's up to me (SME/handicapper) to sculpt the data. imho, handicapping is sometimes an art.

ReplayRandall 06-13-2017 01:16 PM

Quote:

Originally Posted by lamboy (Post 2184321)
i think the disconnect is you're thinking along the lines of building a blackbox?

i parse the necessary data and run it through my algos which spit it out in a gui. it's up to me (SME/handicapper) to sculpt the data. imho, handicapping is sometimes an art.

That's how I see it as well, good point Phil. BTW, congrats on your 4th place finish at the Belmont Stakes Challenge, $45K+prize money+NHC seat....:ThmbUp:


All times are GMT -4. The time now is 11:58 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 11:58 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.