PDA

View Full Version : Studying Past Results


traynor
06-12-2017, 09:03 PM
A good explanation of why handicappers "studying" their clumps of races so often go astray, and wind up chasing rainbows that don't exist in the real world. "Overfitting to a specific clump of races" should not be dismissed lightly.

"To train a machine learning system, you start with a lot of training data: millions of photos, for example. You divide that data into a training set and a test set. You use the training set to "train" the system so it can identify those images correctly. Then you use the test set to see how well the training works: how good is it at labeling a different set of images? The process is essentially the same whether you're dealing with images, voices, medical records, or something else. It's essentially the same whether you're using the coolest and trendiest deep learning algorithms, or whether you're using simple linear regression.

But there's a fundamental limit to this process, pointed out in Understanding Deep Learning Requires Rethinking Generalization. If you train your system so it's 100% accurate on the training set, it will always do poorly on the test set and on any real-world data. It doesn't matter how big (or small) the training set is, or how careful you are. 100% accuracy means that you've built a system that has memorized the training set, and such a system is unlikely to indentify anything that it hasn't memorized."

https://www.oreilly.com/ideas/the-machine-learning-paradox?imm_mid=0f2702&cmp=em-data-na-na-newsltr_20170607

traynor
06-12-2017, 10:32 PM
Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.

It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.

DeltaLover
06-12-2017, 11:04 PM
Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.

It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.

The biggest challenge lies in the way your training data are presented in your earning algorithm. Of course the data transformation can also require ML so we can say that the process is recursive to some extend. The size of your training universe is proportional to the features you are going to pass as the deepness of your networks as well.

whodoyoulike
06-12-2017, 11:33 PM
A good explanation of why handicappers "studying" their clumps of races so often go astray, and wind up chasing rainbows that don't exist in the real world. "Overfitting to a specific clump of races" should not be dismissed lightly.

"To train a machine learning system, you start with a lot of training data: millions of photos, for example. You divide that data into a training set and a test set. You use the training set to "train" the system so it can identify those images correctly. Then you use the test set to see how well the training works: how good is it at labeling a different set of images? The process is essentially the same whether you're dealing with images, voices, medical records, or something else. It's essentially the same whether you're using the coolest and trendiest deep learning algorithms, or whether you're using simple linear regression.

But there's a fundamental limit to this process, pointed out in Understanding Deep Learning Requires Rethinking Generalization. If you train your system so it's 100% accurate on the training set, it will always do poorly on the test set and on any real-world data. It doesn't matter how big (or small) the training set is, or how careful you are. 100% accuracy means that you've built a system that has memorized the training set, and such a system is unlikely to indentify anything that it hasn't memorized."

https://www.oreilly.com/ideas/the-machine-learning-paradox?imm_mid=0f2702&cmp=em-data-na-na-newsltr_20170607

I have to ask you these questions.

:1: What kind of computer system (PC??) do you think most people on here own?

and

:2: What kind of computer system do you own? Is it also a PC?

Then, maybe your recent posts would make some sense to me.

traynor
06-13-2017, 12:37 AM
The biggest challenge lies in the way your training data are presented in your earning algorithm. Of course the data transformation can also require ML so we can say that the process is recursive to some extend. The size of your training universe is proportional to the features you are going to pass as the deepness of your networks as well.

Absolutely. If one is using standard PP data, finding stuff that everyone else misses or overlooks is almost impossible. Whatever one discovers is guaranteed to be found (or to have been found) by others.

One of the "depth" problems is that the more factors/attributes included, the more likely it is that others will be using the same factors/attributes (more or less in combination with other factors/attributes that one may or may not be using). It often seems that trying to include too many factors is a bigger problem than including too few. Fewer factors, better prices.

traynor
06-13-2017, 12:44 AM
I have to ask you these questions.

:1: What kind of computer system (PC??) do you think most people on here own?

and

:2: What kind of computer system do you own? Is it also a PC?

Then, maybe your recent posts would make some sense to me.


Plain vanilla, standard laptop and desktop. Nothing spectacular. Some of the most useful data mining apps (and processes) are well-suited to pretty basic computer hardware.

It is the approach to data analysis that is as (or more) important than any gee whiz hardware or Big Data software.

barn32
06-13-2017, 12:52 AM
Well, so what? If you are using a piece of software (yours or someone else's) that "builds models" using all the races in a specific clump, or all the races that fit a specific set of filters, it might be wise to view the output with a healthy bit of skepticism. Especially those dazzling ROIs that never quite seem to work out when you bet on the recommended patterns.

It is relatively trivial to split data into training sets (to find the patterns) and control sets (to test the patterns). A jillion races is not necessary. Even if you are building models from a few hundred races, it might be much to your advantage to split it into training sets and control sets.

The biggest challenge lies in the way your training data are presented in your earning algorithm. Of course the data transformation can also require ML so we can say that the process is recursive to some extend. The size of your training universe is proportional to the features you are going to pass as the deepness of your networks as well.I still think you two guys are the same person.

lamboy
06-13-2017, 11:13 AM
ML is indeed difficult to apply to handicapping especially since flow and trips need to be taken into account -- however, these factors are so subjective. Take other fields where ML systems are applied and experts all say it requires SMEs to interpret the data.

At the end, imho, an ensemble method of algorithms work ok but more importantly a good visualation tool works best. After all--aren't the bris,timeform and drf pps nothing more than data dashboards?

DeltaLover
06-13-2017, 11:45 AM
ML is indeed difficult to apply to handicapping especially since flow and trips need to be taken into account -- however, these factors are so subjective. Take other fields where ML systems are applied and experts all say it requires SMEs to interpret the data.

At the end, imho, an ensemble method of algorithms work ok but more importantly a good visualation tool works best. After all--aren't the bris,timeform and drf pps nothing more than data dashboards?

The difficulty lies in the problem definition more than anything else. One of the core challenges has to do with the representation of the primitive handicapping factors along with the derived metrics and their through time and circuit behavior.

lamboy
06-13-2017, 12:16 PM
The difficulty lies in the problem definition more than anything else. One of the core challenges has to do with the representation of the primitive handicapping factors along with the derived metrics and their through time and circuit behavior.

i use graph theory to represent the core handicapping factors which allows me to see the relationships between different circuits and classes of horses.

DeltaLover
06-13-2017, 12:24 PM
i use graph theory to represent the core handicapping factors which allows me to see the relationships between different circuits and classes of horses.

What you say here is not very descriptive though. Questions like what you use as vertex - edge in your graph, how you calculate edge weights and how you are searching the graph can clarify your statement.

lamboy
06-13-2017, 12:31 PM
What you say here is not very descriptive though. Questions like what you use as vertex - edge in your graph, how you calculate edge weights and how you are searching the graph can clarify your statement.

LOL, that's why i stress building a great visualizaion tool!!

DeltaLover
06-13-2017, 12:32 PM
LOL, that's why i stress building a great visualizaion tool!!

??

lamboy
06-13-2017, 12:50 PM
??

i think the disconnect is you're thinking along the lines of building a blackbox?

i parse the necessary data and run it through my algos which spit it out in a gui. it's up to me (SME/handicapper) to sculpt the data. imho, handicapping is sometimes an art.

ReplayRandall
06-13-2017, 02:16 PM
i think the disconnect is you're thinking along the lines of building a blackbox?

i parse the necessary data and run it through my algos which spit it out in a gui. it's up to me (SME/handicapper) to sculpt the data. imho, handicapping is sometimes an art.

That's how I see it as well, good point Phil. BTW, congrats on your 4th place finish at the Belmont Stakes Challenge, $45K+prize money+NHC seat....:ThmbUp:

lamboy
06-13-2017, 02:24 PM
That's how I see it as well, good point Phil. BTW, congrats on your 4th place finish at the Belmont Stakes Challenge, $45K+prize money+NHC seat....:ThmbUp:

lol, thanks Randall. Happy with the results esp on the heels of the monmouth tourney but still need to grow a bigger set of balls to compete in these high roller tourneys. Had a 10k lead going into the finale and got passed like a broken down claimer:rant:

traynor
06-13-2017, 08:03 PM
I think many of those who use computers to handicap races hope to beat the races (generically) rather than beating each specific race. The broad brush strokes derived from AI, ML, and NN tend to be useful generically, but not so useful when applied to individual races. Specifically, it is way easier to develop models that will predict X wins per howevermany races, with a probable (hopefully positive) ROI in some range, than it is to predict that Horse A will beat Horses B, C, D, E, F, and G in the next race.

Toward that end, qualitative (rather than quantitative) data (trip notes, observation, many others) is less useful because it is ambiguous and event-specific. Essential when betting individual races, not so much so when betting a (relatively) high volume of races.

Different tactics for different strategies. The trick is not to confuse the strategies (or the tactics needed to implement the selected strategy).

Partsnut
06-18-2017, 09:48 AM
Plain vanilla, standard laptop and desktop. Nothing spectacular. Some of the most useful data mining apps (and processes) are well-suited to pretty basic computer hardware.

It is the approach to data analysis that is as (or more) important than any gee whiz hardware or Big Data software.

Traynor, I totally agree with what you say.
Excellent thoughts and thinking.

I was never a proponent of data mining until recently.
I was a pace handicapper.

I use a basic and old XP laptop and have no problem using a data mining application that I find to be one the best or possibly the best value out there.
I am not a programmer. I don't have to be. all the work, support and videos are there for my disposal.
I would prefer not to mention what I use in deference an to the boards policies and advertisers.
For some, the learning curve is overwhelming and they just can't handle it.
However, for me, it works.
When I start a project that I believe in I try to make it work.
I am obsessive in that regard.

The game is about making money and not picking winners.

Just for fun and without much value let's use todays (06-18-17) Race-1
Gulfstream as an example race.
I see the (3) which will probably go off as the favorite finishing within the top 2 .......3,4,6.
Good luck and happy fathers day.:bang:

zerosky
06-18-2017, 04:27 PM
The difficulty lies in the problem definition more than anything else. One of the core challenges has to do with the representation of the primitive handicapping factors along with the derived metrics and their through time and circuit behavior.

Good point, I think it was Einstein who said that he spent 80% of his time thinking up the question and only 20% of the time to find an answer.

traynor
06-18-2017, 04:55 PM
Good point, I think it was Einstein who said that he spent 80% of his time thinking up the question and only 20% of the time to find an answer.

With all due respect to both Einstein and Pareto, the percentages may be even more extreme.

"The Pareto principle (also known as the 80/20 rule, the law of the vital few, or the principle of factor sparsity) states that, for many events, roughly 80% of the effects come from 20% of the causes. ... Pareto developed both concepts in the context of the distribution of income and wealth among the population."

A carefully formulated question not only defines the problem, but (often clearly) indicates the direction in which an answer lies. That is rarely more evident than in spurious research (a label that applies to most "studies"), where the researcher(s) are assured of "discovering" the desired results by carefully formulating the research question. The problem is so pervasive that it slips under the cognitive radar of readers as if the questions actually make sense.

traynor
06-18-2017, 05:08 PM
Traynor, I totally agree with what you say.
Excellent thoughts and thinking.

I was never a proponent of data mining until recently.
I was a pace handicapper.

I use a basic and old XP laptop and have no problem using a data mining application that I find to be one the best or possibly the best value out there.
I am not a programmer. I don't have to be. all the work, support and videos are there for my disposal.
I would prefer not to mention what I use in deference an to the boards policies and advertisers.
For some, the learning curve is overwhelming and they just can't handle it.
However, for me, it works.
When I start a project that I believe in I try to make it work.
I am obsessive in that regard.

The game is about making money and not picking winners.

Just for fun and without much value let's use todays (06-18-17) Race-1
Gulfstream as an example race.
I see the (3) which will probably go off as the favorite finishing within the top 2 .......3,4,6.
Good luck and happy fathers day.:bang:

Thank you for clearly stating points I have been trying to make for a long time. "Basic programming skills" doesn't necessarily mean learning all the syntax of a programming language to write your own software applications. It can mean something as simple as understanding what (readily available, free, very useful) existing data analysis or data mining apps are actually doing, and what steps are needed (before processing) to assure that the results are useful.

A good question for handicappers (especially pace handicappers) to ask is the most basic in business analysis, "Does this (stuff) actually mean what I/we/they think it means?" Very often the answer is "no."

DeltaLover
06-18-2017, 05:40 PM
Thank you for clearly stating points I have been trying to make for a long time. "Basic programming skills" doesn't necessarily mean learning all the syntax of a programming language to write your own software applications. It can mean something as simple as understanding what (readily available, free, very useful) existing data analysis or data mining apps are actually doing, and what steps are needed (before processing) to assure that the results are useful.

A good question for handicappers (especially pace handicappers) to ask is the most basic in business analysis, "Does this (stuff) actually mean what I/we/they think it means?" Very often the answer is "no."

Traynor, what you are describing here is far from simple and in my opinion more challenging that simply knowing how to be a programmer. ML and AI is a branch of mathematics and successfully applying them requires understanding of things like calculus, linear algebra, Bayesian stats etc which are way above the layman's level. Understanding what data-mining apps are doing is not a simple task by no means!

thaskalos
06-18-2017, 08:23 PM
Thank you for clearly stating points I have been trying to make for a long time. "Basic programming skills" doesn't necessarily mean learning all the syntax of a programming language to write your own software applications. It can mean something as simple as understanding what (readily available, free, very useful) existing data analysis or data mining apps are actually doing, and what steps are needed (before processing) to assure that the results are useful.

A good question for handicappers (especially pace handicappers) to ask is the most basic in business analysis, "Does this (stuff) actually mean what I/we/they think it means?" Very often the answer is "no."

I don't understand why this applies especially to "pace handicappers". It seems to me to be equally applicable to the entirety of LIFE.

traynor
06-18-2017, 11:23 PM
Traynor, what you are describing here is far from simple and in my opinion more challenging that simply knowing how to be a programmer. ML and AI is a branch of mathematics and successfully applying them requires understanding of things like calculus, linear algebra, Bayesian stats etc which are way above the layman's level. Understanding what data-mining apps are doing is not a simple task by no means!

It isn't necessary to know HOW ML, NN, and AI do what they do to understand WHAT they do. Most people are "mechanically incompetent" yet manage to drive various vehicles around with some degree of skill. Knowing the nuts-and-bolts details of a device (or software application) is not necessary to use it.

I agree wholeheartedly that the more one knows about the inner workings of a given process, the more competently one can utilize that process. However, for most uses involving horse racing, a basic understanding of the processes involved--especially what those processes can and cannot due--is well within the reach of most everyone.

traynor
06-18-2017, 11:26 PM
I don't understand why this applies especially to "pace handicappers". It seems to me to be equally applicable to the entirety of LIFE.

Perhaps even more than you realize.

lamboy
06-19-2017, 11:56 AM
Perhaps even more than you realize.

a good read is "Algorithms to Live By". by Christian and Griffiths

traynor
06-19-2017, 06:55 PM
a good read is "Algorithms to Live By". by Christian and Griffiths

Great recommendation! Thanks!
https://www.amazon.com/Algorithms-Live-Computer-Science-Decisions/dp/1627790365

You might also consider:


"Amazon Best Books of the Month, November 2011: Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely word pairs like "vomit and banana." System 1 and System 2, the fast and slow types of thinking, become characters that illustrate the psychology behind things we think we understand but really don't, such as intuition. Kahneman's transparent and careful treatment of his subject has the potential to change how we think, not just about thinking, but about how we live our lives. Thinking, Fast and Slow gives deep--and sometimes frightening--insight about what goes on inside our heads: the psychological basis for reactions, judgments, recognition, choices, conclusions, and much more."

“A tour de force. . . Kahneman's book is a must read for anyone interested in either human behavior or investing. He clearly shows that while we like to think of ourselves as rational in our decision making, the truth is we are subject to many biases. At least being aware of them will give you a better chance of avoiding them, or at least making fewer of them.” ―Larry Swedroe, CBS News

“Daniel Kahneman demonstrates forcefully in his new book, Thinking, Fast and Slow, how easy it is for humans to swerve away from rationality.” ―Christopher Shea, The Washington Pos

https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555/ref=pd_sim_14_4/131-7153743-2001229?_encoding=UTF8&pd_rd_i=037453

DeltaLover
06-19-2017, 07:09 PM
Talking about books, one of the best I can suggest is:

http://www.paceadvantage.com/forum/attachment.php?attachmentid=20393&stc=1&d=1497910071

https://www.amazon.com/Machine-Learning-Optimization-Perspective-Developers/dp/0128015225/ref=sr_1_1/145-5718447-7876254?ie=UTF8&qid=1497910423&sr=8-1&keywords=machine+learning+sergios


I strongly suggest it to anyone interested in ML as it covers the topic pretty extensively and well..

lamboy
06-19-2017, 07:49 PM
thanks Traynor and DL -- read Think Fast a while back, will take a look at the ML title. always welcome good titles esp since there hasn't been any good hcappin books in a very long time.

traynor
06-19-2017, 08:20 PM
Talking about books, one of the best I can suggest is:

http://www.paceadvantage.com/forum/attachment.php?attachmentid=20393&stc=1&d=1497910071

https://www.amazon.com/Machine-Learning-Optimization-Perspective-Developers/dp/0128015225/ref=sr_1_1/145-5718447-7876254?ie=UTF8&qid=1497910423&sr=8-1&keywords=machine+learning+sergios


I strongly suggest it to anyone interested in ML as it covers the topic pretty extensively and well..

Great recommendation! Thanks!

Might be a bit over the head of the pencil-and-paper handicappers, but looks like a good time investment for anyone using a computer for data analysis.

DeltaLover
06-19-2017, 08:25 PM
thanks Traynor and DL -- read Think Fast a while back, will take a look at the ML title. always welcome good titles esp since there hasn't been any good hcappin books in a very long time.

Also the Black swain is a natural companion to TFAS.