PDA

View Full Version : Big Data... Thick Data


Dan Montilion
07-23-2017, 05:11 PM
https://www.ted.com/talks/tricia_wang_the_human_insights_missing_from_big_da ta

I found this to be a very informative handicapping presentation, albeit not about handicapping. I look forward to the thoughts of others. If it produces any. Hypothetically and in the simplest of terms. Bag data shows best effort if run back in 28 days. Thick data notes horse was entered on day 27 and the race did not fill but is written back on day 30 and fills.

Jeff P
07-23-2017, 06:11 PM
Thanks for posting that. (I really enjoyed watching the presentation.)

Ok. Sticking with your days since last raced example...

Suppose, hypothetically, that big data suggests optimal returns occur (could be thousands of parimutuel tickets cashed or thousands of checks for purse money earned) when race day occurs on the 28th day after the most recent start.

It should be obvious that thick data -- if put in the right context -- has the ability to completely overrule whatever observations might have been gleaned from big data.

Big data example: You have a database and can generate large sample stats for horses returning off a 180 day layoff since their most recent start.

Thick data example: You have insight into what transpired during the layoff.

What if you are able to make the thick data observation that a specific horse was turned out to a private farm for six months? And was given steroids and worked vigorously on the half mile track there?

And shows up in the paddock today carrying muscle mass and confidence he didn't have before?

When you are able to connect the dots in a thick data way you'd be crazy to just blindly go with your big data model.


-jp

.

zerosky
07-23-2017, 07:04 PM
Interesting lecture, I just wish they would stop using the term 'Big Data' its statistics!
I found some good insights on the following pages.
http://psychclassics.yorku.ca/topic.htm

Gamblor
07-24-2017, 06:02 AM
So what's better? Big or thick? How about both big AND thick?

acorn54
07-24-2017, 07:39 AM
Thanks for posting that. (I really enjoyed watching the presentation.)

Ok. Sticking with your days since last raced example...

Suppose, hypothetically, that big data suggests optimal returns occur (could be thousands of parimutuel tickets cashed or thousands of checks for purse money earned) when race day occurs on the 28th day after the most recent start.

It should be obvious that thick data -- if put in the right context -- has the ability to completely overrule whatever observations might have been gleaned from big data.

Big data example: You have a database and can generate large sample stats for horses returning off a 180 day layoff since their most recent start.

Thick data example: You have insight into what transpired during the layoff.

What if you are able to make the thick data observation that a specific horse was turned out to a private farm for six months? And was given steroids and worked vigorously on the half mile track there?

And shows up in the paddock today carrying muscle mass and confidence he didn't have before?

When you are able to connect the dots in a thick data way you'd be crazy to just blindly go with your big data model.


-jp

.

i think the lecturer mentioned the fact that companies are going the way of the dodo bird, by blindly following what the big data tells them.

Jeff P
07-24-2017, 12:31 PM
That's exactly the point I was trying to make.

If a once lofty company like Nokia can fall off the face of the map because their managers chose to ignore thick data and were utterly blind to emerging trends in their market space:

What does that say about the horseplayer who ignores thick data?

Or for that matter -- What does that say about track management and horsemen who choose to ignore thick data?

See the horse racing slowly dying in SoCal (http://www.paceadvantage.com/forum/showthread.php?t=139761) thread.


-jp

.

ReplayRandall
07-24-2017, 01:39 PM
What does that say about the horseplayer who ignores thick data?

Defining what is "thick data" to the horseplayer is a subject undertaking, to say the least. For example of thick data, what are the public's betting tendencies as we get towards the middle of the card at a specific track? At the beginning, at the end? What pools are affected the most to extract value? The least? How do you gather this info from the players perspective? Is it based on whether a lot of chalk has been winning, medium prices or bombs as we go through the card or viewing yesterday's charts/replays? Or is it based on perceived biases on the dirt, turf, routes or sprints as the card progresses?.....The list is quite long and very subjective for establishing "what is good thick data", versus mediocre data.....Lastly, what percentage of "blend" do you give big data when combined with thick data for optimal results/profits?

Jeff P
07-26-2017, 11:34 AM
Imo, valid questions -- every one of them.

But the last one is of particular interest to me:.....Lastly, what percentage of "blend" do you give big data when combined with thick data for optimal results/profits?

You mentioned something that I think is a valid point:.....The list is quite long and very subjective for establishing "what is good thick data", versus mediocre data.

One approach that seems to be working (for me) has been to get both big data and thick data into a data set.

And from there run a statistical analysis (mlr, tda, what have you) on the intersection of big data and thick data.

If you've made a valid thick data observation: Your stat analysis should suggest that incremental improvement can be had by adding a thick data observation to an existing big data model.


-jp

.

ReplayRandall
07-26-2017, 12:07 PM
Imo, valid questions -- every one of them.

But the last one is of particular interest to me:

One approach that seems to be working (for me) has been to get both big data and thick data in a data set.

And from there run a statistical analysis on the intersection of big data and thick data.



-jp

.

I use converging/intersection points which reoccur, as there is more than just one "intersection" to my analysis....BTW, each and every track has its own unique data stats and betting mentality(thick data), thus there is NO universal format that works across all venues/circuits.....Except for one, that works in tourneys only, which is what 3.5 years of hit and miss will finally get you, but the end result was worth the time invested.

DeltaLover
07-26-2017, 12:26 PM
I cannot see how horse racing can be approached using Big data. In contrary I think that the related data do not qualify neither by size nor by type. Using a single modern computer we can easily load million or races in memory (covering many years worth of complete data) represented in a structured format that can be processed as such.

ReplayRandall
07-26-2017, 12:44 PM
I cannot see how horse racing can be approached using Big data. In contrary I think that the related data do not qualify neither by size nor by type. Using a single modern computer we can easily load million or races in memory (covering many years worth of complete data) represented in a structured format that can be processed as such.

IMO simply stated, there is NO EDGE left in a structured formatted data process/analysis, it's been picked clean....You must go outside the box, using creative contrarian concepts to find an edge. There are exceptions, but the actual number of plays are so limited and subject to variance droughts, it's just not worth the time invested.

DeltaLover
07-26-2017, 12:53 PM
IMO simply stated, there is NO EDGE left in a structured formatted data process/analysis, it's been picked clean....You must go outside the box, using creative contrarian concepts to find an edge. There are exceptions, but the actual number of plays are so limited and subject to variance droughts, it's just not worth the time invested.

What you are saying here is correct although I have the following questions:

- Why is not possible to create "contrarian concepts" ( I like the term!) based on the existing data? After all these are the data that dictate the formation of the pools and they must be responsible for the existence of betting inefficiencies.

- What is the source of the (potentially unstructured) data to use? Are they the product of web search (including social data like twiter of fb for example) or they require custom collection meaning dedicated on site observers?

ReplayRandall
07-26-2017, 01:16 PM
What you are saying here is correct although I have the following questions:

- Why is not possible to create "contrarian concepts" ( I like the term!) based on the existing data? After all these are the data that dictate the formation of the pools and they must be responsible for the existence of betting inefficiencies.

- What is the source of the (potentially unstructured) data to use? Are they the product of web search (including social data like twiter of fb for example) or they require custom collection meaning dedicated on site observers?

Here's an example of using big data sets at a slightly losing ROI of 93-95%. If this specific data set has consistently shown these numbers for the last 3 years, I look to see how they are doing after 100 plays. If they are severely under-performing, say at a 60% rate of return, I will have the confidence based on the data to bet these specific sets HARD, until they return close to their mean performance, like an under-valued stock that has a great balance sheet, good fundamentals, product line, but for some unknown reason has fallen out of favor with the market crowd......A contrarian concept using data which most operators throw away for lack of a +ROI, but is consistent at 93-95% as they come...$$$

DeltaLover
07-26-2017, 01:22 PM
Here's an example of using big data sets at a slightly losing ROI of 93-95%. If this specific data set has consistently shown these numbers for the last 3 years, I look to see how they are doing after 100 plays. If they are severely under-performing, say at a 60% rate of return, I will have the confidence based on the data to bet these specific sets HARD, until they return close to their mean performance, like an under-valued stock that has a great balance sheet, good fundamentals, product line, but for some unknown reason has fallen out of favor with the market crowd......A contrarian concept using data which most operators throw away for lack of a +ROI, but is consistent at 93-95% as they come...$$$

Great! This is the way to go. Still, these approach has nothing to do with big data which is the theme of this thread. "Big data" is not (necessarily) about the absolute size of the data but about the processing methodology.

ReplayRandall
07-26-2017, 01:26 PM
Great! This is the way to go. Still, these approach has nothing to do with big data which is the theme of this thread. "Big data" is not (necessarily) about the absolute size of the data but about the processing methodology.

I know, but this subject is basically dead to me, while thick data is still alive and well, and I thought I'd just give you something interesting to chew on..;)

Jeff P
07-26-2017, 01:33 PM
DeltaLover, I think the context I had in mind when I used the term big data and the context you have in mind for big data are likely two very different things.

The context I had in mind when I used the term big data was along the lines of:

a. Several years of horse racing data sitting in a table.

b. Approximately 50 percent the columns in the table designed to store basic data collected by Equibase... date, track, race, saddlecloth, horsename, rider, trainer, owner, breeder, odds, finish position, parimutuel payoffs, positional calls, beaten lengths, surface, distance, track condition, class level, purse, etc.

c. Approximately 25 percent of the columns in the table designed to store derived data that was generated by the data reseller... for example a rating like Brisnet's Prime Power or HDW's PSR, early pace figs, late pace figs, class ratings, pars, rider stats, trainer stats, sire stats, workout history, data from past running lines, etc.

d. Approximately 25 percent of the columns in the table designed to store custom data points derived by the player based on statistical analysis of items a, b, and c (above.)

I've found that implementing a few of my own thick data observations into item d (above) goes a long way.

Admittedly, that's a very different thing than something like Google's DeepMind project.



-jp

.

DeltaLover
07-26-2017, 01:50 PM
DeltaLover, I think the context I had in mind when I used the term big data and the context you have in mind for big data are likely two very different things.

The context I had in mind when I used the term big data was along the lines of:

a. Several years of horse racing data sitting in a table.

b. Approximately 50 percent the columns in the table designed to store basic data collected by Equibase... date, track, race, saddlecloth, horsename, rider, trainer, owner, breeder, odds, finish position, parimutuel payoffs, positional calls, beaten lengths, surface, distance, track condition, class level, purse, etc.

c. Approximately 25 percent of the columns in the table designed to store derived data that was generated by the data reseller... for example a rating like Brisnet's Prime Power or HDW's PSR, early pace figs, late pace figs, class ratings, pars, rider stats, trainer stats, sire stats, workout history, data from past running lines, etc.

d. Approximately 25 percent of the columns in the table designed to store custom data points derived by the player based on statistical analysis of items a, b, and c (above.)

I've found that implementing a few of my own thick data observations into item d (above) goes a long way.

Admittedly, that's a very different thing than something like Google's DeepMind project.



-jp

.

Sure, I completely understand what you mean here Jeff, albeit the data you describe here are not Big Data, at least not what this term has come to mean during the last decade and this is what confused me when reading the thread ( of course this is a matter of definition that does not really matter when we agree upon what we mean by it! :))