Big Data... Thick Data - Horse Racing Forum - PaceAdvantage.Com

Dan Montilion · 07-23-2017, 05:11 PM

https://www.ted.com/talks/tricia_wan..._from_big_data

I found this to be a very informative handicapping presentation, albeit not about handicapping. I look forward to the thoughts of others. If it produces any. Hypothetically and in the simplest of terms. Bag data shows best effort if run back in 28 days. Thick data notes horse was entered on day 27 and the race did not fill but is written back on day 30 and fills.

Jeff P · 07-23-2017, 06:11 PM

Thanks for posting that. (I really enjoyed watching the presentation.)

Ok. Sticking with your days since last raced example...

Suppose, hypothetically, that big data suggests optimal returns occur (could be thousands of parimutuel tickets cashed or thousands of checks for purse money earned) when race day occurs on the 28th day after the most recent start.

It should be obvious that thick data -- if put in the right context -- has the ability to completely overrule whatever observations might have been gleaned from big data.

Big data example: You have a database and can generate large sample stats for horses returning off a 180 day layoff since their most recent start.

Thick data example: You have insight into what transpired during the layoff.

What if you are able to make the thick data observation that a specific horse was turned out to a private farm for six months? And was given steroids and worked vigorously on the half mile track there?

And shows up in the paddock today carrying muscle mass and confidence he didn't have before?

When you are able to connect the dots in a thick data way you'd be crazy to just blindly go with your big data model.

-jp

.

zerosky · 07-23-2017, 07:04 PM

Interesting lecture, I just wish they would stop using the term 'Big Data' its statistics!
I found some good insights on the following pages.
http://psychclassics.yorku.ca/topic.htm

Gamblor · 07-24-2017, 06:02 AM

So what's better? Big or thick? How about both big AND thick?

acorn54 · 07-24-2017, 07:39 AM

Quote:

Originally Posted by Jeff P

Thanks for posting that. (I really enjoyed watching the presentation.)

Ok. Sticking with your days since last raced example...

Suppose, hypothetically, that big data suggests optimal returns occur (could be thousands of parimutuel tickets cashed or thousands of checks for purse money earned) when race day occurs on the 28th day after the most recent start.

It should be obvious that thick data -- if put in the right context -- has the ability to completely overrule whatever observations might have been gleaned from big data.

Big data example: You have a database and can generate large sample stats for horses returning off a 180 day layoff since their most recent start.

Thick data example: You have insight into what transpired during the layoff.

What if you are able to make the thick data observation that a specific horse was turned out to a private farm for six months? And was given steroids and worked vigorously on the half mile track there?

And shows up in the paddock today carrying muscle mass and confidence he didn't have before?

When you are able to connect the dots in a thick data way you'd be crazy to just blindly go with your big data model.

-jp

.

i think the lecturer mentioned the fact that companies are going the way of the dodo bird, by blindly following what the big data tells them.

Jeff P · 07-24-2017, 12:31 PM

That's exactly the point I was trying to make.

If a once lofty company like Nokia can fall off the face of the map because their managers chose to ignore thick data and were utterly blind to emerging trends in their market space:

What does that say about the horseplayer who ignores thick data?

Or for that matter -- What does that say about track management and horsemen who choose to ignore thick data?

See the horse racing slowly dying in SoCal thread.

-jp

.

ReplayRandall · 07-24-2017, 01:39 PM

Quote:

Originally Posted by Jeff P

What does that say about the horseplayer who ignores thick data?

Defining what is "thick data" to the horseplayer is a subject undertaking, to say the least. For example of thick data, what are the public's betting tendencies as we get towards the middle of the card at a specific track? At the beginning, at the end? What pools are affected the most to extract value? The least? How do you gather this info from the players perspective? Is it based on whether a lot of chalk has been winning, medium prices or bombs as we go through the card or viewing yesterday's charts/replays? Or is it based on perceived biases on the dirt, turf, routes or sprints as the card progresses?.....The list is quite long and very subjective for establishing "what is good thick data", versus mediocre data.....Lastly, what percentage of "blend" do you give big data when combined with thick data for optimal results/profits?

Jeff P · 07-26-2017, 11:34 AM

Imo, valid questions -- every one of them.

But the last one is of particular interest to me:

Quote:

Originally Posted by ReplayRandall

.....Lastly, what percentage of "blend" do you give big data when combined with thick data for optimal results/profits?

You mentioned something that I think is a valid point:

Quote:

Originally Posted by ReplayRandall

.....The list is quite long and very subjective for establishing "what is good thick data", versus mediocre data.

One approach that seems to be working (for me) has been to get both big data and thick data into a data set.

And from there run a statistical analysis (mlr, tda, what have you) on the intersection of big data and thick data.

If you've made a valid thick data observation: Your stat analysis should suggest that incremental improvement can be had by adding a thick data observation to an existing big data model.

-jp

.

ReplayRandall · 07-26-2017, 12:07 PM

Quote:

Originally Posted by Jeff P

Imo, valid questions -- every one of them.

But the last one is of particular interest to me:

One approach that seems to be working (for me) has been to get both big data and thick data in a data set.

And from there run a statistical analysis on the intersection of big data and thick data.

-jp

.

I use converging/intersection points which reoccur, as there is more than just one "intersection" to my analysis....BTW, each and every track has its own unique data stats and betting mentality(thick data), thus there is NO universal format that works across all venues/circuits.....Except for one, that works in tourneys only, which is what 3.5 years of hit and miss will finally get you, but the end result was worth the time invested.

DeltaLover · 07-26-2017, 12:26 PM

I cannot see how horse racing can be approached using Big data. In contrary I think that the related data do not qualify neither by size nor by type. Using a single modern computer we can easily load million or races in memory (covering many years worth of complete data) represented in a structured format that can be processed as such.

ReplayRandall · 07-26-2017, 12:44 PM

Quote:

Originally Posted by DeltaLover

I cannot see how horse racing can be approached using Big data. In contrary I think that the related data do not qualify neither by size nor by type. Using a single modern computer we can easily load million or races in memory (covering many years worth of complete data) represented in a structured format that can be processed as such.

IMO simply stated, there is NO EDGE left in a structured formatted data process/analysis, it's been picked clean....You must go outside the box, using creative contrarian concepts to find an edge. There are exceptions, but the actual number of plays are so limited and subject to variance droughts, it's just not worth the time invested.

DeltaLover · 07-26-2017, 12:53 PM

Quote:

Originally Posted by ReplayRandall

IMO simply stated, there is NO EDGE left in a structured formatted data process/analysis, it's been picked clean....You must go outside the box, using creative contrarian concepts to find an edge. There are exceptions, but the actual number of plays are so limited and subject to variance droughts, it's just not worth the time invested.

What you are saying here is correct although I have the following questions:

- Why is not possible to create "contrarian concepts" ( I like the term!) based on the existing data? After all these are the data that dictate the formation of the pools and they must be responsible for the existence of betting inefficiencies.

- What is the source of the (potentially unstructured) data to use? Are they the product of web search (including social data like twiter of fb for example) or they require custom collection meaning dedicated on site observers?

ReplayRandall · 07-26-2017, 01:16 PM

Quote:

Originally Posted by DeltaLover

What you are saying here is correct although I have the following questions:

- Why is not possible to create "contrarian concepts" ( I like the term!) based on the existing data? After all these are the data that dictate the formation of the pools and they must be responsible for the existence of betting inefficiencies.

- What is the source of the (potentially unstructured) data to use? Are they the product of web search (including social data like twiter of fb for example) or they require custom collection meaning dedicated on site observers?

Here's an example of using big data sets at a slightly losing ROI of 93-95%. If this specific data set has consistently shown these numbers for the last 3 years, I look to see how they are doing after 100 plays. If they are severely under-performing, say at a 60% rate of return, I will have the confidence based on the data to bet these specific sets HARD, until they return close to their mean performance, like an under-valued stock that has a great balance sheet, good fundamentals, product line, but for some unknown reason has fallen out of favor with the market crowd......A contrarian concept using data which most operators throw away for lack of a +ROI, but is consistent at 93-95% as they come...$$$

DeltaLover · 07-26-2017, 01:22 PM

Quote:

Originally Posted by ReplayRandall

Here's an example of using big data sets at a slightly losing ROI of 93-95%. If this specific data set has consistently shown these numbers for the last 3 years, I look to see how they are doing after 100 plays. If they are severely under-performing, say at a 60% rate of return, I will have the confidence based on the data to bet these specific sets HARD, until they return close to their mean performance, like an under-valued stock that has a great balance sheet, good fundamentals, product line, but for some unknown reason has fallen out of favor with the market crowd......A contrarian concept using data which most operators throw away for lack of a +ROI, but is consistent at 93-95% as they come...$$$

Great! This is the way to go. Still, these approach has nothing to do with big data which is the theme of this thread. "Big data" is not (necessarily) about the absolute size of the data but about the processing methodology.

ReplayRandall · 07-26-2017, 01:26 PM

Quote:

Originally Posted by DeltaLover

Great! This is the way to go. Still, these approach has nothing to do with big data which is the theme of this thread. "Big data" is not (necessarily) about the absolute size of the data but about the processing methodology.

I know, but this subject is basically dead to me, while thick data is still alive and well, and I thought I'd just give you something interesting to chew on..