Quote:
Originally Posted by Cratos
Statistics are just data; statistical analysis is another story and I believe that is what the poster, “Traynor” was referring too.
Statistical analysis involves both science and math; and when properly applied it is the most powerful tool known to man for reaching a probable cause about the future or why what happened in the past.
In statistical analysis it not just sample size, although that is a significant parameter; it is also the data type, the domain, and the distribution.
In the Bayesian framework which is useful for horserace handicapping, we define probabilities distributions not only over the data and world state, but also over the parameters of these distributions.
The Bernouilli, categorical, univariate normal, and multivariate normal distributions will describe the probabilities of the data and world state.
Yes, “Horse racing is a complex and fascinating game,” but no more complex than medical science, aerospace, telecommunications, or many other endeavors that require statistical analysis.
|
I think the above quote contains a lot of insight.
Speaking from personal experience here - when setting up a database - sometimes it is all too easy to end up ignoring the bolded parts of the above quote.
About three weeks ago I created a four year database consisting of Saratoga only - so that I could run stats for the current issue of the HANA Monthly.
The domain of that database differs quite a bit from that of the databases I've been using for live play the past couple of years.
The domain of my regular databases has (mostly) been a rolling 12-15 months of all tracks everywhere - the intent being to facilitate identification of new overlooked areas in the odds - and from there be one of the first to exploit them (and be one of the first to bail when it becomes clear everybody else is catching on.)
And while that approach has mostly worked for me - it does have some built in weaknesses.
For example, the way those databases are set up means that I don't get to see multi-year trends - unless I go out of my way and take the extra time to look at data from older databases.
As soon as I started thinking about the Saratoga stats I had compiled - it hit me that here I was looking at multi-year trends (something I had gotten away from in my regular databases.)
It also hit me that the way I've been using my regular databases has mostly had me ignoring one of the basics - a concept from a stat class I once had in college called a study domain.
After reading your post I made an effort to create a handful of multi-year domains - and it started paying dividends almost immediately.
You never know when something you read on this site is going to be useful.
Just wanted to say thanks,
-jp
.