Question for Database Dudes [Archive] - Horse Racing Forum - PaceAdvantage.Com

View Full Version : Question for Database Dudes

Handiman

12-21-2011, 12:32 AM

As I have posted before, I have never been a database guy. But now that I get JCapper files I am building a database even without intending too. So here is my question.

When or at what point does data as a rule out live it's relevancy? How far back till the data is no longer viable? Or does it always stay viable as a part of large Number theory?

Are races run 5 years ago and their results still significant and if so, what ratio of significance do they have with races and results run this past year?

I ask these questions out of complete ignorance in relation to database handicapping.

Thanks,
Handi:)

OverlayHunter

12-21-2011, 03:14 AM

I believe it depends at least in part on the nature of the data. Here are some examples:

If you are looking at something that is very sensitive to field size and 5 years ago the average field size was 10 and for some dramatic reason today it's 8, the older data is probably irrelevant.

If you are looking at something that is surface dependent or sensitive at SA, for example, the switches from dirt to poly and back to dirt again will be significant and render the old data obsolete.

If you are looking at something class related and slot money or something like Monmouth's 2010 experiment changes the "strength" of the typical race, the data may not be comparable from year to year. (In this example, let's say that MTH got stronger fields, it's also possible that some other tracks simultaneously got statistically weaker fields.)

Winter meets at the same track might have some factors that are not comparable from year to year because of abnormally bad weather in one or more of the years.

Contrary to the above, perhaps whether or not some or all of the above was true at a particular track, maybe a particular trainer angle would stay steady and unaffected.

There is also the certainty that various factors will just randomly be significantly stronger some years and significantly weaker other years.

Handiman

12-21-2011, 07:21 AM

So given your response, how on earth can anyone really trust any database?

Handi:)

sjk

12-21-2011, 07:27 AM

I wrote a program 15 years ago and at the time I thought it would need to change over time to stay ahead of my competition. That has not turned out to be the case. It is still very effective for me and I would not ever try to improve it because of the likelihood of making it less effective.

DJofSD

12-21-2011, 08:26 AM

There is a thread somewhere on this board where Dave breaks down different queries and how much data needed in your data base to get statistically significant results. As I recall, it is a surprising number -- huge.

pondman

12-21-2011, 09:04 AM

When or at what point does data as a rule out live it's relevancy? How far back till the data is no longer viable? Or does it always stay viable as a part of large Number theory?:)

Until a macro event occurs... Changes in the supply of horses. Changes in the number of days of racing. Changes in the number of tracks in your region. Changes in the availability of experienced jockeys.

windoor

12-21-2011, 09:36 AM

I asked this very same question earlier in the year.

I have come to the conclusion that it depends on the type of data.

Time of year has become an important factor (one of the seven) for me as it relates to the age of the horse, the track surface and shippers moving in and out for certain tracks.

My HDW data is only for the year 2011, but I have PTD files that go back to 1996.

Trainer and Jock stats are pretty much useless as is anything that has to do with human performance. Human nature is another thing entirely and you can see some consistent patterns here as to when and where a horse is placed in a race that results in a good effort.

I am thinking maybe three to five years for any track that has not changed the racing surface or the “type” of races being run.

I have nothing to back up that up, just a gut feeling for what is needed to see how a track plays over time.

Regards,

Windoor

Rigger

12-21-2011, 10:47 AM

One thing noticable with a database is the time of the winning horse in each class of race. This can also apply to maiden races. A trainer may place his horse where he thinks the horse can compete. In mdn special weight races, an unraced horse has won if the expected winning time of the raced horses does not meet the standard winning time allowing for a +5 learning capability of the raced horses taking into consideration all other factors.

OverlayHunter

12-26-2011, 09:24 AM

So given your response, how on earth can anyone really trust any database?

My response points out (i.e., attempted to alert you) that - like most things in life - what you use something for matters. It's good for some things and not so good or dangerous for other things. The user has to discern whether the findings are likely to be helpful going forward or not and whether or not the findings may be synergistic in some way or are absolutely irrelevant to what they are being consider for.

For example, except for those just introduced to thoroughbreds and handicapping, I don't believe anyone frequenting this site would take database data from races 10 furlongs and longer and apply it universally (or very much) to races 6 furlongs and shorter. Just because it's in a database, it doesn't mean that it's value to a handicapper is universal or static and unchanging.

All my post was attempting to do (perhaps poorly) was to alert you to the fact that racing isn't static and some significant changes in racing can (sometimes quickly) render database data either obsolete (and/or misleading) or less valuable.

Having said that, much data (including data that is years old) can be very valuable (if for no other reason that it may confirm or contradict more recent data) and be very valuable going forward for years.

In short, databases can provide clues to positive ROI's, to significant impact values, to changing habits of the wagering public, to high win % but low ROI situations, to helpful trainer stats, to track biases, to spot play development, to important but difficult to discern relationships, and many, many other useful (and potentially profitable) handicapping ideas. But they can't be used mindlessly and can't be accepted as the 100%, gospel, unchanging truths of handicapping.

They can be trusted for some things but not for all things and you have to analyze and confirm (as much as possible) the data and the circumstances that produced it as part of your determination of whether to trust it (and your conclusions) or not.