PDA

View Full Version : Flies and Pepper


Larry Hamilton
03-28-2004, 12:01 PM
To say, "This race has never been," or "There will never be another race exactly like this one," are unarguably true, though no proof can be cited other than common sense. However, if you can also say, "There will never be another race nearly like this one," or "there will never be another race nearly like this one," is to say that handicapping of ANY sort is a waste of time.

Our handicapping goals, whether with computer or pencil, are the same--we look for patterns. We often lose sight of what measure identifies a good pattern, but we are all looking for a pattern. The difference between "exactly" and "nearly" is where we live. To use the term exactly when we really mean nearly does a disservice to the sport and shows a profound misunderstanding of statistics, probability and dynamics of a living event. Fact is, the whole comparison of exact and nearly is like picking fly shit out of pepper.

Larry Hamilton
03-28-2004, 12:28 PM
Here's a thought for some of you guys building dbs that struck me hard this morning. For one reason or another, I took the last few days off to rebuild my db. (My coding and relationships was beginning to look like spaghetti)

1. Step 1--rerun the downloads--no problem 24 fours of constant machine running later, all loaded with 2 years of data.

2. Step 2 - Connect the results sheet so that scratches can be removed before calculations.

3. Step 3 - Identify all your filters, such as average of qualified last two, etc.

'''''all acomplished'''''''''''''''

this morning gazing at the monitor and trying to get a image of how to attack the problem of indexes, it occurs to me that when the indexes are calculated on the finished db, that ALL backwards questions will be guilty of back fitting. For instance, if your db has a value in the 2f pace number field of 55 for the horse named A B C Copper, that number is only good in two instances: Today that is his average and that is our guess of his average in the future. It is not to be used in any calculation in the past.

So what? It means that the proper way to assemble a db which can be parsed by time, is to assemble it a day at a time. That means that the 2f and all other numbers to be used in calculations must be assembed and recalculated every day. In my case, as this is a two year db I assembled, 90,000 potential horses each day multiplied by two years (700 days) gives me a db that far exceeds the limits of Access. There is a way to do it--assemble by day and delete dupelicate entries. As you might guess, this is a TEDIOUS task, though necessary if you wish to determine how good your factors are.

For those of you who are buying a db, it should be essential information to know how the data was assembled and what is the time range of the data. I think you can assume that the three major players here: Dave, Nathan, and Ken have considered this. About the others--ask. When someone comes on here and actually claims to be backfitting--run.

Larry Hamilton
03-28-2004, 12:50 PM
Ok, I am sure some you are saying, "what the hell is the deal with back fitting, why is it important?"

By way of simulated example maybe I can help you understand the dangers. Let's say your db says that Bailey is 10 for 30 riding these kinds of horses. Lets also say that in your calculations you say that jocks with a .3 or greater win rate get extra points.

Now for the simulation, lets say that Baily rides 10 in a row while losing. His win rate thus far is 0. And the extra points you assign should NOT have been assinged UNTIL he reached .3. By assigning him extra points on all races before he attained .3 you are actually doctoring you own db. by giving him credit for races he has not run yet.

Hope that helps. Backfitting is a trap we can all fall into. If your results appear too good to be true, this is the usual culprit.

kitts
03-28-2004, 01:26 PM
You know, that was good stuff. I am totally incompetent in the care, feeding and use of a databse but that was good and I undertsood a lot of it. Thanks

hdcper
03-28-2004, 01:56 PM
Like always Larry, excellent post!!!

Bill

Tom
03-28-2004, 02:34 PM
Well you lost me here :D

OK, my db for 2000-2003 says Bailey hits 30% on factor X.

Day 1 2004, Baily is rinding a Factor X horse, I can't give him credit until he once again hits 30%?

sjk
03-28-2004, 03:20 PM
Larry,

I don't think the 2nd call pace figure is quite as bad as you indicate as I have done that calculation several times over the years in Access on more than 2 years of data.

I calculated the pace figure value for each race starter and date. Then for each starter and date averaged those values for race dates previous to the chosen date within a given time parameter (or not as you prefer).

When dealing with large datasets the 2 gig limit gets to be a real pain. I have been reduced to running the queries 1 or 2 at a time with the calculation in one database linked to the data in another.

I would wholeheatedly agree that with any back-testing, it is imperative that all your program and data be carefully chosen to exclude subsequent events, or at least be based on a large enough dataset that the time-frame tested cannot have a significant effect on the values tested.

When I back-test program changes, I test one date at a time. This is a slow process (takes several days and many database compacts) to back-test a year of data. However, given the constraints you have discussed I don't know any better way.

Larry Hamilton
03-28-2004, 03:41 PM
No tom, You can always PROJECT a number forward, what you can not do is look back and say on 1 jan 2000 he was a 30percenter. On 1 Jan 2000 he was what ever he calculates to be with numbers from that date and earlier, not after

Larry Hamilton
03-28-2004, 03:42 PM
skj, we are in sync, you just said it better

sjk
03-28-2004, 03:48 PM
Larry,

Nice to find someone who agrees with one of my posts. Thanks.

Tom
03-28-2004, 03:51 PM
Larry,
Thanks...gottcha.

Dave Schwartz
03-28-2004, 03:54 PM
Larry,

Great thread.

What you have allude to here about the current state of the database on a given day is very true. It tremendously complicates the issue of testing back through the database.

That is one reason why we chose to create a "dynamic" filtering system for handicapping purposes. That is, we effectively "drop a dynamic filter" on a race which creates a custom "race" filter just for that race. The race filter automatically starts looking at races the day before the race date, going backward.

A good description of a dynamic filter would be "a filter that creates filters."



Regards,
Dave Schwartz