Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board - View Single Post

Jeff P · 01-18-2019, 07:51 PM

Quote:

Originally Posted by Dave Schwartz

Jeff,

I love the idea of local models. However, there are some real issues I've not been able to get past.

1. There are simply too many models to develop.
It is one thing to say, "Here's an AQU model." But, logically, the slicing and dicing continues after that.

Obviously, we're not going to handicap a 4f dash for 2yr olds the same way we'd handicap a Graded Stakes on the turf at 9f.

This slicing/dicing is what really expands the system/model list.

2. There are too many models to support.
We had a user a few years ago who was managing 1,300 "systems." Each one was built for a specific track-surface-distance.

The HSH software handles which system fires based in a given race automatically. That part was easy.

The problem comes in when you have to decide when a model needs to be rebuilt. Is it because of a random down-turn that is just a normal aberration or is the model itself fundamentally flawed?

When you start tossing around system counts like 1,300, it becomes a full-time job just to manage those models; just to decide when something is really wrong.

What most of us have done...
Most of our HSH users have gone to a completely dynamic, race-by-race modelling system. That is, when you open a race, HSH queries the database for races "like this one" (based upon whatever filtering parameters you've created.

Then the software builds a system from that data, based upon the factors you've selected. (There can be static factors involved as well.)

In this way, we really only have a single system to maintain!

Of course, we can still slice and dice the results to determine where our strengths and weaknesses are, and then address them across the entire approach.

Dave

Dave,

In JCapper there's something called a Prob Expression which is basically a user defined sql expression that gets executed during number crunching.

The query results from the executed sql expression are scored based on a set of rules called a Behavior which the user defines for the individual Prob Expression.

The scored query results from the Prob Expression are transformed by the Interface into factor numeric value, gap, and rank.

Just like any other factor such as an HDW final time speed fig, early pace fig, late pace fig, or distance pedigree rating, etc., the results from Prob Expressions can be used as inputs for models.

I mostly use Prob Expressions for scoring rider, trainer, post position, or how early or late pace have been performing recently at today's track-surface-dist, etc.

You mentioned there are a LOT of models.

Of course I agree with that.

When the idea of using Prob Expressions first came to me the objective was to create an Interface that could, on its own, learn from the data -- and make decisions from there. Very much like AI.

Early on, my Prob Expressions looked something like this:

Code:

SELECT TOP 150 * FROM STARTERHISTORY 
       WHERE TRACK = 'AQU'
       AND DIST = 1320
       AND INTSURFACE = 1
       AND RANKF20 = 1
       AND [DATE] < #01-18-2019#
       ORDER BY [DATE] DESC

Basically the above sql expression tells the Interface to go back in time (prior to today's date) pull up the most recent 150 starters matching the defined parameters (AQU 6F Dirt and CPace Rank=1) and score the query results.

Of course, structuring Prob Expressions like this means you need a distinct stored sql expression for every permutation of track-surface-dist and factor you want your model to handle.

That's way too much overhead.

I eventually learned to structure them like this:

Code:

SELECT TOP 150 * FROM STARTERHISTORY
       WHERE TRACK = {TRACK} 
       AND DIST = {DIST} 
       AND INTSURFACE = {INTSURFACE} 
       AND RANKF20 = {RANKCPACE} 
       AND [DATE] < {TODAY} 
       ORDER BY [DATE] DESC

Along the way I spent lots of programming hrs writing code to make the Interface recognize and respond to the characters inside the curly brackets --

For example, if the current race is at AQU, the Interface recocnizes the curly brackets and replaces {TRACK} with 'AQU'...

Likewise, the Interface recognizes {RANKCPACE} and replaces that with the actual CPace rank of the current horse.

It took me a while, but in the end, I accomplished the objective.

The Interface now has the ability to learn from the data -- and make decisions from there.

It's similar to the concept of a generative query in AI (introduced by Geoffrey Hinton.)

Imo, it's done in a way that doesn't require the user to do too much in the way of overhead.

For example, a track-dist-surface-cpace specific Prob Expression structured like the one with the curly brackets above works no matter what the track-dist-surface-cpace of the current horse in the current race happens to be.

Of course the next step is to get the Interface make meaningful follow up queries based on observations gleaned from these initial generative queries.

I'll stop here and post more as free time allows.

-jp

.