How would you quantify being held up for clear running for inclusion in model? - Horse Racing Forum - PaceAdvantage.Com

hpollock · 09-03-2020, 12:32 PM

How would you quantify or factor this into a model

CBYRacer · 09-03-2020, 12:43 PM

Quote:

Originally Posted by hpollock

How would you quantify or factor this into a model

What does 'being held up for clear running' mean? The jockey holding back the horse?

cj · 09-03-2020, 02:33 PM

Quote:

Originally Posted by CBYRacer

What does 'being held up for clear running' mean? The jockey holding back the horse?

Sounds to me like stuck in traffic.

thaskalos · 09-03-2020, 02:42 PM

Some people post as if these are telegrams where they charge you by the letter.

CBYRacer · 09-03-2020, 02:43 PM

Quote:

Originally Posted by cj

Sounds to me like stuck in traffic.

Got it.

For model inclusion, I use Python to parse through the race comments of each horse. I have a lexicon (dictionary) of trip terms that I developed manually and classify these terms into buckets like 'Encountered traffic', 'Wide', 'Very Wide', etc. I then one-hot encode these buckets and feed those values into the model.

Curious how others handle this as well?

headhawg · 09-03-2020, 08:36 PM

IMO, if you are going to model anything that has to do with trips you better be watching the videos and taking your own notes. Parsing from charts will simply add noise and have no ROI advantage because everyone sees the same thing.

CBYRacer · 09-03-2020, 10:48 PM

Quote:

Originally Posted by headhawg

IMO, if you are going to model anything that has to do with trips you better be watching the videos and taking your own notes. Parsing from charts will simply add noise and have no ROI advantage because everyone sees the same thing.

No doubt that watching the race replay firsthand is optimal if you have the time. At the same, unless you statistically test the parsed comments, you can't conclude that there will not be an ROI advantage. The public may systematically over or underweight certain chart comments based on their own personal biases. They may THINK a certain comment means something but not KNOW that it does. Also, I inferred from headhawg's post that he wants to incorporate something into his computer model. Notes from watching replays, while useful, can't be tested in a model unless you code them somehow or use natural language processing on your notes. Again, this process would be extremely time consuming unless you hired someone to do it. Are there ways to do this efficiently?

jay68802 · 09-03-2020, 11:10 PM

One of my best days at the track came about because of comments. The comments were:

Rank, stopped.
Rated, no response

After looking at the replays, I came to the conclusion that the comments should have read.

Again????
Why is he rating this horse, put him on the lead.

Horse went wire to wire at 38-1.

I rarely make any adjustments to pace and speed figures because of comments. Only make adjustments after watching replays.

headhawg · 09-03-2020, 11:15 PM

Quote:

Originally Posted by CBYRacer

No doubt that watching the race replay firsthand is optimal if you have the time. At the same, unless you statistically test the parsed comments, you can't conclude that there will not be an ROI advantage. The public may systematically over or underweight certain chart comments based on their own personal biases. They may THINK a certain comment means something but not KNOW that it does. Also, I inferred from headhawg's post that he wants to incorporate something into his computer model. Notes from watching replays, while useful, can't be tested in a model unless you code them somehow or use natural language processing on your notes. Again, this process would be extremely time consuming unless you hired someone to do it. Are there ways to do this efficiently?

I have no desire to model chart comments as the pursuit doesn't seem to be worthwhile to me. Does the same person make all of the comments in all of the charts? Of course not. Then how do we know that all chart creators use the same words to describe identical (or near-identical) trips? We don't, so to me that seems akin to GIGO. Much of the data that handicappers use already has built-in inaccuracy. I think that trying to model too many things just keeps adding more error into the mix. Just my .02.

Jeff P · 09-04-2020, 12:03 AM

Assuming you are using something like conditional or multinomial logistic regression for your model --

Classify your trip types. To keep things simple, letter codes like TripA and TripB, etc. should work just fine.

Imo, it doesn't matter what your trip types are (at least not at first.) As you move forward with your model, the data will tell you which of your trip types (if any) are significant and which you can safely discard.

The important thing is to classify your trip types, give each a distinct letter code, compile the data for each of your trip types in a consistent manner, and include a column for your trip types in the history table you are using to accumulate data for purposes of building your model.

For example purposes, below is a simple history table that contains data for Remington Park R1 on 09-03-2020.

The Track, rDate, Race, Surf, Dist, and Odds columns should be self explanatory.

The Horse column contains the horse's position in the starting gate from the rail out.

The Speed column contains the horse's HDW final time speed fig from its most recent running line. (Imo, there's nothing magic about last race running line speed fig. Just using it here for example purposes.)

The TripA column contains a value of 1 for True in cases where the horse qualifies as a Trip Type A. Otherwise it contains a value of 0 for False. In this case Trip Type A describes a poor start last out.

The TripB column contains a value of 1 for True in cases where the horse qualifies as a Trip Type B. Otherwise it contains a value of 0 for False. In this case Trip Type B describes a horse that was making an outside closing move on the far turn (or tying to) last out.

The Wnr column is assigned a value of 1 to indicate True this horse won this race. All other horses are assigned a value of 0 for False.

The table structure with data looks something like this:

Code:

Track  rDate     Race  Surf  Dist  Horse  Speed  TripA  TripB   Odds  Wnr
-----  --------  ----  ----  ----  -----  -----  -----  -----  -----  ---
  RPX  9/3/2020     1     1  1210      1     67      0      0    4.5    0
  RPX  9/3/2020     1     1  1210      2     61      0      0   29.5    0
  RPX  9/3/2020     1     1  1210      3     55      1      0    2.5    1
  RPX  9/3/2020     1     1  1210      4     70      0      0    1.3    0
  RPX  9/3/2020     1     1  1210      5     63      1      0   13.4    0
  RPX  9/3/2020     1     1  1210      6     71      0      1    3.4    0

After your history contains data for a few thousand races, and if you've done a good job of compiling your trip type data in a consistent manner:

When you run the data through a third party stat tool such as SPSS, Stata, or one of the logistic regression packages in R:

The third party stat tool should be able to display significance for your trip types.

From there you should be in a position to make an informed decision whether or not to include your trip types in your model.

Hope I managed to type most of that out in a way that makes sense,

-jp

.

CBYRacer · 09-04-2020, 12:12 AM

Quote:

Originally Posted by Jeff P

Assuming you are using something like conditional or multinomial logistic regression for your model --

Classify your trip types. To keep things simple, letter codes like TripA and TripB, etc. should work just fine.

Imo, it doesn't matter what your trip types are (at least not at first.) As you move forward with your model, the data will tell you which of your trip types (if any) are significant and which you can safely discard.

The important thing is to classify your trip types, give each a distinct letter code, compile the data for each of your trip types in a consistent manner, and include a column for your trip types in the history table you are using to accumulate data for purposes of building your model.

For example purposes, below is a simple history table that contains data for Remington Park R1 on 09-03-2020.

The Track, rDate, Race, Surf, Dist, and Odds columns should be self explanatory.

The Horse column contains the horse's position in the starting gate from the rail out.

The Speed column contains the horse's HDW final time speed fig from its most recent running line. (Imo, there's nothing magic about last race running line speed fig. Just using it here for example purposes.)

The TripA column contains a value of 1 for True in cases where the horse qualifies as a Trip Type A. Otherwise it contains a value of 0 for False. In this case Trip Type A describes a poor start last out.

The TripB column contains a value of 1 for True in cases where the horse qualifies as a Trip Type B. Otherwise it contains a value of 0 for False. In this case Trip Type B describes a horse that was making an outside closing move on the far turn (or tying to) last out.

The Wnr column is assigned a value of 1 to indicate True this horse won this race. All other horses are assigned a value of 0 for False.

The table structure with data looks something like this:

Code:

Track  rDate     Race  Surf  Dist  Horse  Speed  TripA  TripB   Odds  Wnr
-----  --------  ----  ----  ----  -----  -----  -----  -----  -----  ---
  RPX  9/3/2020     1     1  1210      1     67      0      0    4.5    0
  RPX  9/3/2020     1     1  1210      2     61      0      0   29.5    0
  RPX  9/3/2020     1     1  1210      3     55      1      0    2.5    1
  RPX  9/3/2020     1     1  1210      4     70      0      0    1.3    0
  RPX  9/3/2020     1     1  1210      5     63      1      0   13.4    0
  RPX  9/3/2020     1     1  1210      6     71      0      1    3.4    0

After your history contains data for a few thousand races, and if you've done a good job of compiling your trip type data in a consistent manner:

When you run the data through a third party stat tool such as SPSS, Stata, or one of the logistic regression packages in R:

The third party stat tool should be able to display significance for your trip types.

From there you should be in a position to make an informed decision whether or not to include your trip types in your model.

Hope I managed to type most of that out in a way that makes sense,

-jp

.

This is exactly what I was referring to. Jeff, are your Trip A, Trip B, etc. from watching race replays or parsing chart comments? If the former, how long did it take you to accumulate enough data (i.e., watch that many replays) for your model?

sjk · 09-04-2020, 06:40 AM

At one time I thought about parsing the comments but it looked like they were so different from one circuit to the next it would be difficult to classify by machine and I had no interest in looking at races one by one by one.

As I recall it looked like you could do pretty well betting horses that lost their jockey last out.

Robert Fischer · 09-04-2020, 08:52 AM

I think it's worth it to pay a competent 'trip guy'.

Even if it's just a simple 'neutral' -1 or +1 ...

at least you then have a model and can see if there is a potential synergy(agrees), or tradeoff(disagrees) with a horse that is a potential play.

You get horses that were best and were race-ridden and shuffled back and trapped

you got horses who had a drive or rally that was 'muted'

then you got horses who in reality 'saved' a bunch of energy and got a dream trip while it superficially looks like trouble

classhandicapper · 09-04-2020, 09:17 AM

Jeff,

That's a nice approach.

I've been able to test computer generated race flow and bias notes, but nothing beyond that. The rest has been trial and error experience. I've concluded T & E is a risky way to learn. A short flurry of random successes or failures can cause you to come to an incorrect conclusion that lasts for years.

CBYRacer · 09-04-2020, 10:43 AM

Quote:

Originally Posted by Robert Fischer

I think it's worth it to pay a competent 'trip guy'.

Even if it's just a simple 'neutral' -1 or +1 ...

at least you then have a model and can see if there is a potential synergy(agrees), or tradeoff(disagrees) with a horse that is a potential play.

You get horses that were best and were race-ridden and shuffled back and trapped

you got horses who had a drive or rally that was 'muted'

then you got horses who in reality 'saved' a bunch of energy and got a dream trip while it superficially looks like trouble

I like this approach, Robert. Have you done this before? If so, any suggestions on where to find the person?