DAYS SINCE LAST START - Page 4 - Horse Racing Forum - PaceAdvantage.Com

traynor · 11-21-2012, 12:41 AM

Quote:

Originally Posted by raybo

Your assumption is correct, we get a set of predictive attributes from the database, the program then uses those attributes to determine eliminations as win contenders, then afurther elimination process is undertaken regarding adjusted early fractional velocities.

The number of cards needed for a particular track varies according to the average number of races per card, but, approximately 240-250 recent races (24-25 cards approximately) are kept in the database, constantly updated, of course. That number can vary for some tracks, depending on how the spread of pace pressure groupings works out, and how many of each of those type races actually occur, some pace pressure readings happen an insignificant number of times, as you can imagine, while others occur in significant numbers. There are 20 different groupings, so not all tracks have the same spread of pace pressure ratings, thus needing a few more, or fewer, races in the database. Once we find the number where predictiveness is at an acceptable level, the program keeps the database there automatically.

I am impressed at the innovative way you view the data. Thank you for explaining.

raybo · 11-21-2012, 07:25 AM

You're welcome. I think the fact that the program takes Randy Giles' work a step or 2 further, makes it more valuable. While Randy's work tells you a particular PPG gives advantage to an early, or a late, horse, etc., this program tells you which of the specific running styles have won those races and at what percentage. Also, it tells you the same thing for those styles' early speed point ranges. Both of those things being track specific and in a recent time frame.

Capper Al · 11-21-2012, 07:41 AM

Quote:

Originally Posted by Cratos

I disagree because each race is an independent event and the margin of victory in one event has nothing to do with the margin in another event. Additionally, the field of horses will probably be different in each event.

It is never the attribute (fact) like DSLR, it is how it is used. Days has to be applied in conjunction with other factors or it is meaningless. The more dominate attributes like speed and pace are considered primary factors and may stand as a single source (variable) on their own for selecting horses while Days Since Last Raced is a secondary attribute and has to be interpreted in a comprehensive manner together with other factors.

bob60566 · 11-21-2012, 10:25 AM

Quote:

Originally Posted by Capper Al

It is never the attribute (fact) like DSLR, it is how it is used. Days has to be applied in conjunction with other factors or it is meaningless. The more dominate attributes like speed and pace are considered primary factors and may stand as a single source (variable) on their own for selecting horses while Days Since Last Raced is a secondary attribute and has to be interpreted in a comprehensive manner together with other factors.

In todays world there is only so many times you can administer apple juice in a cycle and make the horse effective.

Capper Al · 11-21-2012, 10:53 AM

Quote:

Originally Posted by bob60566

In todays world there is only so many times you can administer apple juice in a cycle and make the horse effective.

If you are referring to cheating then there isn't much one can do except hope there's a pattern that can be flagged.

DeltaLover · 11-21-2012, 11:33 AM

In my opinion the concept of recency is one of the most naively treated in handicapping literature and related software.

Recency analysis is a classical example of unsupervised learning where the input vector consists of the intervals between consecutive races and the output is a classification universe. It is a cluster analysis that can be optimized by various fitness functions: PNL, ROI or winning frequency maximizers or minimizers.

The objective of this classification is to derive group monikers in such a way that each starter will be assigned one creating a recency shape for each race. The value of a classifications schema can be evaluated by a selection method (ga, nn, linear regression or other) that will be able to show profitability utilizing it in some way...

bob60566 · 11-21-2012, 01:11 PM

Quote:

Originally Posted by Capper Al

If you are referring to cheating then there isn't much one can do except hope there's a pattern that can be flagged.

Wrong thread should have posted under Pattern Recognition

TrifectaMike · 11-21-2012, 07:18 PM

Quote:

Originally Posted by DeltaLover

In my opinion the concept of recency is one of the most naively treated in handicapping literature and related software.

Recency analysis is a classical example of unsupervised learning where the input vector consists of the intervals between consecutive races and the output is a classification universe. It is a cluster analysis that can be optimized by various fitness functions: PNL, ROI or winning frequency maximizers or minimizers.

The objective of this classification is to derive group monikers in such a way that each starter will be assigned one creating a recency shape for each race. The value of a classifications schema can be evaluated by a selection method (ga, nn, linear regression or other) that will be able to show profitability utilizing it in some way...

You might discover a "bounce" classification.

Delta, you may want to give an example of what you have stated in paragraph 2. It is a good idea.

Mike (Dr Beav)

DeltaLover · 11-21-2012, 11:07 PM

Sure Doc..

I will try to present my approach:

Each starter has an array of days intervals between his starts:

s1: 25 15 53 212 ....

s1: d1, d2, dn

For simplicity let's consider only todays race days off and previous race.

In this case we have

classififaction = f(d1,d2)

Each interval adds one dimension so for our example we are talking about two dimensions

Each starter can be represented as a point in a two dimensional surface x,y.

To make the algorithm easier we might need some transformation logic for the days off:

for example:
T(d) = log(d) (or whatever)

Using the euclidean distance for all starters we will be looking for clusters having some similar behavior:

for example winning frequency.

The objective of the algorithm will be to find two dimensional clusters that behave similarly.

For example we might find a cluster c1 who is having winning frequency or c2 who is having the lower.

Each cluster will be assigned an arbitrary label C1, C2, C2 etc

Then the whole race can be described as a composite of clusters based where each starter belongs:

C1
C1
C2
C3
C7

Now the race can be matched against similar races from where we might be able to conclude (for example)
that this type of race is most frequently won by a C1 type

SmarterSig · 11-22-2012, 04:08 AM

Hi Delta

Do you use any other inputs to your cluster analysis. Trainer springs to mind ?.

DeltaLover · 11-22-2012, 07:16 AM

Trainer will add a categorical dimension in the space which subsequently will narrow clusterization to this dimension only.

Although I am not currently doing in in any of my systems we can add analyze trainers after we have completed the recency clustrerization. Based in it we will now be able to assign to each trainer a distribution of classifications and rate him based in them.

For example:

Trainer: T1

C1 : T1-C1-Stats win% roi
C2 : T1-C2-Stats win% roi
C3 : T1-C3-Stats win% roi
C4 : T1-C4-Stats win% roi

Trainer: T2

C1 : T2-C1-Stats win% roi
C2 : T2-C2-Stats win% roi
C3 : T2-C3-Stats win% roi
C4 : T2-C4-Stats win% roi

etc

Now we can use each trainer's vector :

[ Ti-Ci .... ]

To perform another classification...