PDA

View Full Version : Small Sample Trend Demo


traynor
11-20-2013, 10:19 AM
This is a current wagering template for Maywood. I will re-visit this template after an additional number of new (current) races have been added, to illustrate how "snapshot" small sample models can be used--and cannot be used. In general, I would like to illustrate the basic principles of distinguishing predictive models from those which are merely descriptive--and essentially worthless for wagering purposes.

The model:
Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Nov20 In 102 NON-SELECT May Pace 13 12 17 19 in 21 20.59 % MIQR 5.16 Won 47.62 % ROI 1.47 105.40 4
Nov20 In 102 NON-SELECT May Pace 13 14 17 19 in 21 20.59 % MIQR 5.16 Won 47.62 % ROI 1.47 105.40 4
Nov20 In 102 NON-SELECT May Pace 13 15 17 19 in 17 16.67 % MIQR 5.45 Won 47.06 % ROI 1.52 105.40 3
Nov20 In 102 NON-SELECT May Pace 13 17 19 36 in 19 18.63 % MIQR 5.56 Won 47.37 % ROI 1.55 105.40 4
Nov20 In 102 NON-SELECT May Pace 14 10 17 19 in 22 21.57 % MIQR 5.20 Won 45.45 % ROI 1.41 94.40 4
Nov20 In 102 NON-SELECT May Pace 14 11 15 19 in 18 17.65 % MIQR 5.50 Won 44.44 % ROI 1.44 94.40 4
Nov20 In 102 NON-SELECT May Pace 14 11 17 19 in 20 19.61 % MIQR 5.20 Won 50.00 % ROI 1.55 94.40 4
Nov20 In 102 NON-SELECT May Pace 14 12 17 19 in 23 22.55 % MIQR 5.73 Won 47.83 % ROI 1.61 105.40 4
Nov20 In 102 NON-SELECT May Pace 14 13 17 19 in 23 22.55 % MIQR 5.73 Won 47.83 % ROI 1.61 105.40 4
Nov20 In 102 NON-SELECT May Pace 14 15 17 19 in 20 19.61 % MIQR 6.11 Won 45.00 % ROI 1.60 105.40 5
Nov20 In 102 NON-SELECT May Pace 14 17 19 36 in 20 19.61 % MIQR 5.60 Won 45.00 % ROI 1.49 105.40 6
Nov20 In 102 NON-SELECT May Pace 15 12 13 19 in 18 17.65 % MIQR 5.38 Won 44.44 % ROI 1.42 94.40 4
Nov20 In 102 NON-SELECT May Pace 15 13 14 19 in 18 17.65 % MIQR 5.38 Won 44.44 % ROI 1.42 94.40 4

What the numbers mean:
Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Template code designation (ID)

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Percentage of this category (NON-SELECT) races this template fits.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
"Average mutuel"--high values truncated to the mean of the interquartile range.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Percentage of matched races won.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
ROI. Not to be confused with POI, or some other expression. ROI means that for every $100 wagered, $143 was returned. This is a VERY conservative estimate, based on the MIQR, NOT the actual return, which is considerably higher.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
High mutuel in this series.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Missouts. The maximum number of losses between wins in this series.

DeltaLover
11-20-2013, 11:07 AM
I am not sure I can undestand what each field means. Can you make it a bit more descriptive?

TexasDolly
11-20-2013, 01:01 PM
I don't understand the numbers either. When you get a few minutes explain the columns for us. Thank you.
TD

traynor
11-20-2013, 02:49 PM
This is a current wagering template for Maywood. I will re-visit this template after an additional number of new (current) races have been added, to illustrate how "snapshot" small sample models can be used--and cannot be used. In general, I would like to illustrate the basic principles of distinguishing predictive models from those which are merely descriptive--and essentially worthless for wagering purposes.

The model:
Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Nov20 In 102 NON-SELECT May Pace 13 12 17 19 in 21 20.59 % MIQR 5.16 Won 47.62 % ROI 1.47 105.40 4
Nov20 In 102 NON-SELECT May Pace 13 14 17 19 in 21 20.59 % MIQR 5.16 Won 47.62 % ROI 1.47 105.40 4
Nov20 In 102 NON-SELECT May Pace 13 15 17 19 in 17 16.67 % MIQR 5.45 Won 47.06 % ROI 1.52 105.40 3
Nov20 In 102 NON-SELECT May Pace 13 17 19 36 in 19 18.63 % MIQR 5.56 Won 47.37 % ROI 1.55 105.40 4
Nov20 In 102 NON-SELECT May Pace 14 10 17 19 in 22 21.57 % MIQR 5.20 Won 45.45 % ROI 1.41 94.40 4
Nov20 In 102 NON-SELECT May Pace 14 11 15 19 in 18 17.65 % MIQR 5.50 Won 44.44 % ROI 1.44 94.40 4
Nov20 In 102 NON-SELECT May Pace 14 11 17 19 in 20 19.61 % MIQR 5.20 Won 50.00 % ROI 1.55 94.40 4
Nov20 In 102 NON-SELECT May Pace 14 12 17 19 in 23 22.55 % MIQR 5.73 Won 47.83 % ROI 1.61 105.40 4
Nov20 In 102 NON-SELECT May Pace 14 13 17 19 in 23 22.55 % MIQR 5.73 Won 47.83 % ROI 1.61 105.40 4
Nov20 In 102 NON-SELECT May Pace 14 15 17 19 in 20 19.61 % MIQR 6.11 Won 45.00 % ROI 1.60 105.40 5
Nov20 In 102 NON-SELECT May Pace 14 17 19 36 in 20 19.61 % MIQR 5.60 Won 45.00 % ROI 1.49 105.40 6
Nov20 In 102 NON-SELECT May Pace 15 12 13 19 in 18 17.65 % MIQR 5.38 Won 44.44 % ROI 1.42 94.40 4
Nov20 In 102 NON-SELECT May Pace 15 13 14 19 in 18 17.65 % MIQR 5.38 Won 44.44 % ROI 1.42 94.40 4

What the numbers mean:
Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Template code designation (ID) The sequence of numbers is an identifier of the template--they don't "mean" anything other than as an identifier in
the output.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Percentage of this category (NON-SELECT) races this template fits. The template selected 18 matches in the 102 races as fitting the template--17.65%

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
"Average mutuel"--high values truncated to the mean of the interquartile range. The MIQR is an automated calculation applied to all models. If it is a four race model, the mean of the middle half (second and third values) is used to determine the value. If it is 400 race model, the same process is applied. It is not "exactly" MIQR--it is the average of the middle half of the races--the total of the middle half mutuels divided by the number of the races. I like that way of calculating it better than the more conventional (but laborious) calculation.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Percentage of matched races won. Again, this is an automated calculation, and is the number of winners divided by the number of matches. The accuracy of this value increases substantially as the size of the sample increases.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
ROI. Not to be confused with POI, or some other expression. ROI means that for every $100 wagered, $143 was returned. This is a VERY conservative estimate, based on the MIQR, NOT the actual return, which is considerably higher. It is calculated based on the truncated mutels (MIQR), rather than on actual mutuels. Dumping a $105.40 mutuel into anything less than a 500-1000 (or more) model produces seriously screwy results. The list of mutuels is iterated through, and truncated to 1.5 times the MIQR value (in this case, 8.17) before calculating the ROI. Meaning that for modeling purposes, the 105.40 mutuel was entered as 8.17 in calculating the ROI.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
High mutuel in this series.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Missouts. The maximum number of losses between wins in this series.

I hope this clarifies.

traynor
11-20-2013, 02:59 PM
I probably should have explained that this is "the Maywood model" rather than a single template. It is comprised of 14 complimentary and (partially) overlapping templates. That is, each of the 14 rows of figures is a separate and distinct template, extracted from analysis of the latest batch of Maywood pace races. In theory, each could be used (or viewed) or analyzed separately. I use them in this fashion, so it is simpler for me to monitor the results using the compound model, rather than the individul components of that model.

This may seem an unusual approach, with different components pointing to different entries. The way I build models, that rarely happens. I will explain further as new results become available.

traynor
11-20-2013, 03:29 PM
This is the type of pattern I want to find. The list below is from the current model for Northfield (the Maywood model does not have enough values yet to generate this data--that is why I am using the Nfld sample to illustrate).

2.40
2.40
2.60
2.80
3.00
3.60
4.00
5.00
5.20
5.40
5.40
5.40
6.00
7.00
7.40
7.40
8.80
11.00
12.80
14.60
15.80
16.00
25.20

The range is smooth, and builds to fairly generous mutuels, indicating that the model is slipping under the radar of both the heavy hitters and the average bettor. Six double digit mutuels in a model this size is something to note carefully.

mrroyboy
11-20-2013, 03:36 PM
Tray
You have explained your methodology but not what you are trying to determine.

traynor
11-20-2013, 03:42 PM
In contrast, this is a list of the missed mutuels--races that the template matched, but the selection did NOT win: (everyone studies their losses as well as their wins, right?)

2.20
2.60
2.80
3.20
3.20
3.60
4.20
4.40
5.00
5.00
5.40
5.60
5.60
6.00
6.40
6.40
7.00
7.60
10.80
12.60
13.40
14.00
15.00
17.60
21.80
24.00
24.60
25.40
166.00

Note the pattern compared to the winning races. At least in the case of this template, the "wisdom of crowds" is a myth. That crowd is letting some very good horses go to post at very generous odds.

Why should this matter, and why should anyone care? Using a truncated MIQR mutuel as a guide, I can virtually ignore the post time odds and all the nonsense about "value betting" (that always seems to work out the wrong way in the final odds). That is, the projected return is sufficiently generous using truncated mutuels that I don't need to worry much about the odds in individual races. The ROI projection is based on conservative estimates, NOT on aberrant outliers that are unlikely to repeat.

That means that using this type of template and mutuel distribution, I will almost always earn more than the projected ROI, and very rarely earn less.

traynor
11-20-2013, 03:45 PM
Tray
You have explained your methodology but not what you are trying to determine.

How to distinguish predictive models from descriptive models.

traynor
11-20-2013, 03:53 PM
Most "handicapping" software and most handicapping methods do not distinguish between description and prediction--there is a tacit (and erroneous) assumption that merely describing (or looking at) the past will enable one to predict the future. That is rarely the case, in anything less than broad strokes with a mega-database--and then only if applied to a huge number of races.

traynor
11-20-2013, 04:52 PM
Perhaps it would be more interesting to ask what you would do with a small sample model. Do you believe small samples are "predictive"? That is, do you believe that a sample based on 100 or so races is significant enough to be worth betting on? Or that the results of such a small sample can be reasonably expected to represent the distribution of events in a larger sample?

Bear in mind that when all the slicing and dicing by track, distance, surface, and whatever is done, a typical sample of several hundred races can be whacked down to a "sample" no larger than that I posted above for Maywood (which actually represents something like the most recent six to eight weeks or so of races at Maywood). "Finely layered" is too often a euphemism for "backfitted to the results of last month's races." Do you believe such a small sample can be used profitably for wagering? If so, why? And if not, why not?

I am not interested in debate, but rather in understanding how and why other bettors use (or do not use) small samples as the basis for their wagers. If I can provide a few tips on the way on how to distinguish a (possibly) predictive sample from a sample that only describes a small subset of (more than likely) unrepeatable events, great.

mrroyboy
11-20-2013, 05:31 PM
Well I know you do plenty of research on various angles etc so I will believe anything you post.

traynor
11-20-2013, 05:41 PM
Well I know you do plenty of research on various angles etc so I will believe anything you post.

I don't know about belief, but I really like symmetric mutuel distributions. Meaning, if it is mostly chalk with one or two biggies, it makes me skeptical, because I expect mostly chalk. I much prefer models that indicate I am looking at something other than what everyone else is looking at (or for).

Similarly, I have little or no faith in models that include outliers in the calculations of ROI. If I can't toss the biggest payoffs and still earn a profit, I don't want to bet it. Chasing rainbows is not in my job description.

What I want to show is the exact same model after another 20-30 or whatever races have been added, to see how close the (current) figures are replicated (or not) when applied to races that have not been run yet. In short, what would happen if one were to actually bet on such models, on the assumption that a small set of past events can be used to predict the outcome of future events?

I want to use a "new" model, which is why I chose Maywood.

traynor
11-20-2013, 05:57 PM
In case the "symmetric" statement seemed obscure, it is common in model-making and data analysis. In the context of horse races, it can best be understood by looking at "unsymmetric" patterns. If a pattern is mostly low-end prices with a couple of big payoffs, the big payoffs are most likely the result of something else happening in the race--such as a way overbet favorite not firing and finishing out of the money, or some other "confounding variable" that caused the high mutuel (on some other horse) in that race. Similarly, an accident that wipes out the top three choices in the race, creating a situation for an entry with little or no chance of winning otherwise to win the race at a huge price. If that huge price is mindlessly tossed in to the calculations, all it does is create nonsense and junk models.

Unfortunately, because including such events (and ignoring the effect of confounding variables) makes otherwise clunky and boring "models" seem profitable, it is a common practice to include them. There is a reason for NOT including them (at least for me). I actually bet on the models I make.

DeltaLover
11-20-2013, 06:25 PM
To clarify:

Descriptive model : over trained and over fitted to the point of memorizing what has happened in the past

Predictive model: trained up to the point of behaving beyond a specific level of acceptance in events unseen during the training process

Do we agree?

traynor
11-20-2013, 06:51 PM
To clarify:

Descriptive model : over trained and over fitted to the point of memorizing what has happened in the past

Predictive model: trained up to the point of behaving beyond a specific level of acceptance in events unseen during the training process

Do we agree?

Yes. Forecasting possible trends, rather than simply describing past events.

DeltaLover
11-20-2013, 07:30 PM
OK, we are in common ground so far.

The next thing, I like to clarify, is why you are reporting:

- I like your definition of the ROI which is essentially PNL

- I am noy sure why you are reporting win%, I think is not needed.

- If I am not missing anything, the percentage of found bets is already reflected sufficiently on the ROI. I think it confuses me...

- I like the idea of reporting MIQR although again I view it as an intermediate parameter already refleted in the final ROI.

- Have you already tried instead of using the final mutual price to adjust it based in pre - take out values? I have found that this approach yields fitter results.

traynor
11-20-2013, 08:28 PM
OK, we are in common ground so far.

The next thing, I like to clarify, is why you are reporting:

- I like your definition of the ROI which is essentially PNL

- I am noy sure why you are reporting win%, I think is not needed.

- If I am not missing anything, the percentage of found bets is already reflected sufficiently on the ROI. I think it confuses me...

- I like the idea of reporting MIQR although again I view it as an intermediate parameter already refleted in the final ROI.

- Have you already tried instead of using the final mutual price to adjust it based in pre - take out values? I have found that this approach yields fitter results.

This is a (very) small model, intended to iilustrate principles. Some things may seem odd or inconvenient in a small sample, but the same processes are applied to (much) larger samples.

I use win% ("strike rate") because it is a value filter on the data mining app I use. That is, I can set it to locate whatever ratios I want. Same with other values--I set the criteria for acceptable parameters (usually set to default values), but with the option to easily change search parameters. It is also set to flag anomalies. If I have a fully developed model that hits 40-45% consistently, and it suddenly drops to 25-30% I want to know why. Using win% allows me to set a simple flag.

For matches, if I set matched races to 20%, I get a lot of matches, and lots of chalk. At 10%, not enough matches, and the missouts go up dramatically. Most of the models I am using now are developed with 16% (and over) matches. That seems (at least currently) to work the best (for me). I define it in the output so I don't build a model using criteria that I don't remember, or whatever. It is more a reminder (note) to myself of the specific search parameters I used in that search than something intended for public display.

I really don't like monitoring actual mutuel prices when modeling--they seem more a distraction (at this point) than anything else. I just started building the hits and misses mutuel files for selections. The preliminary results look like it may be a valuable source of information, and my use of mutuels may change radically.

Understand that the template description(s) I posted are for my own use, and contain notations and values that may seem overkill, but allow me to evaluate new models and model components at a glance. For example, the current betting model I use for The Meadows has almost 100 components, that are categorized by race type (some used for one type of race, some for another, developed over many months). I find it much easier to scan a field of numbers parsed into discrete categories than to use other methods. For example, scanning (that pattern-recognition thing again) the mutuel prices makes a 6.46 "jump out" in a column comprised primarily of values in the 5.12 to 5.57 (or whatever) range.

DeltaLover
11-20-2013, 09:51 PM
Sounds simple and straight forward approach. As far as pattern recognition goes, maybe you should consider a NN, if your data are clean and more than anything else you get it right to eliminate conflicts it might work. What I think might be of more value is to create some sort of a voting mechanism and based your betting on it. Although clearly an emperical approach with no sound theoretical foundation, it seems to work.

traynor
11-20-2013, 10:15 PM
Sounds simple and straight forward approach. As far as pattern recognition goes, maybe you should consider a NN, if your data are clean and more than anything else you get it right to eliminate conflicts it might work. What I think might be of more value is to create some sort of a voting mechanism and based your betting on it. Although clearly an emperical approach with no sound theoretical foundation, it seems to work.

A model is like a snapshot--a single image. A series of such snapshots enable (relatively) early detection of trends (as opposed to chaotic scatters of data points). That is one of the reasons I use several different "views" defining the overall "snapshot."

For example, a lower frequency of matches and wins with higher mutuels is more likely to produce an equivalent (or better) ROI, but may be less dependable (and ultimately less profitable) than a higher frequency of matches and wins with the same--or even lower--ROI. Most of the models tend to suggest that focusing on higher mutuels (average, mean, MIQR) is counter productive. The higher the mutuels, the more likely they are to be (or to contain) anomalies--and not to be sustainable in actual use.

DeltaLover
11-20-2013, 10:21 PM
I completely agree. The whole idea is to jump to the wagon before anyone else. Early trend realization is the name of the game.

traynor
11-21-2013, 10:41 AM
Well, whoopie, so now there is a small model of a small sample of races at a track. What comes next? For some, the tendency seems to be to add more races, recalculate everything, backfit it to the newly added races, and declare it as a "change to reflect the realities of racing." That is utter nonsense, and worthless for wagering.

The next step is to test the model on new races, not to tweak it so it would have picked yesterday's (or today's) winner(s)--if it had been tweaked into the new configuration before those races were run.

The model--exactly as created, no tweaking, no adjustment, no modification---should be applied to a fresh sample of races to determine if it performs as expected. That testing process is the essence of model making. Anyone can look back and say "oh, blah blah happened. Therefore blah blah will continue happening. I will bet on blah blah, and fortune and glory are just over the next hill." That type of "logic" is a major factor in why 98% of bettors lose.

Once a model has been developed that indicates a potential for a positive ROI based on past performances there is zero guarantee that the model will perform the same way on future races. That is the point I am trying to make--that the basic premises of "handicapping" races, in particular the use of "regression studies" of small samples of races--is conceptually flawed. It is not just that small samples can be misleading. It is that reliance on small samples as predictive, when they are only descriptive, is a fundamental error that misleads many to select (and bet on) the wrong horse(s) while firmly believing they are using the knowledge gained from their years of experience at analyzing races to pick the right horse(s).

Cleaning the data is a necessary prerequisite for further study. All the nonsense about MIQR and outliers and so on is essential to the process. Unless those steps are taken initially, there is little hope of producing models that are other than misleading foolishness fit only for bragging--not for betting.

Step Two--the testing process--follows. I will apply the model to a new clump of races at Maywood. Tonight's races, if the model fits any of the entries. I won't know that until I run the races later today. This is the essential caveat__do NOT bet on those selections!!!

The model is untested, unverified, and could be junk, pure and simple. To wager on it at this stage would be foolish in the extreme. By applying it to a new set of races, it may provide clues that can be used (by you or anyone else) to determine when a model is predictive (and possibly profitable), and when a model is only descriptive (of a past set of anomalies unikely to repeat in future events).

As I have mentoned several times in the recent past, the software I use turns up 100 or more preliminary models (that show a healthy win%, ROI, and other attributes of a "winning approach") on a normal day. At this point in time, the Maywood model is just that--preliminary. Please do not regard it as anything other than a vehicle I am using to illustrate the process of testing preliminary models.

traynor
11-21-2013, 03:13 PM
Ideally, preliminary testing should be done on a sample the same size or larger. That may take awhile--there are only two entries that fit the Maywood model defined earlier in tonight's races:

May21PACE R02_6 ShiftingInterlude

May21PACE R03_2 WolfiesSportster

These are to watch, NOT to bet! At this stage (as with most other models built by backfitting to past races) the only thing that distinguishes this model from many, many others (that prove to be unprofitable or non-predictive or both) is the relative symmetry of mutuels.

At this stage (still very preliminary) pursuing "profit" as a goal is foolish. A model that loses 30 races in a row, then picks one $70 winner could be considered "profitable"--and still be a really useless model. What I am looking for now is a model that may be predictive, not just one that backfitting to past results made seem predictive.

In real world terms, I am looking for a model that--when applied to a new sample of races--performs approximately as it did in the initial model. NOT just "profit" but match rate, strike rate, mutuel size, and all the bells and whistles of the original. Then--and only then--should the small sample model be considered a possible contender for continued (and more specific) testing.

Without the testing and validation, my small sample models--just like everyone else's small sample models--are only descriptions of past events that in no way guarantee accuracy when applied to future events.

traynor
11-22-2013, 10:33 AM
May22PACE R10_8 Shabalabadingdong

May22PACE R11_1 RealHero

Same caveats as yesterday--this is a test of small sample modeling techniques, NOT race selections! Just for looking, not for betting.

traynor
11-23-2013, 11:04 AM
Results so far: Four selections, three losses, one win on an odds-on favorite. No conclusions (even tentative) until the test has been applied to 20 or so races.

traynor
11-29-2013, 03:56 PM
The purpose of this posting is a continuation of the small sample test. This is only a test, NOT betting selections!

May29PACE R03_2 Goldennugget

May29PACE R06_1 Pippi

May29PACE R07_6 FoxValleyMatinee

May29PACE R09_1 Firstclassallthway

May29PACE R10_7 SealarkHanover

traynor
11-30-2013, 12:26 AM
The purpose of this posting is a continuation of the small sample test. This is only a test, NOT betting selections!

May29PACE R03_2 Goldennugget

May29PACE R06_1 Pippi

May29PACE R07_6 FoxValleyMatinee

May29PACE R09_1 Firstclassallthway

May29PACE R10_7 SealarkHanover

May29PACE R03_2 Goldennugget (place)

May29PACE R06_1 Pippi (FOOM)

May29PACE R07_6 FoxValleyMatinee (FOOM)

May29PACE R09_1 Firstclassallthway WON $3.40

May29PACE R10_7 SealarkHanover (place)

This is actually pretty typical of small samples. In 9 races, two wins--both at less than even money. A cluster of wins in the model created the illusion of a "pattern" that may in fact be no more than illusion. While a cluster of wins could bring the ROI and win percentage both back up to the projections there is no guarantee that the ROI and win percentage of such a small model can be used for wagering. If you want to win a profit, that is.

I have no conclusions at this point, because I need approximately 20 additional races (11 more than already posted) to make a preliminary determination if the model is worth pursuing.

As indicated previously, 95% of such small sample models turn out to be worthless in the real world. Especially worthless (for wagering purposes) are small sample models that are continually tweaked and modified by adding additional races and re-calculating.

It should be noted that if the 9 "new" races are added into the mix, and the model re-calculated, it will still show a substantial ROI and healthy win percentage over the (extended) sample. That is one of the problems associated with descriptive models--they only describe the past, not what will happen in the future.

traynor
12-05-2013, 06:11 PM
Same caveats--this is only a test.

May05PACE R01_6 SixPackAnnie

May05PACE R02_6 BuckeyeDragonlady

May05PACE R05_4 EdenWay

May05PACE R08_1 WesMantooth

traynor
12-05-2013, 06:21 PM
In general, finding something that shows a positive ROI over a chunk of past races is fairly easy. It is similarly easy to continually tweak that something by adding more races, adding and deleting races, or whatever process is used, and re-calculating it. In general, it will continue--at each tweaking--to appear to have been "profitable" in those past races.

The point I am trying to make is that is a big reason why people lose--their "somethings" are utter nonsense when applied in the real world (by actually betting on them).

The reason is that "events" occur in clusters--they are not symmetrically distributed. In plain English, there are clumps of wins, clumps of losses, more clumps of wins, more clumps of losses--all scattered throughout a perfectly normal distribution of race results. The "patterns" that small sample users believe they are seeing are pure illusion and only exist on paper in retrospect. Betting those illusory patterns typically results in betting into a cluster of losses--exactly what is happening with the small sample model I posted earlier for Maywood.

traynor
12-06-2013, 12:45 AM
Same caveats--this is only a test.

May05PACE R01_6 SixPackAnnie

May05PACE R02_6 BuckeyeDragonlady

May05PACE R05_4 EdenWay

May05PACE R08_1 WesMantooth

May05PACE R01_6 SixPackAnnie Place

May05PACE R02_6 BuckeyeDragonlady FOOM

May05PACE R05_4 EdenWay FOOM

May05PACE R08_1 WesMantooth Place

Again, this is a fairly typical result of believing a small sample is predictive of "even distribution of events over the entire sample." The next 13 races showed two wins--both returning less than even money.

Even more interesting--if I re-run the model with the new races added, it will still appear to be quite profitable. On paper. Not because it is profitable, or even predictive, but because a cluster of anomalies were picked up by the computer in the small sample of races (100+). As long as I include that cluster in the modeling process, this looks like the greatest thing since sliced bread. Until one starts to bet on it. Just like all the other small models.

traynor
12-06-2013, 02:07 PM
Modeling is weird, and rarely works out for the dogmatic developers intent on leveraging complex analytical techniques to gain a "statistical advanage." Specifically, building predictive models (in contrast to building close-to-worthless descriptive models) requires a much higher level of pattern-recognition skills than most dogmatic developers are able to muster.

The current model in this demonstration is a good example. It showed a good ROI, good match rate, and good win percentage on those matches. Most bettors would jump all ove such a model, and would have been betting with both hands on the last 13 races--in which 11 lost, and the two that won returned less than even money. This is the time in the life of a model in which the timid and the I-only-trust-statistics bettors tend to stop and think, "I don't understand--my numbers aren't working. What could be wrong?"

I have yet to decide if this model can be tweaked into being predictive. However, at this exact point, I would not be surprised in the least if the current group of races generated better results than the previous groups of test races. That is not an implied validation of small sample models. It is an indication that the tendency to get caught in the switches is very strong in building such models.

Again, these are not "betting recommendations"--this is still only a test. However, I would suggest one might learn something of value about building small sample models by watching the results of tonight's races.

May06PACE R02_1 BaksidebarNlounge

May06PACE R04_4 YaUBetCha

May06PACE R13_1 Justified

SchagFactorToWin
12-06-2013, 03:13 PM
In general, finding something that shows a positive ROI over a chunk of past races is fairly easy. It is similarly easy to continually tweak that something by adding more races, adding and deleting races, or whatever process is used, and re-calculating it. In general, it will continue--at each tweaking--to appear to have been "profitable" in those past races.

The point I am trying to make is that is a big reason why people lose--their "somethings" are utter nonsense when applied in the real world (by actually betting on them).[/i].

Every handicapper should read this repeatedly until it sinks in.

traynor
12-07-2013, 12:47 PM
Results:

May06PACE R02_1 BaksidebarNlounge place

May06PACE R04_4 YaUBetCha FOOM

May06PACE R13_1 Justified place 39.40/1 paid $19.00

Models that go beyond the obvious are worth much more (to a bettor) than those indicating what everyone else sees. The results of the 13th race indicate this model may be one of such. Horses that the public lets go off at nearly 40/1--and come close to winning--means the model should not be abandoned as worthless just yet.

As for the "wisdom of crowds" the 1.10/1 favorite--Kansas Wildcat--finished third. I won't bore you with conjecture about exacta and trifecta payoffs. I assume you can work out for yourself what kind of leverage 40/1 shots in one of the top two positions can do for your bottom line.

traynor
12-07-2013, 04:22 PM
Of the components of the original small sample demo, NONE met the basic criteria after additional races were added. That simulates building a model from a small sample of races, then wagering on the selections of that model as if a small sample represented a rational view of "reality." The ONLY thing a small sample represents is a description of a small number of events. There is nothing "predictive" about it at all--that is an illusion created by what seems to be a pattern, but is only anomalies in an otherwise perfectly normal distribution of discrete events sprinkled around a baseline.

To give you some idea of how thoroughly one can deceive himself or herself (and anyone foolish enough to take her or his "advice"), a "new" model--based on the addition of more races has emerged that meets the original criteria set for the original small sample demo:

Dec07 In 188 NON-SELECT Pace races 11 14 17 18 in 32 17.02 % MIQR 6.25 Won 40.63 % ROI 1.47 72.60 7

Dec07 In 188 NON-SELECT Pace races 20 21 23 25 in 32 17.02 % MIQR 5.73 Won 53.13 % ROI 1.79 18.00 6

traynor
12-07-2013, 06:05 PM
The below should make clear why I think anyone risking money on a small sample of races is in for a fall. At any point, "positive ROI models" can be extracted from a set of races. Those models are worthless for wagering purposes.

Similarly, by continually tweaking the set of races used as the sample (adding new races, deleting old races) the illusion of profitability is maintained. "New" patterns (just as illusionary as the old patterns) will continue to appear. Just like the "new" patterns in the post above. It all looks good on paper, and in the computer output, but is pretty much worthless for betting.

The most basic test should be applying the model to a different set of races. If the model is predictive, it will select (approximately) the same number of matches, the same number of wins, the same MIQR, and the same ROI in a new group of races.

Bear in mind the demo model is not some wild-eyed, wishin' and hopin' nonsense that some seem to think passes for "handicapping." In particular, the distortion of unusually high mutuels (that is guaranteed to make any sample less than 5000 or so races extremely misleading) has been eliminated. I think anyone who fails to take that fundamental step in analyzing his or her data only does so because she or he knows out of the gate that it is worthless.

Nov20 In 102 NON-SELECT May Pace 12 15 17 19 in 18 17.65 % MIQR 5.45 Won 44.44 % ROI 1.43 105.40 3
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 13 12 17 19 in 21 20.59 % MIQR 5.16 Won 47.62 % ROI 1.47 105.40 4
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 13 14 17 19 in 21 20.59 % MIQR 5.16 Won 47.62 % ROI 1.47 105.40 4
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 13 15 17 19 in 17 16.67 % MIQR 5.45 Won 47.06 % ROI 1.52 105.40 3
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 13 17 19 36 in 19 18.63 % MIQR 5.56 Won 47.37 % ROI 1.55 105.40 4
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 14 10 17 19 in 22 21.57 % MIQR 5.20 Won 45.45 % ROI 1.41 94.40 4
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 14 11 15 19 in 18 17.65 % MIQR 5.50 Won 44.44 % ROI 1.44 94.40 4
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 14 11 17 19 in 20 19.61 % MIQR 5.20 Won 50.00 % ROI 1.55 94.40 4
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 14 12 17 19 in 23 22.55 % MIQR 5.73 Won 47.83 % ROI 1.61 105.40 4
Dec07 In 188 NON-SELECT May Pace 14 12 17 19 in 33 17.55 % MIQR 5.73 Won 33.33 % ROI 1.12 105.40 4

Nov20 In 102 NON-SELECT May Pace 14 13 17 19 in 23 22.55 % MIQR 5.73 Won 47.83 % ROI 1.61 105.40 4
Dec07 In 188 NON-SELECT Pace races 14 13 17 19 in 33 17.55 % MIQR 6.54 Won 39.39 % ROI 1.48 105.40 5

Nov20 In 102 NON-SELECT May Pace 14 15 17 19 in 20 19.61 % MIQR 6.11 Won 45.00 % ROI 1.60 105.40 5
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 14 17 19 36 in 20 19.61 % MIQR 5.60 Won 45.00 % ROI 1.49 105.40 6
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

Nov20 In 102 NON-SELECT May Pace 15 12 13 19 in 18 17.65 % MIQR 5.38 Won 44.44 % ROI 1.42 94.40 4
Dec07 In 188 NON-SELECT Pace races 15 12 13 19 in 41 21.81 % MIQR 4.84 Won 39.02 % ROI 1.14 72.60 6

Nov20 In 102 NON-SELECT May Pace 15 13 14 19 in 18 17.65 % MIQR 5.38 Won 44.44 % ROI 1.42 94.40 4
Dec07 In 188 NON-SELECT May Pace races--ROI negative over entire sample

traynor
12-07-2013, 06:16 PM
So, is it all hopeless? Not at all. I am still looking at one component, that may just turn out to be predictive. This component is the one that selected Justified last night. It also selected the 105.40 winner in the sample chunk of races used initially. I especially like that last number--5--which is the maximum number of losses between wins.

Nov20 In 102 NON-SELECT May Pace 14 13 17 19 in 23 22.55 % MIQR 5.73 Won 47.83 % ROI 1.61 105.40 4
Dec07 In 188 NON-SELECT Pace races 14 13 17 19 in 33 17.55 % MIQR 6.54 Won 39.39 % ROI 1.48 105.40 5

traynor
12-12-2013, 06:48 PM
May12PACE R06_4 KennansNancyLee

traynor
12-12-2013, 10:23 PM
May12PACE R06_4 KennansNancyLee

Won $4.80. Not exactly something to brag about, but a winner is still better than a loser.

eurocapper
12-13-2013, 03:37 AM
Bear in mind the demo model is not some wild-eyed, wishin' and hopin' nonsense that some seem to think passes for "handicapping." In particular, the distortion of unusually high mutuels (that is guaranteed to make any sample less than 5000 or so races extremely misleading) has been eliminated. I think anyone who fails to take that fundamental step in analyzing his or her data only does so because she or he knows out of the gate that it is worthless.


Maybe I'm wrong but I get the impression you are saying the crowd can be both right and wrong, by taking it into account in the model building stage but (as I gather from other posts) ignoring it in the analysis/betting stage. Why not ignore it and mutuels consistently?

traynor
12-13-2013, 06:21 PM
Maybe I'm wrong but I get the impression you are saying the crowd can be both right and wrong, by taking it into account in the model building stage but (as I gather from other posts) ignoring it in the analysis/betting stage. Why not ignore it and mutuels consistently?

Easy. That is the only way to build a realistic model for wagering. If it is strictly for win percentage--ignoring mutuel prices--it is rarely profitable. ROI is a compound of (average, mean, cleaned, adjusted, whatever) mutuel price and win percentage. Ignoring mutuel prices completely would effectively mean ignoring ROI--not a good thing for betting.

traynor
12-13-2013, 06:22 PM
Maywood 12/13 -- Nothing fits the model in tonights races.

traynor
12-19-2013, 11:18 AM
May19PACE R04_5 KennansNancyLee

traynor
12-19-2013, 09:49 PM
May19PACE R04_5 KennansNancyLee

Won at $2.40. Ugh.

traynor
12-20-2013, 03:27 PM
Maywood 12/20 -- Nothing fits the model in tonights races.

traynor
12-27-2013, 05:02 PM
Nothing fit the model last night. One race tonight.

May27PACE R03_4 RichessNestor

traynor
12-27-2013, 09:22 PM
Nothing fit the model last night. One race tonight.

May27PACE R03_4 RichessNestor

May27PACE R03_4 RichessNestor FOOM 20/1

I am really, really glad that I am not foolish enough, desperate enough, or whatever else enough to bet on small samples that indicate a "positive ROI." I can tack a program to the wall and throw darts at it blindfolded and do better than this.

Hmm. Maybe if I didn't mention that the spiffy-appearing models are absolutely worthless for betting, I could sell it as the greatest thing since sliced bread? Perhaps as a black box app for hobbyists who expect to lose anyway? Nah. Life is too short for such nonsense.

End of small sample demo for Maywood.