Testing Overlays - Page 6 - Horse Racing Forum - PaceAdvantage.Com

JerryBoyle · 04-06-2018, 01:47 PM

Quote:

Originally Posted by acorn54

to determine if a model exceeds randomness, it must exceed the critical z score of the null hypothesis for the model you are working on.
when you have such a model, assuming you ever do get to that point of development, you use it in real time going forward to see if it continues to exceed randomness, and of course if it produces a profit commensurate with the risk you are taking using said model. that is how you "test overlays"

What metric do you use and over what time horizon? E.g weekly roi, daily profit, etc?

acorn54 · 04-06-2018, 06:01 PM

Quote:

Originally Posted by JerryBoyle

What metric do you use and over what time horizon? E.g weekly roi, daily profit, etc?

there are statistical tables that give the critical score for various confidence levels. i highly suggest a statistical textbook that you can pick up at your local college for elaboration. as to the time horizon used and sample size. that is the purpose of using statistical tables for the certainty of whether what you are getting in your results exceed random occurance and the certainty (confidence level) that going forward the results will be repeated. as a caveat, there is no absolute certainty that whatever model you use, even if it exceeds randomness, is going to be a "sure thing", there will ALWAYS be chance of failure. the best that statisticians can achieve is MAYBE 90 percent certainty.

JerryBoyle · 04-06-2018, 06:43 PM

Quote:

Originally Posted by acorn54

there are statistical tables that give the critical score for various confidence levels. i highly suggest a statistical textbook that you can pick up at your local college for elaboration. as to the time horizon used and sample size. that is the purpose of using statistical tables for the certainty of whether what you are getting in your results exceed random occurance and the certainty (confidence level) that going forward the results will be repeated. as a caveat, there is no absolute certainty that whatever model you use, even if it exceeds randomness, is going to be a "sure thing", there will ALWAYS be chance of failure. the best that statisticians can achieve is MAYBE 90 percent certainty.

Sorry, didn't mean metric as far as z score. I meant what metric are you measuring the distribution of? E.g. profit, roi, etc?

acorn54 · 04-07-2018, 02:59 AM

Quote:

Originally Posted by JerryBoyle

Sorry, didn't mean metric as far as z score. I meant what metric are you measuring the distribution of? E.g. profit, roi, etc?

i understood what you meant. i keep the factors that i use to myself, however i am sure you will be able to find your own unique factors.

lansdale · 04-09-2018, 12:55 AM

Quote:

Originally Posted by JerryBoyle

Made some modifications and improved my model a bit. The attached chart includes the testing and validation ranges. Clearly the validation range is much better than the previous I posted. Results in general are also much better w.r.t draw downs.

There are man simplifying assumptions made in the backtest which would cause real results to likely be significantly worse. Not sure how many people on the forum are doing much modeling, but I'd be interesting in starting a chat about going from test results to actual live betting.

Not sure what you're doing here or if you've tested vs. holdout samples, but your growth chart here very much resembles that of a professional blackjack player, or, as I said, Bill Benter -- a random walk with an upward drift. If so, you're on your way to multi-millionaire status.

JerryBoyle · 04-09-2018, 11:15 AM

Quote:

Originally Posted by lansdale

Not sure what you're doing here or if you've tested vs. holdout samples, but your growth chart here very much resembles that of a professional blackjack player, or, as I said, Bill Benter -- a random walk with an upward drift. If so, you're on your way to multi-millionaire status.

That is my hold out sample, though rather than having one single training period with one single hold out period, I use a walk forward backtest. https://www.amibroker.com/guide/h_walkforward.html.

Unfortunately, there are some optimizations applied to speed up the process, so results in the backtest aren't entirely reproducible. For example, the effect of my bet size isn't accounted for in the final payoff. Using a relatively small bankroll and bet size partially accounts for this but still means real profits will be less than tested

lansdale · 04-10-2018, 12:45 AM

Quote:

Originally Posted by JerryBoyle

That is my hold out sample, though rather than having one single training period with one single hold out period, I use a walk forward backtest. https://www.amibroker.com/guide/h_walkforward.html.

Unfortunately, there are some optimizations applied to speed up the process, so results in the backtest aren't entirely reproducible. For example, the effect of my bet size isn't accounted for in the final payoff. Using a relatively small bankroll and bet size partially accounts for this but still means real profits will be less than tested

Since you're using what looks like financial models for this, I would assume you had some background in the areas of STEM or quantitative finance (neither of which I have), but what you've written is so confusing it suggests that you don't. It sounds like what you might be doing is using a machine learning/SVD program to create a handicapping model, which is fine. And you seem to realize that you shouldn't have started to simulate betting before you completed testing the model, which is true. But it also sounds like you don't completely understand the difference between a training (or in-sample) model and a hold-out (or out-of-sample) model. The hold-out model is 'always' less powerful (or has higher RSME) than the training (or optimization) model -- this is a given. The holdout model's results are the ones that count . So if you had really already validated your training model, you would be much more confident in your results. As a side comment your bet size or bank size are irrelevant.

I won't write more until I get a better idea of what you're doing, but I would say this much-- if the results you post are accurate, and the dataset does include ca. 3600 races, which works out to ca. 240 races per mo., your model has to be quite robust. In addition, although it was a mistake to start simulating betting, if you were using fully Kelly based on whatever edge your model was providing, that in itself can be considered a kind of validation -- the betting and handiicapping act as a check on each other. Put another way, if your model really sucked, full Kelly would have wiped out your bank in a flash.

I'm not averse to responding if you can elaborate further, but it would help if you could answer some of these questions. Last but not least, don't discuss the parameters, inputs, factors, variables (or whatever you want to call them) that you're using, with anyone, in case this model does turn out to be valid. I know people who have lost a lot of money that way.

JerryBoyle · 04-10-2018, 10:01 AM

Quote:

Originally Posted by lansdale

Since you're using what looks like financial models for this, I would assume you had some background in the areas of STEM or quantitative finance (neither of which I have), but what you've written is so confusing it suggests that you don't. It sounds like what you might be doing is using a machine learning/SVD program to create a handicapping model, which is fine. And you seem to realize that you shouldn't have started to simulate betting before you completed testing the model, which is true. But it also sounds like you don't completely understand the difference between a training (or in-sample) model and a hold-out (or out-of-sample) model. The hold-out model is 'always' less powerful (or has higher RSME) than the training (or optimization) model -- this is a given. The holdout model's results are the ones that count . So if you had really already validated your training model, you would be much more confident in your results. As a side comment your bet size or bank size are irrelevant.

I won't write more until I get a better idea of what you're doing, but I would say this much-- if the results you post are accurate, and the dataset does include ca. 3600 races, which works out to ca. 240 races per mo., your model has to be quite robust. In addition, although it was a mistake to start simulating betting, if you were using fully Kelly based on whatever edge your model was providing, that in itself can be considered a kind of validation -- the betting and handiicapping act as a check on each other. Put another way, if your model really sucked, full Kelly would have wiped out your bank in a flash.

I'm not averse to responding if you can elaborate further, but it would help if you could answer some of these questions. Last but not least, don't discuss the parameters, inputs, factors, variables (or whatever you want to call them) that you're using, with anyone, in case this model does turn out to be valid. I know people who have lost a lot of money that way.

Apologies for the confusion. I will try to explain the design of the backtest I run a little more clearly. I believe I'm reasonably comfortable with the ideas of in sample and out of sample forecasting, so it might just be difference in applying terminology which is causing our confusion.

Let's say my data goes from 2016-01-01 to 2018-01-01, covering 2 full years and let's saying 10,000 races after all filtering is done. So our total dataset is 10k races. The most familiar backtest, and what I think you're alluding to, would be to fit a model on one subset, for example the first 2k races, and then run a betting simulation on the hold out sample of 8k. I agree that performance in the hold out sample (whether performance is measured via a betting simulation or some measurement of prediction error) will be worse than if we measured performance on the in sample data.

Rather than doing the above, I apply a walk forward backtest. This is done as follows. Given our sample space of 10k races, I take say the first 2k races and train my model on those 2k. I then run a betting simulation on a relatively small out of sample portion following the 2k (let's say 500 races from race 2001 to 2500). I stash the results from those 500. I then shift the 2k training races forward by 500 races, dropping the earliest 500 and adding the original out of sample 500, so the next training window is on races 500 to 2500. I then run a betting simulation on the next 500 races (in this case races 2501 to 3000). And so on and so on until I'm out of data. The result is that I end up training many models with many corresponding 500 count, non overlapping out of sample tests that I then stitch together to form the graph I attached earlier. The motivation for doing this is that I don't believe the model coefficients for horse racing remain stable throughout a year or over years.

*A note about the betting simulation I run on the out of sample races: I do use full kelly; however, I do not simulate reinvesting profits or losses after each race. Rather, at the start of each race I assume a $1000 bankroll. So the cumulative profit graph will never flatline, but if it dips far below $1000, I would have effectively gone broke.

lansdale · 04-11-2018, 01:28 AM

Quote:

Originally Posted by JerryBoyle

Apologies for the confusion. I will try to explain the design of the backtest I run a little more clearly. I believe I'm reasonably comfortable with the ideas of in sample and out of sample forecasting, so it might just be difference in applying terminology which is causing our confusion.

Let's say my data goes from 2016-01-01 to 2018-01-01, covering 2 full years and let's saying 10,000 races after all filtering is done. So our total dataset is 10k races. The most familiar backtest, and what I think you're alluding to, would be to fit a model on one subset, for example the first 2k races, and then run a betting simulation on the hold out sample of 8k. I agree that performance in the hold out sample (whether performance is measured via a betting simulation or some measurement of prediction error) will be worse than if we measured performance on the in sample data.

Rather than doing the above, I apply a walk forward backtest. This is done as follows. Given our sample space of 10k races, I take say the first 2k races and train my model on those 2k. I then run a betting simulation on a relatively small out of sample portion following the 2k (let's say 500 races from race 2001 to 2500). I stash the results from those 500. I then shift the 2k training races forward by 500 races, dropping the earliest 500 and adding the original out of sample 500, so the next training window is on races 500 to 2500. I then run a betting simulation on the next 500 races (in this case races 2501 to 3000). And so on and so on until I'm out of data. The result is that I end up training many models with many corresponding 500 count, non overlapping out of sample tests that I then stitch together to form the graph I attached earlier. The motivation for doing this is that I don't believe the model coefficients for horse racing remain stable throughout a year or over years.

*A note about the betting simulation I run on the out of sample races: I do use full kelly; however, I do not simulate reinvesting profits or losses after each race. Rather, at the start of each race I assume a $1000 bankroll. So the cumulative profit graph will never flatline, but if it dips far below $1000, I would have effectively gone broke.

Okay -- I see my first instinct was right -- you do have some background in probablility, statistics, sampling, etc., which I do not, aside from what I've picked up informally, as a blackjack player, from those far more knowledgeable. So, I'm somewhat mystified by what seems to be your skepticism about your model. I'm not clear how you arrived the initial parameter estimation you used to derive Kelly bets -- possibly there was no intermediate step testing vs. final results, and instead you began using the simulated bets as part of your out-of-sample model right away. Whatever you did, it's obviously working very well.

What's most amazing, and I'm not sure you mentioned this before, is that you did not re-invest in the bank, therefore no bet-resizing, as would be normal. Even so you made 4,0000% in a little over a year -- what if you had reinvested and re-sized? This is an issue with blackjack teams --whether to double the bet-size when the bank doubles or whether to halve it, given a diminished bank.

The training model/validation model split seems to be consistent with common practice, and your sample size is more than large enough. If you're looking for further testing I know some people who use the K-fold for cross-validation, but not for racing data -- but your take on this surely better than mine.

You are the first person I've ever seen mention using dynamic coefficients, which I understand are used to model turbulence, but possibly this is what's really making the difference for your model. People talk about racing as being complex -- dynamic coefficients may be a solution to this problem.

So, if I were you, I'd start betting this with money you can afford to lose. If it continues to go well and you start betting more serious coin, I would think about contacting some of the people - like Benter - who have done this successfully. As far as this site goes, sjk is a very smart guy who has built a successful model who might be responsive.

JerryBoyle · 04-11-2018, 11:37 AM

Quote:

Originally Posted by lansdale

Okay -- I see my first instinct was right -- you do have some background in probablility, statistics, sampling, etc., which I do not, aside from what I've picked up informally, as a blackjack player, from those far more knowledgeable. So, I'm somewhat mystified by what seems to be your skepticism about your model. I'm not clear how you arrived the initial parameter estimation you used to derive Kelly bets -- possibly there was no intermediate step testing vs. final results, and instead you began using the simulated bets as part of your out-of-sample model right away. Whatever you did, it's obviously working very well.

What's most amazing, and I'm not sure you mentioned this before, is that you did not re-invest in the bank, therefore no bet-resizing, as would be normal. Even so you made 4,0000% in a little over a year -- what if you had reinvested and re-sized? This is an issue with blackjack teams --whether to double the bet-size when the bank doubles or whether to halve it, given a diminished bank.

The training model/validation model split seems to be consistent with common practice, and your sample size is more than large enough. If you're looking for further testing I know some people who use the K-fold for cross-validation, but not for racing data -- but your take on this surely better than mine.

You are the first person I've ever seen mention using dynamic coefficients, which I understand are used to model turbulence, but possibly this is what's really making the difference for your model. People talk about racing as being complex -- dynamic coefficients may be a solution to this problem.

So, if I were you, I'd start betting this with money you can afford to lose. If it continues to go well and you start betting more serious coin, I would think about contacting some of the people - like Benter - who have done this successfully. As far as this site goes, sjk is a very smart guy who has built a successful model who might be responsive.

Thanks for your thoughts landsdale. I've started betting it for very small amounts of money. So far so good, but I don't expect it to perform nearly as well as it has in the backtest.

Regarding the re investment. I suspect that the profit chart would exponentially increase until my wagers are restricted by pool size which would happen fairly quickly on smaller tracks

lansdale · 04-11-2018, 09:01 PM

Quote:

Originally Posted by JerryBoyle

Thanks for your thoughts landsdale. I've started betting it for very small amounts of money. So far so good, but I don't expect it to perform nearly as well as it has in the backtest.

Regarding the re investment. I suspect that the profit chart would exponentially increase until my wagers are restricted by pool size which would happen fairly quickly on smaller tracks

Cool. Re pool-size-limited betting, you may well already have an algorithm for this, but if not, you might want to take a look at Benter's comments on this subject, which will, hopefuly become increasingly relevant. He doesn't state the algorithm, but does refer to it in the endnotes to his well-known article.

BTW, just to correct a typo in my last post -- growth of bank per chart, ca. 4k% not 40k%. If you only get half of your model's projection going forward, which is common among those I know who do this, still would be very nice.

Best of luck.

lansdale

mikesal57 · 05-17-2018, 09:57 AM

Quote:

Originally Posted by Dave Schwartz

Generally, I do not play during the winter. Last November, just before Thanksgiving, I did a live play session (i.e. with an audience) and the racing was just horrible. My handicapping was not so hot, either.

I dedicated several weeks to figuring out just how different winter racing really is.

For over a decade I have used month-of-year in gathering races from the database to build a model of "races like this one." Specifically, +/- 2 months.

But this was different.

The first thing I determined was that racing is "different" from the week before Thanksgiving to around Jan. 21. The races are just far less predictable and/or do not match my handicapping approaches.

Ironically, many of the usual factor values - like recent speed ratings, for example - actually perform BETTER! But somehow the puzzle just doesn't fit together properly; the picture of the winner's circle just does not match the puzzle box.

I THINK... but have no proof of this... that it stems from the trainer patterns changing for the holidays. I THINK that the best trainers simply take time off from racing and hand the reigns over to assistants for about two months. This causes differences.

Very open to other ideas. BTW, it is not just weather because things change even in the warmer climates. (Not as much, but still different.)

Dave

I like to expand on this some more.....

I usually create a power number from the past year worth of data....
I make it with ALL tracks , not track specific.....
Someone asked me why dont I use the current last month or 2...I tested it with Gulfstream and found using the past year came out better....

My questions , Dave is:

Do you find the +/- 2 is better that whole year .....
or maybe +2/-1 ...?
track specific or all ?
class ?

I find that a handfull of factors usually prevail year after year , but the weights sometimes changes..

comments?

Thxs
Mike

BCOURTNEY · 05-17-2018, 04:59 PM

Quote:

Originally Posted by JerryBoyle

Thanks for your thoughts landsdale. I've started betting it for very small amounts of money. So far so good, but I don't expect it to perform nearly as well as it has in the backtest.

Regarding the re investment. I suspect that the profit chart would exponentially increase until my wagers are restricted by pool size which would happen fairly quickly on smaller tracks

.. next stop Hong Kong ..