Quote:
Originally Posted by lansdale
Since you're using what looks like financial models for this, I would assume you had some background in the areas of STEM or quantitative finance (neither of which I have), but what you've written is so confusing it suggests that you don't. It sounds like what you might be doing is using a machine learning/SVD program to create a handicapping model, which is fine. And you seem to realize that you shouldn't have started to simulate betting before you completed testing the model, which is true. But it also sounds like you don't completely understand the difference between a training (or in-sample) model and a hold-out (or out-of-sample) model. The hold-out model is 'always' less powerful (or has higher RSME) than the training (or optimization) model -- this is a given. The holdout model's results are the ones that count . So if you had really already validated your training model, you would be much more confident in your results. As a side comment your bet size or bank size are irrelevant.
I won't write more until I get a better idea of what you're doing, but I would say this much-- if the results you post are accurate, and the dataset does include ca. 3600 races, which works out to ca. 240 races per mo., your model has to be quite robust. In addition, although it was a mistake to start simulating betting, if you were using fully Kelly based on whatever edge your model was providing, that in itself can be considered a kind of validation -- the betting and handiicapping act as a check on each other. Put another way, if your model really sucked, full Kelly would have wiped out your bank in a flash.
I'm not averse to responding if you can elaborate further, but it would help if you could answer some of these questions. Last but not least, don't discuss the parameters, inputs, factors, variables (or whatever you want to call them) that you're using, with anyone, in case this model does turn out to be valid. I know people who have lost a lot of money that way.
|
Apologies for the confusion. I will try to explain the design of the backtest I run a little more clearly. I believe I'm reasonably comfortable with the ideas of in sample and out of sample forecasting, so it might just be difference in applying terminology which is causing our confusion.
Let's say my data goes from 2016-01-01 to 2018-01-01, covering 2 full years and let's saying 10,000 races after all filtering is done. So our total dataset is 10k races. The most familiar backtest, and what I think you're alluding to, would be to fit a model on one subset, for example the first 2k races, and then run a betting simulation on the hold out sample of 8k. I agree that performance in the hold out sample (whether performance is measured via a betting simulation or some measurement of prediction error) will be worse than if we measured performance on the in sample data.
Rather than doing the above, I apply a walk forward backtest. This is done as follows. Given our sample space of 10k races, I take say the first 2k races and train my model on those 2k. I then run a betting simulation on a relatively small out of sample portion following the 2k (let's say 500 races from race 2001 to 2500). I stash the results from those 500. I then shift the 2k training races forward by 500 races, dropping the earliest 500 and adding the original out of sample 500, so the next training window is on races 500 to 2500. I then run a betting simulation on the next 500 races (in this case races 2501 to 3000). And so on and so on until I'm out of data. The result is that I end up training many models with many corresponding 500 count, non overlapping out of sample tests that I then stitch together to form the graph I attached earlier. The motivation for doing this is that I don't believe the model coefficients for horse racing remain stable throughout a year or over years.
*A note about the betting simulation I run on the out of sample races: I do use full kelly; however, I do not simulate reinvesting profits or losses after each race. Rather, at the start of each race I assume a $1000 bankroll. So the cumulative profit graph will never flatline, but if it dips far below $1000, I would have effectively gone broke.