PDA

View Full Version : A general question about data analysis.


vnams
04-04-2012, 04:20 PM
First, I discovered this forum just last week. It's great to find other people interested in what I'm interested in.

I am writing for thoughts on data analysis.

There are pluses and minuses to using standard statistical methods tools to analyse racing data. The plus is that there are a lot of very good general tools that smart people have figured out. The minus is that racing data has structure that is difficult to implement. Specifically, there are races, and there are individual horses within each race.

For those of you who are using statistical methods, my question is, what are you considering to be one sample? One horse in one race, or one race?

The difficulties are, that if you consider it to be one horse, then you also need to somehow indicate how well this horse has done in the past compared to other horses in this race.

But if you consider it to be one race, then it is difficult to know how to include variables for each individual horse. For example, if you include 5 past performance variables for each horse, and you have 10 horses per race then you have 50 past performance variables in each sample. And, what order to you place the variables in?

Any thoughts?

pondman
04-04-2012, 06:26 PM
Any thoughts?

I think you are heading up the wrong road, at least from my point of view. To be able to single out a horse in a race, the horse first needs a set of variables that you will give you a 40% confidence level. At that point, you can decide, given the many pools available, whether there is enough advantage to cover the 60% unknown. I recommend finding the variable, which indicate the connections are serious enough about the horse, and forget trying to analyze a race as a whole, unless you are a mad (as in crazy) superfecta player.

I don't think the structure of racing is evident until you look at the aggregate. And at that point you begin asking odd questions, such as "How many horses does this movie star own? How many win? And how can I make a profit from it? And did they just fire their trainer and want to win at all cost?

When you do it the way you are describing, and the variable are present for everyone to see, the result with be:

Hot performance * Hot Trainer * Hot Jockey = Poor bet.

bob60566
04-04-2012, 06:31 PM
I think you are heading up the wrong road, at least from my point of view. To be able to single out a horse in a race, the horse first needs a set of variables that you will give you a 40% confidence level. At that point, you can decide, given the many pools available, whether there is enough advantage to cover the 60% unknown. I recommend finding the variable, which indicate the connections are serious enough about the horse, and forget trying to analyze a race as a whole, unless you are a mad (as in crazy) superfecta player.

I don't think the structure of racing is evident until you look at the aggregate. And at that point you begin asking odd questions, such as "How many horses does this movie star own? How many win? And how can I make a profit from it? And did they just fire their trainer and want to win at all cost?

When you do it the way you are describing, and the variable are present for everyone to see, the result with be:

Hot performance * Hot Trainer * Hot Jockey = Poor bet.

Have fun and the effort to find the variables as i am win bettor most time only you can find them as it your money that is on the line

Mac:)

InControlX
04-05-2012, 11:47 AM
Vnams...

What you're describing is the trade-off between spot play searches (look for a good preparation pattern in a single entry) and handicapping methods (look for a ranking scale between entries in a race). I believe most database horseplayers use both to varying degrees. Even the best spot plays warrant an examination of the competition, and even the best handicapping rankings must be aware of special entrants.

Two suggestions which might help in your quest...

1. Don't redboard yourself. If you find a correlation of parameters tests profitable in a range of race dates, test it on another season or two of data before believing it. It is very easy and in fact mathematically predictable to get many false-positive correlations of both spot plays and ranking methods.

2. You will be tempted to "tweak" parameter correlations to maximize tested win percentages or ROI. Be careful to maintain parameter minimum resolutions in your trial ranges, example: If split times need to be carved down to 1/100th second to qualify a correlation, there is no play.

ICX

pondman
04-05-2012, 12:29 PM
I believe most database horseplayers use both to varying degrees. Even the best spot plays warrant an examination of the competition, and even the best handicapping rankings must be aware of special entrants.




I use neither rankings or am at all concerned about the competition. When the connections place their horse into the right spot and are serious about winning, then I swing. I'm more than willing to lose on +50% of my bets if the margins are large. Fortunately for me, I'm able to play the large margins between what the crowd thinks (which is usually speed), and how a horse is trained and prepared in real life to win a race. If you are going to survive very long as a handicapper, you've got to find methods which are not obvious. The margins are too small on the horses with stellar performances.You are better off looking for improvements-- such as on young horses, or horses with a recent problem. However, for both of these there isn't any easy way to formulate an instant pizza in a cup rating system. You are not going to find this in a black box. You've got to have the data, experience, and a reasonable understanding of gambling to make this type of wager. The purpose of my data is to distil the game into profitable, high margin wagers, not to handicap 1000s of races.

AITrader
04-05-2012, 12:59 PM
Most of the factors I have found to be significant work out to identifying general characteristics that make certain horses stand out from others and then creating a mathematical formula that quantifies this and makes it known to the statistical methods. Each horse is unique, just as each person is. Some people like cotton candy, some don't. If I were analyzing a cotton candy eating contest I would guess at what factors might lead a person to be successful in that endeavor. Then I would create a formula to quantify these factors one by one in light of the data I have available. Then test each factor singly against a set of known contests and note the factors that show significiance for successful versus unsuccessful participants.

Once I have identified a set of factors that are significant, I need to put them together and adjust them to work successfully as a set with the statistical methods I use.

As to your question about horse versus race, I treat a race as a unit during training. Each of the horses are competing against the others in the current race. That being said, some factors are best emphasized 'within-race' while others are best within the set of horses as a whole, though often by class, winnings, etc. (Specifically I am speaking to in-race data normalization or normalization within the entire set of horses, or by class, by winnings, etc).

A good paper that touches on this is "Identifying winners of competitive events: A SVM-based classification model for horserace prediction" by Lessmann, Sung, and Johnson. I haven't found SVM's to be useful as the authors of the paper claimed, but the data normalization has been.

vnams
04-05-2012, 09:35 PM
A good paper that touches on this is "Identifying winners of competitive events: A SVM-based classification model for horserace prediction" by Lessmann, Sung, and Johnson. I haven't found SVM's to be useful as the authors of the paper claimed, but the data normalization has been.

Thanks - looking at this paper has lead me to others in a similar vein. I hadn't heard of these kinds of techniques before, that combine both within-race and between horse comparisons. I had been trying to cobble together such a comparison in a crude way before (e.g. for each horse, having variables that relate that past performance to the other horses' past performances, such as a ranking, or difference from the best, etc). I may be back with questions after I do some reading and figuring.

InControlX
04-09-2012, 04:38 PM
I use neither rankings or am at all concerned about the competition. When the connections place their horse into the right spot and are serious about winning, then I swing. I'm more than willing to lose on +50% of my bets if the margins are large. Fortunately for me, I'm able to play the large margins between what the crowd thinks (which is usually speed), and how a horse is trained and prepared in real life to win a race. If you are going to survive very long as a handicapper, you've got to find methods which are not obvious. The margins are too small on the horses with stellar performances.You are better off looking for improvements-- such as on young horses, or horses with a recent problem. However, for both of these there isn't any easy way to formulate an instant pizza in a cup rating system. You are not going to find this in a black box. You've got to have the data, experience, and a reasonable understanding of gambling to make this type of wager. The purpose of my data is to distil the game into profitable, high margin wagers, not to handicap 1000s of races.

Maybe you got a shock from a computer once?

You apparently have an edge by means of shrewd observation of "connections" which tell you "when they place their horse into the right spot" and yield a 50% winning percentage with "large margins". If you've achieved that you've hands-down beaten the game. Congrats. However, most of us aren't as well connected, psychic, related, or so gifted to have any such inside information. A few other corrections: 1000's of races aren't handicapped, 100,000's are. With all entries at all tracks automatically scanned each day in about 10 minutes without getting ink on our thumbs. The working element isn't "an instant pizza in a cup", but preparation patterns refined and researched for the past ten years. "Obvious" horses are not sought, ROI is. Yes, we care about competitive entries and constantly work on ranking methods.

ICX