PDA

View Full Version : Self introduction and data query


prank
04-29-2006, 06:34 PM
Thanks for reading and for any suggestions.

I'm a statistics PhD student (please, hold the jeers), and working on research in the field of rank prediction. My favorite example when explaining this to others is horse racing. Naturally, asset allocation (betting ;-)) is a follow-on issue when gambling or making decisions about the ranking, but I'm sticking to ranking for now.

I've been collecting data for a ranking project for this summer, from fields like computer science, sports, economics, marketing, entertainment, etc., but I'm at a loss to identify high dimensional data for horse racing (i.e. beyond just name, jockey, post, track, date, weather, and a few other variables - say 15 or so in total, but lacking breeding, prior performance, etc.). I've scouted sources like Equibase, etc., but I doubt that many places will provide academic access to their data. I also know that distributing their data is against the license agreements, so I'm at a loss to see if there is a way forward in developing and validating models with racing data.

So, maybe I seem like a fish out of water here. What do you think? I could see potentially working in collaboration or an internship with some company, and using my research with their data. I've done that in the past with several well known technology firms; and I worked in applied statistics for years (actually, mostly working on ranking :)) before returning to grad school. So, I've got a bit more experience than most random grad students.

If this is the wrong forum, board, or alley for this pursuit, I'm glad to hear of where else I might seek information. If seeking racing data is a bad idea, that's also good to know early.

Thank you very much for your time and assistance.

Prank (or pRank :cool: )

kenwoodallpromos
04-29-2006, 06:49 PM
2 of the best possible sources ofr data used to make rankings are members of theis forum- CJ, with past figures based on time and distance; and Dave, with his every-track- pars for track condition. maybe there is a way they can get you some past data.

sjk
04-29-2006, 07:52 PM
Some of us have numerous graduate degrees so whatever level of education you have apart from handicapping is probably not of much relevance. In fact, here you are swimming with the sharks.

Many of us enjoy a good discussion of handicapping from a scientific point of view. Bring it on.

prank
04-29-2006, 09:08 PM
Some of us have numerous graduate degrees so whatever level of education you have apart from handicapping is probably not of much relevance. In fact, here you are swimming with the sharks.

Many of us enjoy a good discussion of handicapping from a scientific point of view. Bring it on.

You shouldn't take anything I said as a threat. Instead, I'm focusing on research. As for who has what level of education, I'm not concerned with that, either.

So, do you have any constructive suggestions on data?

Cheers.

sjk
04-29-2006, 09:11 PM
Do you?

I don't need them.

Even so I enjoy a good discussion.

sjk
04-29-2006, 09:22 PM
Please bet as much as you can afford. I look forward to the competition.

prank
04-29-2006, 09:41 PM
Please bet as much as you can afford. I look forward to the competition.

I'm not sure where you're going with this.

You know what I'm looking for and my interests (also see my post over in multi logit regression, which is perhaps more appropriate for this thread).

I don't think either of us needs to compete. You're probably thinking of winning races, I'm thinking of research and data. The motivation is entirely different. Unless you're also working on a PhD in ranking. :D But then I'd rather have a couple of beers and discuss the topic. It is fascinating what people in different fields do. The CS guys do one thing, the econ guys another, it seems I've seen a few more methods here in racing, and so on.

Anyhow, the articles I read on Bill Benter and so on make me believe that betting in HK would make more sense than in the US, where there is too much variability (i.e. harder to get narrow confidence intervals on the ranking prediction) and possibly too little money wagered in each race (so, any meaningful wager can change the odds by too much). Hell, maybe I'm wrong about gambling, but I've got other things to spend a graduate research stipend on. http://paceadvantage.com/forum/images/smilies/47.gif Seriously, though, a research stipend sucks compared to getting done and getting out.

Anyhow, as I said, if I'm in the wrong place for discussion on data, let me know.

Best wishes.

prank

sjk
04-29-2006, 09:47 PM
Forget the competition angle.

My main premise is that ranking won't work for betting races. Happy to discuss it.

prank
04-29-2006, 09:58 PM
Forget the competition angle.

My main premise is that ranking won't work for betting races. Happy to discuss it.

Cheers. I agree that just ranking isn't enough - you do need probability estimates, etc. And of course the money isn't in the estimates but being smarter than the average bear about how to bet using your estimated probabilities. :) But, that's another project for another time or another person. I'll just try one thing for now. :) You guys can get rich, I'm just trying to get done. http://paceadvantage.com/forum/images/smilies/47.gif I have no interest in sticking around in grad school for longer than necessary.

Dave Schwartz
04-29-2006, 10:10 PM
Prank,

I am the "Dave" that KenWood was referring to. Alas, I must tell you that "data sharing" is highly frowned upon by the data providers.

But I do have a question for you... What, exactly, would a company that invited you in as an intern gain from such an arrangement?


Regards,
Dave Schwartz

prank
04-29-2006, 11:16 PM
Prank,

I am the "Dave" that KenWood was referring to. Alas, I must tell you that "data sharing" is highly frowned upon by the data providers.

But I do have a question for you... What, exactly, would a company that invited you in as an intern gain from such an arrangement?


Regards,
Dave Schwartz

Hi Dave,

Thanks for writing,

In this case, it would involve working on better ranking for the horses or at least better probability estimates or odds. I've already worked at and with a couple of well known internet and software firms to improve ranking in their systems. For them, this has led or could lead to better ranking in applications and sites used by hundreds of millions of users. In the past, I've worked on everything from experimental design to applications of many machine learning systems.

Now, my own work is more on developing new methodology, rather than just applying existing methods. The goal is always to beat what's out there - that's what drives the research forward.

Specifically for horse racing data, I am convinced that we can look at not only dozens of possible predictor variables, but can push things up to the many thousands of possible variables. My work also involves model selection - how to select the best models in a very high dimensional space. How would we get these features, from a set of dozens? By looking at various nonlinear combinations (assorted interactions; these can be simple polynomials to things like indicator functions & decision trees).

What would this mean to the company? It depends on the company:
- For companies selling tips: potentially more accurate predictions.

- For companies selling software: better modeling algorithms for their users

- For companies selling data: better evaluation of variables to collect (model selection); better indices that aggregate many dimensions (please forgive me - I'm trying to walk between the terms I see used in handicapping & terms used in statistics; the general idea here would be something akin to principal component analysis); and possibly the identification of anamolous data (I'm not sure if this comes up much in data collection for race histories, but it is a problem in many other fields, and errors & outliers can cause all kinds of problems for sensitive models).

So, these are general ideas, and I'd be happy to talk about specifics for particular situations and objectives. Because I have some specific research plans already lined up for this summer (I'm working on several other data sets from CS and economics), I'm not in a rush on this arrangement, but I am very open to such work in later semesters, as well.

Best wishes,

prank (Now I'm starting to feel foolish about that handle :) C'est la vie - life should be fun. If you're interested - it comes from a title of a paper I read awhile back: "Pranking with Ranking", by Yoram Singer and one of his students.)

AwolAtPA
04-30-2006, 12:15 AM
Sat 29 Apr 6

hi pRank,

consider looking at HTR data

With any handicapping software, you have the problem of learning the software and how to use the data. Well, HTR is no exception and has MANY choices to export data for research. Ken Massa, HTR developer, has nine (9) different exports and six different Pace Line modes to generate different data sets!! I have not looked at JeffP's or DaveS's software but expect you would have similar choices in your search for data.

The general export from HTR is Hx4 using Pace Line five (PL5). The Hx4 export has many fields which are the rank of different factors. You can download a sample and look at the Hx4 specification at www.htr2.com

If you are interested, then contact Ken about a week end trial during which you could download forty-five (45) days of data from HDW. After the trial, your data expense for two months of data could be 'averaged' to sixty dollars a month. That is, sign up, download, 45 days, sign off and wait for the second month! A simple, cost effective solution to not SHARING the data from another HTR user.

If you decide to check out HTR, then do not hestitate to email me when you encounter an issue. Or, post your question about other exports because there are other HTR users on this forum who would have experience with the other exports.

I enjoy research and spend most of my time searching for the combination of rankings that will define a good bet.

---aaah disclaimer--
oh, I am just a satisfied HTR user, ie, my personal gain is from sharing this data opportunity with you and am not connected to the HTR company.
------------------

good luck with your data search.

duane/Awol

Dave Schwartz
04-30-2006, 12:28 AM
Prank,

That was a good answer.

Check your PM.

Regards,
Dave Schwartz