PDA

View Full Version : Academic Research Project : seeking data


akgandhi
11-03-2005, 11:08 PM
Hello group :

John Swetye recommended I post my inquiry on this discussion board, and he in turn was recomended to me by William Ziemba.

I am a PhD student at the University of Chicago in Economics, and along with my collaborators at Columbia University, we have a new modelling approach to thinking about how partimutuel odds at horse tracks are competitively determined via a "supply=demand" type process. We also have a new way to conceptualize the relationship between odds in a race and the underlying rates of return on the horses.

I have a need for data on an many races as possible, possibly in the
60,000 plus range, where in each race I get to see the number of horses
running, the odds on each horse, and which horse wins.

I tried going through equibase to obtain this type of information, but
it is rather expensive - I was wondering whether anyone in this group would have data available of this nature. If so, would you be willing to share the data and/or collaborate on the project at hand? Thanks in advanece for any advice any of you could offer for how to proceed in carrying out this research.

Thanks,
Amit Gandhi

BillW
11-03-2005, 11:44 PM
Hello group :

John Swetye recommended I post my inquiry on this discussion board, and he in turn was recomended to me by William Ziemba.

I am a PhD student at the University of Chicago in Economics, and along with my collaborators at Columbia University, we have a new modelling approach to thinking about how partimutuel odds at horse tracks are competitively determined via a "supply=demand" type process. We also have a new way to conceptualize the relationship between odds in a race and the underlying rates of return on the horses.

I have a need for data on an many races as possible, possibly in the
60,000 plus range, where in each race I get to see the number of horses
running, the odds on each horse, and which horse wins.

I tried going through equibase to obtain this type of information, but
it is rather expensive - I was wondering whether anyone in this group would have data available of this nature. If so, would you be willing to share the data and/or collaborate on the project at hand? Thanks in advanece for any advice any of you could offer for how to proceed in carrying out this research.

Thanks,
Amit Gandhi


Almost all data available for racing in the USA has an origin with Equibase, which is copyrighted. It would be more than likely illegal for you to obtain this data from a free source. I would seriously recommend that you explore the impact this would have on any academic project before you proceed.

Bill

MichaelNunamaker
11-03-2005, 11:51 PM
Hi Amit,

I agree with Bill. All this data is copyright of Equibase and they get downright grumpy about people sharing it. However, anyone who had the data could run your analysis for you and give you the results without any issues. I'd be willing to do that if what you had in mind isn't too crazy in terms of my time. I have all North American T-bred data going back to 1996 so there are over half a million races with involving over four million horses.

What information would you like to see?

Mike Nunamaker

traynor
11-04-2005, 12:53 AM
akgandhi wrote: <I have a need for data on an many races as possible, possibly in the 60,000 plus range, where in each race I get to see the number of horses running, the odds on each horse, and which horse wins. >

With all due respect to Dr. Ziemba, I think you are looking at the wrong thing, if you are interested in some kind of positive result. I assume you are already aware of the concept of market inefficiencies resulting from "the principle of maximum confusion." In short, the odds of any given winner are dictated by the odds of the other winners, and the less certain bettors are in a given situation, the higher the average mutuels. If you do an academic search in peer-reviewed journals, you should turn up a stack of relevant studies.

As for your specific problem, it might be a lot simpler to find someone who can program in a scripting language (Perl, PHP, Python, even Visual Basic) and ask them to write a utility routine to parse the Daily Racing Form, newspaper, or whatever source file. Specifically, a number of sources are available that are free, and only require that you download and parse the data yourself.
Good Luck

DJofSD
11-04-2005, 12:59 AM
Almost all data available for racing in the USA has an origin with Equibase, which is copyrighted. It would be more than likely illegal for you to obtain this data from a free source. I would seriously recommend that you explore the impact this would have on any academic project before you proceed.

Why would the fair use clause not apply?

DJofSD

akgandhi
11-04-2005, 09:59 AM
Michael -
so if I understand you correct - if I gave you the code I built for running an analysis - it would be okay for you to run the code on your data and send me the output?? And what is the fair use clause?

RonTiller
11-04-2005, 10:13 AM
akgandhi,

Horse racing data analysis, for anything other than personal use (curiosity, betting), is a tricky area and not industry friendly at all. If you read the Terms of Use by Equibase and the DRF, it looks like telling the guy next to you in the racebook the program number of a horse or her Beyer speed figure is a copyright violation. Whether Fair Use applies is a matter for lawyers and I am not aware of any test cases in this industry.

Published statistical studies are now almost exclusively done by people associated with Equibase or the DRF or value added resellers like ourselves. If this study is to be published, I would guess the respective universities would have some vetting mechanism to make sure here are no legal minefields. It would tickle me to death to see a headline in Thoroughbred Times "Equibase sues University of Chicago and Columbia University".

My recommendation is to take Michael Nunamaker up on his offer, if it passes the legal sniff test. He is well known in this industry, a first rate programmer, knowledgable about statistics and horse racing, and has himself published a massive statistical survey, Modern Impact Values. Plus he has the data. We have been approached several times for research data like you requested and have had to turn down each request.

Ron Tiller
HDW

andicap
11-04-2005, 10:55 AM
So,
I could not be sued for sending him the result charts, but he could be for publishing whatever results he finds from them??

Interesting if that's the case that the racing industry is discouraging any form of original research on handicapping, That's awful.

Dave Schwartz
11-04-2005, 11:00 AM
Andy,

I could not be sued for sending him the result charts, but he could be for publishing whatever results he finds from them??


That is not what he said. WHat he said was more like, "You could analyze the result charts and give him the analysis but you could not give him the results charts."


Dave

MichaelNunamaker
11-04-2005, 12:47 PM
Hi Amit,

You wrote "so if I understand you correct - if I gave you the code I built for running an analysis - it would be okay for you to run the code on your data and send me the output??"

Yes, that is correct.

You also asked "And what is the fair use clause?"

It relates to copyright law. A great resource to learn about it is at http://fairuse.stanford.edu/

To me, the question of whether sending someone a database excerpt is fair use ot not is besides the point. I'm sure Equibase does not believe it is fair use. They might be right, they might be wrong. But I don't want to fight about it.

Mike Nunamaker

GameTheory
11-04-2005, 01:05 PM
Hi Amit,

You wrote "so if I understand you correct - if I gave you the code I built for running an analysis - it would be okay for you to run the code on your data and send me the output??"

Yes, that is correct.

You also asked "And what is the fair use clause?"

It relates to copyright law. A great resource to learn about it is at http://fairuse.stanford.edu/

To me, the question of whether sending someone a database excerpt is fair use ot not is besides the point. I'm sure Equibase does not believe it is fair use. They might be right, they might be wrong. But I don't want to fight about it.

Equibase would probably also say that you running the analysis and giving him the analysis is not allowed either -- a derivitive work. If you look at the terms of use closely creating a handicapping service where you sell picks is not allowed (another derivitive work), because you used the data to come up with the picks. Since they're never legally challenged I think they've come up with a policy that basically says, "Use of the data is not allowed."

twindouble
11-04-2005, 01:14 PM
I just can't help saying, when I read all that is posted here on computerized horse racing systems and stats that in my mind fundamentally has little to do with any race in hand, makes me feel like a lone border collie nipping at the heals of an out of control herd that may very well run off a cliff but who am I to say? Who would ever think Wall Street would come with a program that would prevent another October melt down when computer progams were selling off for no good reason? Those programs were built around historical statistics and trends but lacked human evaluation of the prevailing conditions.

Two things can happen, I can keep nipping at your heals to the point where I'm not wecomed or just be ignored because I can't contrubute anything realitive to the path the majority here are on.

I come here to socialize but it's not an enjoyable experience being the odd man out. Beside this place like I said before has products to sell and believe me if I had anyone on my crew knocking or questioning my product he would be sent down the road real quick.

This is no place for an old horse like me, I'm no John Henry or a computer programer but I do have current form, even so I doubt I'll ever run back to my best race on this forum running so wide. I will say this, anyone I incounter that's into modern day handicapping I'll highly recomend this place, it's loaded with that type of knowledge and a bunch of gentlemen to boot. The latter is a far cry from my experience on other forums.

Contrary to what TLG might say, this isn't a cry for attention, I just need a glove that fits and only I can make that determination. With all due respect I'll stop in now and then to see what materializes and post where I think the glove fits but I doubt that will happen very often from what I gather so far.

Good luck,

T.D.

GameTheory
11-04-2005, 01:43 PM
I just can't help saying, when I read all that is posted here on computerized horse racing systems and stats that in my mind fundamentally has little to do with any race in hand, makes me feel like a lone border collie nipping at the heals of an out of control herd that may very well run off a cliff but who am I to say? Who would ever think Wall Street would come with a program that would prevent another October melt down when computer progams were selling off for no good reason? Those programs were built around historical statistics and trends but lacked human evaluation of the prevailing conditions.

Two things can happen, I can keep nipping at your heals to the point where I'm not wecomed or just be ignored because I can't contrubute anything realitive to the path the majority here are on.

I come here to socialize but it's not an enjoyable experience being the odd man out. Beside this place like I said before has products to sell and believe me if I had anyone on my crew knocking or questioning my product he would be sent down the road real quick.

This is no place for an old horse like me, I'm no John Henry or a computer programer but I do have current form, even so I doubt I'll ever run back to my best race on this forum running so wide. I will say this, anyone I incounter that's into modern day handicapping I'll highly recomend this place, it's loaded with that type of knowledge and a bunch of gentlemen to boot. The latter is a far cry from my experience on other forums.

Contrary to what TLG might say, this isn't a cry for attention, I just need a glove that fits and only I can make that determination. With all due respect I'll stop in now and then to see what materializes and post where I think the glove fits but I doubt that will happen very often from what I gather so far.

We see posts like this every once in a while, but I never understand them. Have you been "shunned" by someone here for not being a "modern computer handicapper"? If you feel you're an oddball, then I guarantee any topics you'd like to bring up will be greeting with enthusiam as people get tired of talking about the same old thing. Personally, I'd like to see more talk about trainer handicapping and how the business side of horse racing affects who wins the races. Far too little of that around here I think.

But I suggest we do that in another thread so this one doesn't get too off-track. The guy who wants to do the academic research probably isn't solely interested in handicapping and "coming up with a positive result" as is being suggested, but is probably interested in market dynamics in general and horse racing happens to be an excellent example of such markets. They should be fairly easy to study if it weren't for the RETARDATION OF THE INDUSTRY WITH ITS IDIOTIC STRANGEHOLD ON THE DATA. (It would tickle *me* to see the University of Chicago sue Equibase rather than the other way around.)

PaceAdvantage
11-04-2005, 02:10 PM
twindouble, I don't understand your post. This place has products to sell? We don't sell anything, except perhaps some advertising space. I don't endorse any of the products you may see advertised. The stuff comes through Google. If I see something patently absurd, I will block the ad.

Other than that, we don't sell a thing. Feel free to knock those who use a computer to HELP them handicap, but be warned, that is considered rude behavior, the same way if I knocked you and your method of handicapping, which of course, I would never do.

As for variety of topics, I think this board has loads of topics.....there is plenty of speed handicapping, pace handicapping, some trainer talk, some software talk....

Just look at the handicapping software section of the board, it's not the most popular place, so I don't really understand your belief that this board is dominated by those talking about playing this game with a computer. (But let's face it, if you didn't have a computer, you wouldn't be on this board...and it's only natural to want to SEE what might happen if one were to use the computer to assist in the handicapping process).

I agree with GameTheory though, I too would like to see some more trainer handicapping talk.....if you want to continue this, let's move to a new topic...I don't want to mess with akgandhi's thread here...

mcikey01
11-04-2005, 03:07 PM
took a look at the "fair use clause" website mentioned by Mike Nunamaker....

Here's a scenario:

a researcher publishes a study that tests the hypothesis that "class is a directly proportional function of the mean of the average odds /expected odds ratio of all horses" i.e the higher the mean A/E ratio of the field, the higher the class of the race.
The researcher randomly a number from 1 to 20 to designate the class of a race, does not ever identify the actual class of the race represented by its random number.
The researcher never identifies the individual tracks, racing dates or races covered in the study but gives the broad range of dates and broad geographical area where the tracks are located.
The researchers states the intention to show that class is extrinsic to race conditions, measure of relative speed or the dosage index of the entrants etc without having to factor these elements into the testing.
Could this scenario satisfy the "transformative nature" aspect of "fair use" guidelines cited in the website?

GameTheory
11-04-2005, 03:12 PM
Lots of research would be allowable under the fair use clause. The Equibase Terms of Service (that you supposedly agree to) are not about copyright law -- they are about contract law. In other words, Equibase is in effect saying (although they wouldn't admit it I'm sure), "Yes, we know you could use this data for legitimate purposes under the fair use clause without it being a copyright violation, but we don't want you doing it just the same. Therefore, we want you to bound by this separate 'license agreement' that states you won't do any of the stuff, fair use or not." So what we need is a legal opinion on whether their TOS holds any water, because it seems pretty clear that this type of research would be allowed under fair use....

twindouble
11-04-2005, 03:15 PM
"I agree with GameTheory though, I too would like to see some more trainer handicapping talk.....if you want to continue this, let's move to a new topic...I don't want to mess with akgandhi's thread here..."


PA. Feel free to move my post where you see fit, I won't be offended. I'll carry on from there. Thanks.


T.D.

HEY DUDE
11-04-2005, 03:32 PM
To bad you just couldn't share this data on Napster or Kazza like they do with music and video.

RonTiller
11-04-2005, 06:01 PM
I believe GameTheory has it exactly right, - all data downloaded from Equibase or one of their VARs is downoaded with a TOS (Terms of Service). The Equibase and DRF TOSs are quite elaborately written. The TOS on all of our websites reads: "You may use the information hereunder provided for personal purposes only. You may not re-sell this information or an analysis thereof, either directly or indirectly, nor may you re-distribute this information in any form without written permission from Equibase Company LLC."

I believe that a large data provider (not Equibase) has filed some lawsuits against individuals for reselling their data (albeit in a modified form) and I believe it was not a copyright issue but a contractual issue - violation of the TOS at a minimum.

Equibase has enforced their TOS, as far as I know, by cease and desist orders or the threat of a cease and desist order, against persons reselling data in violation of the TOS. Apparently, this is enough to deal with the issue. Equibase has also enforced their TOS against entities giving away handicapping data derived from their data. These matters get settled long before a lawsuit becomes necessary and in fact they sometimes get resolved by the person(s) geting a contract with Equibase to resell or distribute data.

I am no lawyer, nor do I work for Equibase, but I believe the copyright issue is not the driving force here. Hence the elaborate TOS agreements that one implicitly accepts by using the website and the data.

For an educational research project like this, who knows? We live in a country where the governments make so many activities illegal that we are all criminals - then people are selectively prosecuted on the whim of those in charge. Likewise, the TOS seems to make ANY use beyond personal handicapping a violation of the TOS, including presumably downloading a free pdf chart and emailing it to a couple of friends (redistribution of the data). Is Equibase gunning for THIS person, salivating at the thought of prosecution? No way.

That being said, akgandhi must find this thread quite bizarre and perhaps disappointing. The business world of horse data is very far removed from the much more open world of academia. But being from the University of Chicago, perhaps he has a greater than normal appreciation of free market capitalism and an understanding of how it can lead to all THIS!

Ron Tiller
HDW

Dave Schwartz
11-04-2005, 06:25 PM
Hey Dude,

To bad you just couldn't share this data on Napster or Kazza like they do with music and video.

You mean the two services that were deemed illegal in their original, "free" form?

<G>

Regards,
Dave Schwartz

MichaelNunamaker
11-04-2005, 07:06 PM
Hi Gametheory,

You wrote "Equibase would probably also say that you running the analysis and giving him the analysis is not allowed either -- a derivitive work. If you look at the terms of use closely creating a handicapping service where you sell picks is not allowed (another derivitive work), because you used the data to come up with the picks. Since they're never legally challenged I think they've come up with a policy that basically says, "Use of the data is not allowed.""

I'm sure they would not say that to me. This is really substantively not different than the analysis I did in my books, and there was never any objection to them. Furthermore, Trackmaster/Equibase and myself have an excellent relationship (at least I think so<G>) because of my selections that they resell. I'd be shocked if they objected to this. Indeed, I'll e-mail now and find out for sure.

Mike Nunamaker

GameTheory
11-04-2005, 07:26 PM
Hi Gametheory,

You wrote "Equibase would probably also say that you running the analysis and giving him the analysis is not allowed either -- a derivitive work. If you look at the terms of use closely creating a handicapping service where you sell picks is not allowed (another derivitive work), because you used the data to come up with the picks. Since they're never legally challenged I think they've come up with a policy that basically says, "Use of the data is not allowed.""

I'm sure they would not say that to me. This is really substantively not different than the analysis I did in my books, and there was never any objection to them. Furthermore, Trackmaster/Equibase and myself have an excellent relationship (at least I think so<G>) because of my selections that they resell. I'd be shocked if they objected to this. Indeed, I'll e-mail now and find out for sure.
I'm not saying they will necessarily object to you doing what you propose, but I bet they "reserve the right" to object on a case-by-case basis and you aren't going to get them to say, "Of course, anyone with a bunch of data like you have can provide all the analysis they want to someone doing research." They are much more likely to say, "Sure Michael, YOU go ahead and do that -- we're not going to fuss about it with YOU."

sjk
11-04-2005, 07:32 PM
Indeed it is well that Equibase has excellent relationships with someone because they are making permanent enemies elsewhere.

traynor
11-04-2005, 08:41 PM
RonTiller wrote: <For an educational research project like this, who knows? We live in a country where the governments make so many activities illegal that we are all criminals - then people are selectively prosecuted on the whim of those in charge. Likewise, the TOS seems to make ANY use beyond personal handicapping a violation of the TOS, including presumably downloading a free pdf chart and emailing it to a couple of friends (redistribution of the data). Is Equibase gunning for THIS person, salivating at the thought of prosecution? No way.>

There seems to be a LOT of confusion about copyright law. I am not an attorney, nor am I giving legal advice. The opinions expressed are only opinions, and are not intended to induce anyone to perform any acts that may or may not prove to be a violation of copyright law. Caveats so stated, my opinion is that Equibase would like everyone to believe they have a mortal lock on data. That is simply not true.

The data is "collected reality," not intellectual property. It is simply collected, just as telephone directories and business directories are collected. That specific collection--not the data itself--may be copyrighted. Specifically, I cannot cut-and-paste Equibase data, repackage, and re-sell it. To believe that I (or anyone else) cannot take that data, run it through another application, fundamentally change it, and then market it, is so silly I find it difficult to believe that anyone actually accepts it.

The issue is not copyright "rights." The issue is whether the data is created, or simply gathered. The test is, specifically, whether or not someone else could gather the same data. If it is possible for someone else to replicate the data, however difficult it may be to do so, that data is not privileged--only the format in which the data is delivered is privileged. Back to the original premise--if someone tweaks the data, it is THEIR data now, not Equibase's data. While this may violate some carefully crafted paragraph in a lease agreement, unless that provision is tested in court, it is pretty much wishful thinking.

If you are really interested, do some serious research on the topic of database sanctity, particularly in relation to directories, as opposed to "original data" that cannot be replicated or gathered by others. There is no glory in allowing someone to manipulate you into believing you cannot use their data for your own purposes--commercial or otherwise--because some provision of a lease agreement so states. Such provisions are the contractual equivalent of "liability limitations"--and almost as meaningless.

However, Equibase, or any other seller, may arguably be able to restrict your access to that data for violation of the lease terms. In which case you might have to obtain the same data from another source.
Good Luck

Dave Schwartz
11-04-2005, 08:57 PM
However, Equibase, or any other seller, may arguably be able to restrict your access to that data for violation of the lease terms. In which case you might have to obtain the same data from another source.

And what might that other source be?

Regards,
Dave Schwartz

john spencer
11-04-2005, 09:44 PM
Hi Amit, not sure if this is what you are looking for but one of the betting agencies in Australia have a CDRom available presenting in csv and excel format the starting price, results etc of each and every race that they cover over 12 month intervals. Although 90% Australian racing based, you might find some use from this CD for the purpose of your study . This is avaialble from www.nswtab.com.au (http://www.nswtab.com.au/) . Good luck with your research .

John

Jeff P
11-04-2005, 10:42 PM
And what might that other source be?
Let's just play devil's advocate for a minute and ask "What if?"

What if somebody was to get a hold of track video - maybe even homemade track video (there's been a lot of improvements recently in the world of digital cameras - they're cheap, small, reliable, and video quality on my $250.00 Minolta is surprisingly clear if that's any indication) - and do their own chart calling from careful analysis of said video. Points of call, beaten lengths, fractional splits (we all KNOW it's not rocket science right?) could be entered into a database. A software app could be written to retrieve today's entries from the database and create a standardized set of past performances. I'm also guessing there's enough brain power among those who regularly visit the PA site to get more than a smidgeon of value added info into those past performances. Track Variants, Adjusted Splits, Bias Info, Pace Figures, Speed Figures, Power Ratings, etc - there ARE some good ones around here.

What would that really involve? A handful of people each day whose job it is to capture races digitally at a handful of circuits we'd choose to follow? Then email said video files on to somebody else whose job it is to note points of call and determine fractional splits? Then that goes on to somebody else who keys it into a database along with odds and payoffs? Then one of us clicks a button on the software app which then assists in cranking out tomorrow's past performances.

Question: How far fetched of an idea is that?

Because in reality, from what I can see, that's basically what Equibase does.

-jp

.

akgandhi
11-04-2005, 10:51 PM
Group :

I want to thank you guys for an excellent discussion - I have learned alot about how data in this world works. Let me just follow up on an observation made by GameTheory (my area of research by the way), which is that there is ALOT of interesting economics involved in racetrack wagering markets.

One way to think about it is that racetrack is like a "one period" stock market - a financial market that has a start and an end. In normal financial markets, you don't quite get to see the "end" of the market (the stock market never quite ends). Racetracks provide an excellent grounds for understanding the forces that determine market prices (i.e. the odds) in a very ideal setting.

Issues of "efficiency", which many here are probably familiar, really only scratch the surface. There is the deeper issue of "equilibrium", which is essentially the idea that supply=demand in an appropriate sense for these markets. Things that seems like an "inefficiency", such as the favorite longshot bias, can be understood as the result of an equilibrium.

In any event - I am trying to "test" between two theories about investor behavior that are very hard to distinguish in the stock market, but I think the track is a testing ground where we can distinguish, because of the one period nature of things.

I look forward to participaing more on the board and learning more from you guys - the industry experts!

Thanks alot

Steve 'StatMan'
11-04-2005, 11:06 PM
For what it's worth - Equibase does have people on track, not just at the meets we care about, but appearently the bulk of the picyune places as well (there are some county fairs that aren't out there, esp. if there isn't any wagering going on). At least for the major markets, they do pay their trackpersons/chartcallers a living wage, and appearently they have health benefits, something few people in the racing industry outside of the top management levels have.

I've thought about the 'own' charts as well off the videos. Beaten lengths are currently just estimates anyway. The ones in the far back, out of the picture, would get missed. Frankly, if a horse is more than 15 lengths back, does one really need to know more than he's way way back at that point, and that if he does close, he closed a helluva lot? But incomplete data, and missing tracks might look 'schlocky'. I think it's something a limited group of users can understand and deal with, but if you ever wanted to take on paying customers, it'd need to be a lot better. Not sure how much people want to do for free and essentially give away, when industry people are getting paid with health benefits to do it. I do wish I had health benefits with my racing-related job (haven't 'needed' it yet, Thank God.) I bet most full-time handicappers wish they had health coverage too.

I've thought about getting a modern digital camera myself, but mainly for recording the horses on the track and examining and learning body language, eventually storing and retriving my evaluations to supplement the pps. We can get race replays, but doubt many have replays of the post parade and the warmups. Best one might do is to get a copy of the 'Day Tape', if availabe, but I know they don't sell them cheap, maybe $20-$50 a day I imaging, where available.

Remember what they say during baseball and football games, etc. "All pictures, descriptions and accounts of this game, may not be transmitted or rebroadcast without the express written consent of (league and team names)."

Tom
11-04-2005, 11:40 PM
So what is happening basically is EB is saying, "We can sell you our data, but we will have to kill you!"


What would we call an Equibase version of Napster....CRAPster? :D

Really hard to have any sympathy for EB.....no, make that impossible.

traynor
11-05-2005, 01:21 AM
Dave Schwartz wrote: <And what might that other source be?>

Not difficult to find your own; pick any decent programmer and they can probably write a utility app to parse the freely available data posted on various sites, or available in electronic format with a little digging. That is what we started with several years ago. Your question seems to beg an answer like, "Ta Dum! I have this spiffy ripoff of Equibase data." I do not, nor have I looked too deeply into what sources might be currently available. The conversion processes are so simple that I don't understand why anyone would want to pay a monthly fee for something he or she could parse and accumulate himself (or herself).

In blunt terms, the information provided by Equibase is too crude and imprecise for our purposes. We use our own videocameras to record races, so the entries can be digitally isolated in the data stream, and each entry's actual performance calculated in an almost infinite number of data points using (there is that evil word again ... the one that cause spasms of anguish in the faint of heart) calculus. Overkill? Of course it is. The upside is that we don't need to quibble about a few bucks of rebate here or there to show a profit, or pretend we are in some kind of battlefield environment with the other bettors. Does it work? Yes, quite well. Is it for sale? Absolutely not. We are bettors, not system sellers.
Good Luck

traynor
11-05-2005, 01:40 AM
akgandhi wrote: <Issues of "efficiency", which many here are probably familiar, really only scratch the surface. There is the deeper issue of "equilibrium", which is essentially the idea that supply=demand in an appropriate sense for these markets. Things that seems like an "inefficiency", such as the favorite longshot bias, can be understood as the result of an equilibrium.>

Now I'm curious. "Efficiency" is Ziemba's "argument," which at last look had died a natural death as "interesting but not especially useful." In fact, I spent several squirrely months betting huge sums of money to just about break even using the "Dr. Z Method" on a cute little Sharp pocket computer.

Equilibrium is a rather simplistic concept that argues that supply and demand will "equalize" at some point, because as supply increases, demand (and price)will decrease, and as supply decreases, demand (and price) will increase. How that could be applied to horse racing seems "a bit of a reach." I understand grand academic theory; I am in my last semester of an MBA program, on a short trip to a doctorate in my chosen field. Many of the concepts that seem to irresistibly compel instructors to "pass them on" to others have little substantiation outside the hallowed halls of academia. A great deal of what was considered almost writ in economics and business is being trashed on a daily basis because the models are obsolete.

That said, I applaud your innovation; I don't think many handicappers would consider equilibrium or market efficiencies applicable to predicting the winner of the 7th race at Aqueduct. I suspect they may be right. If you manage to put together a credible argument, please post a link to your paper on this site. I think many would be interested in the viewpoint, whether or not they agree with it.
Good Luck

GameTheory
11-05-2005, 02:13 AM
Dave Schwartz wrote: <And what might that other source be?>
Not difficult to find your own; pick any decent programmer and they can probably write a utility app to parse the freely available data posted on various sites, or available in electronic format with a little digging. That is what we started with several years ago. Your question seems to beg an answer like, "Ta Dum! I have this spiffy ripoff of Equibase data." I do not, nor have I looked too deeply into what sources might be currently available. The conversion processes are so simple that I don't understand why anyone would want to pay a monthly fee for something he or she could parse and accumulate himself (or herself).Any "freely available data" posted on the internet is Equibase data. BRIS, TSN, DRF, Trackmaster are all Equibase resellers. There is no other source for chart data except making them yourself...

traynor
11-05-2005, 04:16 AM
Game Theory wrote: <Any "freely available data" posted on the internet is Equibase data. BRIS, TSN, DRF, Trackmaster are all Equibase resellers. There is no other source for chart data except making them yourself...>

Understood. When that information is placed on the internet, it is freely available to anyone who wants to use it, or to tweak it and resell it, if so inclined. There is no obligation to comply with any "fine print" from Equibase unless the format and data are directly copied. Processing that data through another application and re-selling the data in another form is perfectly legitimate.
Good Luck

Rook
11-05-2005, 09:02 AM
Steve StatMan wrote:

"I bet most full-time handicappers wish they had health coverage too."

If they move to Canada, full time handicappers get health coverage for free and don't have to pay income or witholding tax for it. If they miss the warm weather, they can buy a place in the South and spend 5 months and 29 days there.

cj
11-05-2005, 09:16 AM
Question: How far fetched of an idea is that?

Because in reality, from what I can see, that's basically what Equibase does.

-jp

.

One thing that would be missing is information on first time starters and foreign shippers. Where does that info come from?

highnote
11-05-2005, 09:52 AM
One thing that would be missing is information on first time starters and foreign shippers. Where does that info come from?

Most foreign racing jurisdictions place their charts for free on the internet. I have never seen any terms of service clauses. However, since I don't read many foreign languages, I could be overlooking something.

Jeff P
11-05-2005, 10:20 AM
One thing that would be missing is information on first time starters and foreign shippers. Where does that info come from? Now that does present a problem.

Let's say for the sake of argument we start covering New York, Florida, Chicago, Kentucky, and Northern and Southern California only. After a few months of compiling data we are ready to go live with the project.

A horse shipping in from a track we don't cover is not going to be in our database. Perhaps we need to cover a few more tracks? A horse that has never raced before is not going to be in our database. Perhaps we need to have someone ask the trainer about the horse? A horse with a really long layoff is not going to be in our database, well at least not until we've been doing it for a while.

I still maintain that this project would be a doable thing if we really wanted to get it done. My vision here is based mainly on making use of the PPs ourselves and not trying to sell PPs to outsiders. That said, our own PPs could contain a set of very high quality numbers available nowhere else. The occasional odd shipper that comes in from MTH, DEL, PIM, or PHA and beats a quality field at SAR, well... when that happens it'll be egg on our faces for sure. :bang: Is that something most of us around here can live with?

-jp

.

DJofSD
11-05-2005, 11:17 AM
Regarding first time starters and foreign horses.

I've always thought that somewhere like the Jockey Club there was a list of horses registered by year of foaling along with either a name or the breeding. I don't remember all the rules but to run in a T'bred race the horse has to be registered with the JC. If some has real world knowledge of the JC rules perhaps they can fill us in. Get this list from the JC and you've got all the potential first time starters.

One of the projects on my list of things I'd eventually like to program would be a 'bot to scrape the results of the various web sites I've found for the names of the horses that have run in races over seas.

Between the two suggestions that should eliminate most of the "information gap".

DJofSD

twindouble
11-05-2005, 12:01 PM
PaceAdvantage;

"We see posts like this every once in a while, but I never understand them. Have you been "shunned" by someone here for not being a "modern computer handicapper"? If you feel you're an oddball, then I guarantee any topics you'd like to bring up will be greeting with enthusiam as people get tired of talking about the same old thing. Personally, I'd like to see more talk about trainer handicapping and how the business side of horse racing affects who wins the races. Far too little of that around here I think."

I know PA doesn't "Sell handicapping softwhere." I believe that was pointed out to me along the way. Correct me if I'm wrong but I think a good percentage of people here are deep into pace figures, Beyer figures, Stats and support in one way or another what's advertized here and elsewhere, the links that crop up now and then points in that direction. So, where I was coming from, I have very little in common because I use none of the above, with the exception of what I read in the DRF's past performaces excluding the Beyer figures. I rely on my knowledge of the track, the horses stabled there, jocks and "trainers."

I'm not so senative to think I'm being shunned, as a matter of fact I don't recall any post I've made where anyone said I wasn't making any sense. Maybe they are just being polite and that wouldn't suprise me. Here again correct me if I'm wrong, threads of this nature get a higher responce than most others, heck I can't even understand the language they use let alone give some input. Who wouldn't feel like the odd many out. LOL.

On the latter, I would have more questions than answers. I started in this game with just the DRF in hand and lived every facet of one track for 12 or 14 years or so, never looked beyond that one with the exception of trecking to other New England tracks now and then when the meet was finished. So, I have my limitations when it comes to historical knowledge, breeding, mathematical equations, scientific study or research. It's just not my language. Like I said, the glove don't fit.

As far as questions go on this topic, I feel the need to keep on subject so you won't delete or move what I said.

One that cropped up last night was, how much controll do the tracks have over their product when it comes to past performances?

Why would anyone use thousands of races to create stats that have nothing to do with the race in hand? Every race is different with different players.

Concidering the above, the tote reflects to some degree those different players baced on their past performances, new conditions and in some cases different distances along with young improving horses not withstanding changes in track conditions, jocks, equipment, form and troubled races? Plus other factors as we know.

I doubt that this study would produce anything of value. To me, distorted numbers and stats lead people in the wrong direction, just think of Wall Street, when that happens you can be lead anywhere and that spell disaster in my opinion.

Good Luck,

T.D.

GameTheory
11-05-2005, 12:30 PM
Game Theory wrote: <Any "freely available data" posted on the internet is Equibase data. BRIS, TSN, DRF, Trackmaster are all Equibase resellers. There is no other source for chart data except making them yourself...>

Understood. When that information is placed on the internet, it is freely available to anyone who wants to use it, or to tweak it and resell it, if so inclined. There is no obligation to comply with any "fine print" from Equibase unless the format and data are directly copied. Processing that data through another application and re-selling the data in another form is perfectly legitimate.
The problem is Equibase doesn't agree with you, will take steps to block you, or possibly even sue you. Even if you're right you've got to be prepared to defend that position with time, money, and lawyers if you plan to openly challenge their policies (by publicly reselling data, for instance).

RonTiller
11-05-2005, 12:56 PM
Equibase is NOT the only data collector in this industry. Witness:

A. Today's Racing Digest collects their own data for California tracks and publishes Today's Racing Digest using their own data, NOT Equibase's - just for California tracks though.

B. Ragozin's Sheets, a very popular high quality and high end product, do not rely on Equibase data. They have built up a decent sized infrastructure at a lot of tracks and they use their own data to bet with as well as publish and sell it. Sheets players can correct me, but I believe they time all the races themselves, and I have heard Ragozin speak on the superiority of their timing and anomaly correction procedures.

C. Logic Dictates has been selling trip notes for NY tracks for years. Their selling point is precisely that their trip notes are superior to the Equibase chart caller's notes. Since I have been in the business, I've seen several other altenative trip notes vendors, wholly unreliant on Equibase data.

D. There are numerous private workout reports that purport to deliver much more comprehensive and meaningful workout data than that provided by Equibase.

E. I have spoken with 2 individuals since I have been in the business who have done precisely what Jeff P has suggested as a thought experiment - extract their own PPs from video. In one case, back when both the DRF and Equibase were collecting data independently (and often coming up with significantly different beatn lengths and even positions!), he had some pretty interesting results on who was the better chart caller at his track. These individuals were private bettors and not selling or distributing their privately accumulated data.

Equibase was formed in the early 1990s to GET IN the data collection business, precisely as a way for tracks to be in control of their own data destinies, independent of the DRF (track programs compete with the DRF for sales at the track). The DRF, under new ownership in the late 1990s, decided to GET OUT of the data collection business and let Equibase deal with that mess. Any individual, group or private syndicate can jump in at any time and start collecting their own data. There needs to be a big payoff at the betting window though cause as a BUSINESS proposition it probably sucks. Of course tracks can always proibit video cameras, binoculars and laptops! And I suppose there are legal issues with distributing the analysis of video taken on the track's property. Sigh...

Ron Tiller
HDW

highnote
11-05-2005, 02:50 PM
Of course tracks can always proibit video cameras, binoculars and laptops! And I suppose there are legal issues with distributing the analysis of video taken on the track's property. Sigh...

Ron Tiller
HDW


Therein lies the rub. If at any time tracks or Equibase can shut you down then what is the point in trying to compete with a monopoly.

I have not been able to bet over the internet for about a month now because I am prohibited from doing so in Connecticut. I refuse to bet by phone with the monopolistic CT-OTB.

I gotta say, this has been liberating. Now, I don't even think about trying to start a company to compete with Equibase or DRF or gathering data that might be gotten in an illegal manner.

I figure it's the industry's loss. I never wanted to be a system seller, data seller or a tout -- just a bettor. There are a lot of inefficient markets in the world that can be exploited. I will find another one somewhere. Maybe even in an industry that welcomes the revenue I generate for them. It's a big world full of opportunities.

traynor
11-05-2005, 03:29 PM
Rook wrote: <If they move to Canada, full time handicappers get health coverage for free and don't have to pay income or witholding tax for it. If they miss the warm weather, they can buy a place in the South and spend 5 months and 29 days there.>

Exactly. That is why many of the "real" Sartin users (the ones who were actually winning) moved to either Canada or England, and why a whole schlonk of successful blackjack players followed them. For a professional, the overwhelming advantage for either place is the fact that winnings are not taxed as income.
Good Luck

thoroughbred
11-05-2005, 03:31 PM
In response to the original post in this thread, I was just wondering. Dr. Ziemba, together with two others, puplished "Efficiency of Racetrack Betting Markets." An excellent compendium of a few dozen papers by different researchers, addressing, among other things, the economics of horse race wagering. Those authors managed to obtain, I believe, data from thousands of races. So, couldn't the same methods to obtain data, whatever they were, be used again? And, also, couldn't the same data themselves, that were used before, be used again?

Maybe I'm misunderstanding something.

traynor
11-05-2005, 03:41 PM
Game Theory wrote: <The problem is Equibase doesn't agree with you, will take steps to block you, or possibly even sue you. Even if you're right you've got to be prepared to defend that position with time, money, and lawyers if you plan to openly challenge their policies (by publicly reselling data, for instance).>

There is an interesting situation going on in Venezuela. The U.S. has apparently refused to sell spare parts for F-16s, and blocked an attempt by Venezuela to obtain the same parts from Israel. Venezuela threatened to give the F-16s to China or Cuba, an act for which U.S. Ambassador William Brownfield said a few days ago Venezuela was "contractually forbidden" from doing. It will be interesting to see if the Venezuelan government considers that contractual obligation to be as impressive as the U.S. does.

How is that related? The idea that fine print legalese is compelling is a very sore issue with many U.S. corporations attempting to deal on playing field that is not heavily biased in their favor--particularly in regard to intellectual property and outsourced software development. As impressive as the "intent" of Equibase may be, it is restricted to the U.S. court system, if at all, and then only at considerable expense to Equibase. That same "intent" would be close to comical if the "defendant" happened to be a developer in St. Petersburg or Shanghai.

It is in the best interest of Equibase to perpetuate the mortal fear of being sued that grips most in the United States. That fear is largely a myth created to foster a state of fear and learned helplessness in the general public, or in prospective competitors.
Good Luck

traynor
11-05-2005, 03:51 PM
RonTiller wrote: <Any individual, group or private syndicate can jump in at any time and start collecting their own data. There needs to be a big payoff at the betting window though cause as a BUSINESS proposition it probably sucks.>

That depends whether you consider your business to be wagering or marketing. We got started after "becoming involved" with a group of trip handicappers in upstate New York (Yes, Tom, right there at Finger Lakes) who spotted from different place on the track, and used recorders to note their verbal descriptions of the race. Those descriptions were subsequently compared, edited, and placed in a file for future races. To say the least, they did well.

The obvious conclusion is that almost any restricted access data is more profitable than information that is widely disseminated; every user with access to the data diminishes the potential value of that data. The advantage is to those willing to exert the effort and time to gather proprietary information. It has worked well for the group I am associated with, and continues to work well with little sign of diminishing returns.
Good Luck

traynor
11-05-2005, 04:08 PM
swetyejohn wrote: <I gotta say, this has been liberating. Now, I don't even think about trying to start a company to compete with Equibase or DRF or gathering data that might be gotten in an illegal manner.>

Personal opinion--I don't think it would be especially profitable to attempt to compete with Equibase in scope. They prosper because of the extent of their product, not because of its accuracy or usefulness. That is, they crank out numbers and words that people have been led to believe is "all they need to win." If that were true, then anyone subscribing to their data, or the data of other sellers of the same basic information, would be rolling in winning tickets.

I already hear the comments about how the individual has to "use the data correctly." In reality, it is like trying to earn an income as a day trader in stocks based on the information published in a daily newspaper; if everyone has access to the information, it is generally not worth having in the first place. If you think otherwise, take a close look at Gartner's "briefs" for CEOs and CIOs in the IT industry; 2 to 10 page synopses by experts that are priced from $200 apiece to way up, and worth every penny.

That said, there is tremendous earning potential for a small group of bettors willing to gather their own information, for their own use, for their own wagering. That earning potential is in direct proportion to the restrictions on access to the information. That last part is critical; if someone thinks they can "compete" with Equibase as a disseminator, they had best have very deep pockets. Similarly, if the basic purpose of gathering the information is to sell it to others, rather than use it, I suspect that such a grand gesture would only take place after the seller had milked the cash cow dry, and wanted to squeeze a few more dollars out of something that was running in the red for wagering purposes.
Good Luck

highnote
11-05-2005, 04:17 PM
That last part is critical; if someone thinks they can "compete" with Equibase as a disseminator, they had best have very deep pockets.

Actually, I have much different plans on how to compete with Equibase and the other racetracks. If my investors and I can pull it off, a lot of handicappers will be happy, but I don't Equibase on the other tracks will be. It's a real longshot that it will come to fruition, but, like Trump says, "If you have to think, you might as well think big."

PaceAdvantage
11-05-2005, 09:59 PM
Why would anyone use thousands of races to create stats that have nothing to do with the race in hand? Every race is different with different players.

How do you know what stats they are creating, and how they may, or may not be related to the race at hand?

The beautiful thing about this hobby (or profession, for the lucky few) of ours is that it can be attacked from a multitude of angles. There is no ONE way to arrive at the answer of "who do ya like today?" There are hundreds and thousands of ways to arrive at that answer, from the most basic, to the most complicated.

Every race is different, with different players, but there are always constants involved. They may be subtle, but they are there. Why would you want to wholly dismiss the work (or the future work) of a man when you don't even know what direction his studies might take him? It truly is the death knell of a horseplayer to become complacent with his methods, even if those methods are long-term successful. ALWAYS be searching for greater and newer edges! Then there will always be a reason to get up in the morning!

twindouble
11-05-2005, 11:09 PM
How do you know what stats they are creating, and how they may, or may not be related to the race at hand?

The beautiful thing about this hobby (or profession, for the lucky few) of ours is that it can be attacked from a multitude of angles. There is no ONE way to arrive at the answer of "who do ya like today?" There are hundreds and thousands of ways to arrive at that answer, from the most basic, to the most complicated.

Every race is different, with different players, but there are always constants involved. They may be subtle, but they are there. Why would you want to wholly dismiss the work (or the future work) of a man when you don't even know what direction his studies might take him? It truly is the death knell of a horseplayer to become complacent with his methods, even if those methods are long-term successful. ALWAYS be searching for greater and newer edges! Then there will always be a reason to get up in the morning!

Pace; That was good, hits home with anyone who is still truly alive and not ready to hang it up. Well, I hope to think I haven't reached that point. I'm not one to argue a point just for the sake of agrument but I did say I'll see how things materialize. Besides as a contractor, I'm no longer using the same tools I used thirty years ago, even though those tools could still do a good job but not as accurate or as efficient. So, your point is well taken.

I don't know, I still like the feel of driving a nail home rather than pull a trigger. :cool:

Thanks,

T.D.

cosmicway
11-05-2005, 11:17 PM
You seem to dwelve upon the copyright issue a great deal.
In my opinion there are a few types of potential misuse:

1 - the data are not for publication but somehow came to your possession
2 - pretend the data are yours whereas they belong to a company
3 - publish unauthorised copies
4 - publish the data in publications other than those approved by the originator
5 - use the data for research / other material purpose

(1)-(2)-(3)-(4) are straightforward but (5) is a dark area of the law and
in my opinion no one has the right to stop you if you give proper credits to the source.

What the original poster seems to be looking at -if I understood properly- is to correlate the market sizes with the level of difficulty of the races.

To do that you can define the difficulty level of a race with a formula like:

Q = Exp ( Sum ( p x log (p) )

where p are horses win probabilities based on the odds:

p = constant / (odds + 1)

So you can tell at which values of "Q" the market peaks, and you can do the
same trick for quinellas-forecasts-trios a.s.o.
That's a likely approximation at any rate.
I have the prices for old races but not the markets on a race by race basis
and there was talk of the race course wanting to carry out such a study but may be it has petered out. I did not see any results published.

The popular belief is that the races with the highest level of difficulty are those wagered most, but I don't know if that's 100% true.

traynor
11-06-2005, 12:05 AM
swetyejohn wrote: <Actually, I have much different plans on how to compete with Equibase and the other racetracks. If my investors and I can pull it off, a lot of handicappers will be happy, but I don't Equibase on the other tracks will be.>

About 20 years ago Jim Selvidge (best known for arguing with anyone who disagreed with him and being married to Trillis Parker, who produced one of the very best equine body language videotapes made) tried to organize a group to do body language, inspection, and trip handicapping, take notes, which would then be made available to others in the group for other tracks. In short, to place 3 or 4 trained observers at each of what he considered to be the important tracks for wagering purposes.

It was a great idea, incredibly poorly organized; there was a lot of smoke, mirrors, and promises, but he just couldn't put it together. There is a LOT of opportunity in that area.

I don't know exactly what you have in mind, but my argument was based on a full-blown replication of Equibase--every race, every day, at every track. For our own use, we concentrate on very few tracks, which makes it manageable (or at least creates the illusion of being manageable). For just a half dozen tracks, the work is monumental, unrelenting, and close to overwhelming. The upside is that it is also very profitable, so it is time and effort well-spent.
Good Luck

traynor
11-06-2005, 12:15 AM
cosmicway wrote: <To do that you can define the difficulty level of a race with a formula like:

Q = Exp ( Sum ( p x log (p) )

where p are horses win probabilities based on the odds:

p = constant / (odds + 1)

So you can tell at which values of "Q" the market peaks, and you can do the
same trick for quinellas-forecasts-trios a.s.o.
That's a likely approximation at any rate.>

I am missing something here. If you define the win probabilities as a factor of the odds, you are arguing that the mutuel odds are "accurate" in predicting the outcome of the race. A lot of research disagrees.

As far as "confusing races" creating larger mutuel pools, that is not quite accurate; the average mutuels are higher (bets spread over more entries), but the mutuel pools are usually smaller.

One last point. Shouldn't the exponent go to the right of the parentheses? I assume that is the meaning of exp.
Good Luck

traynor
11-06-2005, 12:25 AM
PaceAdvantage wrote: <The beautiful thing about this hobby (or profession, for the lucky few) of ours is that it can be attacked from a multitude of angles. There is no ONE way to arrive at the answer of "who do ya like today?" There are hundreds and thousands of ways to arrive at that answer, from the most basic, to the most complicated.>

Well said. It is also clearly the way to gain an advantage; by looking for new solutions and new ways to define the problems. There is an old saying in NLP, "If you always do what you have always done, all you will ever get is what you have always gotten." That is a rather inelegant way of expressing the view that new ideas--even ones that seem totally toasted at first blush--can provide an advantage. That advantage changes on an almost hourly basis, requiring continually new methods, new approaches, new insights, and new ideas to stay ahead. Personally, I wouldn't want it any other way. The people who are looking for The Answer that will let them stop thinking and just make money month after month are in the wrong field.
Good Luck

cosmicway
11-06-2005, 12:54 AM
traynor says
I am missing something here. If you define the win probabilities as a factor of the odds, you are arguing that the mutuel odds are "accurate" in predicting the outcome of the race. A lot of research disagrees.

As far as "confusing races" creating larger mutuel pools, that is not quite accurate; the average mutuels are higher (bets spread over more entries), but the mutuel pools are usually smaller.

One last point. Shouldn't the exponent go to the right of the parentheses? I assume that is the meaning of exp.
Good Luck

The mutuel odds suffice for a batch job relating to the market research project described here.

Can you improve upon the mutuel odds ?

To test such a hypothesis you have to compute something like

I = Exp ( sum ( log (p) ) / N )

where p = probabilities from mutuel odds , N = number of races

This is "posterior information" whereas the formula before was "prior information" (nb. the outer bracket is missing in the formula two posts above).

Can you say that I(my method) > I(mutuel) ?

It's a difficult task for a modelled to achieve, so if you do you deserve high marks.

Overlay
11-06-2005, 12:55 AM
It truly is the death knell of a horseplayer to become complacent with his methods, even if those methods are long-term successful. ALWAYS be searching for greater and newer edges! Then there will always be a reason to get up in the morning!

I'm not saying that horseplayers shouldn't be open to revising their thinking and handicapping approach based on new ideas and information. However, it also seems to me that there's something to be said from an effectiveness standpoint for having a certain degree of stability in one's handicapping model, and for staying away from a "pick-the-winner" orientation that leads to the endless kind of cycle you note, where the player is always having to search for new angles and edges as former ones get overbet and lose their profitability. And by the time the player has enough data to detect a new variable that wagers can confidently be based on, it, too, is on its way out through overplay. I know I get a lot more satisfaction and enjoyment out of the game now from sticking with a variety of fundamental factors that have retained their validity as predictors of actual performance over time (considered apart from their pari-mutuel effectiveness), and using them to examine every horse's chance of winning so I can find value, than I used to when I was always scrambling from one variable to another looking for a way to stay ahead of the public in narrowing a field down by elimination to the one horse that was likeliest to win today.

GameTheory
11-06-2005, 01:07 PM
About 20 years ago Jim Selvidge (best known for arguing with anyone who disagreed with him...Pot. Kettle. Black.

I am missing something here. If you define the win probabilities as a factor of the odds, you are arguing that the mutuel odds are "accurate" in predicting the outcome of the race. A lot of research disagrees.He is using an information-theoretic formula there that is measuring the amount of "entropy" as given by the public odds. (Or something close to it, as entropy uses a log2 base and measures information in bits. Entropy = -sum(p*log2(p)) where p is a vector of probabilities summing to 1.) The less spread in the odds (spread meaning the highs of the highs and the lows of the lows) the more confused you might say the public is. Entropy is a quantification of the amount of confusion.

[BTW, "exp" is the exponential function, or antilog, i.e. exp(log(x)) = x]

twindouble
11-06-2005, 02:53 PM
Pot. Kettle. Black.

He is using an information-theoretic formula there that is measuring the amount of "entropy" as given by the public odds. (Or something close to it, as entropy uses a log2 base and measures information in bits. Entropy = -sum(p*log2(p)) where p is a vector of probabilities summing to 1.) The less spread in the odds (spread meaning the highs of the highs and the lows of the lows) the more confused you might say the public is. Entropy is a quantification of the amount of confusion.

[BTW, "exp" is the exponential function, or antilog, i.e. exp(log(x)) = x]

Can you determin to any degree how confused I am at this point. :confused:

Are you saying confusion determins the odds in some way? To me it's just the lack of knowledge of the game. Ask anyone why they made any type of wager and you'll get answers with no confusion like, it's my lucky number, I like the jock, I'm betting a long shot in hopes he wins, I got a tip on the horse, the horse has good breeding, it's my wifes birthday numbers, I always bet the outside horses at this distance, the horse had trouble last out, he's dropping in class, he's an overlay, I can go on and on with three or 4 paragraphs, that would very well include some experienced handicappers. Most are unproductive angles or wishfull thinking, gambling in other words.

Take the guy who boxes the 5 longest shots on the board in all the supers he plays, $120 play for a buck, that's his "system of wagering". You or I would come out of our skin when he takes down half or all the pool along the way, he's far from being confused, maybe after a few drinks celebrating. LOL. Take the guy that hit that monster pick 6 last year, if I recall right he did quick picks, to top it off he lost the ticket and a teller found it, lucky him. How the heck can anyone make sense of the tote or the public in general. I can't after many years.

Good luck,

T.D.

GameTheory
11-06-2005, 03:10 PM
Can you determin to any degree how confused I am at this point. :confused:

Are you saying confusion determins the odds in some way? To me it's just the lack of knowledge of the game. Ask anyone why they made any type of wager and you'll get answers with no confusion like, it's my lucky number, I like the jock, I'm betting a long shot in hopes he wins, I got a tip on the horse, the horse has good breeding, it's my wifes birthday numbers, I always bet the outside horses at this distance, the horse had trouble last out, he's dropping in class, he's an overlay, I can go on and on with three or 4 paragraphs, that would very well include some experienced handicappers. Most are unproductive angles or wishfull thinking, gambling in other words.

Take the guy who boxes the 5 longest shots on the board in all the supers he plays, $120 play for a buck, that's his "system of wagering". You or I would come out of our skin when he takes down half or all the pool along the way, he's far from being confused, maybe after a few drinks celebrating. LOL. Take the guy that hit that monster pick 6 last year, if I recall right he did quick picks, to top it off he lost the ticket and a teller found it, lucky him. How the heck can anyone make sense of the tote or the public in general. I can't after many years.
In this case, we're treating the public as a single entity, and the odds (which we convert to probabilities) are that entity's assesments of the chances of winning of each horse, no different than if you or I assigned a probabilitity to each horse. The bit about confusion means that if the public can't pick any standout favorites, they are more confused than if they could -- they are confused about who will win. If they make one horse 1/9 they are not confused at all -- they are pretty sure that horse is going to win.

So in a 5 horse race, if they assign probabilities like this:

Horse A => 20%
Horse B => 20%
Horse C => 20%
Horse D => 20%
Horse E => 20%

Then that is the most confused they could possibly get since they aren't giving any horse any edge over any other horse. Whereas the most "unconfused" they could possibly get is to bet all the money on one horse and none on the others, giving a horse 100% and the rest 0%. It doesn't mean the public is right, it just shows how confident they are. The entropy formula mentioned earlier is a single-number quantification of this confidence/confusion. Users of the HTR program have a rating called the "Volatility Index" which does the same thing (although I think it uses the morning line). Races with high entropy values have higher average mutuels than races with lower entropy values (which makes sense because high entropy races don't have strong favorites), and are good races for betting on longshots...

twindouble
11-06-2005, 03:38 PM
In this case, we're treating the public as a single entity, and the odds (which we convert to probabilities) are that entity's assesments of the chances of winning of each horse, no different than if you or I assigned a probabilitity to each horse. The bit about confusion means that if the public can't pick any standout favorites, they are more confused than if they could -- they are confused about who will win. If they make one horse 1/9 they are not confused at all -- they are pretty sure that horse is going to win.

So in a 5 horse race, if they assign probabilities like this:

Horse A => 20%
Horse B => 20%
Horse C => 20%
Horse D => 20%
Horse E => 20%

Then that is the most confused they could possibly get since they aren't giving any horse any edge over any other horse. Whereas the most "unconfused" they could possibly get is to bet all the money on one horse and none on the others, giving a horse 100% and the rest 0%. It doesn't mean the public is right, it just shows how confident they are. The entropy formula mentioned earlier is a single-number quantification of this confidence/confusion. Users of the HTR program have a rating called the "Volatility Index" which does the same thing (although I think it uses the morning line). Races with high entropy values have higher average mutuels than races with lower entropy values (which makes sense because high entropy races don't have strong favorites), and are good races for betting on longshots...

Wouldn't it just be easyer to look at the tote to determin the public's confidence level in the race? That and when it comes right down to it, isn't your conficence level the most important part when it comes to wagering? I'm not picking here, just tring to understand what these studies are all about, in other words where's the silver bullet? If you say there isn't any, I'll get off your case. :)

Thanks,

T.D.

GameTheory
11-06-2005, 03:53 PM
Wouldn't it just be easyer to look at the tote to determin the public's confidence level in the race? That and when it comes right down to it, isn't your conficence level the most important part when it comes to wagering? I'm not picking here, just tring to understand what these studies are all about, in other words where's the silver bullet? If you say there isn't any, I'll get off your case. :)
We are looking at the tote. That's where the probabilities come from. And the point of "all these studies" is not necessarily to "pick winners", but simply to gain a greater understanding of how the horse racing market works, how financial markets work in general, psychology of the masses, etc. That may be helpful in making money betting on horses, or it may not. But as science, it is science for the sake of greater knowledge and understanding.

twindouble
11-06-2005, 04:36 PM
We are looking at the tote. That's where the probabilities come from. And the point of "all these studies" is not necessarily to "pick winners", but simply to gain a greater understanding of how the horse racing market works, how financial markets work in general, psychology of the masses, etc. That may be helpful in making money betting on horses, or it may not. But as science, it is science for the sake of greater knowledge and understanding.

Ok, thanks. I'll wait for the final thesis to posted. :) If that's the right word. LOL

T.D.

BillW
11-06-2005, 04:52 PM
About the 45th post I was starting to get concerned that this thread was drifting off topic. Thankfully I was wrong. :rolleyes:


Amit, I hope you got your question answered.

Bill

cosmicway
11-06-2005, 05:39 PM
It's a process of trial and error.
There are just TOO MANY folks wagering on any particular race horse on any day and most of them are regulars and have long time experience.
If you address the same question (i.e. who will win the race) to people who have a scant knowledge of things they 'll give stupid answers but not so in the racecourse.
Information theory helps you to appreciate the level of difficulty of a race but also to eveluate the potential of any given prediction model.
By adding "angles" to your model you increase the information - if first you make sure that the angle works and it carries information as opposed to background noise. For instance if you add a disadvantage factor to the outside draw in a steep track, it's an angle - but you have to measure the extent to which it works that way.

MichaelNunamaker
11-08-2005, 02:08 PM
Hi Gametheory,

You wrote "They are much more likely to say, "Sure Michael, YOU go ahead and do that -- we're not going to fuss about it with YOU.""

That's what I thought as well. I got an answer today. It was, do not do it. So, I guess I won't be doing the research, and Amit's best option is probably the Australian data that another member mentioned.

Mike Nunamaker

GameTheory
11-08-2005, 02:19 PM
That's what I thought as well. I got an answer today. It was, do not do it.
Reason given?

MichaelNunamaker
11-08-2005, 02:45 PM
Hi Gametheory,

The reason they gave was essentially that they want researchers to pay for their own data.

Mike Nunamaker

rokitman
11-08-2005, 02:49 PM
Ridiculous. Somebody has to do something about this monopoly.

Tonight, I add an "Off-Shore Data Source" to my prayers.

GameTheory
11-08-2005, 02:58 PM
Add it to the list:

Academic Research Studies -- not allowed!


Equibase should be SPONSORING this type of research.

akgandhi
11-08-2005, 03:14 PM
Thanks everyone for all of your input -

My hunch is that the trackmaster people feel that they alredy understand every feature of their data that interests them (which is the sense I got when I phoned the CEO last week), and so have nothing to gain from being part of a non-profit research effort (not to mention losing the sale). Of course I would vehemently disagree with the the idea that all features of their data are well understood, but their main empirical interests seem to be taking ex-ante race information (information about a race that comes from the entry data and excludes odds), and producing optimal betting strategies. However even the way they pursue this from a statistical point of view, which the CEO shared a bit with me on the phone, is not as far as one can go, and my research suggests a number of other strategies (including a nonparametric take on the prediction problem.)

In any event, I can smell a non-competitive industry, and there seem to exist significant profits that would accrue to a new entrant in the trackdata market - an invitation to all of the entrepenuers in the forum.

Thanks to all,
Amit

GameTheory
11-08-2005, 03:56 PM
If Nunamaker or other who already possessed such data (i.e. paid for) were to become a co-researcher (or the primary researcher) in a study (at least as far as published credit -- who actually did what behind the scenes would be "unknown"), wouldn't that in effect fit the bill of having the researcher pay for his own data? In other words, instead of transferring the data from non-researcher (who has the data) to the researcher (that doesn't have the data); why not just switch researchers?

Of course, I suppose then the powers that be would say you're not allowed to publish the results. But Nunamaker has already published his own massive study and no one stopped him. (But then that study predated Equibase, didn't it? Would an update of MODERN IMPACT VALUES actually not be allowed without permission from Equibase?) So if Nunamaker were to get his name on this new study?

Excuse me Michael for abusing your name this way, just using you as a convenient example...

akgandhi
11-08-2005, 04:12 PM
Game Theory - good point. If you purchased the data under a license agreement that did not cover academic publishing, then the sense I got from them is that in order to get an official permission to publish would require a new license agreement, which I am quite sure Trackmaster would charge to obtain (hopefully not the same amount to obtain the data in the first place). I wonder if that would pass the legal smell test.

OTM Al
11-08-2005, 04:22 PM
Hey Amit. I was an Econ PhD student at NYU a few years back who never quite finished that last paper to get the degree. I've been playing around with doing some work on horse racing markets for a while as my advisor still says all I need to do is finish that paper.... anyway a lot of these guys have bigger and probably more intricate dbs than I do but I do have one that has between 30 and 40,000 races in it. Wouldn't mind helping out. I'm sure you've also gone through the literature pretty well, but I do have several papers off JSTOR as well.

GameTheory
11-08-2005, 04:26 PM
Game Theory - good point. If you purchased the data under a license agreement that did not cover academic publishing, then the sense I got from them is that in order to get an official permission to publish would require a new license agreement, which I am quite sure Trackmaster would charge to obtain (hopefully not the same amount to obtain the data in the first place). I wonder if that would pass the legal smell test.The thing is, many people have loads of data that wasn't PURCHASED at all, but wasn't stolen either. They simply archived free charts as they were posted on the internet. Equibase does have a restrictive "Terms of Use" posted on their site about use of this data, but is such a thing legally binding when no money has exchanged hands and most people probably aren't even aware of these Terms of Use? Is data acquired before they posted this (back when it just had a copyright notice) then fair game to use under normal fair use copyright law? (If so, then data from charts saved from 2000-2003 or so would be fair game.) I don't think half of what they do passes the legal smell test, but then I'm not a lawyer so I can't say. Sure seems shaky though -- imagine if I went into the drug store and bought a can of Coke. They guy at the counter says to me, "You are only allowed to drink this out of the can. No glass may be used." If I go home and pour it into a glass, have I broken my "license agreement" for this can of Coke? Are they going to sue me now? That's basically what it seems Equibase is saying -- they want to place whatever arbitrary restrictions they want on something that otherwise wouldn't have such restrictions...

akgandhi
11-08-2005, 04:39 PM
The thing is, many people have loads of data that wasn't PURCHASED at all, but wasn't stolen either. They simply archived free charts as they were posted on the internet. Equibase does have a restrictive "Terms of Use" posted on their site about use of this data, but is such a thing legally binding when no money has exchanged hands and most people probably aren't even aware of these Terms of Use? Is data acquired before they posted this (back when it just had a copyright notice) then fair game to use under normal fair use copyright law?

Oh I see - I was wondering how people here could afford all this data that trackmaster wants to charge me tens of thousands to obtain (which is just a tad bit over my student stipend).

You ask an extremely interesting question - and I would love to understand the answer. Like I said before - this is a shockingly high margin business. In contrast, Financial data sets constitute a very competitive business, and for garden variety numbers, they are virtually free. The only data that is getting somewhat costly in this arena is transactional data - so if you want to see every trade and traded price on all equity, bond, and option securities (which is millions of observations per week), then some data companies are starting to charge a premium for this information.

cosmicway
11-08-2005, 05:34 PM
I don't understand.
If I copy down the list of "all night pharmacists" from my newspaper and later use it to conduct some research, am I breaking the law ?
Why did they ever publish it then ?
They have a case only if I cut and paste it in a rival journal.

GameTheory
11-08-2005, 08:14 PM
Oh I see - I was wondering how people here could afford all this data that trackmaster wants to charge me tens of thousands to obtain (which is just a tad bit over my student stipend).

You ask an extremely interesting question - and I would love to understand the answer. Like I said before - this is a shockingly high margin business. In contrast, Financial data sets constitute a very competitive business, and for garden variety numbers, they are virtually free. The only data that is getting somewhat costly in this arena is transactional data - so if you want to see every trade and traded price on all equity, bond, and option securities (which is millions of observations per week), then some data companies are starting to charge a premium for this information.Most people do actually purchase data, just not as a big lump. I pay $140 a month for data, for instance. Month by month, year by year, and eventually you'll have a big database.

But the fact that they want to charge high prices for OLD data -- this is not just something that keeps you from doing an academic study -- this is something that keeps people from getting into horse racing at all. People that otherwise would be inclined to bet on horses -- to give the industry their money on a regular basis, possibly for the rest of the lives, take up football betting instead. Because if you decide to take up horse racing and want to do some research when you get started, you find it is going to cost you thousands of dollars just to do some research on past races. Any other arena (including the stock market, as you say), historical data is basically free. Horse racing data is free in many other parts of the world. If you want to do your study on Hong Kong racing, for instance, I believe you might be able to get the data straight from the HK Jockey Club, and at little to no cost. (That used to be true, don't know if it is now.) Getting data for UK racing may be fairly easy as well. I have noticed that a high percentage of the studies on horse racing markets use UK data, so that probably is not a coincidence. (Ask John Swetye to introduce you to Nick Mordin.)

rokitman
11-08-2005, 09:04 PM
Try this site. http://www.flatstats.co.uk/

Take that Equibase! :rolleyes:

akgandhi
11-08-2005, 09:23 PM
The economics of the UK markets are going to work a little different than US markets because you are comparing bookmaker odds to parimutuel odds. While similar - there are some important differences. Steven Levitt, the author of freakonomics, just wrote a paper detailing the differences. That said - I think the UK has the right idea about growing their markets, as GameTheory suggests, and providing access to data.

highnote
11-08-2005, 10:51 PM
Also, check out the website of the Hong Kong Jockey Club -- arguably one of the best, if not the best, racing websites in the world. Lots of good, free data, too.

Check out this link for the free entire lifetime past performance of the horse.

http://www.hkjc.com/english/racing/Horse.asp?HorseNo=C140

I assume every horse's lifetime past performance is available. I haven't checked, but it looks to be the case.

American racing is odd in that the executives who run the racetracks don't want their customers to be regular winners. They try to make it as hard as possible for anyone to win. Does anyone else find that strange?

MichaelNunamaker
11-08-2005, 11:02 PM
Hi Gametheory,

You wrote "If Nunamaker or other who already possessed such data (i.e. paid for) were to become a co-researcher (or the primary researcher) in a study (at least as far as published credit -- who actually did what behind the scenes would be "unknown"), wouldn't that in effect fit the bill of having the researcher pay for his own data? In other words, instead of transferring the data from non-researcher (who has the data) to the researcher (that doesn't have the data); why not just switch researchers?"

As I understand it, yes, then it would be OK.

I specifically asked about the books I've written and those are not only OK, they would be happy for me to do more of them.

Mike Nunamaker

DJofSD
11-08-2005, 11:10 PM
Very interesting site that Hong Kong page!

It never occured to me to actually post the photo of each point of call. Perhaps I didn't look hard enough but I didn't find any data for the points of call -- the split or the equivalent beaten lengths or fractional time for each horse.

Yes, it does seem strange to me that race track operators apparently are more interested in protecting a monopoly than they are supporting the betting public.

DJofSD

rokitman
11-08-2005, 11:26 PM
I like the very last note on that Hong Kong page. "Don't bet with illegal bookmakers."

Quite charming.

jfdinneen
11-09-2005, 10:09 AM
swetyejohn/DJofSD/rokitman,

HKJC Racing is the most handicapper-friendly racing in the world as well as the most intellectually challenging - 95% of all races are handicaps!

Here are some important links to the different kinds of information supplied for all races (Use IE to access all features of web site):



Sha Tin Race 08 09-Nov-2005 (http://www.hkjc.com/english/racing/startersR8_e.asp) - Starters List - customize details (including body weight)
- Horses menu contains trackwork and official veterinary records (including injuries, recovery dates, and gelding reports)
- Easy Form (web page may be slow to load) contains excellent graphics and tabular displays
- Speed Power contains last ten past performance lines including sectional times (Times recorded by horse over three sections of previous races. First section (varies in length according to distance of race) measured from start to 800m point, second section is from 800m to 400m, and third from 400m to finish. All sectional times adjusted proportionally by daily track variant.) and speed maps (graphic showing projected running position after 300m). Look, for example, at fitness report for 4. Lucky Sixer (Not fully fit; has done mainly gallop work; moving well but not fully fit.)
- Statistics (Draw, Trainer, and so on)
- Free race replays and points of call photos.
I have included the above details by way of encouraging other handicappers to support Hong Kong racing as, I believe, it is in our own best interests to support the kind of handicapper-friendly racing we would like to see in the US by voting with our investments - power of the ballot box, if you like!

By way of a challenge to the best handicappers on this forum, I propose the motion that "if we cannot take our handicapping expertise on the road to Europe, Asia, and Australia and use exactly the same fundamental, handicapping factors in all geographical locations then we are not expert handicappers, merely glorified apprentices (myself included)"

Best wishes,

John

DJofSD
11-09-2005, 08:24 PM
John,

Thanks for the additional information.
Ya, I'm an apprentice too. And too much dependent upon the traditional U.S. points of call/fractional time/beaten lengths style of data for my pace handicapping.

Considering that California racing is on one of the last swirls around the porcelin fixtures and soon to disappear from my RADAR screen, I'd love to be able to find competitive grass races with full fields. You will not find any of those west of the Mississippi river for the next couple of months. But I'm sure there'd be more to overcome than just the difference in time zones.

DJofSD

Tom
11-09-2005, 08:40 PM
Screw EB....give the data to the guy and let THEM prove it. Does anybody care if EB or DRF loses money? (They are NOT losing money by stiffling researrch - their crying the blues is BS in the first degree. Somebody put all the data on a CD and "lose" it in front of his house. Let the data whores prove it. How many DRF printed editions are thrown away everyday - anyone who picks one up from the trash is free to do with it what he pleases - there is NO user agreement. In fact, there is NO USER agreement to anypone who gets a file from somebody else. I buy one, give to andicap - HE is not bound by any user agreement.

These cheap bastards are really pathetic.

DJofSD
11-09-2005, 08:44 PM
Tom,

I don't know if they're data whores or data pimps.

Either way it's all about getting screwed.

DJofSD

rokitman
11-09-2005, 09:30 PM
Screw EB....give the data to the guy and let THEM prove it. Does anybody care if EB or DRF loses money? (They are NOT losing money by stiffling researrch - their crying the blues is BS in the first degree. Somebody put all the data on a CD and "lose" it in front of his house. Let the data whores prove it. How many DRF printed editions are thrown away everyday - anyone who picks one up from the trash is free to do with it what he pleases - there is NO user agreement. In fact, there is NO USER agreement to anypone who gets a file from somebody else. I buy one, give to andicap - HE is not bound by any user agreement.

These cheap bastards are really pathetic.

Here here, Tom! When did everyone get hired as Equibase lawyers around here?

This is WAR! Fire a salvo at those sonsabitches!!

You go first.

highnote
11-09-2005, 09:36 PM
Here here, Tom! When did everyone get hired as Equibase lawyers around here?

This is WAR! Fire a salvo at those sonsabitches!!

You go first.


These are the times that try men's souls. -- Thomas Paine

Trainer_is_Key
11-30-2005, 01:57 PM
Casinos will teach you how to play any game they offer for free. They have a statistical edge and are unafraid, card counters notwithstanding, of skilled players.
Horse tracks have no interest in which horse wins. Their money is made from the handle.
SO, why don't they give the equibase data away for free? Why aren't they doing everything in their power to encourage a bigger handle?

yak merchant
12-10-2005, 03:55 AM
Because we are all idiots. We can't do without their precious service, and we'll line up to give them our money. Or....we'll stop playing the horses and watch the sport of kings die a horrible death right before our eyes. I've already chosen option B.