PDA

View Full Version : Equibase chart parser


osophy_junkie
11-23-2005, 10:55 PM
For those interested I have written a parser for the equibase PDF charts. It can be found at: http://lamedomain.net/horses/chartparser/chartparse.zip. The chart parsing can take up to a minute on a Athlon 1.3 Ghz. During early testing I have seen it get hung up on charts and never stop running. If it runs for more than a couple of minutes, there was an error parsing the chart and you will need to kill the program.

This program converts the PDF to text and then parsers the text for relivent information. The text file I get from the PDF is very difficult to parse due to the way PDF files are formated. This program has known bugs and will not parse all charts correctly.

Running it consists of unzipping the zip file and then double clicking parser.exe, click the "Convert PDF" button, choose a PDF file, then click Open. During this time it will parse the PDF and output a CSV file with the same filename and a "csv" extension.

I have only tested it on Windows XP and with the box I developed it on. Please let me know if you have any problems running it. NOTE: The user uses this program at their own risk and is responsible for verifing the data is correct before making any handicapping decisions using it.

Enjoy!
Ed

Brian Flewwelling
11-24-2005, 05:53 AM
I get an error running parser.exe

"libglib-2.0-0.dll cannot be found"

headhawg
11-24-2005, 08:38 AM
Me too. Using Windows XP.

douglasw32
11-24-2005, 08:49 AM
yep me three

nomadpat
11-24-2005, 10:08 AM
Same error here and I'm on 2000.

DJofSD
11-24-2005, 10:59 AM
Dudes, try this link. (http://www.dlldump.com/download-dll-files.php/dllfiles/L/libglib-2.0-0.dll)

Good luck. Hopefully there won't be any more after this.

DJofSD
11-24-2005, 11:31 AM
Not lucky.

Now it's looking for iconv.dll.

DJofSD
11-24-2005, 11:48 AM
It goes on and on.

After the iconv.dll it complains about something it found in my Interbase (Borland data base) folder.

So much for turkey day computer fun.

Tom
11-24-2005, 12:44 PM
It goes on and on.

After the iconv.dll it complains about something it found in my Interbase (Borland data base) folder.

So much for turkey day computer fun.

Go parse the turkey! :D

rokitman
11-24-2005, 01:52 PM
It worked on the turkey. Getting lotsa errors on the pies.

osophy_junkie
11-24-2005, 08:57 PM
The runtime files were suppose to be included in the zip file. Apparantly they are not or I added them incorrectly. I am out of town and do not have access to a computer that I test with, so it's just a guess.

The runtime files can be found at http://prdownloads.sourceforge.net/gladewin32/gtk-win32-devel-2.8.6-rc3.exe?download

douglasw32
11-24-2005, 10:28 PM
Genius...not sure what to do with it....some chart person let me know, but it worked...opened in excell....WOW!!! thanks for sharing :jump:

rokitman
11-25-2005, 10:41 AM
Hey Doug, did you download the whole file that link above wants to send? Or did you do it one dll at a time?

douglasw32
11-25-2005, 08:12 PM
Downloaded the file above 2nd link in the post...
Installed the RUNTIME LIBRARYS the whole setup.

Then downloaded the parser, first link over again, unzipped it and it ran fine.

:)

highnote
11-28-2005, 12:02 AM
I know it's asking alot, but it would be great if someone would write a parser that is open source. That way the racing community could add to it. Perhaps the originator of the code would act as a moderator and approve of any worthwhile additions to the code.

Just a thought.

osophy_junkie
11-28-2005, 12:55 PM
The source code has been released[1]. It is written in pyhton[2] and includes a test suite which use twisted[3] trial. The base directory is the original Unix version. winport/ contains changes that make it run under windows, a modified pdftotext.exe[4] binary, and setup files to generate the exe. Even if you don't know python, the regex and call handeling code should be interesting or benificial.

Ed

[1] http://lamedomain.net/horses/chartparser/
[2] http://python.org/
[3] http://twistedmatrix.com/
[4] http://www.foolabs.com/xpdf/

traynor
12-02-2005, 03:25 AM
This looks like a really interesting utility, but won't go. I am using XP on a Pentium. I downloaded the GTK development kit, that works fine, downloaded the recommended dll from the link listed, opened with Python, all I got was a blink from Python before it (Python) closed. Assuming it had installed the dll file, tried again, still no go. Double-clicking parser.exe still asks for the same missing dll it originally complained about. Suggestions would be greatly appreciated!
Thanks

douglasw32
12-02-2005, 08:09 AM
All I did to get it going on cp, is ran this first The runtime files can be found at http://prdownloads.sourceforge.net/...c3.exe?download

Then unzipped and clicked on the parser program as instructed too.

If that helps for the order?

highnote
12-02-2005, 08:27 AM
Suggestions would be greatly appreciated!
Thanks


Here's a suggestion:

subscribe to a data service. :D

traynor
12-02-2005, 03:40 PM
swetyejohn wrote: <Here's a suggestion: subscribe to a data service. :D >

That would be good advice if I could find a data service that provides the information I want (and need). Unfortunately, the data services available mainly pander to the lowest common denominator; simplistic computations with simplistic applications.

I think the purpose of text conversions is more for those inclined to do their own work, create their own information, and use their own applications. The value of data is in direct proportion to the availablilty of that data; if every kid on the block has access to the same data, all they will ever get is the same poor results that everyone massaging the same set of numbers gets.

This may seem shocking, but there is a wealth of information available (for a rather stiff price, quite likely beyond your means) that is substantially more useful than "data services" provide to anyone with a couple of bucks and a computer. It may be even more shocking that my interest in the indicated application has very little to do with information provided by data services for thoroughbred racing (or harness racing, for that matter).

Specifically, I have no specific interest in "pirating" anyone else's data files. That is not rampant honesty and ethical constraints on my part, but rather that I don't consider the information worth stealing, let alone worth a subscription.

In any event, thanks for the comment.
Good Luck

osophy_junkie
12-02-2005, 04:32 PM
I've removed the dependency on GTK and released a new version. If you were not have problems with the old version nothing has changed and you won't need to download this version. If you were having problems this should clear them up.

thanks for the interest!
Ed

traynor
12-02-2005, 11:51 PM
osophy_junkie wrote: <thanks for the interest!>

The thanks are to you, for avoiding the Lone Ranger philosophy and sharing your insights and abilities. Ultimately, it is to the benefit of all, including you.
Again, thank you, and good luck.

traynor
12-03-2005, 01:49 AM
For some obscure reason, the parser.exe file still hangs, pops an exception handler that says "LoadLibraryPythondll" failed. Any suggestions would be appreciated.
Thanks

highnote
12-03-2005, 03:34 AM
Traynor,
I was joking. I'm in your camp. This is an information game. The better and more unique your information the more profitable it should be.

I'd be interested in knowing what information is available that is more useful than the common data service providers, but is possibly beyond my means.

If there is data out there that can be purchased and also produces a profit over and above the cost of the data, then the data can not be beyond my means. If expensive data is available that does not help produce profits then I can not afford it because I don't like throwing my money down a rat hole.

(btw -- the rat hole line is a tribute to Dick Mitchell. He used that expression on occasion.)

Lastly, what information is available by parsing the charts yourself that is not available from the common data providers? Maybe the comments in the actual charts are more detailed? That's the only thing that comes to mind.

js


swetyejohn wrote: <Here's a suggestion: subscribe to a data service. :D >

That would be good advice if I could find a data service that provides the information I want (and need). Unfortunately, the data services available mainly pander to the lowest common denominator; simplistic computations with simplistic applications.

I think the purpose of text conversions is more for those inclined to do their own work, create their own information, and use their own applications. The value of data is in direct proportion to the availablilty of that data; if every kid on the block has access to the same data, all they will ever get is the same poor results that everyone massaging the same set of numbers gets.

This may seem shocking, but there is a wealth of information available (for a rather stiff price, quite likely beyond your means) that is substantially more useful than "data services" provide to anyone with a couple of bucks and a computer. It may be even more shocking that my interest in the indicated application has very little to do with information provided by data services for thoroughbred racing (or harness racing, for that matter).

Specifically, I have no specific interest in "pirating" anyone else's data files. That is not rampant honesty and ethical constraints on my part, but rather that I don't consider the information worth stealing, let alone worth a subscription.

In any event, thanks for the comment.
Good Luck

osophy_junkie
12-03-2005, 03:56 AM
For some obscure reason, the parser.exe file still hangs, pops an exception handler that says "LoadLibraryPythondll" failed.

What specific platform, OS version and architecture, are you using?

traynor
12-03-2005, 10:05 PM
swetyejohn wrote <If there is data out there that can be purchased and also produces a profit over and above the cost of the data, then the data can not be beyond my means. If expensive data is available that does not help produce profits then I can not afford it because I don't like throwing my money down a rat hole.>

"Cost" is a relative term; sometimes it is more in time and effort than dollars. Specifically, the information I use is generated by multiple sources, each "expert" in a specific category of analysis. The information is not sold for the reasons you mentioned; it is worth more when access is limited.

To "explain" the data would not be particularly useful to you. Not because it is too complex (most of it is not), but that it would require lengthy explanation that might not really explain it to you. Example; one aspect of our analysis uses "body language." The immediate response might be, "Oh, yeah, Joe Takach, Bonnie Ledbetter, Paul Mellos ... I know all about that stuff. I even watched one of Joe Takach's videos."

In our use, that aspect is handled by "coders," mostly graduate students who have been specifically trained in a coding discpline, commonly psychology or communication. They have a high degree of "intercoder reliability," which means that Coder A at Arlington grades a particular entry on specific dimensions of appearance and behavior that is within a few percentage points of how Coder B, using videotapes of the same entry, rates that entry at a given point in time.

It is essential that subjective opinion be kept to a minimum. There are specific criteria we use to evaluate physical appearance and behavior, and those are quantified. In short, each of the dimensions has a number associated with it. With practice, and training, coders can objectively evaluate entries and come up with essentially the same numbers. Not identical, but close.

The simplistic explanation is that we "look at the horses before betting," and use that inspection as a component of the final selection process. The "real" explanation is that unless you are wagering fairly large sums, it is too much effort. Most bettors are not particularly interested in exerting the effort necessary, and taking the time necessary, to do more than superficial analysis.

I could not possibly do alone what we do to analyze a race for wagering. It is simply too complex, and involves a number of approaches that are not as "easily" quantified as physical inspection. That is, they cannot be reduced to algorithms and automated. Very minor example; consider the difference in the output of the average software application when the criteria for pace line selection is "automated," or when individual pace lines are selected subjectively for each entry. Unless specific criteria are used consistently, the results cannot be modeled in any meaningful way. Each event is unique, and essentially an anoimaly.

I don't want to seem obscure, or to dust off your very reasonable questions. I want to emphasize that the "cost" of the ratings I use is in more than dollars. A lot of that cost is in time and effort.
Good luck

traynor
12-03-2005, 10:13 PM
I am using XP on a Pentium. I use Boa Constructor for WxPython, with Python 2.3, and Python 2.4 as standalone. The GTK popped up without any problem, but every time I click Parser.exe it bonks with an exception handler "LoadLibrary(pythondll). Tried the "missing dll" link you provided, but that doesn't seem to help much.
Thanks

highnote
12-04-2005, 12:43 AM
In our use, that aspect is handled by "coders," mostly graduate students who have been specifically trained in a coding discpline, commonly psychology or communication. The simplistic explanation is that we "look at the horses before betting," and use that inspection as a component of the final selection process. The "real" explanation is that unless you are wagering fairly large sums, it is too much effort. Most bettors are not particularly interested in exerting the effort necessary, and taking the time necessary, to do more than superficial analysis.

I could not possibly do alone what we do to analyze a race for wagering. It is simply too complex, and involves a number of approaches that are not as "easily" quantified as physical inspection.

I can relate. I used to spend every Saturday in the paddock looking at horses. I was trained by Nick Mordin. There is a tremendous edge in paddock handicapping and it is something that can be taught and two people who work together will come up with the same opinion frequently.

You're right. It is a lot of work. It is not how I wanted to spend my Saturdays once I started a family. Plus, unless we hired and trained other people we could only do one track at a time. We could have made a living, but not a fortune doing what we did. I didn't want to become a manager.

If I'm going to work for a living, I may as well do something I enjoy and make good wages. I enjoyed paddock handicapping and the wages were OK. But there are better opportunities for me outside of racing that are more conducive to family life.

Your approach sounds very interesting. Good luck, too.

js

osophy_junkie
12-04-2005, 11:42 PM
I am using XP on a Pentium. I use Boa Constructor for WxPython, with Python 2.3, and Python 2.4 as standalone. The GTK popped up without any problem, but every time I click Parser.exe it bonks with an exception handler "LoadLibrary(pythondll). Tried the "missing dll" link you provided, but that doesn't seem to help much.

Try downloading the source code and running parser.py.

Alan Wight
01-18-2006, 06:07 PM
Anyone get Ed's parser to work with the Equibase pdf's? If so, on what platform? Anyone revise it to work with html's?