Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > Handicapping Software


Reply
 
Thread Tools Rate Thread
Old 01-22-2012, 12:36 AM   #1
guckers
Registered User
 
Join Date: Sep 2011
Posts: 77
PDF Parser

I've been doing some homework on programming a parser and was wondering if anyone else has successfully parsed a PP in PDF (Equibase or DRF)?
guckers is offline   Reply With Quote Reply
Old 01-25-2012, 12:53 PM   #2
openhorse
Registered User
 
openhorse's Avatar
 
Join Date: Jul 2011
Location: Cardiff by the Sea
Posts: 65
initial research

By breaking the data out of the document model, you can get reliable results.

Regex and text parsing wont be as reliable as something that knows the lengths of varying data from reading the header.

C#, php

http://itextpdf.com/
openhorse is offline   Reply With Quote Reply
Old 02-04-2012, 02:44 AM   #3
Greybase
Registered User
 
Greybase's Avatar
 
Join Date: Jul 2009
Posts: 71
Just caught this... I've done some pretty extensive PDF parsing, for Greyhound programs however. Fully automated, using various command-line text extraction tools. When I looked at doing this with DRF and Equibase PDF's there were a number of complications. You have lots of embedded tiny fractions, and even worse, symbols embedded in horse PP's... which make text conversion difficult. I still say it COULD be done!!
__________________
*
The Dogs = Man's Best Bet!
Greybase is offline   Reply With Quote Reply
Old 02-04-2012, 03:15 PM   #4
guckers
Registered User
 
Join Date: Sep 2011
Posts: 77
Quote:
Originally Posted by Greybase
Just caught this... I've done some pretty extensive PDF parsing, for Greyhound programs however. Fully automated, using various command-line text extraction tools. When I looked at doing this with DRF and Equibase PDF's there were a number of complications. You have lots of embedded tiny fractions, and even worse, symbols embedded in horse PP's... which make text conversion difficult. I still say it COULD be done!!
Greybase, would you mind telling me at a high level how you achieved this? It seems that you would have to understand the chart framework and ordering of things, while still accounting for unique anomalies that happen. Then go through line by line and parse it by identifying locations and keywords. Does my explanation match something similar to what you were doing?
guckers is offline   Reply With Quote Reply
Old 02-04-2012, 03:46 PM   #5
GameTheory
Registered User
 
Join Date: Dec 2001
Posts: 6,128
I used to parse the Equibase pdf charts for a while after they switched over from pdf. So it used to be possible, anyway. They'd added some stuff since then (PP preview box of the running lines, for instance).
GameTheory is offline   Reply With Quote Reply
Old 02-14-2012, 03:48 PM   #6
togatrigger
Registered User
 
Join Date: Nov 2009
Posts: 26
I tried for awhile, but any conversion from the compressed sections were often also encrypted. Maybe that isn't the case anymore, but it's doubtful. The format is notoriously difficult to parse and convert. Good luck.
togatrigger is offline   Reply With Quote Reply
Old 02-15-2012, 12:55 AM   #7
guckers
Registered User
 
Join Date: Sep 2011
Posts: 77
PDF manipulation is no easy task, thanks for the input from everyone. I will update my progress as it may (or may not) come along.
guckers is offline   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 01:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.