Quote:
What kind of coding and OCR would be required to take the DRF PDF Files we download from the Daily Racing Form and convert to CSV or whichever file.
|
There are many programs that convert PDF files to various other formats, many free and many quite pricey. You can Google search on "PDF to Text".
However be prepared to be disappointed. Much formatting is normally lost going straight to Text (especially with any document with complicated layout and formatting, like PPs) and the amount of string and text manipulation needed to convert what can be a mess into usable data can be enormous with an ultimately unsatisfying and frustrating result.
I do some of this myself with relatively simple PDFs, only extracting maybe 10% of the data. Even then, some converted text documents require manual editing and they all require much automated error checking.
Going PDF to Text to CSV is 100 times worse. If anybody has done this with full PDF PP, you have my respect, with a hashtag #AreYouCrazy.
Having said this, I am aware of a professional programmer that has partially done that - extracting a small subset of data (speed and pace ratings I believe) from PDF PPs for importing into a database table (and no, I cannot make referrals).
On the other hand, converting PDF to Word format, using Adobe Acrobat program is so good they look identical. Unfortunately, this does not help you at all.
Quote:
What kind of coding does or needs to done in order for these files to download into a sheet all nice and neat with everything seperated and ready to be manipulated. Similar to Raybo and his project whereby Bris files are Comma Delimited for Spreadsheet.
|
As other posters have said, the way to go is to use files that are specifically designed as DATA FILES, comma delimited being the easiest to work with. I assume you asked the original question based on not wanting to purchase comma delimited files like BRIS sells. If this is the case, you are probably out of luck.
Ron Tiller
HDW