Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board


Thread: If
View Single Post
Old 02-16-2023, 05:21 PM   #5
RonTiller
Registered User
 
Join Date: Oct 2003
Posts: 253
Quote:
What kind of coding and OCR would be required to take the DRF PDF Files we download from the Daily Racing Form and convert to CSV or whichever file.
There are many programs that convert PDF files to various other formats, many free and many quite pricey. You can Google search on "PDF to Text". However be prepared to be disappointed. Much formatting is normally lost going straight to Text (especially with any document with complicated layout and formatting, like PPs) and the amount of string and text manipulation needed to convert what can be a mess into usable data can be enormous with an ultimately unsatisfying and frustrating result.

I do some of this myself with relatively simple PDFs, only extracting maybe 10% of the data. Even then, some converted text documents require manual editing and they all require much automated error checking.

Going PDF to Text to CSV is 100 times worse. If anybody has done this with full PDF PP, you have my respect, with a hashtag #AreYouCrazy.

Having said this, I am aware of a professional programmer that has partially done that - extracting a small subset of data (speed and pace ratings I believe) from PDF PPs for importing into a database table (and no, I cannot make referrals).

On the other hand, converting PDF to Word format, using Adobe Acrobat program is so good they look identical. Unfortunately, this does not help you at all.

Quote:
What kind of coding does or needs to done in order for these files to download into a sheet all nice and neat with everything seperated and ready to be manipulated. Similar to Raybo and his project whereby Bris files are Comma Delimited for Spreadsheet.
As other posters have said, the way to go is to use files that are specifically designed as DATA FILES, comma delimited being the easiest to work with. I assume you asked the original question based on not wanting to purchase comma delimited files like BRIS sells. If this is the case, you are probably out of luck.

Ron Tiller
HDW
RonTiller is offline   Reply With Quote Reply
 
» Advertisement
» Current Polls
Which horse do you like most
Dornoch - 67.74%
42 Votes
Track Phantom - 32.26%
20 Votes
Total Votes: 62
This poll is closed.
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 01:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.