Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board

Go Back   Horse Racing Forum - PaceAdvantage.Com - Horse Racing Message Board > Thoroughbred Horse Racing Discussion > Handicapping Software


Reply
 
Thread Tools Rate Thread
Old 02-16-2023, 11:29 AM   #1
wiretowire68
Registered User
 
Join Date: Jan 2020
Posts: 303
If

This is a software question? What kind of coding and OCR would be required to take the DRF PDF Files we download from the Daily Racing Form and convert to CSV or whichever file. This is a techie question? What kind of coding does or needs to done in order for these files to download into a sheet all nice and neat with everything seperated and ready to be manipulated. Similar to Raybo and his project whereby Bris files are Comma Delimited for Spreadsheet.

Hope this can be answered.
wiretowire68 is offline   Reply With Quote Reply
Old 02-16-2023, 12:06 PM   #2
Stevecsd2
Registered User
 
Join Date: Jan 2018
Posts: 664
W2W,

I have done software development and programming for many years.

At one time I did "decode" the raw DRF files into the program I use. It was a lot of work and very time consuming. And I did that before I came across Raybo's documentation. Even with Raybo's documentation it would still be a lot of work.

Since the data is in a text file format it can be imported pretty much into any software you want to use. It may be a little large to work with in Excel and I don't think you could do much sophisticated analysis using Excel. However, there are probably people who have done that.

If you aren't doing this yourself it would be very expensive to hire someone to do it. I'm talking in thousands of dollars.

I took a look at Raybo's files. He did a good job decoding the DRF files.

A better approach might be to work out what you think are the important factors for a horse and a race. Then you may be able to work with a developer to pull out those specific factors from the DRF files. The big problem with that approach is if it doesn't yield the results you want, you have to start over.
Stevecsd2 is offline   Reply With Quote Reply
Old 02-16-2023, 12:29 PM   #3
ranchwest
Registered User
 
Join Date: Oct 2001
Location: near Lone Star Park
Posts: 5,152
Quote:
Originally Posted by wiretowire68 View Post
This is a software question? What kind of coding and OCR would be required to take the DRF PDF Files we download from the Daily Racing Form and convert to CSV or whichever file. This is a techie question? What kind of coding does or needs to done in order for these files to download into a sheet all nice and neat with everything seperated and ready to be manipulated. Similar to Raybo and his project whereby Bris files are Comma Delimited for Spreadsheet.

Hope this can be answered.
I would think deconstructing a PDF would be a long way approach. Why not just work from the BRIS files?
__________________
Ranch West
Equine Performance Analyst, Quick Grid Software
ranchwest is offline   Reply With Quote Reply
Old 02-16-2023, 03:03 PM   #4
NormanTD
Registered User
 
Join Date: Sep 2001
Posts: 117
Quote:
Originally Posted by ranchwest View Post
I would think deconstructing a PDF would be a long way approach. Why not just work from the BRIS files?
I concur with Ranch here. Starting with an existing .csv file would be incredibly easier than what you're talking about and the files are only $1 each.
NormanTD is offline   Reply With Quote Reply
Old 02-16-2023, 05:21 PM   #5
RonTiller
Registered User
 
Join Date: Oct 2003
Posts: 253
Quote:
What kind of coding and OCR would be required to take the DRF PDF Files we download from the Daily Racing Form and convert to CSV or whichever file.
There are many programs that convert PDF files to various other formats, many free and many quite pricey. You can Google search on "PDF to Text". However be prepared to be disappointed. Much formatting is normally lost going straight to Text (especially with any document with complicated layout and formatting, like PPs) and the amount of string and text manipulation needed to convert what can be a mess into usable data can be enormous with an ultimately unsatisfying and frustrating result.

I do some of this myself with relatively simple PDFs, only extracting maybe 10% of the data. Even then, some converted text documents require manual editing and they all require much automated error checking.

Going PDF to Text to CSV is 100 times worse. If anybody has done this with full PDF PP, you have my respect, with a hashtag #AreYouCrazy.

Having said this, I am aware of a professional programmer that has partially done that - extracting a small subset of data (speed and pace ratings I believe) from PDF PPs for importing into a database table (and no, I cannot make referrals).

On the other hand, converting PDF to Word format, using Adobe Acrobat program is so good they look identical. Unfortunately, this does not help you at all.

Quote:
What kind of coding does or needs to done in order for these files to download into a sheet all nice and neat with everything seperated and ready to be manipulated. Similar to Raybo and his project whereby Bris files are Comma Delimited for Spreadsheet.
As other posters have said, the way to go is to use files that are specifically designed as DATA FILES, comma delimited being the easiest to work with. I assume you asked the original question based on not wanting to purchase comma delimited files like BRIS sells. If this is the case, you are probably out of luck.

Ron Tiller
HDW
RonTiller is offline   Reply With Quote Reply
Old 02-16-2023, 06:07 PM   #6
ranchwest
Registered User
 
Join Date: Oct 2001
Location: near Lone Star Park
Posts: 5,152
You might reach the break even point on cost about 25 years down the road.
__________________
Ranch West
Equine Performance Analyst, Quick Grid Software
ranchwest is offline   Reply With Quote Reply
Old 02-17-2023, 09:38 AM   #7
Tom
The Voice of Reason!
 
Tom's Avatar
 
Join Date: Mar 2001
Location: Canandaigua, New york
Posts: 112,819
32 years with inflation!
__________________
Who does the Racing Form Detective like in this one?
Tom is online now   Reply With Quote Reply
Old 02-18-2023, 10:45 AM   #8
wiretowire68
Registered User
 
Join Date: Jan 2020
Posts: 303
I have Ray's Version

I have used Ray's Version and it is excellent, however, I hate excel, I find google sheets just as good(Opinion) Ray's spits out whats programmed and if you mess around, it spoils his code..The reason I am working on this, is I think it can be done with the brainiac outs there in software land..I think with the proper OCR. I am wondering if an excellent java script. I have put in a request just out of curiosity to the drf developers of formulator, an aussie company to see what it would cost.. not cheap but could be onto to something. lol
wiretowire68 is offline   Reply With Quote Reply
Old 02-20-2023, 01:31 AM   #9
ranchwest
Registered User
 
Join Date: Oct 2001
Location: near Lone Star Park
Posts: 5,152
There's 1435 elements in a BRIS file. I don't think any programmer is going to want to try to get that much data from using an OCR or decrypting a PDF.
__________________
Ranch West
Equine Performance Analyst, Quick Grid Software
ranchwest is offline   Reply With Quote Reply
Old 02-23-2023, 02:25 PM   #10
wiretowire68
Registered User
 
Join Date: Jan 2020
Posts: 303
I already have Raybos version, creating my own.
wiretowire68 is offline   Reply With Quote Reply
Reply





Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

» Advertisement
» Current Polls
Wh deserves to be the favorite? (last 4 figures)
Powered by vBadvanced CMPS v3.2.3

All times are GMT -4. The time now is 08:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright 1999 - 2023 -- PaceAdvantage.Com -- All Rights Reserved
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program
designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.