PDA

View Full Version : Why don't computer


BIG HIT
07-22-2009, 03:51 PM
Programer's ever make them to read pdf file's.Can't just make so put in extension and wham you got it.Really speaking pdf tsn bis espn that way no one is limited say have pdf files and result.Then buy a program can't test or model because dozen't read them have to start from scr.Think more people would consider it if they did not have to start over

ryesteve
07-22-2009, 04:17 PM
Have you ever looked at a pdf file in notepad?

BIG HIT
07-22-2009, 07:26 PM
Well had to ask thank's

ryesteve
07-22-2009, 07:29 PM
If you take a look, you'll see... a pdf file isn't intended to be a data file.

Dave Schwartz
07-22-2009, 07:39 PM
Big Hit,

The creator of the PDF obviously does not want the handicapping information shared via a software program. If they did, they would put it into a format that is easily read. By putting it into a PDF the creator of said PDF actually has the ability to prevent most programs from reading the data.

As a programmer, there are some tools that I can purchase which will read them anyway. However, the cost for such a pckage is about $500 per user (or more). That is not MY cost - the license demands that EVERY user of MY software pays for a copy of THEIR software.

So, the bottom line is that programmers don't even consider PDF imports without cooperation of the pdf's creator.


Regards,
Dave Schwartz

BIG HIT
07-23-2009, 08:34 AM
Great explanation or fact.Never understood thank's.

Tom Barrister
07-23-2009, 11:04 AM
What would stop the programmer from writing his/her own application to read/translate/convert the PDF file into usable form?

Dave Schwartz
07-23-2009, 12:30 PM
Tom,

That is precisely what these 3rd party vendors have done. It is a huge problem to get right... not just 100 or 200 hours but much more.

Personally, I have never attempted to read such a file so I have no idea what level of difficulty it is. I just now that programmers do not typically write such things. They purchase tools.

Sometimes it is just easier (or more cost-effective) to purchase a solution.


Dave

dutchboy
07-23-2009, 07:20 PM
I use a program at work that will convert pdf files to excel, word, powerpoint, text. It is a simple program to use and learn and the cost per computer is about 125.00 That is a one time charge and everyone with access to my computer can use it. We use it to convert pdf files to excel worksheets so they can then be imported in to a workflow system. It will convert a pdf that was created electronically or a pdf that was created by scanning it as a pdf file.

Problem always is the page layout of the pdf file. If everything is in tabular columns it will convert error free in a few minutes. If the data is stacked as a Brisnet pp pdf is you will have problems. They do have a 7 day free trial available which allows you to convert up to 3 pages an unlimited number of times in the seven day timeframe until you buy the license.

twobet
08-02-2009, 12:53 PM
I just purchased Nuance PDF Converter 6 for $39.95. Converts PDF to Word, Excel, Power Point, etc.

Dave Schwartz
08-02-2009, 05:47 PM
Does it work with the PDFs you need to convert?

At least in the past, not all PDFs would convert.


Dave

dutchboy
08-02-2009, 06:55 PM
PDF files can be protected/encrypted by the creator so they cannot be converted. For users that think their pdf files are secure, you can buy software online to crack those passwords in a few seconds. If you receive a protected pdf that cannot be converted you can always print it and then scan it as a pdf yourself and then convert them.

Some of the PDF conversion software vendors sell two versions. The less expensive versions usually will not convert a pdf that was created by scanning but will convert the pdf files that were created electronically. The more expensive versions usually will convert the pdf since they have an OCR as part of the software. If you do not have the majority of the data in well defined tabular columns you will have a lot of problems.

Good luck as it is not as simple as it looks.

CBedo
08-02-2009, 11:55 PM
Does it work with the PDFs you need to convert?

At least in the past, not all PDFs would convert.


Daveif you have Microsoft Office, you already have a pretty good converter actually. Just print the pdf to the Office ImageWriter as a .tif file. Then open it with MODI (Microsoft Office Digital Imaging? I'm guessing) which if installed will be in the Office Tools folder. Last step is to have the ocr software do it's stuff and send the output to Word.

It works better than you would think.

Dave Schwartz
08-03-2009, 12:08 AM
LOL - Chris - that is way too hard for me.

But it certainly would not work from a software development standpoint.


Dave

CBedo
08-03-2009, 12:17 AM
LOL - Chris - that is way too hard for me.

But it certainly would not work from a software development standpoint.


DaveActually, from a development standpoint, it's way easier! I learned about the manual procedure while checking out some Windows OLE programming techniques (in Ruby). It's pretty slick. If you are interested, I'll try to find where I picked it up. (You still have to convert the pdf to an image file, since the code was designed to handle images, but that shouldn't be hard to code either.

I used to struggle with ways to convert the pdfs (pps and charts) to try to cut down on data acquisition costs, but now with unlimited chart plans from BRIS and others, and unlimited pp data (ProCaps data files) from TSN, it's not quite as big an issue.

Dave Schwartz
08-03-2009, 01:13 AM
I am not saying that I couldn't do it. I am saying that in a piece of commercial software it is not practical.

Warren Henry
08-03-2009, 02:19 AM
I am not saying that I couldn't do it. I am saying that in a piece of commercial software it is not practical.
A lot of folks don't understand that adding capabilities like that to commercial software would push the cost of the software well beyone what anyone would be willing to pay.

Back when I did custom programming for businesses, one of my standard replies was

"Yeah, I can do that. But it is a function of economics as to whether or not you really want me to do it."

CBedo
08-03-2009, 02:49 AM
A lot of folks don't understand that adding capabilities like that to commercial software would push the cost of the software well beyone what anyone would be willing to pay.

Back when I did custom programming for businesses, one of my standard replies was

"Yeah, I can do that. But it is a function of economics as to whether or not you really want me to do it."I'm not a developer, so forgive my ignorance, but I'm interested in understanding what you mean, especially economically. From my standpoint, this methodology would only be impractical from two standpoints(not counting that there is probably a more eloquent way to do it): 1) There are better data sources for not much more (if any) cost than pdfs (as I said in my earlier post), and 2) locking in a program (and customer) to not only a specific platform, but having the requirement of having Microsoft Office installed on the customer's machine doesn't seem to make any sense.

Other than that, the coding itself would be easy. Read a pdf, process the pdf, parse the resulting text file to get the data you want.

What else am I missing, especially from a cost standpoint? As I said earlier, forgive my inexperience in this area.

Dave Schwartz
08-03-2009, 04:21 AM
Anytime one sells a product that demands other products be owned or purchased by the purchaser, it hurts sales.

So, imagine if a guy wants to buy my software and I tell him that to use it correctly he must also have a PDF reader and MS Office - together (say) a $400 purchase. To him my product just went up $400 in price but I did not get any of that revenue.

In the long run it costs the developer money. Note that in the corprate world this has little or no impact but in the end-user world it is huge.


Dave

CBedo
08-03-2009, 11:55 AM
Thanks Dave, that makes total sense. When you asked about it, I def wasn't thinking commerical development, I was thinking "quick and dirty" personal use.

harntrox
08-05-2009, 11:54 AM
If someone doesnt want you reading their pdf's, they can throw invisible characters and format commands into the document that dont show up when viewed, but make parsing impossible because each field is a random length delimited by random (garbage) characters.

At the point they dont want their PDF document parsed, they can easily start adding this in. So if your software depends on a specific PDF format, the document's owner can modify it regularly and guarantee your software will never work reliably for any amount of time.

This is different than HTML where you can always see the plain text, even if its wrapped in javascript indirection. With PDF's tokenized commands can be hidden in the code, referencing internal parameters which will never be exposed to developers.

CBedo
08-05-2009, 12:09 PM
If someone doesnt want you reading their pdf's, they can throw invisible characters and format commands into the document that dont show up when viewed, but make parsing impossible because each field is a random length delimited by random (garbage) characters.

At the point they dont want their PDF document parsed, they can easily start adding this in. So if your software depends on a specific PDF format, the document's owner can modify it regularly and guarantee your software will never work reliably for any amount of time.That's why I've playing with the method I am using now. By turning the pdf into an image file first, you can avoid that issue.

Red Knave
08-05-2009, 05:59 PM
That's why I've playing with the method I am using now. By turning the pdf into an image file first, you can avoid that issue.As part of the previously mentioned encryption, a PDF creator can also prevent printing as well as copy/paste. This makes it that much more difficult to turn it into an image.

TrifectaMike
08-05-2009, 06:26 PM
As part of the previously mentioned encryption, a PDF creator can also prevent printing as well as copy/paste. This makes it that much more difficult to turn it into an image.

If it's displayed on your screen, it can be captured.
Mike

CBedo
08-05-2009, 08:05 PM
As part of the previously mentioned encryption, a PDF creator can also prevent printing as well as copy/paste. This makes it that much more difficult to turn it into an image.I've never seen a pdf I couldn't print a hard copy of (doesn't mean there aren't any). That is basically what I am doing--"hard copy printing it," but instead of it going to paper, it's going through an imagewriter. Instead of a hard copy on paper, I get a .tiff image file that I then use OCR software on. That's basically the original manual process I described. The only difference in what I'm doing now is that instead of doing it manually step by step, I do it about 10 lines of code.