This is a discussion on PHP Read PDF within the PHP Language forums, part of the PHP Programming Forums category; I had to do my first investigation regarding PDF files. Surprisingly, I found that the only functions in PHP were ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I had to do my first investigation regarding PDF files. Surprisingly, I
found that the only functions in PHP were for creating PDF files. The potential customer receives order forms from the corporate headquarters and they are PDF forms. What we want to do is to extract information from these forms and process the data into a database. To do this we need to read certain set fields. Nowhere did I find a function to be able to read PDF files, let alone extract information from them. My thoughts, in the absence of this function, would be if there were a way to open the file, strip the formatting, and then work on the text stream. The key unknown for me in this is how to strip the formatting. So, do I hear any suggestions for either?: (1) How to read predetermined field entries from a PDF file or (2) How to convert a PDF into an unformatted text stream Shelly |
|
|||
|
Any suggestions?
"Shelly" <sheldonlg.news@asap-consult.com> wrote in message news:13f2f8uqm3eck19@corp.supernews.com... >I had to do my first investigation regarding PDF files. Surprisingly, I >found that the only functions in PHP were for creating PDF files. > > The potential customer receives order forms from the corporate > headquarters and they are PDF forms. What we want to do is to extract > information from these forms and process the data into a database. To do > this we need to read certain set fields. Nowhere did I find a function to > be able to read PDF files, let alone extract information from them. > > My thoughts, in the absence of this function, would be if there were a way > to open the file, strip the formatting, and then work on the text stream. > The key unknown for me in this is how to strip the formatting. > > So, do I hear any suggestions for either?: > (1) How to read predetermined field entries from a PDF file or > (2) How to convert a PDF into an unformatted text stream > > Shelly > |
|
|||
|
"Shelly" <sheldonlg.news@asap-consult.com> wrote in
news:13f2ro925ga7teb@corp.supernews.com: > Any suggestions? > > "Shelly" <sheldonlg.news@asap-consult.com> wrote in message > news:13f2f8uqm3eck19@corp.supernews.com... >>I had to do my first investigation regarding PDF files. Surprisingly, >>I found that the only functions in PHP were for creating PDF files. >> >> The potential customer receives order forms from the corporate >> headquarters and they are PDF forms. What we want to do is to >> extract information from these forms and process the data into a >> database. To do this we need to read certain set fields. Nowhere >> did I find a function to be able to read PDF files, let alone extract >> information from them. >> >> My thoughts, in the absence of this function, would be if there were >> a way to open the file, strip the formatting, and then work on the >> text stream. The key unknown for me in this is how to strip the >> formatting. >> >> So, do I hear any suggestions for either?: >> (1) How to read predetermined field entries from a PDF file or >> (2) How to convert a PDF into an unformatted text stream >> >> Shelly >> yikes, found this expensive option via the folks at pdflib: http://www.pdflib.com/products/tet/ .... also found a link that suggests PDF files are just gzipped XML, so maybe you could write your own extractor: http://www.thescripts.com/forum/thread631837.html |
|
|||
|
"Good Man" <heyho@letsgo.com> wrote in message news:Xns99B0A095D3484sonicyouth@216.196.97.131... > "Shelly" <sheldonlg.news@asap-consult.com> wrote in > news:13f2ro925ga7teb@corp.supernews.com: > >> Any suggestions? >> >> "Shelly" <sheldonlg.news@asap-consult.com> wrote in message >> news:13f2f8uqm3eck19@corp.supernews.com... >>>I had to do my first investigation regarding PDF files. Surprisingly, >>>I found that the only functions in PHP were for creating PDF files. >>> >>> The potential customer receives order forms from the corporate >>> headquarters and they are PDF forms. What we want to do is to >>> extract information from these forms and process the data into a >>> database. To do this we need to read certain set fields. Nowhere >>> did I find a function to be able to read PDF files, let alone extract >>> information from them. >>> >>> My thoughts, in the absence of this function, would be if there were >>> a way to open the file, strip the formatting, and then work on the >>> text stream. The key unknown for me in this is how to strip the >>> formatting. >>> >>> So, do I hear any suggestions for either?: >>> (1) How to read predetermined field entries from a PDF file or >>> (2) How to convert a PDF into an unformatted text stream >>> >>> Shelly >>> > > yikes, found this expensive option via the folks at pdflib: > > http://www.pdflib.com/products/tet/ yikes is an understatement > > > ... also found a link that suggests PDF files are just gzipped XML, so > maybe you could write your own extractor: > > http://www.thescripts.com/forum/thread631837.html hmm. |
|
|||
|
> "Good Man" <heyho@letsgo.com> wrote in message
>> ... also found a link that suggests PDF files are just gzipped XML, so >> maybe you could write your own extractor: >> >> http://www.thescripts.com/forum/thread631837.html > > hmm. I tried a very simple test with a very small PDF file. The code is: <?php $pdfFile = "./images/Postcard.pdf"; echo $pdfFile . "<br>"; $fp = gzopen($pdfFile, "r"); $rawStream = gzread($fp, 5000000); gzclose($fp); echo "**" .$rawStream . "**<br>"; $stream = gzuncompress($rawStream); echo $stream; ?> It came up with a "data error" in the line with $stream = gzuncompress($rawStream); The error was in the gzuncompress. |
|
|||
|
On 19.09.2007 22:41 Shelly wrote:
>> "Good Man" <heyho@letsgo.com> wrote in message >>> ... also found a link that suggests PDF files are just gzipped XML, so >>> maybe you could write your own extractor: >>> >>> http://www.thescripts.com/forum/thread631837.html >> hmm. > > I tried a very simple test with a very small PDF file. The code is: > > <?php > $pdfFile = "./images/Postcard.pdf"; > echo $pdfFile . "<br>"; > $fp = gzopen($pdfFile, "r"); > $rawStream = gzread($fp, 5000000); > gzclose($fp); > echo "**" .$rawStream . "**<br>"; > $stream = gzuncompress($rawStream); > echo $stream; > ?> > > > It came up with a "data error" in the line with > $stream = gzuncompress($rawStream); > The error was in the gzuncompress. > > Some parts of PDF are compressed using zip algorithm, but PDF itself is not a ZIP file. You cannot read it with gz functions. -- gosha bine makrell ~ http://www.tagarga.com/blok/makrell php done right ;) http://code.google.com/p/pihipi |
|
|||
|
Shelly wrote:
> "Good Man" <heyho@letsgo.com> wrote in message > news:Xns99B0A095D3484sonicyouth@216.196.97.131... >> "Shelly" <sheldonlg.news@asap-consult.com> wrote in >> news:13f2ro925ga7teb@corp.supernews.com: >> >>> Any suggestions? >>> >>> "Shelly" <sheldonlg.news@asap-consult.com> wrote in message >>> news:13f2f8uqm3eck19@corp.supernews.com... >>>> I had to do my first investigation regarding PDF files. Surprisingly, >>>> I found that the only functions in PHP were for creating PDF files. >>>> >>>> The potential customer receives order forms from the corporate >>>> headquarters and they are PDF forms. What we want to do is to >>>> extract information from these forms and process the data into a >>>> database. To do this we need to read certain set fields. Nowhere >>>> did I find a function to be able to read PDF files, let alone extract >>>> information from them. >>>> >>>> My thoughts, in the absence of this function, would be if there were >>>> a way to open the file, strip the formatting, and then work on the >>>> text stream. The key unknown for me in this is how to strip the >>>> formatting. >>>> >>>> So, do I hear any suggestions for either?: >>>> (1) How to read predetermined field entries from a PDF file or >>>> (2) How to convert a PDF into an unformatted text stream >>>> >>>> Shelly >>>> >> yikes, found this expensive option via the folks at pdflib: >> >> http://www.pdflib.com/products/tet/ > > yikes is an understatement > >> >> ... also found a link that suggests PDF files are just gzipped XML, so >> maybe you could write your own extractor: >> >> http://www.thescripts.com/forum/thread631837.html > > hmm. > > No, that is incorrect. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
Well You do have a option of reading PDF files from PHP
Firstly you will need PDFlib to be install written by Thomas Merz. then using statement below: <?php $pdf = PDF_new(); PDF_open_file($pdf); you can read the contents of the PDF file "Where $var represents the variable to store the PDF object reference (to be used in the next function in place of <pdf object>) and [filename] represents an optional parameter specifying a already existing PDF file to open. If no filename is specified, then a new PDF document is created." for more reference please visit: http://www.zend.com/zend/spotlight/creatingpdfmay1.php On Sep 19, 11:48 pm, "Shelly" <sheldonlg.n...@asap-consult.com> wrote: > Any suggestions? > > "Shelly" <sheldonlg.n...@asap-consult.com> wrote in message > > news:13f2f8uqm3eck19@corp.supernews.com...>I had to do my first investigation regarding PDF files. Surprisingly, I > >found that the only functions in PHP were for creating PDF files. > > > The potential customer receives order forms from the corporate > > headquarters and they are PDF forms. What we want to do is to extract > > information from these forms and process the data into a database. To do > > this we need to read certain set fields. Nowhere did I find a function to > > be able to read PDF files, let alone extract information from them. > > > My thoughts, in the absence of this function, would be if there were a way > > to open the file, strip the formatting, and then work on the text stream. > > The key unknown for me in this is how to strip the formatting. > > > So, do I hear any suggestions for either?: > > (1) How to read predetermined field entries from a PDF file or > > (2) How to convert a PDF into an unformatted text stream > > > Shelly |