PHP Read PDF

This is a discussion on PHP Read PDF within the PHP Language forums, part of the PHP Programming Forums category; I had to do my first investigation regarding PDF files. Surprisingly, I found that the only functions in PHP were ...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 09-19-2007
Shelly
 
Posts: n/a
Default PHP Read PDF

I had to do my first investigation regarding PDF files. Surprisingly, I
found that the only functions in PHP were for creating PDF files.

The potential customer receives order forms from the corporate headquarters
and they are PDF forms. What we want to do is to extract information from
these forms and process the data into a database. To do this we need to
read certain set fields. Nowhere did I find a function to be able to read
PDF files, let alone extract information from them.

My thoughts, in the absence of this function, would be if there were a way
to open the file, strip the formatting, and then work on the text stream.
The key unknown for me in this is how to strip the formatting.

So, do I hear any suggestions for either?:
(1) How to read predetermined field entries from a PDF file or
(2) How to convert a PDF into an unformatted text stream

Shelly


Reply With Quote
  #2 (permalink)  
Old 09-19-2007
Shelly
 
Posts: n/a
Default Re: PHP Read PDF

Any suggestions?

"Shelly" <sheldonlg.news@asap-consult.com> wrote in message
news:13f2f8uqm3eck19@corp.supernews.com...
>I had to do my first investigation regarding PDF files. Surprisingly, I
>found that the only functions in PHP were for creating PDF files.
>
> The potential customer receives order forms from the corporate
> headquarters and they are PDF forms. What we want to do is to extract
> information from these forms and process the data into a database. To do
> this we need to read certain set fields. Nowhere did I find a function to
> be able to read PDF files, let alone extract information from them.
>
> My thoughts, in the absence of this function, would be if there were a way
> to open the file, strip the formatting, and then work on the text stream.
> The key unknown for me in this is how to strip the formatting.
>
> So, do I hear any suggestions for either?:
> (1) How to read predetermined field entries from a PDF file or
> (2) How to convert a PDF into an unformatted text stream
>
> Shelly
>



Reply With Quote
  #3 (permalink)  
Old 09-19-2007
Good Man
 
Posts: n/a
Default Re: PHP Read PDF

"Shelly" <sheldonlg.news@asap-consult.com> wrote in
news:13f2ro925ga7teb@corp.supernews.com:

> Any suggestions?
>
> "Shelly" <sheldonlg.news@asap-consult.com> wrote in message
> news:13f2f8uqm3eck19@corp.supernews.com...
>>I had to do my first investigation regarding PDF files. Surprisingly,
>>I found that the only functions in PHP were for creating PDF files.
>>
>> The potential customer receives order forms from the corporate
>> headquarters and they are PDF forms. What we want to do is to
>> extract information from these forms and process the data into a
>> database. To do this we need to read certain set fields. Nowhere
>> did I find a function to be able to read PDF files, let alone extract
>> information from them.
>>
>> My thoughts, in the absence of this function, would be if there were
>> a way to open the file, strip the formatting, and then work on the
>> text stream. The key unknown for me in this is how to strip the
>> formatting.
>>
>> So, do I hear any suggestions for either?:
>> (1) How to read predetermined field entries from a PDF file or
>> (2) How to convert a PDF into an unformatted text stream
>>
>> Shelly
>>


yikes, found this expensive option via the folks at pdflib:

http://www.pdflib.com/products/tet/


.... also found a link that suggests PDF files are just gzipped XML, so
maybe you could write your own extractor:

http://www.thescripts.com/forum/thread631837.html


Reply With Quote
  #4 (permalink)  
Old 09-19-2007
Shelly
 
Posts: n/a
Default Re: PHP Read PDF


"Good Man" <heyho@letsgo.com> wrote in message
news:Xns99B0A095D3484sonicyouth@216.196.97.131...
> "Shelly" <sheldonlg.news@asap-consult.com> wrote in
> news:13f2ro925ga7teb@corp.supernews.com:
>
>> Any suggestions?
>>
>> "Shelly" <sheldonlg.news@asap-consult.com> wrote in message
>> news:13f2f8uqm3eck19@corp.supernews.com...
>>>I had to do my first investigation regarding PDF files. Surprisingly,
>>>I found that the only functions in PHP were for creating PDF files.
>>>
>>> The potential customer receives order forms from the corporate
>>> headquarters and they are PDF forms. What we want to do is to
>>> extract information from these forms and process the data into a
>>> database. To do this we need to read certain set fields. Nowhere
>>> did I find a function to be able to read PDF files, let alone extract
>>> information from them.
>>>
>>> My thoughts, in the absence of this function, would be if there were
>>> a way to open the file, strip the formatting, and then work on the
>>> text stream. The key unknown for me in this is how to strip the
>>> formatting.
>>>
>>> So, do I hear any suggestions for either?:
>>> (1) How to read predetermined field entries from a PDF file or
>>> (2) How to convert a PDF into an unformatted text stream
>>>
>>> Shelly
>>>

>
> yikes, found this expensive option via the folks at pdflib:
>
> http://www.pdflib.com/products/tet/


yikes is an understatement

>
>
> ... also found a link that suggests PDF files are just gzipped XML, so
> maybe you could write your own extractor:
>
> http://www.thescripts.com/forum/thread631837.html


hmm.


Reply With Quote
  #5 (permalink)  
Old 09-19-2007
Shelly
 
Posts: n/a
Default Re: PHP Read PDF

> "Good Man" <heyho@letsgo.com> wrote in message
>> ... also found a link that suggests PDF files are just gzipped XML, so
>> maybe you could write your own extractor:
>>
>> http://www.thescripts.com/forum/thread631837.html

>
> hmm.


I tried a very simple test with a very small PDF file. The code is:

<?php
$pdfFile = "./images/Postcard.pdf";
echo $pdfFile . "<br>";
$fp = gzopen($pdfFile, "r");
$rawStream = gzread($fp, 5000000);
gzclose($fp);
echo "**" .$rawStream . "**<br>";
$stream = gzuncompress($rawStream);
echo $stream;
?>


It came up with a "data error" in the line with
$stream = gzuncompress($rawStream);
The error was in the gzuncompress.


Reply With Quote
  #6 (permalink)  
Old 09-20-2007
gosha bine
 
Posts: n/a
Default Re: PHP Read PDF

On 19.09.2007 22:41 Shelly wrote:
>> "Good Man" <heyho@letsgo.com> wrote in message
>>> ... also found a link that suggests PDF files are just gzipped XML, so
>>> maybe you could write your own extractor:
>>>
>>> http://www.thescripts.com/forum/thread631837.html

>> hmm.

>
> I tried a very simple test with a very small PDF file. The code is:
>
> <?php
> $pdfFile = "./images/Postcard.pdf";
> echo $pdfFile . "<br>";
> $fp = gzopen($pdfFile, "r");
> $rawStream = gzread($fp, 5000000);
> gzclose($fp);
> echo "**" .$rawStream . "**<br>";
> $stream = gzuncompress($rawStream);
> echo $stream;
> ?>
>
>
> It came up with a "data error" in the line with
> $stream = gzuncompress($rawStream);
> The error was in the gzuncompress.
>
>


Some parts of PDF are compressed using zip algorithm, but PDF itself is
not a ZIP file. You cannot read it with gz functions.


--
gosha bine

makrell ~ http://www.tagarga.com/blok/makrell
php done right ;) http://code.google.com/p/pihipi
Reply With Quote
  #7 (permalink)  
Old 09-21-2007
Jerry Stuckle
 
Posts: n/a
Default Re: PHP Read PDF

Shelly wrote:
> "Good Man" <heyho@letsgo.com> wrote in message
> news:Xns99B0A095D3484sonicyouth@216.196.97.131...
>> "Shelly" <sheldonlg.news@asap-consult.com> wrote in
>> news:13f2ro925ga7teb@corp.supernews.com:
>>
>>> Any suggestions?
>>>
>>> "Shelly" <sheldonlg.news@asap-consult.com> wrote in message
>>> news:13f2f8uqm3eck19@corp.supernews.com...
>>>> I had to do my first investigation regarding PDF files. Surprisingly,
>>>> I found that the only functions in PHP were for creating PDF files.
>>>>
>>>> The potential customer receives order forms from the corporate
>>>> headquarters and they are PDF forms. What we want to do is to
>>>> extract information from these forms and process the data into a
>>>> database. To do this we need to read certain set fields. Nowhere
>>>> did I find a function to be able to read PDF files, let alone extract
>>>> information from them.
>>>>
>>>> My thoughts, in the absence of this function, would be if there were
>>>> a way to open the file, strip the formatting, and then work on the
>>>> text stream. The key unknown for me in this is how to strip the
>>>> formatting.
>>>>
>>>> So, do I hear any suggestions for either?:
>>>> (1) How to read predetermined field entries from a PDF file or
>>>> (2) How to convert a PDF into an unformatted text stream
>>>>
>>>> Shelly
>>>>

>> yikes, found this expensive option via the folks at pdflib:
>>
>> http://www.pdflib.com/products/tet/

>
> yikes is an understatement
>
>>
>> ... also found a link that suggests PDF files are just gzipped XML, so
>> maybe you could write your own extractor:
>>
>> http://www.thescripts.com/forum/thread631837.html

>
> hmm.
>
>


No, that is incorrect.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Reply With Quote
  #8 (permalink)  
Old 09-21-2007
dshesnicky@yahoo.com
 
Posts: n/a
Default Re: PHP Read PDF


> I had to do my first investigation regarding PDF files. Surprisingly, I
> found that the only functions in PHP were for creating PDF files.


How about pdf2text? Google it if your interested.

Don

Reply With Quote
  #9 (permalink)  
Old 10-15-2007
atulkapoor@gmail.com
 
Posts: n/a
Default Re: PHP Read PDF

Well You do have a option of reading PDF files from PHP

Firstly you will need PDFlib to be install written by Thomas Merz.

then using statement below:
<?php

$pdf = PDF_new();
PDF_open_file($pdf);

you can read the contents of the PDF file

"Where $var represents the variable to store the PDF object reference
(to be used in the next function in place of <pdf object>) and
[filename] represents an optional parameter specifying a already
existing PDF file to open. If no filename is specified, then a new PDF
document is created."

for more reference please visit:
http://www.zend.com/zend/spotlight/creatingpdfmay1.php
On Sep 19, 11:48 pm, "Shelly" <sheldonlg.n...@asap-consult.com> wrote:
> Any suggestions?
>
> "Shelly" <sheldonlg.n...@asap-consult.com> wrote in message
>
> news:13f2f8uqm3eck19@corp.supernews.com...>I had to do my first investigation regarding PDF files. Surprisingly, I
> >found that the only functions in PHP were for creating PDF files.

>
> > The potential customer receives order forms from the corporate
> > headquarters and they are PDF forms. What we want to do is to extract
> > information from these forms and process the data into a database. To do
> > this we need to read certain set fields. Nowhere did I find a function to
> > be able to read PDF files, let alone extract information from them.

>
> > My thoughts, in the absence of this function, would be if there were a way
> > to open the file, strip the formatting, and then work on the text stream.
> > The key unknown for me in this is how to strip the formatting.

>
> > So, do I hear any suggestions for either?:
> > (1) How to read predetermined field entries from a PDF file or
> > (2) How to convert a PDF into an unformatted text stream

>
> > Shelly




Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 08:05 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0