Large XML manipulation within PHP

This is a discussion on Large XML manipulation within PHP within the PHP General forums, part of the PHP Programming Forums category; I work for a company that has chosen to use XML (Software AG Tamino XML database) as its storage system ...


Go Back   Usenet Forums > PHP Programming Forums > PHP General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-23-2008
Steve Gula
 
Posts: n/a
Default Large XML manipulation within PHP

I work for a company that has chosen to use XML (Software AG Tamino XML
database) as its storage system for an enterprise application. We need to
make a system wide change to information within the database that isn't
feasible to do through our application's user interface. My solution was to
unload the XML collection in question, open it, manipulate it, then write it
back out. Problem is it's a 230+MB file and even with PHP's max mem set to
4096MB (of 8GB available to the system) SimpleXML claims to still run out of
memory. Can anyone recommend a better way for handling a large amount of XML
data? Thanks.

--
--Steve Gula

(this email address is used for list communications only, direct contact at
this email address is not guaranteed to be read)

Reply With Quote
  #2 (permalink)  
Old 04-23-2008
Bastien Koert
 
Posts: n/a
Default Re: [PHP] Large XML manipulation within PHP

On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote:
>
> I work for a company that has chosen to use XML (Software AG Tamino XML
> database) as its storage system for an enterprise application. We need to
> make a system wide change to information within the database that isn't
> feasible to do through our application's user interface. My solution was
> to
> unload the XML collection in question, open it, manipulate it, then write
> it
> back out. Problem is it's a 230+MB file and even with PHP's max mem set to
> 4096MB (of 8GB available to the system) SimpleXML claims to still run out
> of
> memory. Can anyone recommend a better way for handling a large amount of
> XML
> data? Thanks.
>
> --
> --Steve Gula
>
> (this email address is used for list communications only, direct contact
> at
> this email address is not guaranteed to be read)
>


Can you chunk the data in any way, break it into smaller more managable
peices?

--

Bastien

Cat, the other other white meat

Reply With Quote
  #3 (permalink)  
Old 04-23-2008
Steve Gula
 
Posts: n/a
Default Re: [PHP] Large XML manipulation within PHP

I could but it would make things very difficult. Some of the entities around
id # 100 could be affected by entities around id #11000 and would result in
a file needing to be manipulated at the same time. Unfortunately, I don't
think this is a top to bottom change for the information at hand.

On Wed, Apr 23, 2008 at 4:36 PM, Bastien Koert <phpster@gmail.com> wrote:

>
>
> On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote:
> >
> > I work for a company that has chosen to use XML (Software AG Tamino XML
> > database) as its storage system for an enterprise application. We need
> > to
> > make a system wide change to information within the database that isn't
> > feasible to do through our application's user interface. My solution was
> > to
> > unload the XML collection in question, open it, manipulate it, then
> > write it
> > back out. Problem is it's a 230+MB file and even with PHP's max mem set
> > to
> > 4096MB (of 8GB available to the system) SimpleXML claims to still run
> > out of
> > memory. Can anyone recommend a better way for handling a large amount of
> > XML
> > data? Thanks.
> >
> > --
> > --Steve Gula
> >
> > (this email address is used for list communications only, direct contact
> > at
> > this email address is not guaranteed to be read)
> >

>
> Can you chunk the data in any way, break it into smaller more managable
> peices?
>
> --
>
> Bastien
>
> Cat, the other other white meat





--
--Steve Gula

(this email address is used for list communications only, direct contact at
this email address is not guaranteed to be read)

Reply With Quote
  #4 (permalink)  
Old 04-23-2008
Stut
 
Posts: n/a
Default Re: [PHP] Large XML manipulation within PHP

On 23 Apr 2008, at 21:41, Steve Gula wrote:

> I could but it would make things very difficult. Some of the
> entities around
> id # 100 could be affected by entities around id #11000 and would
> result in
> a file needing to be manipulated at the same time. Unfortunately, I
> don't
> think this is a top to bottom change for the information at hand.


Can you not do it with a text processor like sed? That would be a lot
easier than trying to do it with SimpleXML.

-Stut

--
http://stut.net/

> On Wed, Apr 23, 2008 at 4:36 PM, Bastien Koert <phpster@gmail.com>
> wrote:
>
>>
>>
>> On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote:
>>>
>>> I work for a company that has chosen to use XML (Software AG
>>> Tamino XML
>>> database) as its storage system for an enterprise application. We
>>> need
>>> to
>>> make a system wide change to information within the database that
>>> isn't
>>> feasible to do through our application's user interface. My
>>> solution was
>>> to
>>> unload the XML collection in question, open it, manipulate it, then
>>> write it
>>> back out. Problem is it's a 230+MB file and even with PHP's max
>>> mem set
>>> to
>>> 4096MB (of 8GB available to the system) SimpleXML claims to still
>>> run
>>> out of
>>> memory. Can anyone recommend a better way for handling a large
>>> amount of
>>> XML
>>> data? Thanks.
>>>
>>> --
>>> --Steve Gula
>>>
>>> (this email address is used for list communications only, direct
>>> contact
>>> at
>>> this email address is not guaranteed to be read)
>>>

>>
>> Can you chunk the data in any way, break it into smaller more
>> managable
>> peices?
>>
>> --
>>
>> Bastien
>>
>> Cat, the other other white meat

>
>
>
>
> --
> --Steve Gula
>
> (this email address is used for list communications only, direct
> contact at
> this email address is not guaranteed to be read)


Reply With Quote
  #5 (permalink)  
Old 04-23-2008
@4u
 
Posts: n/a
Default Re: [PHP] Large XML manipulation within PHP

Hi,

How about expat with custom XML handlers? Should work even with an 32 MB
memory limit. It will just take some time ...

Have fun

Bastien Koert schrieb:
> On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote:
>> I work for a company that has chosen to use XML (Software AG Tamino XML
>> database) as its storage system for an enterprise application. We need to
>> make a system wide change to information within the database that isn't
>> feasible to do through our application's user interface. My solution was
>> to
>> unload the XML collection in question, open it, manipulate it, then write
>> it
>> back out. Problem is it's a 230+MB file and even with PHP's max mem set to
>> 4096MB (of 8GB available to the system) SimpleXML claims to still run out
>> of
>> memory. Can anyone recommend a better way for handling a large amount of
>> XML
>> data? Thanks.
>>
>> --
>> --Steve Gula
>>
>> (this email address is used for list communications only, direct contact
>> at
>> this email address is not guaranteed to be read)
>>

>
> Can you chunk the data in any way, break it into smaller more managable
> peices?
>

Reply With Quote
  #6 (permalink)  
Old 04-23-2008
Bojan Tesanovic
 
Posts: n/a
Default Re: [PHP] Large XML manipulation within PHP

In that case you may want to try XMLReader as it doesn't load all XML
into memory.

If that doesn't help that you will need to do custom parser
application for you need.
using XMLReader to read through whole XML chunking it with eg every
5000 items and storing those chunks on disk.

Than use SimpleXML to read and manipulate those chunks and save them
back to disk.

It would help if you can provide with XML mockup
eg.
<feed>
<item id='1'>
.......
</item>
<item id='2'>
.......
</item>
<item id='3'>
.......
</item>
.....
<item id='278172'>
</item>
</feed>

<?php



//this will makes files xml-1.xml xml-2.xml etc
makeChunksWithXmlReader($pathToLargeXmlFile, CustomXmlManipulator::
$SPLITAT);


class CustomXmlManipulator{
static $SPLITAT = 5000;


function getXmlChunk($id){
return simplexml_load_file( $this-> getXmlFile($id) );
}

function storeXml($id,$simpleXmlObject){
$file = $this-> getXmlFile($id);
file_put_contents( $file , $simpleXmlObject->asXml() );
//free up the memory
$simpleXmlObject = null;
}

function getXmlFile($id){
$chunk = (int)($id / self::$SPLITAT) + 1;
return 'xml-' . $chunk .' .xml';
}
}


$XMLM = new CustomXmlManipulator();
$first = $XMLM-> getXmlChunk(1);

foreach ($first as $x){
....
......
if(something){
//here you need to manipulate ID 23493
$tmpX = $XMLM-> getXmlChunk(23493);
$tmpX->.... = .....; //change XML
$XMLM->storeXml(23493, $tmpX);
}
}

?>


this is just a basic logic it can be extender further more, depending
on your needs.
function makeChunksWithXmlReader needs to go through a XML file
and make chunks on disk.
more on XMLReader http://www.php.net/manual/en/class.xmlreader.php





On Apr 23, 2008, at 10:41 PM, Steve Gula wrote:

> I could but it would make things very difficult. Some of the
> entities around
> id # 100 could be affected by entities around id #11000 and would
> result in
> a file needing to be manipulated at the same time. Unfortunately, I
> don't
> think this is a top to bottom change for the information at hand.
>
> On Wed, Apr 23, 2008 at 4:36 PM, Bastien Koert <phpster@gmail.com>
> wrote:
>
>>
>>
>> On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote:
>>>
>>> I work for a company that has chosen to use XML (Software AG
>>> Tamino XML
>>> database) as its storage system for an enterprise application. We
>>> need
>>> to
>>> make a system wide change to information within the database that
>>> isn't
>>> feasible to do through our application's user interface. My
>>> solution was
>>> to
>>> unload the XML collection in question, open it, manipulate it, then
>>> write it
>>> back out. Problem is it's a 230+MB file and even with PHP's max
>>> mem set
>>> to
>>> 4096MB (of 8GB available to the system) SimpleXML claims to still
>>> run
>>> out of
>>> memory. Can anyone recommend a better way for handling a large
>>> amount of
>>> XML
>>> data? Thanks.
>>>
>>> --
>>> --Steve Gula
>>>
>>> (this email address is used for list communications only, direct
>>> contact
>>> at
>>> this email address is not guaranteed to be read)
>>>

>>
>> Can you chunk the data in any way, break it into smaller more
>> managable
>> peices?
>>
>> --
>>
>> Bastien
>>
>> Cat, the other other white meat

>
>
>
>
> --
> --Steve Gula
>
> (this email address is used for list communications only, direct
> contact at
> this email address is not guaranteed to be read)


Bojan Tesanovic
http://www.carster.us/





Reply With Quote
  #7 (permalink)  
Old 04-24-2008
Per Jessen
 
Posts: n/a
Default Re: [PHP] Large XML manipulation within PHP

Steve Gula wrote:

> I work for a company that has chosen to use XML (Software AG Tamino
> XML database) as its storage system for an enterprise application. We
> need to make a system wide change to information within the database
> that isn't feasible to do through our application's user interface. My
> solution was to unload the XML collection in question, open it,
> manipulate it, then write it back out. Problem is it's a 230+MB file
> and even with PHP's max mem set to 4096MB (of 8GB available to the
> system) SimpleXML claims to still run out of memory. Can anyone
> recommend a better way for handling a large amount of XML data?


xalan.


/Per Jessen, Zürich

Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 10:34 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0