This is a discussion on Large XML manipulation within PHP within the PHP General forums, part of the PHP Programming Forums category; I work for a company that has chosen to use XML (Software AG Tamino XML database) as its storage system ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I work for a company that has chosen to use XML (Software AG Tamino XML
database) as its storage system for an enterprise application. We need to make a system wide change to information within the database that isn't feasible to do through our application's user interface. My solution was to unload the XML collection in question, open it, manipulate it, then write it back out. Problem is it's a 230+MB file and even with PHP's max mem set to 4096MB (of 8GB available to the system) SimpleXML claims to still run out of memory. Can anyone recommend a better way for handling a large amount of XML data? Thanks. -- --Steve Gula (this email address is used for list communications only, direct contact at this email address is not guaranteed to be read) |
|
|||
|
On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote:
> > I work for a company that has chosen to use XML (Software AG Tamino XML > database) as its storage system for an enterprise application. We need to > make a system wide change to information within the database that isn't > feasible to do through our application's user interface. My solution was > to > unload the XML collection in question, open it, manipulate it, then write > it > back out. Problem is it's a 230+MB file and even with PHP's max mem set to > 4096MB (of 8GB available to the system) SimpleXML claims to still run out > of > memory. Can anyone recommend a better way for handling a large amount of > XML > data? Thanks. > > -- > --Steve Gula > > (this email address is used for list communications only, direct contact > at > this email address is not guaranteed to be read) > Can you chunk the data in any way, break it into smaller more managable peices? -- Bastien Cat, the other other white meat |
|
|||
|
I could but it would make things very difficult. Some of the entities around
id # 100 could be affected by entities around id #11000 and would result in a file needing to be manipulated at the same time. Unfortunately, I don't think this is a top to bottom change for the information at hand. On Wed, Apr 23, 2008 at 4:36 PM, Bastien Koert <phpster@gmail.com> wrote: > > > On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote: > > > > I work for a company that has chosen to use XML (Software AG Tamino XML > > database) as its storage system for an enterprise application. We need > > to > > make a system wide change to information within the database that isn't > > feasible to do through our application's user interface. My solution was > > to > > unload the XML collection in question, open it, manipulate it, then > > write it > > back out. Problem is it's a 230+MB file and even with PHP's max mem set > > to > > 4096MB (of 8GB available to the system) SimpleXML claims to still run > > out of > > memory. Can anyone recommend a better way for handling a large amount of > > XML > > data? Thanks. > > > > -- > > --Steve Gula > > > > (this email address is used for list communications only, direct contact > > at > > this email address is not guaranteed to be read) > > > > Can you chunk the data in any way, break it into smaller more managable > peices? > > -- > > Bastien > > Cat, the other other white meat -- --Steve Gula (this email address is used for list communications only, direct contact at this email address is not guaranteed to be read) |
|
|||
|
On 23 Apr 2008, at 21:41, Steve Gula wrote:
> I could but it would make things very difficult. Some of the > entities around > id # 100 could be affected by entities around id #11000 and would > result in > a file needing to be manipulated at the same time. Unfortunately, I > don't > think this is a top to bottom change for the information at hand. Can you not do it with a text processor like sed? That would be a lot easier than trying to do it with SimpleXML. -Stut -- http://stut.net/ > On Wed, Apr 23, 2008 at 4:36 PM, Bastien Koert <phpster@gmail.com> > wrote: > >> >> >> On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote: >>> >>> I work for a company that has chosen to use XML (Software AG >>> Tamino XML >>> database) as its storage system for an enterprise application. We >>> need >>> to >>> make a system wide change to information within the database that >>> isn't >>> feasible to do through our application's user interface. My >>> solution was >>> to >>> unload the XML collection in question, open it, manipulate it, then >>> write it >>> back out. Problem is it's a 230+MB file and even with PHP's max >>> mem set >>> to >>> 4096MB (of 8GB available to the system) SimpleXML claims to still >>> run >>> out of >>> memory. Can anyone recommend a better way for handling a large >>> amount of >>> XML >>> data? Thanks. >>> >>> -- >>> --Steve Gula >>> >>> (this email address is used for list communications only, direct >>> contact >>> at >>> this email address is not guaranteed to be read) >>> >> >> Can you chunk the data in any way, break it into smaller more >> managable >> peices? >> >> -- >> >> Bastien >> >> Cat, the other other white meat > > > > > -- > --Steve Gula > > (this email address is used for list communications only, direct > contact at > this email address is not guaranteed to be read) |
|
|||
|
Hi,
How about expat with custom XML handlers? Should work even with an 32 MB memory limit. It will just take some time ... Have fun Bastien Koert schrieb: > On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote: >> I work for a company that has chosen to use XML (Software AG Tamino XML >> database) as its storage system for an enterprise application. We need to >> make a system wide change to information within the database that isn't >> feasible to do through our application's user interface. My solution was >> to >> unload the XML collection in question, open it, manipulate it, then write >> it >> back out. Problem is it's a 230+MB file and even with PHP's max mem set to >> 4096MB (of 8GB available to the system) SimpleXML claims to still run out >> of >> memory. Can anyone recommend a better way for handling a large amount of >> XML >> data? Thanks. >> >> -- >> --Steve Gula >> >> (this email address is used for list communications only, direct contact >> at >> this email address is not guaranteed to be read) >> > > Can you chunk the data in any way, break it into smaller more managable > peices? > |
|
|||
|
In that case you may want to try XMLReader as it doesn't load all XML
into memory. If that doesn't help that you will need to do custom parser application for you need. using XMLReader to read through whole XML chunking it with eg every 5000 items and storing those chunks on disk. Than use SimpleXML to read and manipulate those chunks and save them back to disk. It would help if you can provide with XML mockup eg. <feed> <item id='1'> ....... </item> <item id='2'> ....... </item> <item id='3'> ....... </item> ..... <item id='278172'> </item> </feed> <?php //this will makes files xml-1.xml xml-2.xml etc makeChunksWithXmlReader($pathToLargeXmlFile, CustomXmlManipulator:: $SPLITAT); class CustomXmlManipulator{ static $SPLITAT = 5000; function getXmlChunk($id){ return simplexml_load_file( $this-> getXmlFile($id) ); } function storeXml($id,$simpleXmlObject){ $file = $this-> getXmlFile($id); file_put_contents( $file , $simpleXmlObject->asXml() ); //free up the memory $simpleXmlObject = null; } function getXmlFile($id){ $chunk = (int)($id / self::$SPLITAT) + 1; return 'xml-' . $chunk .' .xml'; } } $XMLM = new CustomXmlManipulator(); $first = $XMLM-> getXmlChunk(1); foreach ($first as $x){ .... ...... if(something){ //here you need to manipulate ID 23493 $tmpX = $XMLM-> getXmlChunk(23493); $tmpX->.... = .....; //change XML $XMLM->storeXml(23493, $tmpX); } } ?> this is just a basic logic it can be extender further more, depending on your needs. function makeChunksWithXmlReader needs to go through a XML file and make chunks on disk. more on XMLReader http://www.php.net/manual/en/class.xmlreader.php On Apr 23, 2008, at 10:41 PM, Steve Gula wrote: > I could but it would make things very difficult. Some of the > entities around > id # 100 could be affected by entities around id #11000 and would > result in > a file needing to be manipulated at the same time. Unfortunately, I > don't > think this is a top to bottom change for the information at hand. > > On Wed, Apr 23, 2008 at 4:36 PM, Bastien Koert <phpster@gmail.com> > wrote: > >> >> >> On 4/23/08, Steve Gula <sg-lists@stevegula.net> wrote: >>> >>> I work for a company that has chosen to use XML (Software AG >>> Tamino XML >>> database) as its storage system for an enterprise application. We >>> need >>> to >>> make a system wide change to information within the database that >>> isn't >>> feasible to do through our application's user interface. My >>> solution was >>> to >>> unload the XML collection in question, open it, manipulate it, then >>> write it >>> back out. Problem is it's a 230+MB file and even with PHP's max >>> mem set >>> to >>> 4096MB (of 8GB available to the system) SimpleXML claims to still >>> run >>> out of >>> memory. Can anyone recommend a better way for handling a large >>> amount of >>> XML >>> data? Thanks. >>> >>> -- >>> --Steve Gula >>> >>> (this email address is used for list communications only, direct >>> contact >>> at >>> this email address is not guaranteed to be read) >>> >> >> Can you chunk the data in any way, break it into smaller more >> managable >> peices? >> >> -- >> >> Bastien >> >> Cat, the other other white meat > > > > > -- > --Steve Gula > > (this email address is used for list communications only, direct > contact at > this email address is not guaranteed to be read) Bojan Tesanovic http://www.carster.us/ |
|
|||
|
Steve Gula wrote:
> I work for a company that has chosen to use XML (Software AG Tamino > XML database) as its storage system for an enterprise application. We > need to make a system wide change to information within the database > that isn't feasible to do through our application's user interface. My > solution was to unload the XML collection in question, open it, > manipulate it, then write it back out. Problem is it's a 230+MB file > and even with PHP's max mem set to 4096MB (of 8GB available to the > system) SimpleXML claims to still run out of memory. Can anyone > recommend a better way for handling a large amount of XML data? xalan. /Per Jessen, Zürich |
![]() |
| Thread Tools | |
| Display Modes | |
|
|