This is a discussion on Good XML Parser within the PHP General forums, part of the PHP Programming Forums category; What's the best way to pull down XML from a URL? fopen($URL), then using xml_parse? Or should I ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
2008/5/12 Waynn Lue <waynnlue@gmail.com>:
> What's the best way to pull down XML from a URL? fopen($URL), then > using xml_parse? Or should I be using XML_Parser or SimpleXML? XML parsers fall into two general camps - DOM and SAX. DOM parsers represent an entire XML document as a tree, in-memory, when they are first instantiated. They are generally more memory-hungry and take longer to instantiate, but they can answer queries like "what is the path to this node" or "give me the siblings of this node". SAX parsers are stream- or event-based, and are much more lightweight - they parse the XML in a JIT fashion, and can't answer much more than "give me the next node". If you just need the data, a SAX parser will probably do everything you need. If you need the tree structure implicit in an XML document, use a DOM parser. Expat, which XML Parser (http://uk3.php.net/manual/en/book.xml.php) is based on, is a SAX parser. DOM XML (http://uk3.php.net/manual/en/book.domxml.php) is, obviously, a DOM parser. I don't know, off the top of my head, which camp SimpleXML falls into. |
|
|||
|
So if I'm looking to parse certain attributes out of an XML tree, if I
use SAX, it seems that I would need to keep track of state internally. E.g., if I have a tree like <head> <a> <b></b> </a> <a> <b></b> </a> </head> and say I'm interested in all that's between <b> underneath any <a>, I'd need to have a state machine that looked for an <a> followed by a <b>. If I'm doing that, though, it seems like I should just start using a DOM parser instead? Thanks for any insight, Waynn On Mon, May 12, 2008 at 1:29 AM, David Otton <phpmail@jawbone.freeserve.co.uk> wrote: > 2008/5/12 Waynn Lue <waynnlue@gmail.com>: > > > What's the best way to pull down XML from a URL? fopen($URL), then > > using xml_parse? Or should I be using XML_Parser or SimpleXML? > > XML parsers fall into two general camps - DOM and SAX. DOM parsers > represent an entire XML document as a tree, in-memory, when they are > first instantiated. They are generally more memory-hungry and take > longer to instantiate, but they can answer queries like "what is the > path to this node" or "give me the siblings of this node". > > SAX parsers are stream- or event-based, and are much more lightweight > - they parse the XML in a JIT fashion, and can't answer much more than > "give me the next node". > > If you just need the data, a SAX parser will probably do everything > you need. If you need the tree structure implicit in an XML document, > use a DOM parser. Expat, which XML Parser > (http://uk3.php.net/manual/en/book.xml.php) is based on, is a SAX > parser. DOM XML (http://uk3.php.net/manual/en/book.domxml.php) is, > obviously, a DOM parser. I don't know, off the top of my head, which > camp SimpleXML falls into. > |
|
|||
|
Fot SImpler XMLs and not too large up to 1Mb I would use
$X = simplexml_load_file($URL); simple xml is fairly fast and is very easy to use it accepts foreach loops, accessing attributes via array fashion etc On May 12, 2008, at 9:02 AM, Waynn Lue wrote: > What's the best way to pull down XML from a URL? fopen($URL), then > using xml_parse? Or should I be using XML_Parser or SimpleXML? > > Thanks, > Waynn > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > Bojan Tesanovic http://www.carster.us/ |
|
|||
|
Here is the very simple way ;)
<?php $XML=<<<XMLL <head> <a href='/asas' > <b>First</b> </a> <a href='/bla' > <b class='klas' >Second</b> </a> </head> XMLL; $X = simplexml_load_string($XML); foreach ($X->a as $a){ echo $a->b ."\n"; if( $a->b['class'] ) { echo 'B has class - ' .$a->b['class']."\n"; } } ?> On May 12, 2008, at 1:28 PM, Waynn Lue wrote: > So if I'm looking to parse certain attributes out of an XML tree, if I > use SAX, it seems that I would need to keep track of state internally. > E.g., if I have a tree like > > <head> > <a> > <b></b> > </a> > <a> > <b></b> > </a> > </head> > > and say I'm interested in all that's between <b> underneath any <a>, > I'd need to have a state machine that looked for an <a> followed by a > <b>. If I'm doing that, though, it seems like I should just start > using a DOM parser instead? > > Thanks for any insight, > Waynn > > On Mon, May 12, 2008 at 1:29 AM, David Otton > <phpmail@jawbone.freeserve.co.uk> wrote: >> 2008/5/12 Waynn Lue <waynnlue@gmail.com>: >> >>> What's the best way to pull down XML from a URL? fopen($URL), then >>> using xml_parse? Or should I be using XML_Parser or SimpleXML? >> >> XML parsers fall into two general camps - DOM and SAX. DOM parsers >> represent an entire XML document as a tree, in-memory, when they are >> first instantiated. They are generally more memory-hungry and take >> longer to instantiate, but they can answer queries like "what is the >> path to this node" or "give me the siblings of this node". >> >> SAX parsers are stream- or event-based, and are much more >> lightweight >> - they parse the XML in a JIT fashion, and can't answer much more >> than >> "give me the next node". >> >> If you just need the data, a SAX parser will probably do everything >> you need. If you need the tree structure implicit in an XML >> document, >> use a DOM parser. Expat, which XML Parser >> (http://uk3.php.net/manual/en/book.xml.php) is based on, is a SAX >> parser. DOM XML (http://uk3.php.net/manual/en/book.domxml.php) is, >> obviously, a DOM parser. I don't know, off the top of my head, which >> camp SimpleXML falls into. >> > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > Bojan Tesanovic http://www.carster.us/ |
|
|||
|
2008/5/12 Waynn Lue <waynnlue@gmail.com>:
> So if I'm looking to parse certain attributes out of an XML tree, if I > use SAX, it seems that I would need to keep track of state internally. > E.g., if I have a tree like > > <head> > <a> > <b></b> > </a> > <a> > <b></b> > </a> > </head> > > and say I'm interested in all that's between <b> underneath any <a>, > I'd need to have a state machine that looked for an <a> followed by a > <b>. If I'm doing that, though, it seems like I should just start > using a DOM parser instead? Yeah, I think you've got it nailed, although your example is simple enough (you're only holding one state value - "am I a child of <a>?") that I'd probably still reflexively reach for the lightweight solution). I use SAX for lightweight hacks, one step up from regexes - I know the information I want is between <tag> and </tag>, and I don't care about the rest of the document. The more I need to navigate the document, the more likely I am to use DOM. I could build my own data structures on top of a SAX parser, but why bother reinventing the wheel? Of course, you have to factor document size into that - parsing a big XML document into a tree can be slow. You might also want to explore XPath (http://uk.php.net/manual/en/function...ment-xpath.php http://uk.php.net/manual/en/class.domxpath.php)... XPath is to XML as Regexes are to text files. There's a good chance you'll be able to roll all your parsing up into a couple of XPath queries. I probably should have added that simple parsers come in two flavours - Push Parsers and Pull Parsers. I tend to think (lazily) of Push and Pull as variations on SAX, but strictly speaking they are different. |
|
|||
|
Chetan Rane wrote:
> Hi All > > I am using a PHP Mailer to send mass mails. > How can I Identify how mails have bounced. > Hi, I guess you have to read some RFC's to get an idea about e-mail protocols. -- Aschwin Wesselius /'What you would like to be done to you, do that to the other....'/ |
|
|||
|
Seems like the general way is to create a mailbox (POP3 or IMAP) to
accept the bounces, then check it periodically and mark the emails as invalid in your local database. I would set threshholds so you don't mark something failed that only bounced once - it could have been a mail setup error or something else; I'd say wait for 3 failures in a 7 day period at least. If you get 3 bounces by that point, the address is probably safely dead. You can use PHP's IMAP functions to check the mailbox (even for POP3) or a million classes or your own functions directly on the socket (POP3 is a simple protocol) - it also helps if you parse the bounced email message to process the return address and the mail code; perhaps build something better than just 3 failures = invalid, but actually determine if they're full out failures, or if they're just temporary bounces, etc. Another method: you could just parse mail logs, if you have access to them. > Chetan Rane wrote: > > Hi All > > > > I am using a PHP Mailer to send mass mails. > > How can I Identify how mails have bounced. |
|
|||
|
Chetan Rane wrote:
> Hi All > > I am using a PHP Mailer to send mass mails. > How can I Identify how mails have bounced. > You send them with a bounce-address that uniquely identifies the recipient - when the email bounces, you know exactly which recipient it was. I typically have my mailserver do a quick database update to set a status for such bounces. /Per Jessen, Zürich |