This is a discussion on regex question within the PHP Language forums, part of the PHP Programming Forums category; Hi folks, I have to do the following: match everything between "start match after this text:" and "&...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi folks,
I have to do the following: match everything between "start match after this text:" and "</td>". My problem is that there are other html-tags between, so [^<] doesn't work. How can do something like [^<\/td>] (yes, I know this means not < or / or ...), but do it right? Many thanks in advance, yours Henri -- | Henri Schomäcker - BYTECONCEPTS, VIRTUAL HOMES | * * Datendesign für Internet und Intranet | * * * * http://www.byteconcepts.de | * * * * http://www.virtual-homes.de |
|
|||
|
Henri Schomaecker wrote:
> I have to do the following: > > match everything between "start match after this text:" and "</td>". > My problem is that there are other html-tags between, so [^<] doesn't work. > How can do something like [^<\/td>] (yes, I know this means not < or / > or ...), but do it right? > What's wrong with preg_match('/STARTTEXT(.*)<\/td>/', $text, $array)? Where STARTTEXT is the start match. Maybe I'm mimsunderstanding your requirement, in which case you would need to post some explicit examples of what you want. -- Oli |
|
|||
|
BKDotCom wrote: > '/STARTTEXT(.*)<\/td>/' will continue matching until the last > occurance of </TD> > In that case, /STARTTEXT(.*?)<\/td>/ instead. > use this regex: > '|STARTTEXT([^<>]*)</td>|' > > you shouldn't have any < or >s (other than those that make up tags) > right. :) The OP stated that there *will* be other HTML tags in between, so this regex won't work. -- Oli |
|
|||
|
Thanks to all of you!
I solved it. It was a greedy problem. I just don't understand why in PHP .* catches far over the (...) when I don't set the N (non-greedy) Option. - In my Opinion it should at least stop matching, when the match-making ) is reached. - But it doesn't! In perl, this is no problem, I tried a few one-liners with the g option (perl's greedy option) with my example now. PHP seems to match, and match ..., and does not stop with matching until the end of the subject string is found. I recently wrote a (unfortunately at the moment closed source) c++ API for libpcre. Because the PHP API seems to be kind of copied from pcre, I think I'll have to make some tests, if this behaviour is also present in he pcre API, this will really be a problem for me. Question: Is it correct PHP pcre behaviour to match all over the match-delimiter ) ? Many thanks for every answer, yours Henri -- | Henri Schomäcker - BYTECONCEPTS, VIRTUAL HOMES | * * Datendesign für Internet und Intranet | * * * * http://www.byteconcepts.de | * * * * http://www.virtual-homes.de |
|
|||
|
*** BKDotCom wrote/escribió (11 May 2005 14:57:11 -0700):
> '/STARTTEXT(.*)<\/td>/' will continue matching until the last > occurance of </TD> Unless you turn greediness off: '/STARTTEXT(.*)<\/td>/U' or just '#STARTTEXT(.*)</td>#U' -- -- Álvaro G. Vicario - Burgos, Spain -- http://bits.demogracia.com - Mi sitio sobre programación web -- Don't e-mail me your questions, post them to the group -- |
|
|||
|
Henri Schomaecker <hs@byteconcepts.de> wrote:
> >I solved it. It was a greedy problem. >I just don't understand why in PHP .* catches far over the (...) when I >don't set the N (non-greedy) Option. - In my Opinion it should at least >stop matching, when the match-making ) is reached. - But it doesn't! That's your opinion, because it conveniently suits your current requirement. Regular expressions have been greedy right from the start. >In perl, this is no problem, I tried a few one-liners with the g option >(perl's greedy option) with my example now. Perl is greedy by default (as are all regular expression matchers). Perhaps you should post your test so we can figure out what you really did. >PHP seems to match, and match ..., and does not stop with matching until the >end of the subject string is found. Please post your exact tests. I want to make sure we can explain this to everyone. -- - Tim Roberts, timr@probo.com Providenza & Boekelheide, Inc. |