regex question

This is a discussion on regex question within the PHP Language forums, part of the PHP Programming Forums category; Hi folks, I have to do the following: match everything between "start match after this text:" and "&...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 05-11-2005
Henri Schomaecker
 
Posts: n/a
Default regex question

Hi folks,


I have to do the following:

match everything between "start match after this text:" and "</td>".
My problem is that there are other html-tags between, so [^<] doesn't work.
How can do something like [^<\/td>] (yes, I know this means not < or /
or ...), but do it right?

Many thanks in advance,
yours Henri

--
| Henri Schomäcker - BYTECONCEPTS, VIRTUAL HOMES
| * * Datendesign für Internet und Intranet
| * * * * http://www.byteconcepts.de
| * * * * http://www.virtual-homes.de
Reply With Quote
  #2 (permalink)  
Old 05-11-2005
Oli Filth
 
Posts: n/a
Default Re: regex question

Henri Schomaecker wrote:
> I have to do the following:
>
> match everything between "start match after this text:" and "</td>".
> My problem is that there are other html-tags between, so [^<] doesn't

work.
> How can do something like [^<\/td>] (yes, I know this means not < or

/
> or ...), but do it right?
>


What's wrong with preg_match('/STARTTEXT(.*)<\/td>/', $text, $array)?
Where STARTTEXT is the start match.

Maybe I'm mimsunderstanding your requirement, in which case you would
need to post some explicit examples of what you want.

--
Oli

Reply With Quote
  #3 (permalink)  
Old 05-11-2005
BKDotCom
 
Posts: n/a
Default Re: regex question

'/STARTTEXT(.*)<\/td>/' will continue matching until the last
occurance of </TD>

use this regex:
'|STARTTEXT([^<>]*)</td>|'

you shouldn't have any < or >s (other than those that make up tags)
right. :)

Reply With Quote
  #4 (permalink)  
Old 05-12-2005
Oli Filth
 
Posts: n/a
Default Re: regex question


BKDotCom wrote:
> '/STARTTEXT(.*)<\/td>/' will continue matching until the last
> occurance of </TD>
>


In that case, /STARTTEXT(.*?)<\/td>/ instead.

> use this regex:
> '|STARTTEXT([^<>]*)</td>|'
>
> you shouldn't have any < or >s (other than those that make up tags)
> right. :)


The OP stated that there *will* be other HTML tags in between, so this
regex won't work.

--
Oli

Reply With Quote
  #5 (permalink)  
Old 05-12-2005
Henri Schomaecker
 
Posts: n/a
Default Re: regex question

Thanks to all of you!

I solved it. It was a greedy problem.
I just don't understand why in PHP .* catches far over the (...) when I
don't set the N (non-greedy) Option. - In my Opinion it should at least
stop matching, when the match-making ) is reached. - But it doesn't!
In perl, this is no problem, I tried a few one-liners with the g option
(perl's greedy option) with my example now.
PHP seems to match, and match ..., and does not stop with matching until the
end of the subject string is found.

I recently wrote a (unfortunately at the moment closed source) c++ API for
libpcre. Because the PHP API seems to be kind of copied from pcre, I think
I'll have to make some tests, if this behaviour is also present in he pcre
API, this will really be a problem for me.

Question: Is it correct PHP pcre behaviour to match all over the
match-delimiter ) ?

Many thanks for every answer,
yours Henri

--
| Henri Schomäcker - BYTECONCEPTS, VIRTUAL HOMES
| * * Datendesign für Internet und Intranet
| * * * * http://www.byteconcepts.de
| * * * * http://www.virtual-homes.de
Reply With Quote
  #6 (permalink)  
Old 05-12-2005
Alvaro G Vicario
 
Posts: n/a
Default Re: regex question

*** BKDotCom wrote/escribió (11 May 2005 14:57:11 -0700):
> '/STARTTEXT(.*)<\/td>/' will continue matching until the last
> occurance of </TD>


Unless you turn greediness off:

'/STARTTEXT(.*)<\/td>/U'

or just

'#STARTTEXT(.*)</td>#U'


--
-- Álvaro G. Vicario - Burgos, Spain
-- http://bits.demogracia.com - Mi sitio sobre programación web
-- Don't e-mail me your questions, post them to the group
--
Reply With Quote
  #7 (permalink)  
Old 05-14-2005
Tim Roberts
 
Posts: n/a
Default Re: regex question

Henri Schomaecker <hs@byteconcepts.de> wrote:
>
>I solved it. It was a greedy problem.
>I just don't understand why in PHP .* catches far over the (...) when I
>don't set the N (non-greedy) Option. - In my Opinion it should at least
>stop matching, when the match-making ) is reached. - But it doesn't!


That's your opinion, because it conveniently suits your current
requirement. Regular expressions have been greedy right from the start.

>In perl, this is no problem, I tried a few one-liners with the g option
>(perl's greedy option) with my example now.


Perl is greedy by default (as are all regular expression matchers).
Perhaps you should post your test so we can figure out what you really did.

>PHP seems to match, and match ..., and does not stop with matching until the
>end of the subject string is found.


Please post your exact tests. I want to make sure we can explain this to
everyone.
--
- Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 11:02 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0