ereg_replace question

This is a discussion on ereg_replace question within the PHP Language forums, part of the PHP Programming Forums category; On Mon, 22 May 2006 14:38:52 -0700, John Dunlop wrote: >> So it would seem that while [^...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 05-23-2006
Andy Jeffries
 
Posts: n/a
Default Re: two kinds of regular expression

On Mon, 22 May 2006 14:38:52 -0700, John Dunlop wrote:
>> So it would seem that while [^0-9-] works in PHP/Perl, it's actually not
>> standard and I am correct to use [^0-9\-] in order to ensure maximum
>> compatibility with future version which may implement the standard more
>> strictly.

>
> I'd not say you're correct


OK, so ignoring the latter part about referring to standards, why am I not
correct?

> and I'd shy away from speaking about
> *the* "standard", whatever you mean by that. Where there's two kinds of
> regular expression, claiming that one is standard implies the other is
> not, forcing upon it gratuitous negative connotations. If you do feel the
> urge to think in terms of standard/non-standard, don't think of there
> being one standard and one non-standard but rather of there being two
> standards.


OK, considering the two main standards out there (PCRE and POSIX), both of
them suggest literal hyphens should be quoted within metcharacter classes.
The main book most people use to refer to Regular Expressions suggests the
same thing.

While I understand there's no one true standard for regexes, the nearest
things we have say it should be done one way, therefore although method B
also works, if it's not in any references it may be just an oversight that
will be removed in a latter revision of the code.

Cheers,


Andy


--
Andy Jeffries MBCS CITP ZCE | gPHPEdit Lead Developer
http://www.gphpedit.org | PHP editor for Gnome 2
http://www.andyjeffries.co.uk | Personal site and photos

Reply With Quote
  #12 (permalink)  
Old 05-23-2006
Alan Little
 
Posts: n/a
Default Re: ereg_replace question

Carved in mystic runes upon the very living rock, the last words of Andy
Jeffries of comp.lang.php make plain:

> On Mon, 22 May 2006 19:52:24 -0500, Alan Little wrote:
>>> On p79 of Mastering Regular Expressions by Jeffrey E F Friedl (ISBN
>>> 1-56592-257-3) it says:
>>>
>>> "In limited-metacharacter-class implementations, other metacharacter
>>> (including in most tools, even backslashes) are not recognized. So,
>>> for example, you can't use \- or \] to insert a hyphen or a closing
>>> bracket in to the class." This precedes a list of characters that
>>> are available in these limited implementations which are
>>> specifically: a leading caret, the closing bracket and a dash as a
>>> range operator.
>>>
>>> I'm sure that book details the "standard" for regular expressions in
>>> most people's eyes and that book (as quoted above) uses \- as the
>>> syntax to insert a literal hyphen with a metacharacter class
>>> ([...]).
>>>
>>> So it would seem that while [^0-9-] works in PHP/Perl, it's actually
>>> not standard and I am correct to use [^0-9\-] in order to ensure
>>> maximum compatibility with future version which may implement the
>>> standard more strictly.

>>
>> That's a good reference, but I don't follow you. The part you quoted
>> from the book says you *can't* use \- to insert a hyphen in the
>> class.

>
> In case it's not clear, that's a book on Regular Expressions and not
> specifically about PHP regexes.


I understand.

> In a *limited-metacharacter-class implementation*. Those
> implementations can only accept leading caret, closing bracket and a
> hyphen as a range character (i.e. there's no way to find a hyphen, a
> slash or any other non-alphanumeric character). PHP is not a
> limited-metacharacter-class implementation.


Pardon my density, but I still don't follow you. The book says:

>>> "So, for example, you can't use \- or \] to insert a hyphen or a
>>> closing bracket in to the class."


You say:

>>> I am correct to use [^0-9\-] in order to ensure


The book says it's incorrect, but you're saying it's correct? Am I
missing something?

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/
Reply With Quote
  #13 (permalink)  
Old 05-23-2006
Andy Jeffries
 
Posts: n/a
Default Re: ereg_replace question

On Tue, 23 May 2006 06:11:02 -0500, Alan Little wrote:
>>>> "In limited-metacharacter-class implementations, other metacharacter
>>>> (including in most tools, even backslashes) are not recognized. So,
>>>> for example, you can't use \- or \] to insert a hyphen or a closing
>>>> bracket in to the class." This precedes a list of characters that are
>>>> available in these limited implementations which are specifically: a
>>>> leading caret, the closing bracket and a dash as a range operator.

>
> Pardon my density, but I still don't follow you. The book says:
>
>>>> "So, for example, you can't use \- or \] to insert a hyphen or a
>>>> closing bracket in to the class."

>
> You say:
>
>>>> I am correct to use [^0-9\-] in order to ensure

>
> The book says it's incorrect, but you're saying it's correct? Am I missing
> something?


The book is saying in (limited, non-full, implementations) you cannot use
"\-" to insert a hyphen as you cannot search for a hyphen as one of the
characters in a metaclass.

It gives an example (which I paraphrased) of the only acceptable
characters in a limited implementation and basically you can't include (in
any shape or form) hyphens or square brackets in the class.

So the book said "in these limited forms you can't use \- to insert a
hyphen", which by the phrasing indicates that's the normal way of doing it
in a full implementation.

PHP is a full PCRE implementation with all bells and whistles (including
backreferences).

Does that make more sense?

Cheers,


Andy


--
Andy Jeffries MBCS CITP ZCE | gPHPEdit Lead Developer
http://www.gphpedit.org | PHP editor for Gnome 2
http://www.andyjeffries.co.uk | Personal site and photos

Reply With Quote
  #14 (permalink)  
Old 05-23-2006
Alan Little
 
Posts: n/a
Default Re: ereg_replace question

Carved in mystic runes upon the very living rock, the last words of Andy
Jeffries of comp.lang.php make plain:

> So the book said "in these limited forms you can't use \- to insert a
> hyphen", which by the phrasing indicates that's the normal way of
> doing it in a full implementation.
>
> PHP is a full PCRE implementation with all bells and whistles
> (including backreferences).


OK, I see what you're saying.

--
Alan Little
Phorm PHP Form Processor
http://www.phorm.com/
Reply With Quote
  #15 (permalink)  
Old 05-23-2006
John Dunlop
 
Posts: n/a
Default Re: two kinds of regular expression

Andy Jeffries:

> OK, so ignoring the latter part about referring to standards, why am I not
> correct?


I didn't say you weren't correct. I just don't see why saying that
you are correct helps.

> OK, considering the two main standards out there (PCRE and POSIX), both of
> them suggest literal hyphens should be quoted within metcharacter classes.


I don't know what you mean. The notation of POSIX regular
expressions does not suggest anything of the sort but actually *rules*
*out* backslashes as escape characters in character classes. The man
pages are quite explicit: backslashes lose their metacharacter
function there. The notation of PCREs does allow backslashes as escape
characters in character classes but also allows literal hyphens to
occur in certain positions unescaped. I don't see how it follows from
that that the notation used by either kind of regular expression, let
alone both, suggests that literal hyphens *should* be escaped.

--
Jock

Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 04:16 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0