This is a discussion on ereg_replace question within the PHP Language forums, part of the PHP Programming Forums category; On Mon, 22 May 2006 14:38:52 -0700, John Dunlop wrote: >> So it would seem that while [^...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
On Mon, 22 May 2006 14:38:52 -0700, John Dunlop wrote:
>> So it would seem that while [^0-9-] works in PHP/Perl, it's actually not >> standard and I am correct to use [^0-9\-] in order to ensure maximum >> compatibility with future version which may implement the standard more >> strictly. > > I'd not say you're correct OK, so ignoring the latter part about referring to standards, why am I not correct? > and I'd shy away from speaking about > *the* "standard", whatever you mean by that. Where there's two kinds of > regular expression, claiming that one is standard implies the other is > not, forcing upon it gratuitous negative connotations. If you do feel the > urge to think in terms of standard/non-standard, don't think of there > being one standard and one non-standard but rather of there being two > standards. OK, considering the two main standards out there (PCRE and POSIX), both of them suggest literal hyphens should be quoted within metcharacter classes. The main book most people use to refer to Regular Expressions suggests the same thing. While I understand there's no one true standard for regexes, the nearest things we have say it should be done one way, therefore although method B also works, if it's not in any references it may be just an oversight that will be removed in a latter revision of the code. Cheers, Andy -- Andy Jeffries MBCS CITP ZCE | gPHPEdit Lead Developer http://www.gphpedit.org | PHP editor for Gnome 2 http://www.andyjeffries.co.uk | Personal site and photos |
|
|||
|
Carved in mystic runes upon the very living rock, the last words of Andy
Jeffries of comp.lang.php make plain: > On Mon, 22 May 2006 19:52:24 -0500, Alan Little wrote: >>> On p79 of Mastering Regular Expressions by Jeffrey E F Friedl (ISBN >>> 1-56592-257-3) it says: >>> >>> "In limited-metacharacter-class implementations, other metacharacter >>> (including in most tools, even backslashes) are not recognized. So, >>> for example, you can't use \- or \] to insert a hyphen or a closing >>> bracket in to the class." This precedes a list of characters that >>> are available in these limited implementations which are >>> specifically: a leading caret, the closing bracket and a dash as a >>> range operator. >>> >>> I'm sure that book details the "standard" for regular expressions in >>> most people's eyes and that book (as quoted above) uses \- as the >>> syntax to insert a literal hyphen with a metacharacter class >>> ([...]). >>> >>> So it would seem that while [^0-9-] works in PHP/Perl, it's actually >>> not standard and I am correct to use [^0-9\-] in order to ensure >>> maximum compatibility with future version which may implement the >>> standard more strictly. >> >> That's a good reference, but I don't follow you. The part you quoted >> from the book says you *can't* use \- to insert a hyphen in the >> class. > > In case it's not clear, that's a book on Regular Expressions and not > specifically about PHP regexes. I understand. > In a *limited-metacharacter-class implementation*. Those > implementations can only accept leading caret, closing bracket and a > hyphen as a range character (i.e. there's no way to find a hyphen, a > slash or any other non-alphanumeric character). PHP is not a > limited-metacharacter-class implementation. Pardon my density, but I still don't follow you. The book says: >>> "So, for example, you can't use \- or \] to insert a hyphen or a >>> closing bracket in to the class." You say: >>> I am correct to use [^0-9\-] in order to ensure The book says it's incorrect, but you're saying it's correct? Am I missing something? -- Alan Little Phorm PHP Form Processor http://www.phorm.com/ |
|
|||
|
On Tue, 23 May 2006 06:11:02 -0500, Alan Little wrote:
>>>> "In limited-metacharacter-class implementations, other metacharacter >>>> (including in most tools, even backslashes) are not recognized. So, >>>> for example, you can't use \- or \] to insert a hyphen or a closing >>>> bracket in to the class." This precedes a list of characters that are >>>> available in these limited implementations which are specifically: a >>>> leading caret, the closing bracket and a dash as a range operator. > > Pardon my density, but I still don't follow you. The book says: > >>>> "So, for example, you can't use \- or \] to insert a hyphen or a >>>> closing bracket in to the class." > > You say: > >>>> I am correct to use [^0-9\-] in order to ensure > > The book says it's incorrect, but you're saying it's correct? Am I missing > something? The book is saying in (limited, non-full, implementations) you cannot use "\-" to insert a hyphen as you cannot search for a hyphen as one of the characters in a metaclass. It gives an example (which I paraphrased) of the only acceptable characters in a limited implementation and basically you can't include (in any shape or form) hyphens or square brackets in the class. So the book said "in these limited forms you can't use \- to insert a hyphen", which by the phrasing indicates that's the normal way of doing it in a full implementation. PHP is a full PCRE implementation with all bells and whistles (including backreferences). Does that make more sense? Cheers, Andy -- Andy Jeffries MBCS CITP ZCE | gPHPEdit Lead Developer http://www.gphpedit.org | PHP editor for Gnome 2 http://www.andyjeffries.co.uk | Personal site and photos |
|
|||
|
Carved in mystic runes upon the very living rock, the last words of Andy
Jeffries of comp.lang.php make plain: > So the book said "in these limited forms you can't use \- to insert a > hyphen", which by the phrasing indicates that's the normal way of > doing it in a full implementation. > > PHP is a full PCRE implementation with all bells and whistles > (including backreferences). OK, I see what you're saying. -- Alan Little Phorm PHP Form Processor http://www.phorm.com/ |
|
|||
|
Andy Jeffries:
> OK, so ignoring the latter part about referring to standards, why am I not > correct? I didn't say you weren't correct. I just don't see why saying that you are correct helps. > OK, considering the two main standards out there (PCRE and POSIX), both of > them suggest literal hyphens should be quoted within metcharacter classes. I don't know what you mean. The notation of POSIX regular expressions does not suggest anything of the sort but actually *rules* *out* backslashes as escape characters in character classes. The man pages are quite explicit: backslashes lose their metacharacter function there. The notation of PCREs does allow backslashes as escape characters in character classes but also allows literal hyphens to occur in certain positions unescaped. I don't see how it follows from that that the notation used by either kind of regular expression, let alone both, suggests that literal hyphens *should* be escaped. -- Jock |
![]() |
| Thread Tools | |
| Display Modes | |
|
|