Re: Re: preg_match() returns false but no documentation why

This is a discussion on Re: Re: preg_match() returns false but no documentation why within the PHP General forums, part of the PHP Programming Forums category; > If the pattern delimiter character appears in the pattern it must be > escaped so that the regexp processor ...


Go Back   Usenet Forums > PHP Programming Forums > PHP General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 05-31-2007
Jared Farrish
 
Posts: n/a
Default Re: Re: preg_match() returns false but no documentation why

> If the pattern delimiter character appears in the pattern it must be
> escaped so that the regexp processor will correctly interpret it as a
> pattern character and not as the end of the pattern.
>
> This would produce a regexp error:
>
> /ldap://*/
>
> but this is OK:
>
> /ldap:\/\/*/
>
> Therefore if you choose another delimiter altogether you don't have
> to escape the slashes:
>
> #ldap://*#
>
> Cleaner and more clear.


Ok, that makes sense.

> >preg_match('|^ldap(s)?://[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$|', $this->server

)
> >>
> >>I also recommend using single quotes instead of double quotes here.

> >
> >Single Quotes: Noted. Any reason why? I guess you might be a little out

of
> >luck putting $vars into a regex without . concatenating.

>
> Both PHP and regexp use the backslash as an escape. Inside double
> quotes, PHP interprets \ as escape, while inside single quotes PHP
> interprets \ as a simple backslash character.
>
> When working with regexp in PHP you're dealing with two interpreters,
> first PHP and then regexp. To support PHP's interpretation with
> double quotes, you have to escape the escapes:
>
> Single quotes: '/ldap:\/\/*/'
> Double quotes: "/ldap:\\/\\/*/"
>
> PHP interprets "\\/" as \/
> RegExp interprets \/ as /


Oh. Duh! I wasn't even considering PHP parsing the string due to the double
quoted string.

> So, for a pattern like this that contains slashes, it's best to use a
> non-slash delimiter AND single quotes (unless, as you say, you need
> to include PHP variables in the pattern):
>
> $pattern = '#ldap://*#';
>
> Personally I favor heredoc syntax for such situations because I don't
> have to worry about the quotes:
>
> $regexp = <<<_
> #ldap://*$var#
> _;


Yeah, I just wish there were some way heredoc could work on one line.

> >>why is there a period in the second pattern?

> >
> >The period comes from the original article on SitePoint (linked earlier).

Is
> >it unnecessary? I can't say I'm real sure what this means for the '.' in
> >regex's:
> >
> >"Matches any single character except line break characters \r and \n.

Most
> >regex flavors have an option to make the dot match line break characters
> >too."
> >- http://www.regular-expressions.info/reference.html

>
> Inside of a bracketed character class, the dot means a literal period
> character and not a wildcard.
>
> "All non-alphanumeric characters other than \, -, ^ (at the start)
> and the terminating ] are non-special in character classes"


So what does the definition I posted mean for non-bracketed periods? Does it
mean it will match anything but a line or return break character? How in
practice is this useful?

> PHP PREG
> Pattern Syntax
> http://www.php.net/manual/en/referen...ern.syntax.php
> scroll down to 'Square brackets'
>
>
> >>Also, why are you allowing for uppercase letters
> >>when the RFC's don't allow them?

> >
> >I hadn't gotten far enough to strtolower(), but that's a good point, I
> >hadn't actually considered it yet.

>
> Perhaps it has to do with the source of the string: can you guarantee
> that the URIs passed to this routine conform to spec?


I just prefer to use strtolower(). I have to use the server address
anyways...

Breaking News: I had a thought (surprise!). Are LDAP servers ever on
localhost? Or at least a non-dot-concatenated address
(ldap://directoryname)? The pattern we've been looking won't match that, I
think.

> Another way to handle this would be to simply accept case-insensitive

strings:
>
> |^ldap(s)?://[a-z0-9-]+\.[a-z.]{2,5}$|i


I actually read about that a little while ago, I just didn't know where to
put the i. Thanks!

> Pattern Modifiers
> http://www.php.net/manual/en/referen....modifiers.php
>
> "i (PCRE_CASELESS)
> " If this modifier is set, letters in the pattern match both upper
> and lower case letters."


How do you test regex's against any known variants? I suppose I need to
build a test function to make arbitrary strings and then test and print the
results. I just don't know if my regex is going to be that great in
practice.

This would be in addition to the program Richard alluded to in the code
checker.

Thanks!

--
Jared Farrish
Intermediate Web Developer
Denton, Tx

Abraham Maslow: "If the only tool you have is a hammer, you tend to see
every problem as a nail." $$

Reply With Quote
  #2 (permalink)  
Old 05-31-2007
Paul Novitski
 
Posts: n/a
Default Re: [PHP] Re: Re: preg_match() returns false but nodocumentation why

At 5/30/2007 05:08 PM, Jared Farrish wrote:
>So what does the definition I posted mean for non-bracketed periods? Does it
>mean it will match anything but a line or return break character? How in
>practice is this useful?


Read the manual:

Pattern Syntax
http://www.php.net/manual/en/referen...ern.syntax.php

.. match any character except newline (by default)
...
Full stop

Outside a character class, a dot in the pattern matches any one
character in the subject, including a non-printing character, but not
(by default) newline. If the PCRE_DOTALL option is set, then dots
match newlines as well. The handling of dot is entirely independent
of the handling of circumflex and dollar, the only relationship being
that they both involve newline characters. Dot has no special meaning
in a character class.

etc.


>How do you test regex's against any known variants? I suppose I need to
>build a test function to make arbitrary strings and then test and print the
>results. I just don't know if my regex is going to be that great in
>practice.


rework - an online regular expression workbench
by Oliver Steele
http://osteele.com/tools/rework/

The RegEx Coach (a downloadable Windows application)
by Edi Weitz
http://weitz.de/regex-coach/


Regards,

Paul
__________________________

Paul Novitski
Juniper Webcraft Ltd.
http://juniperwebcraft.com
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 12:57 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0