SQL parsing

This is a discussion on SQL parsing within the PHP Language forums, part of the PHP Programming Forums category; I’ve written a simple results cache for an Oracle driven application. When I run a query I need to ...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 05-20-2008
Álvaro G. Vicario
 
Posts: n/a
Default SQL parsing

I’ve written a simple results cache for an Oracle driven application.
When I run a query I need to know which tables it uses so I can find out
if they have been modified and thus the cached data must be discarded
(as I said, it’s a very simple cache and the DB is basically static).

I didn’t like the idea of passing a table list as argument for each
query because they are dynamically built and, anyway, I thought it’d
become unmaintainable sooner or later. So I wrote an even simpler SQL
parser that looks like this:

$tables = 'PRODUCTS|CATEGORIES|CUSTOMERS|ORDERS';
$regex = '/\W(FROM|JOIN)\s+(' . $tables . ')(?:\W|$)/i';
preg_match_all($regex, $sql, $matches);

So far, it works fine enough for my purposes, but I think it’s not a
long term solution. Is it possible to build a generic reliable SQL
parser to fetch the table names of a query without diving in the regex
hell? Do you know about an existing library?* Can you think of any other
solution?

Thank you in advance,
[*] I’ve browsed some code out there that does what I need (typically
via regex soup), but it always fails with the queries I test.

--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor al baño María: http://www.demogracia.com
--
Reply With Quote
  #2 (permalink)  
Old 05-20-2008
C. (http://symcbean.blogspot.com/)
 
Posts: n/a
Default Re: SQL parsing

On May 20, 12:43 pm, "Álvaro G. Vicario"
<alvaroNOSPAMTHA...@demogracia.com> wrote:
> I’ve written a simple results cache for an Oracle driven application.
> When I run a query I need to know which tables it uses so I can find out
> if they have been modified and thus the cached data must be discarded
> (as I said, it’s a very simple cache and the DB is basically static).
>
> I didn’t like the idea of passing a table list as argument for each
> query because they are dynamically built and, anyway, I thought it’d
> become unmaintainable sooner or later. So I wrote an even simpler SQL
> parser that looks like this:
>
> $tables = 'PRODUCTS|CATEGORIES|CUSTOMERS|ORDERS';
> $regex = '/\W(FROM|JOIN)\s+(' . $tables . ')(?:\W|$)/i';
> preg_match_all($regex, $sql, $matches);
>
> So far, it works fine enough for my purposes, but I think it’s not a
> long term solution. Is it possible to build a generic reliable SQL
> parser to fetch the table names of a query without diving in the regex
> hell? Do you know about an existing library?* Can you think of any other
> solution?
>
> Thank you in advance,
>
>[*] I’ve browsed some code out there that does what I need (typically
> via regex soup), but it always fails with the queries I test.
>


Certainly its **possible** to write an SQL parser - lots of other
people have managed - but stepping back a bit this seems like
premature optimization - even for an Oracle DBMS. Maybe you should
start by looking at the DBMS. Surprisingly it looks like Oracle didn't
introduce result caching until 11g but prior to that it would still be
possible to minimise the disk I/O with a large buffer cache.

For large result sets you may get better performance by going back to
the DBMS - this the sort of thing DBMS are designed to do well - more
so than PHP - which (IME) can be a bit slow handling large arrays.

For small result sets (except maybe aggregates) SQL tuning is the way
to go. While the CBO can be very, very smart - it can also be very,
very dumb at times.

As you have discovered, your regex fails to deal with a lot of cases,
but even a complete parser won't cope with stored procedures. Much
though I dislike SQL stored procedures this might be the most
convenient way to solve the problem - add a parameter for
last_cache_timestamp and pass in the value from the previous query -
in the PL/SQL, first check if any of the tables have been modified
since then - if not return a custom error code, otherwise redo the
query.

But the simplest option might be to just apply a default timeout on
the local query cache.

Alternatively build your own abstraction in PHP whereby developers
never enter SQL commands directly, but it makes it easier for you to
extract the table names.

C.
Reply With Quote
  #3 (permalink)  
Old 05-20-2008
Rik Wasmus
 
Posts: n/a
Default Re: SQL parsing

Álvaro G. Vicario wrote:
> I’ve written a simple results cache for an Oracle driven application.
> When I run a query I need to know which tables it uses so I can find out
> if they have been modified and thus the cached data must be discarded
> (as I said, it’s a very simple cache and the DB is basically static).
>
> I didn’t like the idea of passing a table list as argument for each
> query because they are dynamically built and, anyway, I thought it’d
> become unmaintainable sooner or later. So I wrote an even simpler SQL
> parser that looks like this:
>
> $tables = 'PRODUCTS|CATEGORIES|CUSTOMERS|ORDERS';
> $regex = '/\W(FROM|JOIN)\s+(' . $tables . ')(?:\W|$)/i';
> preg_match_all($regex, $sql, $matches);
>
> So far, it works fine enough for my purposes, but I think it’s not a
> long term solution. Is it possible to build a generic reliable SQL
> parser to fetch the table names of a query without diving in the regex
> hell? Do you know about an existing library?* Can you think of any other
> solution?
>
> Thank you in advance,
>
>[*] I’ve browsed some code out there that does what I need (typically
> via regex soup), but it always fails with the queries I test.


Of course it is possible to build a PARSER. This is generally done
WITHOUT regexes. There are some examples available on the net, none that
I tested though. Google for "PHP SQL parser". If you want to right your
own, Matt Zandstra's book ("PHP5 objects and patterns" or something like
that) has a fine example & starting point for you.
--
Rik Wasmus
....spamrun finished
Reply With Quote
  #4 (permalink)  
Old 05-21-2008
Álvaro G. Vicario
 
Posts: n/a
Default Re: SQL parsing

C. (http://symcbean.blogspot.com/) escribió:
>> So far, it works fine enough for my purposes, but I think it’s not a
>> long term solution. Is it possible to build a generic reliable SQL
>> parser to fetch the table names of a query without diving in the regex
>> hell? Do you know about an existing library?* Can you think of any other
>> solution?


> Certainly its **possible** to write an SQL parser - lots of other
> people have managed - but stepping back a bit this seems like
> premature optimization - even for an Oracle DBMS. Maybe you should
> start by looking at the DBMS. Surprisingly it looks like Oracle didn't
> introduce result caching until 11g but prior to that it would still be
> possible to minimise the disk I/O with a large buffer cache.


Can you speak of premature optimization when it was an application that
had been in production for several months? :)

DBMS-level solutions are not as promising as you may think for several
reasons:

- The app consists basically in a collection of SUM() and COUNT()
statements over multi-million row tables spiced with on-the-fly
user-defined filters that often imply joins or subqueries. There's no
fast way to sum 300,000 values of non-consecutive non-indexed random
rows. And there's no need to do it 100 times a day when data won't
change until next month.

- I only take care of the PHP part. The Oracle server is a separate host
somewhere in the LAN; I don't have the least control over it.

- The DB design is as-is. I mean, I'd really *love* to add primary keys
and such fashionable stuff to all tables, but I just can't ;-)


> For large result sets you may get better performance by going back to
> the DBMS - this the sort of thing DBMS are designed to do well - more
> so than PHP - which (IME) can be a bit slow handling large arrays.


Result sets are normally small. The problem is that they're calculated
from too many rows.

> As you have discovered, your regex fails to deal with a lot of cases,
> but even a complete parser won't cope with stored procedures.


Actually, my regex works without a flaw because I search for the
specific table names I handle. It's third-party code (such as PEAR's
SQL_Parser) which fails.

> But the simplest option might be to just apply a default timeout on
> the local query cache.


That was version one. It was fine, but I wanted to make new data
available in a reasonable time since it was dumped, while not needing to
re-run several times a day all the slow queries on data that had not
changed in a month.

As I said, I have a system that works well, I'm just looking for
improvements. Anyway, thanks for your comments.


--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor al baño María: http://www.demogracia.com
--
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 09:51 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0