This is a discussion on SQL parsing within the PHP Language forums, part of the PHP Programming Forums category; I’ve written a simple results cache for an Oracle driven application. When I run a query I need to ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I’ve written a simple results cache for an Oracle driven application.
When I run a query I need to know which tables it uses so I can find out if they have been modified and thus the cached data must be discarded (as I said, it’s a very simple cache and the DB is basically static). I didn’t like the idea of passing a table list as argument for each query because they are dynamically built and, anyway, I thought it’d become unmaintainable sooner or later. So I wrote an even simpler SQL parser that looks like this: $tables = 'PRODUCTS|CATEGORIES|CUSTOMERS|ORDERS'; $regex = '/\W(FROM|JOIN)\s+(' . $tables . ')(?:\W|$)/i'; preg_match_all($regex, $sql, $matches); So far, it works fine enough for my purposes, but I think it’s not a long term solution. Is it possible to build a generic reliable SQL parser to fetch the table names of a query without diving in the regex hell? Do you know about an existing library?* Can you think of any other solution? Thank you in advance, [*] I’ve browsed some code out there that does what I need (typically via regex soup), but it always fails with the queries I test. -- -- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain -- Mi sitio sobre programación web: http://bits.demogracia.com -- Mi web de humor al baño María: http://www.demogracia.com -- |
|
|||
|
On May 20, 12:43 pm, "Álvaro G. Vicario"
<alvaroNOSPAMTHA...@demogracia.com> wrote: > I’ve written a simple results cache for an Oracle driven application. > When I run a query I need to know which tables it uses so I can find out > if they have been modified and thus the cached data must be discarded > (as I said, it’s a very simple cache and the DB is basically static). > > I didn’t like the idea of passing a table list as argument for each > query because they are dynamically built and, anyway, I thought it’d > become unmaintainable sooner or later. So I wrote an even simpler SQL > parser that looks like this: > > $tables = 'PRODUCTS|CATEGORIES|CUSTOMERS|ORDERS'; > $regex = '/\W(FROM|JOIN)\s+(' . $tables . ')(?:\W|$)/i'; > preg_match_all($regex, $sql, $matches); > > So far, it works fine enough for my purposes, but I think it’s not a > long term solution. Is it possible to build a generic reliable SQL > parser to fetch the table names of a query without diving in the regex > hell? Do you know about an existing library?* Can you think of any other > solution? > > Thank you in advance, > >[*] I’ve browsed some code out there that does what I need (typically > via regex soup), but it always fails with the queries I test. > Certainly its **possible** to write an SQL parser - lots of other people have managed - but stepping back a bit this seems like premature optimization - even for an Oracle DBMS. Maybe you should start by looking at the DBMS. Surprisingly it looks like Oracle didn't introduce result caching until 11g but prior to that it would still be possible to minimise the disk I/O with a large buffer cache. For large result sets you may get better performance by going back to the DBMS - this the sort of thing DBMS are designed to do well - more so than PHP - which (IME) can be a bit slow handling large arrays. For small result sets (except maybe aggregates) SQL tuning is the way to go. While the CBO can be very, very smart - it can also be very, very dumb at times. As you have discovered, your regex fails to deal with a lot of cases, but even a complete parser won't cope with stored procedures. Much though I dislike SQL stored procedures this might be the most convenient way to solve the problem - add a parameter for last_cache_timestamp and pass in the value from the previous query - in the PL/SQL, first check if any of the tables have been modified since then - if not return a custom error code, otherwise redo the query. But the simplest option might be to just apply a default timeout on the local query cache. Alternatively build your own abstraction in PHP whereby developers never enter SQL commands directly, but it makes it easier for you to extract the table names. C. |
|
|||
|
Álvaro G. Vicario wrote:
> I’ve written a simple results cache for an Oracle driven application. > When I run a query I need to know which tables it uses so I can find out > if they have been modified and thus the cached data must be discarded > (as I said, it’s a very simple cache and the DB is basically static). > > I didn’t like the idea of passing a table list as argument for each > query because they are dynamically built and, anyway, I thought it’d > become unmaintainable sooner or later. So I wrote an even simpler SQL > parser that looks like this: > > $tables = 'PRODUCTS|CATEGORIES|CUSTOMERS|ORDERS'; > $regex = '/\W(FROM|JOIN)\s+(' . $tables . ')(?:\W|$)/i'; > preg_match_all($regex, $sql, $matches); > > So far, it works fine enough for my purposes, but I think it’s not a > long term solution. Is it possible to build a generic reliable SQL > parser to fetch the table names of a query without diving in the regex > hell? Do you know about an existing library?* Can you think of any other > solution? > > Thank you in advance, > >[*] I’ve browsed some code out there that does what I need (typically > via regex soup), but it always fails with the queries I test. Of course it is possible to build a PARSER. This is generally done WITHOUT regexes. There are some examples available on the net, none that I tested though. Google for "PHP SQL parser". If you want to right your own, Matt Zandstra's book ("PHP5 objects and patterns" or something like that) has a fine example & starting point for you. -- Rik Wasmus ....spamrun finished |
|
|||
|
C. (http://symcbean.blogspot.com/) escribió:
>> So far, it works fine enough for my purposes, but I think it’s not a >> long term solution. Is it possible to build a generic reliable SQL >> parser to fetch the table names of a query without diving in the regex >> hell? Do you know about an existing library?* Can you think of any other >> solution? > Certainly its **possible** to write an SQL parser - lots of other > people have managed - but stepping back a bit this seems like > premature optimization - even for an Oracle DBMS. Maybe you should > start by looking at the DBMS. Surprisingly it looks like Oracle didn't > introduce result caching until 11g but prior to that it would still be > possible to minimise the disk I/O with a large buffer cache. Can you speak of premature optimization when it was an application that had been in production for several months? :) DBMS-level solutions are not as promising as you may think for several reasons: - The app consists basically in a collection of SUM() and COUNT() statements over multi-million row tables spiced with on-the-fly user-defined filters that often imply joins or subqueries. There's no fast way to sum 300,000 values of non-consecutive non-indexed random rows. And there's no need to do it 100 times a day when data won't change until next month. - I only take care of the PHP part. The Oracle server is a separate host somewhere in the LAN; I don't have the least control over it. - The DB design is as-is. I mean, I'd really *love* to add primary keys and such fashionable stuff to all tables, but I just can't ;-) > For large result sets you may get better performance by going back to > the DBMS - this the sort of thing DBMS are designed to do well - more > so than PHP - which (IME) can be a bit slow handling large arrays. Result sets are normally small. The problem is that they're calculated from too many rows. > As you have discovered, your regex fails to deal with a lot of cases, > but even a complete parser won't cope with stored procedures. Actually, my regex works without a flaw because I search for the specific table names I handle. It's third-party code (such as PEAR's SQL_Parser) which fails. > But the simplest option might be to just apply a default timeout on > the local query cache. That was version one. It was fine, but I wanted to make new data available in a reasonable time since it was dumped, while not needing to re-run several times a day all the slow queries on data that had not changed in a month. As I said, I have a system that works well, I'm just looking for improvements. Anyway, thanks for your comments. -- -- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain -- Mi sitio sobre programación web: http://bits.demogracia.com -- Mi web de humor al baño María: http://www.demogracia.com -- |