This is a discussion on Query meltdown: caching results within the PHP Language forums, part of the PHP Programming Forums category; I'm working on a CMS that, in addition to the database-stored version of articles for easy searching, sorting, ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I'm working on a CMS that, in addition to the database-stored version
of articles for easy searching, sorting, etc, also stores a HTML file version so pages can be fetched with the minimum of overhead (browsing articles has no more overhead than accessing any other HTML file on the server). As I've been trying to keep the system modular I've taken to using an OO approach to the server side scripts, which are all written in PHP 5 and use PDO for database access. I've also been using prepared sequences almost exclusively for security and performance reasons. I've tried to wrap sequences of queries in transactions as well, to eliminate the "every query is its own transaction" overhead. With previous projects which I wrote using structured programming methods it was quite easy to hold caches of results and keep database queries to a minimum, but I've found this extremely difficult to pull off when using the OO approach, and now it's starting to have some real performance consequences. The biggest one comes when publishing a document that has siblings. CMS content is organized in a tree with folders, subfolders and documents. A document can be published, where both a HTML and database copy exist, or unpublished, where only the database version exists, thus denying visitors to the site access to it. Documents in a folder get a sidebar with links to the other documents in the same folder, and when you change the published status of a document then all the other documents that are also published in that folder have to be republished in order to update their sidebars. This means fetching a list of all the documents with the same parent and that have a published flag status of true, using the text stored in the database to generate the HTML page and saving it to disk. Documents have an associated template, which also has to be fetched from the database. And all documents have data such as their path, which is a chain of the document's parents back to the root so that things like breadcrumbs can be generated. In the structured approach I'd have just cached stuff like the trail back to the root as I know it'll be the same for all documents, so I'd only have to run the sequences of queries to get the full trail once. But as each instance of a document is independent of all the others doing things like this is proving really difficult. I need to find a way of not running queries that I don't need to, either in the PHP script, or in the Postgres database. What I need is for a result set to be cached somewhere, either by Postgres or PHP, so when it sees the same query again in a given session it just returns the previously fetched result set. The cache also needs to be able to disregard its cached result sets when an event that changes a table occurs (insert, update, delete, etc). On the PHP side I've written a simple Database class that extends PDO and that I use in its place. It's a simple class that basically I use to allow me to nest calls to beginTransaction(), commit () and rollback () (It only starts an actual transaction of a counter is 0. Otherwide it just increments the counter. Commit only actually commits when the counter is 1, and decrements it otherwise. Rollback sets an error flag and decrements the counter, and only rolls back when the counter is 1. If the error flag is set then commit will actually roll back instead. ) My options are, as far as I can tell, 1) replace the Database PDO extending class with something else that provides query results caching in PHP, or 2) get Postgres itself to cache the results of queries to avoid running them repeatedly during a session. I seem to remember MySQL providing some kind of results caching, can Postgres do the same? Has anyone else run into similar problems and how did they overcome them? |
|
|||
|
On Feb 26, 11:11 am, Gordon <gordon.mc...@ntlworld.com> wrote:
> I'm working on a CMS that, in addition to the database-stored version > of articles for easy searching, sorting, etc, also stores a HTML file > version so pages can be fetched with the minimum of overhead (browsing > articles has no more overhead than accessing any other HTML file on > the server). > > As I've been trying to keep the system modular I've taken to using an > OO approach to the server side scripts, which are all written in PHP 5 > and use PDO for database access. I've also been using prepared > sequences almost exclusively for security and performance reasons. > I've tried to wrap sequences of queries in transactions as well, to > eliminate the "every query is its own transaction" overhead. > > With previous projects which I wrote using structured programming > methods it was quite easy to hold caches of results and keep database > queries to a minimum, but I've found this extremely difficult to pull > off when using the OO approach, and now it's starting to have some > real performance consequences. The biggest one comes when publishing > a document that has siblings. CMS content is organized in a tree with > folders, subfolders and documents. A document can be published, where > both a HTML and database copy exist, or unpublished, where only the > database version exists, thus denying visitors to the site access to > it. Documents in a folder get a sidebar with links to the other > documents in the same folder, and when you change the published status > of a document then all the other documents that are also published in > that folder have to be republished in order to update their > sidebars. > > This means fetching a list of all the documents with the same parent > and that have a published flag status of true, using the text stored > in the database to generate the HTML page and saving it to disk. > Documents have an associated template, which also has to be fetched > from the database. And all documents have data such as their path, > which is a chain of the document's parents back to the root so that > things like breadcrumbs can be generated. > > In the structured approach I'd have just cached stuff like the trail > back to the root as I know it'll be the same for all documents, so I'd > only have to run the sequences of queries to get the full trail once. > But as each instance of a document is independent of all the others > doing things like this is proving really difficult. > > I need to find a way of not running queries that I don't need to, > either in the PHP script, or in the Postgres database. What I need is > for a result set to be cached somewhere, either by Postgres or PHP, so > when it sees the same query again in a given session it just returns > the previously fetched result set. The cache also needs to be able to > disregard its cached result sets when an event that changes a table > occurs (insert, update, delete, etc). > > On the PHP side I've written a simple Database class that extends PDO > and that I use in its place. It's a simple class that basically I use > to allow me to nest calls to beginTransaction(), commit () and > rollback () (It only starts an actual transaction of a counter is 0. > Otherwide it just increments the counter. Commit only actually > commits when the counter is 1, and decrements it otherwise. Rollback > sets an error flag and decrements the counter, and only rolls back > when the counter is 1. If the error flag is set then commit will > actually roll back instead. ) > > My options are, as far as I can tell, > > 1) replace the Database PDO extending class with something else that > provides query results caching in PHP, or > 2) get Postgres itself to cache the results of queries to avoid > running them repeatedly during a session. > > I seem to remember MySQL providing some kind of results caching, can > Postgres do the same? Has anyone else run into similar problems and > how did they overcome them? I have an idea for how to do it but I'm not quite sure how to accomplish it fully. Aspects involving modifications to the tables are going to be particularly problematic. My idea is to extend the PDOStatement class with an internal result cache. I'm already caching PDOStatements in order to prevent the script from trying to prepare the same queries over and over again. The cache will be an array. The execute(), fetch(). fetchall() etc methods will be aware of the array and return values from it if possible. Things risk getting really tricky really quickly, however. If a modification is made to a table, then any or all of the cached data in all the PDOStatements may no longer be valid and will need to be flushed. This is leading me to suspect that this is a far from ideal way of doing things. I know that Postgres can cache query plans, but what about results? Can/do they get cached too? |
|
|||
|
Gordon wrote:
> I'm working on a CMS that, in addition to the database-stored version > of articles for easy searching, sorting, etc, also stores a HTML file > version so pages can be fetched with the minimum of overhead (browsing > articles has no more overhead than accessing any other HTML file on > the server). > > As I've been trying to keep the system modular I've taken to using an > OO approach to the server side scripts, which are all written in PHP 5 > and use PDO for database access. I've also been using prepared > sequences almost exclusively for security and performance reasons. > I've tried to wrap sequences of queries in transactions as well, to > eliminate the "every query is its own transaction" overhead. > > With previous projects which I wrote using structured programming > methods it was quite easy to hold caches of results and keep database > queries to a minimum, but I've found this extremely difficult to pull > off when using the OO approach, and now it's starting to have some > real performance consequences. The biggest one comes when publishing > a document that has siblings. CMS content is organized in a tree with > folders, subfolders and documents. A document can be published, where > both a HTML and database copy exist, or unpublished, where only the > database version exists, thus denying visitors to the site access to > it. Documents in a folder get a sidebar with links to the other > documents in the same folder, and when you change the published status > of a document then all the other documents that are also published in > that folder have to be republished in order to update their > sidebars. > > This means fetching a list of all the documents with the same parent > and that have a published flag status of true, using the text stored > in the database to generate the HTML page and saving it to disk. > Documents have an associated template, which also has to be fetched > from the database. And all documents have data such as their path, > which is a chain of the document's parents back to the root so that > things like breadcrumbs can be generated. > > In the structured approach I'd have just cached stuff like the trail > back to the root as I know it'll be the same for all documents, so I'd > only have to run the sequences of queries to get the full trail once. > But as each instance of a document is independent of all the others > doing things like this is proving really difficult. > > I need to find a way of not running queries that I don't need to, > either in the PHP script, or in the Postgres database. What I need is > for a result set to be cached somewhere, either by Postgres or PHP, so > when it sees the same query again in a given session it just returns > the previously fetched result set. The cache also needs to be able to > disregard its cached result sets when an event that changes a table > occurs (insert, update, delete, etc). > > On the PHP side I've written a simple Database class that extends PDO > and that I use in its place. It's a simple class that basically I use > to allow me to nest calls to beginTransaction(), commit () and > rollback () (It only starts an actual transaction of a counter is 0. > Otherwide it just increments the counter. Commit only actually > commits when the counter is 1, and decrements it otherwise. Rollback > sets an error flag and decrements the counter, and only rolls back > when the counter is 1. If the error flag is set then commit will > actually roll back instead. ) > > My options are, as far as I can tell, > > 1) replace the Database PDO extending class with something else that > provides query results caching in PHP, or > 2) get Postgres itself to cache the results of queries to avoid > running them repeatedly during a session. > > I seem to remember MySQL providing some kind of results caching, can > Postgres do the same? Has anyone else run into similar problems and > how did they overcome them? > Don't know about Postgres but yes MySQL does have caching. You could also take a look at APC (Alternative PHP Cache) depending on your setup. Very easy to use. And very easy to monitor what's actually going on with your pages in the cache. Once a page is generated you can store it in the cache and give it a time to live before going back to the db. -- Norman Registered Linux user #461062 |
|
|||
|
(Sorry for the repost but I thought this would be appropriate to both
groups. I did tell Google to delete my first post but odds are some guys got that copy already anyway) After a lot of hairpulling, I finally found a mechanism in PHP for doing what I wanted. I just had to know 2 things: 1) How to get the PDO engine to use my customized prepared statement class instead of PDOStatement 2) Extending PDOStatement to transparently add results caching is too difficult and complex in the timeframe required Once I knew these things I made a PDOStatement extension class that instead of trying to transparently add caching to the existing methods added a couple of new ones instead. Code below: <?php class Statement extends PDOStatement { private $resultCache = array (); private $database = NULL; public $hits = 0; public $misses = 0; public function ask (array $params = array ()) // Executes a prepared statement on the database that fetches data { $hash = md5 (implode (',', $params)); if (!$this -> resultCache [$hash]) { $this -> misses++; // Execute the query if ($this -> execute ($params)) { // Cache the results $this -> resultCache [$hash] = $this - > fetchAll (PDO::FETCH_ASSOC); } } else { $this -> hits++; } return ($this -> resultCache [$hash]); } public function tell (array $params = array ()) // Execute a prepared statement that causes the database to be modified { // Execute the query if ($this -> execute ($params)) { $rowCount = $this -> rowCount (); if ($rowCount) { // Tell the parent Database object to clear statement caches $this -> database -> clearResults (); } return ($rowCount); } } public function clearResults () // Clear cache { $this -> resultCache = array (); } private function __construct ($db) // Class constructor { $this -> database = $db; //print_r ($this); } } class Database extends PDO // Adds some extra functionality to the built in PDO class { private $statementCache = array (); private $txCount = 0; private $txErr = false; // Prepared statement cacheing public function prepare ($statement, array $options = array ()) { $hash = md5 ($statement); if ((!isset ($this -> statementCache [$hash])) || (!is_object ($this -> statementCache [$hash]))) { //echo ('Preparing statement "'. $statement .'"<br>'); $this -> statementCache [$hash] = parent::prepare ($statement, $options); } else { //echo ('Statement "' . $statement . '" already prepared<br>'); } return ($this -> statementCache [$hash]); } public function clearResults () // Clear the results cache of all associated prepared statements { foreach ($this -> statementCache as $thisStatement) { $thisStatement -> clearResults (); } } // Add support for transaction nesting public function beginTransaction () { if (($this -> txCount == 0) && (!$this -> txErr)) { $result = parent::beginTransaction (); } $this -> txCount ++; if (DEBUG_TX) { echo ('begin: ' . $this -> txCount . ' transaction(s)<br />'); } return ($result); } public function commit () { $this -> txCount --; if ($this -> txCount <= 0) { $this -> txErr? $result = parent::rollback (): $result = parent::commit (); $this -> txErr = false; } if (DEBUG_TX) { echo ('commit: ' . $this -> txCount . ' transaction(s)<br />'); } return ($result); } public function rollback () { $this -> txErr = true; $this -> txCount --; if ($this -> txCount <= 0) { $result = parent::rollback (); $this -> txErr = false; } if (DEBUG_TX) { echo ('rollback: ' . $this -> txCount . ' transaction(s)<br />'); } return ($result); } // Housekeeping private function removeExpiredLocks () { $query = 'DELETE FROM cms_locks WHERE lck_timestamp + lck_duration < NOW();'; $preparedQuery = $this -> prepare ($query); if ($preparedQuery -> execute ()) { return ($preparedQuery -> rowCount ()); } } // Class constructor public function __construct () { parent::__construct ( 'pgsql:host=' . CFG_DB_HOST .' port=' . CFG_DB_PORT .' dbname=' . CFG_DB_DBNAME, CFG_DB_USERNAME, CFG_DB_PASSWORD); /* The documentation for the following line on php.net is really bad! What is does is set what class will be used to handle prepared statements. By default the PDO -> prepare() command returns a prepared statement as a PDOStatement class object. We want to extend PDOStatements to provide some extra functionality, so when we run the prepare () method, we want to return something other than a PDOStatement. This line tells the database to use our Statement class for prepared statements instead of the PDOStatement class. It also passes a reference to the database object that spawned it to the constructor as an argument. We'll use this reference in the Statement class to send messages back to the Database class */ $this -> setAttribute (PDO::ATTR_STATEMENT_CLASS, array ('Statement', array ($this))); $this -> setAttribute (PDO::ATTR_ERRMODE, PDO::ERRMODE_WARNING); $this -> query ('SET search_path = ' . CFG_DB_PATH . ';'); $this -> removeExpiredLocks (); } } The system works like this: I use my Database class where I would have used PDO to create a database connection. When I prepare() a statement I either get a new prepared statement back, or the previously created one is returned. The prepared statement contains an array that caches the results produced with various parameters. My prepared statement class also adds 2 database querying methods, ask () and tell (), that I use where I normally would have used an execute (). If I want to retrieve information from the database I ask () it to return the data matching my parameters. Whenever I want to make a change I tell () it the data I want saved to the database. The ask () method checks the statement's local cache of results and if it finds one matching the passed parameters, it returns it. If not, then it execute()s the query to retrieve them. When I tell () the database to make a change the statement that received the message to do so sends a message to its parent Database object. The object goes through its list of cached prepared statements and sends them messages telling them to dump the contents of their caches. It's not an ideal solution, if two different queries produce identical result sets or ones that overlap in some way then they get cached twice, and the cache invalidation mechanism is extremely primitive (if any change is made, dump everything). Nonetheless this little caching system has had quite a dramatic effect on performance. It's still not as fast as I would like though. :) I reposted the code here in the hope that somebody finds it useful, and/or in case somebody has suggestions for improvement. While the code itself was quite simple in the end, finding the information needed to make it work wasn't. The ATTR_STATEMENT_CLASS parameter is not well documented. Hopefully others can learn from my difficulties here and learn from them. |
|
|||
|
Gordon wrote:
> (Sorry for the repost but I thought this would be appropriate to both > groups. I did tell Google to delete my first post but odds are some > guys got that copy already anyway) > > After a lot of hairpulling, I finally found a mechanism in PHP for > doing what I wanted. I just had to know 2 things: > > 1) How to get the PDO engine to use my customized prepared statement > class instead of PDOStatement > 2) Extending PDOStatement to transparently add results caching is too > difficult and complex in the timeframe required > > Once I knew these things I made a PDOStatement extension class that > instead of trying to transparently add caching to the existing methods > added a couple of new ones instead. Code below: > > <?php > > class Statement extends PDOStatement > { > private $resultCache = array (); > private $database = NULL; > public $hits = 0; > public $misses = 0; > > public function ask (array $params = array ()) > // Executes a prepared statement on the database that fetches > data > { > $hash = md5 (implode (',', $params)); > if (!$this -> resultCache [$hash]) > { > $this -> misses++; > // Execute the query > if ($this -> execute ($params)) > { > // Cache the results > $this -> resultCache [$hash] = $this - >> fetchAll > (PDO::FETCH_ASSOC); > } > } > else > { > $this -> hits++; > } > return ($this -> resultCache [$hash]); > } > public function tell (array $params = array ()) > // Execute a prepared statement that causes the database to be > modified > { > // Execute the query > if ($this -> execute ($params)) > { > $rowCount = $this -> rowCount (); > if ($rowCount) > { > // Tell the parent Database object to > clear statement caches > $this -> database -> clearResults (); > } > return ($rowCount); > } > } > public function clearResults () > // Clear cache > { > $this -> resultCache = array (); > } > private function __construct ($db) > // Class constructor > { > $this -> database = $db; > //print_r ($this); > } > > } > > class Database extends PDO > // Adds some extra functionality to the built in PDO class > { > private $statementCache = array (); > private $txCount = 0; > private $txErr = false; > > // Prepared statement cacheing > public function prepare ($statement, array $options = array > ()) > { > $hash = md5 ($statement); > if ((!isset ($this -> statementCache [$hash])) > || (!is_object ($this -> statementCache [$hash]))) > { > //echo ('Preparing statement "'. > $statement .'"<br>'); > $this -> statementCache [$hash] = > parent::prepare ($statement, > $options); > } > else > { > //echo ('Statement "' . $statement . '" > already prepared<br>'); > } > return ($this -> statementCache [$hash]); > } > public function clearResults () > // Clear the results cache of all associated prepared > statements > { > foreach ($this -> statementCache as $thisStatement) > { > $thisStatement -> clearResults (); > } > } > // Add support for transaction nesting > public function beginTransaction () > { > if (($this -> txCount == 0) && (!$this -> txErr)) > { > $result = parent::beginTransaction (); > } > $this -> txCount ++; > if (DEBUG_TX) > { > echo ('begin: ' . $this -> txCount . ' > transaction(s)<br />'); > } > return ($result); > } > public function commit () > { > $this -> txCount --; > if ($this -> txCount <= 0) > { > $this -> txErr? $result = > parent::rollback (): $result = > parent::commit (); > $this -> txErr = false; > } > if (DEBUG_TX) > { > echo ('commit: ' . $this -> txCount . ' > transaction(s)<br />'); > } > return ($result); > } > public function rollback () > { > $this -> txErr = true; > $this -> txCount --; > if ($this -> txCount <= 0) > { > $result = parent::rollback (); > $this -> txErr = false; > } > if (DEBUG_TX) > { > echo ('rollback: ' . $this -> txCount . ' > transaction(s)<br />'); > } > return ($result); > } > // Housekeeping > private function removeExpiredLocks () > { > $query = 'DELETE FROM cms_locks > WHERE lck_timestamp + > lck_duration < NOW();'; > $preparedQuery = $this -> prepare ($query); > if ($preparedQuery -> execute ()) > { > return ($preparedQuery -> rowCount ()); > } > } > // Class constructor > public function __construct () > { > parent::__construct ( 'pgsql:host=' . CFG_DB_HOST > .' > port=' . CFG_DB_PORT > .' > dbname=' . CFG_DB_DBNAME, > > CFG_DB_USERNAME, > > CFG_DB_PASSWORD); > /* > The documentation for the following line on php.net is > really bad! > > What is does is set what class will be used to handle > prepared > statements. By default > the PDO -> prepare() command returns a prepared > statement as a > PDOStatement class object. > We want to extend PDOStatements to provide some extra > functionality, > so when we run the > prepare () method, we want to return something other > than a > PDOStatement. > > This line tells the database to use our Statement > class for prepared > statements instead > of the PDOStatement class. It also passes a reference > to the > database object that > spawned it to the constructor as an argument. We'll > use this > reference in the Statement > class to send messages back to the Database class > */ > $this -> setAttribute (PDO::ATTR_STATEMENT_CLASS, > array > ('Statement', array ($this))); > $this -> setAttribute (PDO::ATTR_ERRMODE, > PDO::ERRMODE_WARNING); > $this -> query ('SET search_path = ' . CFG_DB_PATH . > ';'); > $this -> removeExpiredLocks (); > } > > } > > The system works like this: I use my Database class where I would > have used PDO to create a database connection. When I prepare() a > statement I either get a new prepared statement back, or the > previously created one is returned. The prepared statement contains > an array that caches the results produced with various parameters. My > prepared statement class also adds 2 database querying methods, ask () > and tell (), that I use where I normally would have used an execute > (). If I want to retrieve information from the database I ask () it > to return the data matching my parameters. Whenever I want to make a > change I tell () it the data I want saved to the database. The ask () > method checks the statement's local cache of results and if it finds > one matching the passed parameters, it returns it. If not, then it > execute()s the query to retrieve them. > > When I tell () the database to make a change the statement that > received the message to do so sends a message to its parent Database > object. The object goes through its list of cached prepared > statements and sends them messages telling them to dump the contents > of their caches. > > It's not an ideal solution, if two different queries produce identical > result sets or ones that overlap in some way then they get cached > twice, and the cache invalidation mechanism is extremely primitive (if > any change is made, dump everything). Nonetheless this little caching > system has had quite a dramatic effect on performance. It's still not > as fast as I would like though. :) > > I reposted the code here in the hope that somebody finds it useful, > and/or in case somebody has suggestions for improvement. While the > code itself was quite simple in the end, finding the information > needed to make it work wasn't. The ATTR_STATEMENT_CLASS parameter is > not well documented. Hopefully others can learn from my difficulties > here and learn from them. Gordon, Thought you might be looking to cache the completed pages (as they are requested). I would have thought something like APC would fit the bill: $key = $_SERVER['HTTP_REFERER']; if ($cached_html = apc_fetch($key) { echo $cached_html; exit; } <rest of your code> ....when you output a page down here you store it like: echo $html; // your output apc_store($key,$html,300); // 300 secs = 5 mins exit; This way you never even touch a query unless the time limit is up. If a page is changed, all you need to do is: apc_delete($key); ....which will automatically re-cache the new page (due to the logic flow) on the next request. -- Norman Registered Linux user #461062 |
![]() |
| Thread Tools | |
| Display Modes | |
|
|