This is a discussion on Importing pages within the PHP Language forums, part of the PHP Programming Forums category; Hi all I've written a content management system that I'm now selling to my customers. It's very ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi all
I've written a content management system that I'm now selling to my customers. It's very nice when we have a blank canvas of a site, but a pain in the arse when there is already a site in place. What I'm in the process of *trying* to put together is a script that would do the following: A simple form where you put the address of the site with the static pages The script then spiders through the site, takes everything between <body> and </body> and chucks the rest away It would then take out all class definitions and all embedded styles like font tags etc but leaves tables, <p> <H?> etc This would leave a very plain page of HTML that would be inserted into a database. CSS would control the fonts etc. I'm aware that there would need to be some tidying up if there was any javascript or anything and also some basic formatting. What I want to know is 1. Has it been done and, if so, where might I find something like this 2. Might it have any commercial value to other developers? Regarding 2, I'm thinking how much time something like this might save me if I have to convert anything more than a few pages of static HTML into something that I can put in a database. Your thoughts would be appreciated. Andy |
|
|||
|
"AJ" <nospam@redcatmedia.net> wrote in message
news:<cirrgo$egu$1@hercules.btinternet.com>... > > The script then spiders through the site, takes everything between > <body> and </body> and chucks the rest away Bad idea. As of HTML 4.0, <head> and <body> tags are optional... Also, why spider the site, if you can (theoretically, at least) crawl the local file system? > 1. Has it been done and, if so, where might I find something like this The spidering part along with storing in databases is what search engines do. What you need to add is the processing in-between. > 2. Might it have any commercial value to other developers? Developers, I doubt it. Content managers, possibly... Cheers, NC |