This is a discussion on CSV speed within the PHP General forums, part of the PHP Programming Forums category; I have about 10 csv files I need to open to access data. It takes a lot of time to ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have about 10 csv files I need to open to access data. It takes a lot
of time to search each file for the values I need. Would it be best to just dump all the cvs files to an SQL db and then just grab what I need from there? I'm starting to think it would make a lot of sense. What do you guys think? Thanks, Dan -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. |
|
|||
|
Danny Brow wrote:
> I have about 10 csv files I need to open to access data. It takes a lot > of time to search each file for the values I need. Would it be best to > just dump all the cvs files to an SQL db and then just grab what I need > from there? I'm starting to think it would make a lot of sense. What do > you guys think? > > Thanks, > Dan Dan, I can tell you that depending on the size of your files is going to dictate the route you want to go. I have a CSV with 568,000+ lines with 19 different pieces to each line. The files are around 180M apiece and it takes my server about 2 seconds to run a system grep against the files. I can run a recursive call 7 times against a MySQL database with the same information and it takes it about 4 seconds. IF you have system call ability, a grep wouldn't be bad, otherwise I'd suggest loading the csv files into MySQL tables and checking them for the information, then dropping the tables when you get the next files. You can backup the databases such as a cron job overnight even. HTH, Wolf |
|
|||
|
On Mon, 2008-03-10 at 22:36 -0400, Wolf wrote:
> Danny Brow wrote: > > I have about 10 csv files I need to open to access data. It takes a lot > > of time to search each file for the values I need. Would it be best to > > just dump all the cvs files to an SQL db and then just grab what I need > > from there? I'm starting to think it would make a lot of sense. What do > > you guys think? > > > > Thanks, > > Dan > > > Dan, > > I can tell you that depending on the size of your files is going to > dictate the route you want to go. I have a CSV with 568,000+ lines with > 19 different pieces to each line. The files are around 180M apiece and > it takes my server about 2 seconds to run a system grep against the > files. I can run a recursive call 7 times against a MySQL database with > the same information and it takes it about 4 seconds. > > IF you have system call ability, a grep wouldn't be bad, otherwise I'd > suggest loading the csv files into MySQL tables and checking them for > the information, then dropping the tables when you get the next files. > You can backup the databases such as a cron job overnight even. > > HTH, > Wolf > > Thanks that sounds like a good idea. I'm still plugging away with how I started. I want to know how much faster it will be going with a db. I was actually thinking of using diff for each updated file to and upload that to the DB. Dan -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. |
|
|||
|
On Mon, Mar 10, 2008 at 8:09 PM, Danny Brow <maillist@fullmotions.com> wrote:
> I have about 10 csv files I need to open to access data. It takes a lot > of time to search each file for the values I need. Would it be best to > just dump all the cvs files to an SQL db and then just grab what I need > from there? I'm starting to think it would make a lot of sense. What do > you guys think? grep foo whatever.csv | php ./script.php -- Greg Donald http://destiney.com/ |
|
|||
|
Danny Brow wrote:
> On Mon, 2008-03-10 at 22:36 -0400, Wolf wrote: >> Danny Brow wrote: >>> I have about 10 csv files I need to open to access data. It takes a lot >>> of time to search each file for the values I need. Would it be best to >>> just dump all the cvs files to an SQL db and then just grab what I need >>> from there? I'm starting to think it would make a lot of sense. What do >>> you guys think? >>> >>> Thanks, >>> Dan >> >> Dan, >> >> I can tell you that depending on the size of your files is going to >> dictate the route you want to go. I have a CSV with 568,000+ lines with >> 19 different pieces to each line. The files are around 180M apiece and >> it takes my server about 2 seconds to run a system grep against the >> files. I can run a recursive call 7 times against a MySQL database with >> the same information and it takes it about 4 seconds. >> >> IF you have system call ability, a grep wouldn't be bad, otherwise I'd >> suggest loading the csv files into MySQL tables and checking them for >> the information, then dropping the tables when you get the next files. >> You can backup the databases such as a cron job overnight even. >> >> HTH, >> Wolf >> >> > > > Thanks that sounds like a good idea. I'm still plugging away with how I > started. I want to know how much faster it will be going with a db. I > was actually thinking of using diff for each updated file to and upload > that to the DB. > > Dan Running a diff and loading the changes wouldn't be a bad way to go, one thing to take into account as well would be any tracking of the changes that you would need to do as well. IE, if you update an entry, write it to a changes table with a date attached as to when it happened. Wolf |
|
|||
|
Greg Donald wrote:
> On Mon, Mar 10, 2008 at 8:09 PM, Danny Brow <maillist@fullmotions.com> wrote: >> I have about 10 csv files I need to open to access data. It takes a lot >> of time to search each file for the values I need. Would it be best to >> just dump all the cvs files to an SQL db and then just grab what I need >> from there? I'm starting to think it would make a lot of sense. What do >> you guys think? > > grep foo whatever.csv | php ./script.php While that could work if the server is set up to allow you to run system calls, but you don't find that available outside of test environments. Also depending on the file sizes, you'll see some differences. Wolf |
|
|||
|
I should say that running 7 quueries against 200MB table with ~600 000 rows
for 4 seconds looks to me as if there was no indexes and very poor database design (i mean that CVS file is dumped into database incorrectly). Or you'r server is Pentium 100 :) P.S. Having myself a 1.5GB database with some tables 300MB+, having 1070 queries/sec average for last month. 2008/3/11, Wolf <LoneWolf@nc.rr.com>: > > Danny Brow wrote: > > I have about 10 csv files I need to open to access data. It takes a lot > > of time to search each file for the values I need. Would it be best to > > just dump all the cvs files to an SQL db and then just grab what I need > > from there? I'm starting to think it would make a lot of sense. What do > > you guys think? > > > > Thanks, > > Dan > > > > Dan, > > I can tell you that depending on the size of your files is going to > dictate the route you want to go. I have a CSV with 568,000+ lines with > 19 different pieces to each line. The files are around 180M apiece and > it takes my server about 2 seconds to run a system grep against the > files. I can run a recursive call 7 times against a MySQL database with > the same information and it takes it about 4 seconds. > > IF you have system call ability, a grep wouldn't be bad, otherwise I'd > suggest loading the csv files into MySQL tables and checking them for > the information, then dropping the tables when you get the next files. > You can backup the databases such as a cron job overnight even. > > HTH, > > Wolf > > > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > |
|
|||
|
Wolf wrote:
> Danny Brow wrote: >> I have about 10 csv files I need to open to access data. It takes a lot >> of time to search each file for the values I need. Would it be best to >> just dump all the cvs files to an SQL db and then just grab what I need >> from there? I'm starting to think it would make a lot of sense. What do >> you guys think? >> >> Thanks, >> Dan > > > Dan, > > I can tell you that depending on the size of your files is going to > dictate the route you want to go. I have a CSV with 568,000+ lines with > 19 different pieces to each line. The files are around 180M apiece and > it takes my server about 2 seconds to run a system grep against the > files. I can run a recursive call 7 times against a MySQL database with > the same information and it takes it about 4 seconds. > > IF you have system call ability, a grep wouldn't be bad, otherwise I'd > suggest loading the csv files into MySQL tables and checking them for > the information, then dropping the tables when you get the next files. > You can backup the databases such as a cron job overnight even. > > HTH, > Wolf > > If you do go the MySQL route, MySQL can import CSV files natively, and it'll be a lot faster than doing it through PHP. Just look up the syntax for the LOAD DATA INFILE command... or look here. http://dev.mysql.com/doc/refman/5.1/en/load-data.html Here's an example: LOAD DATA LOCAL INFILE '/importfile.csv' INTO TABLE test_table FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (field1, filed2, field3); I've had to do imports of a million or more records from CSV files, and PHP is a lot slower than MySQL at importing them :) -- Ray Hauge www.primateapplications.com |