This is a discussion on Best way to optimize this database? within the MySQL Database forums, part of the Database Forums category; I have a database that I put together that is about 11+ Million Rows. The .sql dump is about 2....
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have a database that I put together that is about 11+ Million Rows.
The .sql dump is about 2.5 gigs. I have done NO optimization to this database yet - I want to make sure I'm doing the correct thing first. Ideally I'd like to edit the .sql dump to create any indexes/ partitions that are needed, so that when I execute /. the script everything will happen automatically. I believe the two colums that will be searched upon most will be the "name" column and "city" column, possibly a "state" column as well. i.e. looking for "smith" in "Brooklyn" or "NY" something like that. Here is my question: Because many queries on my Website will be search terms, and the database is so big, I will create indexes for the name + city columns. But is that it? Is that all I need to do for an optimized database? Some of the Oracle guys I work with on a daily basis suggesting looking into partitions... is that necessary on a Database this size? Second part to the question: if I am editing a .sql dump, where in the code can I put the CREATE INDEX code? I apologize if this seems basic, I am not a DBA be any means and until recently had only been working with small databases an no need for advanced SQL syntax. Thanks much! |
|
|||
|
Derrick Shields wrote:
> > Because many queries on my Website will be search terms, and the > database is so big, I will create indexes for the name + city > columns. But is that it? Is that all I need to do for an optimized > database? indexing will improve the search time, also searching for a string takes longer time than searching for a value, so it could be a point to break out the state to it's own table and you leave a "state key" column instead. In the new table you can have "state key","state name" and "state short form", this way the person can search on both NJ and New Jersey. This will also help you to uniform the data, so you won't have "New Jersey", "new Jersey", "New jersey", "NewJersey", "NJ" and so on... > Some of the Oracle guys I work with on a daily basis suggesting > looking into partitions... is that necessary on a Database this > size? Of course the file system affects the speed of the database, I would recommend JFS, it has a quite good performance overall and is stable too. > Second part to the question: if I am editing a .sql dump, where in the > code can I put the CREATE INDEX code? I apologize if this seems > basic, I am not a DBA be any means and until recently had only been > working with small databases an no need for advanced SQL syntax. The .sql file is just a load of SQL statements, create your indexes after you stored all your data. -- //Aho |
|
|||
|
Derrick Shields <derrick.shields@gmail.com> wrote:
> I have a database that I put together that is about 11+ Million Rows. > The .sql dump is about 2.5 gigs. I have done NO optimization to this > database yet - I want to make sure I'm doing the correct thing > first. > > Ideally I'd like to edit the .sql dump to create any indexes/ > partitions that are needed, so that when I execute /. the script > everything will happen automatically. Loading databases from a dump is something that is done in an emergency (i.e. recovering from data loss) or to transport a database. If you want to add indexes, add them to your live database. Then the next dump will contain those indexes as well. > I believe the two colums that will be searched upon most will be the > "name" column and "city" column, possibly a "state" column as well. > i.e. looking for "smith" in "Brooklyn" or "NY" something like that. "searched upon" is a *very* broad field. What exactly do you call a "search"? SELECT ... WHERE ... LIKE '%search term%'? This kind of query will not use an ordinary BTREE index. For some such cases of "search" a FULLTEXT index will help (with special syntax, consult the manual please). For many other cases a "real" search engine like Lucene will be needed. > Here is my question: > > Because many queries on my Website will be search terms, and the > database is so big, I will create indexes for the name + city > columns. But is that it? Is that all I need to do for an optimized > database? Of course not. The typical workflow to optimize a database is like so: 1. normalize your data 2. make sure your data is normalized 3. check again to be 100% sure your data is normalized 4. measure speed of your application; if anything is fast enough: STOP 5. identify[2] bottlenecks[1] 6. find the bottleneck with the most impact and optimize[3] it 7. go back to 4. [1] A bottleneck can be a single SQL statement that executes slowly or it can be a table with too much contention. Or else. [2] MySQL can log queries that need longer than a certain time or are not using indexes to the slow-query-log. [3] this can be anything from simply adding indexes, query rewriting, server parameter tuning to purchasing hardware. Or even denormalizing. > Some of the Oracle guys I work with on a daily basis suggesting > looking into partitions... is that necessary on a Database this > size? Partitions are no silver bullet. A DBA with brain is. XL -- Axel Schwenke, Support Engineer, MySQL AB Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/ MySQL User Forums: http://forums.mysql.com/ |
|
|||
|
On Jun 23, 3:29 am, Axel Schwenke <axel.schwe...@gmx.de> wrote:
> Derrick Shields <derrick.shie...@gmail.com> wrote: > > I have a database that I put together that is about 11+ Million Rows. > > The .sql dump is about 2.5 gigs. I have done NO optimization to this > > database yet - I want to make sure I'm doing the correct thing > > first. > > > Ideally I'd like to edit the .sql dump to create any indexes/ > > partitions that are needed, so that when I execute /. the script > > everything will happen automatically. > > Loading databases from a dump is something that is done in an emergency > (i.e. recovering from data loss) or to transport a database. > If you want to add indexes, add them to your live database. Then the > next dump will contain those indexes as well. > > > I believe the two colums that will be searched upon most will be the > > "name" column and "city" column, possibly a "state" column as well. > > i.e. looking for "smith" in "Brooklyn" or "NY" something like that. > > "searched upon" is a *very* broad field. What exactly do you call a > "search"? SELECT ... WHERE ... LIKE '%search term%'? This kind of > query will not use an ordinary BTREE index. For some such cases of > "search" a FULLTEXT index will help (with special syntax, consult the > manual please). For many other cases a "real" search engine like > Lucene will be needed. > > > Here is my question: > > > Because many queries on my Website will be search terms, and the > > database is so big, I will create indexes for the name + city > > columns. But is that it? Is that all I need to do for an optimized > > database? > > Of course not. The typical workflow to optimize a database is like so: > > 1. normalize your data > 2. make sure your data is normalized > 3. check again to be 100% sure your data is normalized > 4. measure speed of your application; if anything is fast enough: STOP > 5. identify[2] bottlenecks[1] > 6. find the bottleneck with the most impact and optimize[3] it > 7. go back to 4. > > [1] A bottleneck can be a single SQL statement that executes slowly > or it can be a table with too much contention. Or else. > > [2] MySQL can log queries that need longer than a certain time or > are not using indexes to the slow-query-log. > > [3] this can be anything from simply adding indexes, query rewriting, > server parameter tuning to purchasing hardware. > Or even denormalizing. > > > Some of the Oracle guys I work with on a daily basis suggesting > > looking into partitions... is that necessary on a Database this > > size? > > Partitions are no silver bullet. A DBA with brain is. > > XL > -- > Axel Schwenke, Support Engineer, MySQL AB > > Online User Manual:http://dev.mysql.com/doc/refman/5.0/en/ > MySQL User Forums: http://forums.mysql.com/ Thanks for the responses guys. A typical search might be on "Target" in "New Jersey" for example. So, I would imagine a query would look like this: SELECT * FROM places WHERE name LIKE '%target%' AND state = 'NJ'; Given this syntax, however, I know that putting wildcards on both sides of "target" won't allow the index to work (correct?) So I need to figure out how to have the search results return both "Super Target" and "Target Greatland". Would I still be doing an index on the "name" column? Would it make sense to do something like this: SELECT * FROM places WHERE name LIKE '%target' OR name LIKE 'target%', AND state = 'NJ'; I really appreciate the help on this. This is pretty new to me, so it's greatly appreciated. |
|
|||
|
Derrick Shields wrote:
> On Jun 23, 3:29 am, Axel Schwenke <axel.schwe...@gmx.de> wrote: >> Derrick Shields <derrick.shie...@gmail.com> wrote: >>> I have a database that I put together that is about 11+ Million >>> Rows. The .sql dump is about 2.5 gigs. I have done NO optimization >>> to this database yet - I want to make sure I'm doing the correct >>> thing first. >> >>> Ideally I'd like to edit the .sql dump to create any indexes/ >>> partitions that are needed, so that when I execute /. the script >>> everything will happen automatically. >> >> Loading databases from a dump is something that is done in an >> emergency (i.e. recovering from data loss) or to transport a >> database. >> If you want to add indexes, add them to your live database. Then the >> next dump will contain those indexes as well. >> >>> I believe the two colums that will be searched upon most will be the >>> "name" column and "city" column, possibly a "state" column as well. >>> i.e. looking for "smith" in "Brooklyn" or "NY" something like that. >> >> "searched upon" is a *very* broad field. What exactly do you call a >> "search"? SELECT ... WHERE ... LIKE '%search term%'? This kind of >> query will not use an ordinary BTREE index. For some such cases of >> "search" a FULLTEXT index will help (with special syntax, consult the >> manual please). For many other cases a "real" search engine like >> Lucene will be needed. >> >>> Here is my question: >> >>> Because many queries on my Website will be search terms, and the >>> database is so big, I will create indexes for the name + city >>> columns. But is that it? Is that all I need to do for an optimized >>> database? >> >> Of course not. The typical workflow to optimize a database is like >> so: >> >> 1. normalize your data >> 2. make sure your data is normalized >> 3. check again to be 100% sure your data is normalized >> 4. measure speed of your application; if anything is fast enough: >> STOP >> 5. identify[2] bottlenecks[1] >> 6. find the bottleneck with the most impact and optimize[3] it >> 7. go back to 4. >> >> [1] A bottleneck can be a single SQL statement that executes slowly >> or it can be a table with too much contention. Or else. >> >> [2] MySQL can log queries that need longer than a certain time or >> are not using indexes to the slow-query-log. >> >> [3] this can be anything from simply adding indexes, query rewriting, >> server parameter tuning to purchasing hardware. >> Or even denormalizing. >> >>> Some of the Oracle guys I work with on a daily basis suggesting >>> looking into partitions... is that necessary on a Database this >>> size? >> >> Partitions are no silver bullet. A DBA with brain is. >> >> XL >> -- >> Axel Schwenke, Support Engineer, MySQL AB >> >> Online User Manual:http://dev.mysql.com/doc/refman/5.0/en/ >> MySQL User Forums: http://forums.mysql.com/ > > Thanks for the responses guys. A typical search might be on "Target" > in "New Jersey" for example. So, I would imagine a query would look > like this: > > SELECT * FROM places WHERE name LIKE '%target%' AND state = 'NJ'; > > Given this syntax, however, I know that putting wildcards on both > sides of "target" won't allow the index to work (correct?) > > So I need to figure out how to have the search results return both > "Super Target" and "Target Greatland". > > Would I still be doing an index on the "name" column? Would it make > sense to do something like this: > > SELECT * FROM places WHERE name LIKE '%target' OR name LIKE 'target%', > AND state = 'NJ'; > > I really appreciate the help on this. This is pretty new to me, so > it's greatly appreciated. Axel already answered this (hint: FULLTEXT) |