Best way to optimize this database?

This is a discussion on Best way to optimize this database? within the MySQL Database forums, part of the Database Forums category; I have a database that I put together that is about 11+ Million Rows. The .sql dump is about 2....


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 06-23-2007
Derrick Shields
 
Posts: n/a
Default Best way to optimize this database?

I have a database that I put together that is about 11+ Million Rows.
The .sql dump is about 2.5 gigs. I have done NO optimization to this
database yet - I want to make sure I'm doing the correct thing
first.

Ideally I'd like to edit the .sql dump to create any indexes/
partitions that are needed, so that when I execute /. the script
everything will happen automatically.

I believe the two colums that will be searched upon most will be the
"name" column and "city" column, possibly a "state" column as well.
i.e. looking for "smith" in "Brooklyn" or "NY" something like that.

Here is my question:

Because many queries on my Website will be search terms, and the
database is so big, I will create indexes for the name + city
columns. But is that it? Is that all I need to do for an optimized
database?

Some of the Oracle guys I work with on a daily basis suggesting
looking into partitions... is that necessary on a Database this
size?

Second part to the question: if I am editing a .sql dump, where in the
code can I put the CREATE INDEX code? I apologize if this seems
basic, I am not a DBA be any means and until recently had only been
working with small databases an no need for advanced SQL syntax.

Thanks much!

Reply With Quote
  #2 (permalink)  
Old 06-23-2007
J.O. Aho
 
Posts: n/a
Default Re: Best way to optimize this database?

Derrick Shields wrote:
>
> Because many queries on my Website will be search terms, and the
> database is so big, I will create indexes for the name + city
> columns. But is that it? Is that all I need to do for an optimized
> database?


indexing will improve the search time, also searching for a string takes
longer time than searching for a value, so it could be a point to break out
the state to it's own table and you leave a "state key" column instead.
In the new table you can have "state key","state name" and "state short form",
this way the person can search on both NJ and New Jersey. This will also help
you to uniform the data, so you won't have "New Jersey", "new Jersey", "New
jersey", "NewJersey", "NJ" and so on...


> Some of the Oracle guys I work with on a daily basis suggesting
> looking into partitions... is that necessary on a Database this
> size?


Of course the file system affects the speed of the database, I would recommend
JFS, it has a quite good performance overall and is stable too.


> Second part to the question: if I am editing a .sql dump, where in the
> code can I put the CREATE INDEX code? I apologize if this seems
> basic, I am not a DBA be any means and until recently had only been
> working with small databases an no need for advanced SQL syntax.


The .sql file is just a load of SQL statements, create your indexes after you
stored all your data.


--

//Aho
Reply With Quote
  #3 (permalink)  
Old 06-23-2007
Axel Schwenke
 
Posts: n/a
Default Re: Best way to optimize this database?

Derrick Shields <derrick.shields@gmail.com> wrote:

> I have a database that I put together that is about 11+ Million Rows.
> The .sql dump is about 2.5 gigs. I have done NO optimization to this
> database yet - I want to make sure I'm doing the correct thing
> first.
>
> Ideally I'd like to edit the .sql dump to create any indexes/
> partitions that are needed, so that when I execute /. the script
> everything will happen automatically.


Loading databases from a dump is something that is done in an emergency
(i.e. recovering from data loss) or to transport a database.
If you want to add indexes, add them to your live database. Then the
next dump will contain those indexes as well.

> I believe the two colums that will be searched upon most will be the
> "name" column and "city" column, possibly a "state" column as well.
> i.e. looking for "smith" in "Brooklyn" or "NY" something like that.


"searched upon" is a *very* broad field. What exactly do you call a
"search"? SELECT ... WHERE ... LIKE '%search term%'? This kind of
query will not use an ordinary BTREE index. For some such cases of
"search" a FULLTEXT index will help (with special syntax, consult the
manual please). For many other cases a "real" search engine like
Lucene will be needed.

> Here is my question:
>
> Because many queries on my Website will be search terms, and the
> database is so big, I will create indexes for the name + city
> columns. But is that it? Is that all I need to do for an optimized
> database?


Of course not. The typical workflow to optimize a database is like so:

1. normalize your data
2. make sure your data is normalized
3. check again to be 100% sure your data is normalized
4. measure speed of your application; if anything is fast enough: STOP
5. identify[2] bottlenecks[1]
6. find the bottleneck with the most impact and optimize[3] it
7. go back to 4.


[1] A bottleneck can be a single SQL statement that executes slowly
or it can be a table with too much contention. Or else.

[2] MySQL can log queries that need longer than a certain time or
are not using indexes to the slow-query-log.

[3] this can be anything from simply adding indexes, query rewriting,
server parameter tuning to purchasing hardware.
Or even denormalizing.

> Some of the Oracle guys I work with on a daily basis suggesting
> looking into partitions... is that necessary on a Database this
> size?


Partitions are no silver bullet. A DBA with brain is.


XL
--
Axel Schwenke, Support Engineer, MySQL AB

Online User Manual: http://dev.mysql.com/doc/refman/5.0/en/
MySQL User Forums: http://forums.mysql.com/
Reply With Quote
  #4 (permalink)  
Old 06-23-2007
Derrick Shields
 
Posts: n/a
Default Re: Best way to optimize this database?

On Jun 23, 3:29 am, Axel Schwenke <axel.schwe...@gmx.de> wrote:
> Derrick Shields <derrick.shie...@gmail.com> wrote:
> > I have a database that I put together that is about 11+ Million Rows.
> > The .sql dump is about 2.5 gigs. I have done NO optimization to this
> > database yet - I want to make sure I'm doing the correct thing
> > first.

>
> > Ideally I'd like to edit the .sql dump to create any indexes/
> > partitions that are needed, so that when I execute /. the script
> > everything will happen automatically.

>
> Loading databases from a dump is something that is done in an emergency
> (i.e. recovering from data loss) or to transport a database.
> If you want to add indexes, add them to your live database. Then the
> next dump will contain those indexes as well.
>
> > I believe the two colums that will be searched upon most will be the
> > "name" column and "city" column, possibly a "state" column as well.
> > i.e. looking for "smith" in "Brooklyn" or "NY" something like that.

>
> "searched upon" is a *very* broad field. What exactly do you call a
> "search"? SELECT ... WHERE ... LIKE '%search term%'? This kind of
> query will not use an ordinary BTREE index. For some such cases of
> "search" a FULLTEXT index will help (with special syntax, consult the
> manual please). For many other cases a "real" search engine like
> Lucene will be needed.
>
> > Here is my question:

>
> > Because many queries on my Website will be search terms, and the
> > database is so big, I will create indexes for the name + city
> > columns. But is that it? Is that all I need to do for an optimized
> > database?

>
> Of course not. The typical workflow to optimize a database is like so:
>
> 1. normalize your data
> 2. make sure your data is normalized
> 3. check again to be 100% sure your data is normalized
> 4. measure speed of your application; if anything is fast enough: STOP
> 5. identify[2] bottlenecks[1]
> 6. find the bottleneck with the most impact and optimize[3] it
> 7. go back to 4.
>
> [1] A bottleneck can be a single SQL statement that executes slowly
> or it can be a table with too much contention. Or else.
>
> [2] MySQL can log queries that need longer than a certain time or
> are not using indexes to the slow-query-log.
>
> [3] this can be anything from simply adding indexes, query rewriting,
> server parameter tuning to purchasing hardware.
> Or even denormalizing.
>
> > Some of the Oracle guys I work with on a daily basis suggesting
> > looking into partitions... is that necessary on a Database this
> > size?

>
> Partitions are no silver bullet. A DBA with brain is.
>
> XL
> --
> Axel Schwenke, Support Engineer, MySQL AB
>
> Online User Manual:http://dev.mysql.com/doc/refman/5.0/en/
> MySQL User Forums: http://forums.mysql.com/


Thanks for the responses guys. A typical search might be on "Target"
in "New Jersey" for example. So, I would imagine a query would look
like this:

SELECT * FROM places WHERE name LIKE '%target%' AND state = 'NJ';

Given this syntax, however, I know that putting wildcards on both
sides of "target" won't allow the index to work (correct?)

So I need to figure out how to have the search results return both
"Super Target" and "Target Greatland".

Would I still be doing an index on the "name" column? Would it make
sense to do something like this:

SELECT * FROM places WHERE name LIKE '%target' OR name LIKE 'target%',
AND state = 'NJ';

I really appreciate the help on this. This is pretty new to me, so
it's greatly appreciated.

Reply With Quote
  #5 (permalink)  
Old 06-23-2007
Paul Lautman
 
Posts: n/a
Default Re: Best way to optimize this database?

Derrick Shields wrote:
> On Jun 23, 3:29 am, Axel Schwenke <axel.schwe...@gmx.de> wrote:
>> Derrick Shields <derrick.shie...@gmail.com> wrote:
>>> I have a database that I put together that is about 11+ Million
>>> Rows. The .sql dump is about 2.5 gigs. I have done NO optimization
>>> to this database yet - I want to make sure I'm doing the correct
>>> thing first.

>>
>>> Ideally I'd like to edit the .sql dump to create any indexes/
>>> partitions that are needed, so that when I execute /. the script
>>> everything will happen automatically.

>>
>> Loading databases from a dump is something that is done in an
>> emergency (i.e. recovering from data loss) or to transport a
>> database.
>> If you want to add indexes, add them to your live database. Then the
>> next dump will contain those indexes as well.
>>
>>> I believe the two colums that will be searched upon most will be the
>>> "name" column and "city" column, possibly a "state" column as well.
>>> i.e. looking for "smith" in "Brooklyn" or "NY" something like that.

>>
>> "searched upon" is a *very* broad field. What exactly do you call a
>> "search"? SELECT ... WHERE ... LIKE '%search term%'? This kind of
>> query will not use an ordinary BTREE index. For some such cases of
>> "search" a FULLTEXT index will help (with special syntax, consult the
>> manual please). For many other cases a "real" search engine like
>> Lucene will be needed.
>>
>>> Here is my question:

>>
>>> Because many queries on my Website will be search terms, and the
>>> database is so big, I will create indexes for the name + city
>>> columns. But is that it? Is that all I need to do for an optimized
>>> database?

>>
>> Of course not. The typical workflow to optimize a database is like
>> so:
>>
>> 1. normalize your data
>> 2. make sure your data is normalized
>> 3. check again to be 100% sure your data is normalized
>> 4. measure speed of your application; if anything is fast enough:
>> STOP
>> 5. identify[2] bottlenecks[1]
>> 6. find the bottleneck with the most impact and optimize[3] it
>> 7. go back to 4.
>>
>> [1] A bottleneck can be a single SQL statement that executes slowly
>> or it can be a table with too much contention. Or else.
>>
>> [2] MySQL can log queries that need longer than a certain time or
>> are not using indexes to the slow-query-log.
>>
>> [3] this can be anything from simply adding indexes, query rewriting,
>> server parameter tuning to purchasing hardware.
>> Or even denormalizing.
>>
>>> Some of the Oracle guys I work with on a daily basis suggesting
>>> looking into partitions... is that necessary on a Database this
>>> size?

>>
>> Partitions are no silver bullet. A DBA with brain is.
>>
>> XL
>> --
>> Axel Schwenke, Support Engineer, MySQL AB
>>
>> Online User Manual:http://dev.mysql.com/doc/refman/5.0/en/
>> MySQL User Forums: http://forums.mysql.com/

>
> Thanks for the responses guys. A typical search might be on "Target"
> in "New Jersey" for example. So, I would imagine a query would look
> like this:
>
> SELECT * FROM places WHERE name LIKE '%target%' AND state = 'NJ';
>
> Given this syntax, however, I know that putting wildcards on both
> sides of "target" won't allow the index to work (correct?)
>
> So I need to figure out how to have the search results return both
> "Super Target" and "Target Greatland".
>
> Would I still be doing an index on the "name" column? Would it make
> sense to do something like this:
>
> SELECT * FROM places WHERE name LIKE '%target' OR name LIKE 'target%',
> AND state = 'NJ';
>
> I really appreciate the help on this. This is pretty new to me, so
> it's greatly appreciated.


Axel already answered this (hint: FULLTEXT)


Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 08:30 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0