This is a discussion on separate table vs. array-like list field within the MySQL Database forums, part of the Database Forums category; I'm very, very new to mysql and I've been toying with it the last couple weeks. I've ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I'm very, very new to mysql and I've been toying with it the last
couple weeks. I've hit upon a theoretical question that I would like to run by more experienced users. One of the things I've been doing to teach myself is creating databases and tables that might be similar to databases powering big, popular sites like Flickr or Digg. For example, I created a user table (id [primary key], name, nickname, email, location, birthday, profile, etc.) and then played around with querying it in various ways. One of the things I did to learn how to use "join" was to create a separate table for blocked users. Sites like Flickr and Digg often give users the ability to block other users. The way I implemented this was to create a table with the blocker's id and the "blockee's" id. Using that, I can query the blocked users table and cross-reference it to the user list and come up with the list of blocked users for any specified user. Pretty cool. So I wonder. On a big site with a million+ users, would having to query a separate table for a list of blocked users like that bog down the site or would it be preferable (in terms of speed and efficiency) to add a field to the user table that lists the id numbers of that user's blocked users, sort of a pseudo-array that I collapse and expand when needed (e.g., a blocked user array-like list might look like "150|11516|61878|9091|41064" etc. Any thoughts? Or am I still so new at this that this scenario doesn't make sense? --Rick |
|
|||
|
"Rick" <inkswamp@hotmail.com> wrote in message
news:1142819964.329383.55800@t31g2000cwb.googlegro ups.com... > So I wonder. On a big site with a million+ users, would having to query > a separate table for a list of blocked users like that bog down the > site or would it be preferable (in terms of speed and efficiency) to > add a field to the user table that lists the id numbers of that user's > blocked users, sort of a pseudo-array that I collapse and expand when > needed (e.g., a blocked user array-like list might look like > "150|11516|61878|9091|41064" etc. It's better to use another table. There are several reasons for this. - You can create constraints so that you know each blocker/blockee id in the table corresponds to an existing entry in the users table. - You don't have to worry about the string growing too long to fit in the field; just add records as needed. - You create and use your database based on the data model, not the physical model (this is kind of an abstract reason and hard to appreciate if you're new to database programming, but it more makes sense after you've had to manage badly-designed databases for a while). - Doing the join is something an RDBMS is designed to do efficiently. Expanding/collapsing a long array of numbers into a string is not. There are internal features (indexes) that may actually make the join be more speedy and efficient than implementing the same model using a string like you describe. Also, adding and deleting elements by inserting/deleting records is easier and more efficient. Take for example deleting. You'd have to fetch a really long string, search for the relevant id number, make a new string consisting of the prefix and suffix of the string, and then update the table to save the new string. Compare with deleting a row simply by using DELETE with appropriate WHERE criteria. No doubt there are other reasons why it's better to use the separate table instead of the list-in-a-string. Regards, Bill K. |
|
|||
|
Rick wrote:
> I'm very, very new to mysql and I've been toying with it the last > couple weeks. I've hit upon a theoretical question that I would like to > run by more experienced users. > > One of the things I've been doing to teach myself is creating databases > and tables that might be similar to databases powering big, popular > sites like Flickr or Digg. For example, I created a user table (id > [primary key], name, nickname, email, location, birthday, profile, > etc.) and then played around with querying it in various ways. > > One of the things I did to learn how to use "join" was to create a > separate table for blocked users. Sites like Flickr and Digg often give > users the ability to block other users. The way I implemented this was > to create a table with the blocker's id and the "blockee's" id. Using > that, I can query the blocked users table and cross-reference it to the > user list and come up with the list of blocked users for any specified > user. Pretty cool. > > So I wonder. On a big site with a million+ users, would having to query > a separate table for a list of blocked users like that bog down the > site or would it be preferable (in terms of speed and efficiency) to > add a field to the user table that lists the id numbers of that user's > blocked users, sort of a pseudo-array that I collapse and expand when > needed (e.g., a blocked user array-like list might look like > "150|11516|61878|9091|41064" etc. > > Any thoughts? Or am I still so new at this that this scenario doesn't > make sense? > > --Rick > First of all, you should check into database normalization. It helps understand how to design databases. Additionally, it is much more efficient to let the database do the searches rather than try to explode a field yourself, especially when you get into a lot of id's. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
>I'm very, very new to mysql and I've been toying with it the last
>couple weeks. I've hit upon a theoretical question that I would like to >run by more experienced users. > >One of the things I've been doing to teach myself is creating databases >and tables that might be similar to databases powering big, popular >sites like Flickr or Digg. For example, I created a user table (id >[primary key], name, nickname, email, location, birthday, profile, >etc.) and then played around with querying it in various ways. > >One of the things I did to learn how to use "join" was to create a >separate table for blocked users. Sites like Flickr and Digg often give >users the ability to block other users. The way I implemented this was >to create a table with the blocker's id and the "blockee's" id. Using >that, I can query the blocked users table and cross-reference it to the >user list and come up with the list of blocked users for any specified >user. Pretty cool. You can also do other things joining this table against other tables, like "get me a list of articles whose authors are NOT on the blocked list of this user". >So I wonder. On a big site with a million+ users, would having to query >a separate table for a list of blocked users like that bog down the >site or would it be preferable (in terms of speed and efficiency) to >add a field to the user table that lists the id numbers of that user's >blocked users, sort of a pseudo-array that I collapse and expand when >needed (e.g., a blocked user array-like list might look like >"150|11516|61878|9091|41064" etc. Consider what happens when you have a million+ users, EACH OF WHOM BLOCK HALF OF THE OTHER USERS. All that exploding and unexploding can get expensive. And you have to be prepared for the huge string lengths even if one user does that. Now, how do you retrieve a list of articles by unblocked authors? Assuming there's a reason to do it, how do you retrieve a list of users and how many other users blocked them? >Any thoughts? Or am I still so new at this that this scenario doesn't >make sense? Indexes can speed up lookups enormously. Your blocked user list with an index in (blocking user, blocked user) can quickly check the presence of an entry even if there's half a trillion entries in the table. Gordon L. Burditt |
|
|||
|
Thank you very much, all three of you.
Strangely, I would have bet (prior to this) that the pseudo-array approach would have been faster, but I appreciate the three of you taking the time to answer so thoroughly. What you've written makes a lot of sense and shows me where I'm making assumptions I ought not to make. |
|
|||
|
Rick wrote:
>Thank you very much, all three of you. > >Strangely, I would have bet (prior to this) that the pseudo-array >approach would have been faster, but I appreciate the three of you >taking the time to answer so thoroughly. What you've written makes a >lot of sense and shows me where I'm making assumptions I ought not to >make. > > > And thanks for asking this question. As a novice user of databases I have run into situations where storing comma separated data in a column seemed to make sense, and I have wondered about this myself. -- ***************************** Chuck Anderson • Boulder, CO http://www.CycleTourist.com Integrity is obvious. The lack of it is common. ***************************** |