This is a discussion on consolidation of multiple rows within the MySQL Database forums, part of the Database Forums category; Greetings all, I've got a table raw_Companies --nname --address --phone --fax --email --homepage with no keys defined and a ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Greetings all,
I've got a table raw_Companies --nname --address --phone --fax --homepage with no keys defined and a table Companies with the same structure but with nname declared as a primary key. Once raw_Companies gets loaded up I see where there are multiple rows with the same value for nname but missing data in the other fields. One row may have an address and phone while another will have only an email, while a third may have an address and a homepage. I need a way to take the rows in raw_Companies with the same nname and consolidate them into one composite row inserted into Companies. So from IBM----1234 Main Street,Rochester,NY----\N----\N----sales@ibm.com---- www.ibm.com IBM----\N----212-745-9324----212-745-0158----\N----\N in raw_Companies to IBM----1234 Main Street,Rochester,NY----212-745-9324----212-745-0158---- sales@ibm.com----www.ibm.com in Companies Any help/hints/suggestions/code would be greatly appreciated. TIA, Still-learning Steve |
|
|||
|
Perhaps a grouping/max setup? Something like this (untested):
select nname, max(address), max(phone)... from raw_companies group by nname -- Kevin G. Boles Indicium Resources, Inc. SQL Server MVP kgboles a earthlink dt net <canoe414@telecom-digest.zzn.com> wrote in message news:4f04fa64-b865-4bf1-a7ed-498872b6bd74@u10g2000prn.googlegroups.com... > Greetings all, > > I've got a table raw_Companies > --nname > --address > --phone > --fax > --homepage > > with no keys defined > and a table Companies > with the same structure but with nname declared as a primary key. > > Once raw_Companies gets loaded up I see where there are multiple rows > with the same value for nname but missing data in the other fields. > One row may have an address and phone while another will have only an > email, while a third may have an address and a homepage. > > I need a way to take the rows in raw_Companies with the same nname and > consolidate them into one composite row inserted into Companies. > > So from > > IBM----1234 Main Street,Rochester,NY----\N----\N----sales@ibm.com---- > www.ibm.com > IBM----\N----212-745-9324----212-745-0158----\N----\N > > in raw_Companies to > > IBM----1234 Main > Street,Rochester,NY----212-745-9324----212-745-0158---- > sales@ibm.com----www.ibm.com > > in Companies > > Any help/hints/suggestions/code would be greatly appreciated. > > TIA, > > Still-learning Steve |
|
|||
|
On Mar 5, 11:19 am, canoe...@telecom-digest.zzn.com wrote:
> Greetings all, > > I've got a table raw_Companies > --nname > --address > --phone > --fax > --homepage > > with no keys defined > and a table Companies > with the same structure but with nname declared as a primary key. > > Once raw_Companies gets loaded up I see where there are multiple rows > with the same value for nname but missing data in the other fields. > One row may have an address and phone while another will have only an > email, while a third may have an address and a homepage. > > I need a way to take the rows in raw_Companies with the same nname and > consolidate them into one composite row inserted into Companies. > > So from > > IBM----1234 Main Street,Rochester,NY----\N----\N----sa...@ibm.com----www.ibm.com > IBM----\N----212-745-9324----212-745-0158----\N----\N > > in raw_Companies to > > IBM----1234 Main > Street,Rochester,NY----212-745-9324----212-745-0158---- > sa...@ibm.com----www.ibm.com > > in Companies > > Any help/hints/suggestions/code would be greatly appreciated. > > TIA, > > Still-learning Steve A table like raw_Companies is called a staging table and Companies is the production table. If you want the data to load automatically into companies, a PL/SQL procedure or trigger is one way to do it. You can audit the changes (e.g. put in a log table messages like 'ABC Inc. : street address changed from 1234 Main Street to 123A Main Street") as well as logging errors (e.g., Update failed), checking data consistency (Zip code 11234 is invalid for Ohio), and even outside data validation (look up address in the USPS yields Zip code but also indicates the address is at least valid). So it's time to step out of straight SQL and into PL/SQL. HTH, Ed |
|
|||
|
1) If two of the rows in your raw data staging table have different
values for the same attribute, say a phone number, which one do you pick or does it matter? If not, then use this skeleton: SELECT company_name, MAX(phone_nbr), MAX(..), .. FROM RawData GROUP BY company_name; else find the conflicts: SELECT company_name AS conflicting_data FROM RawData GROUP BY company_name HAVING COALESCE (NULLIF(COUNT (DISTINCT phone_nbr), 1), NULLIF(COUNT (DISTINCT ..), 1), .. ) IS NOT NULL ; 2) If two of the rows in your raw data staging table have the same value for the same attribute, do you need to know or do you just use it? 3) If all of the rows in your raw data have NULLs for an attribute, do you use the NULL or want to now about it? There are a lot of games we can play with a HAVING clause on the raw data. You might want to get a copy of THINKING IN SETS and look at Chapter 10, "Scrubbing Data with Non-1NF Tables" for some more ideas. |
|
|||
|
On Mar 5, 12:49*pm, --CELKO-- <jcelko...@earthlink.net> wrote:
> 1) If two of the rows in your raw data staging table have different > values for the same attribute, say a phone number, which one do you > pick or does it matter? *If not, then use this skeleton: > > SELECT company_name, MAX(phone_nbr), MAX(..), .. > * FROM RawData > *GROUP BY company_name; > > else find the conflicts: > > SELECT company_name AS conflicting_data > * FROM RawData > *GROUP BY company_name > HAVING COALESCE (NULLIF(COUNT (DISTINCT phone_nbr), 1), NULLIF(COUNT > (DISTINCT ..), 1), .. ) IS NOT NULL ; > Makes sense ... perhaps the OP can give us some more background of how and why info gets into the raw table. > There are a lot of games we can play with a HAVING clause on the raw > data. *You might want to get a copy of THINKING IN SETS and look at > Chapter 10, "Scrubbing Data with Non-1NF Tables" for some more > ideas. What is "Thinking in sets"? and who wrote it ... not familiar with that title. |
|
|||
|
hpuxrac wrote:
> On Mar 5, 12:49 pm, --CELKO-- <jcelko...@earthlink.net> wrote: >> 1) If two of the rows in your raw data staging table have different >> values for the same attribute, say a phone number, which one do you >> pick or does it matter? If not, then use this skeleton: >> >> SELECT company_name, MAX(phone_nbr), MAX(..), .. >> FROM RawData >> GROUP BY company_name; >> >> else find the conflicts: >> >> SELECT company_name AS conflicting_data >> FROM RawData >> GROUP BY company_name >> HAVING COALESCE (NULLIF(COUNT (DISTINCT phone_nbr), 1), NULLIF(COUNT >> (DISTINCT ..), 1), .. ) IS NOT NULL ; >> > > Makes sense ... perhaps the OP can give us some more background of > how and why info gets into the raw table. > >> There are a lot of games we can play with a HAVING clause on the raw >> data. You might want to get a copy of THINKING IN SETS and look at >> Chapter 10, "Scrubbing Data with Non-1NF Tables" for some more >> ideas. > > What is "Thinking in sets"? and who wrote it ... not familiar with > that title. One could hardly be familiar with the title, as it is copyrighted in 2008 (any decent internet bookstore will show you). Quite unsurprisingly, it is written by Joe Celko. I don't think he has any reason to be ashamed of some shameless self-promotion... Keep up the good work, mr. Celko! Ruud. |
|
|||
|
>> What is "Thinking in sets"? and who wrote it <<
http://www.elsevierdirect.com/produc...=9780123741370 and me. |
|
|||
|
> Nor do I! *I remember being asked by a priest if I was ever bothered
> by evil thoughts. *"**Bothered** by them? *No." <troll> It's good to go to the experts - I've often wondered if they keep the really good ones all to themselves. </troll> |
![]() |
| Thread Tools | |
| Display Modes | |
|
|