consolidation of multiple rows

This is a discussion on consolidation of multiple rows within the MySQL Database forums, part of the Database Forums category; Greetings all, I've got a table raw_Companies --nname --address --phone --fax --email --homepage with no keys defined and a ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 03-05-2008
canoe414@telecom-digest.zzn.com
 
Posts: n/a
Default consolidation of multiple rows

Greetings all,

I've got a table raw_Companies
--nname
--address
--phone
--fax
--email
--homepage

with no keys defined
and a table Companies
with the same structure but with nname declared as a primary key.

Once raw_Companies gets loaded up I see where there are multiple rows
with the same value for nname but missing data in the other fields.
One row may have an address and phone while another will have only an
email, while a third may have an address and a homepage.

I need a way to take the rows in raw_Companies with the same nname and
consolidate them into one composite row inserted into Companies.

So from

IBM----1234 Main Street,Rochester,NY----\N----\N----sales@ibm.com----
www.ibm.com
IBM----\N----212-745-9324----212-745-0158----\N----\N

in raw_Companies to

IBM----1234 Main
Street,Rochester,NY----212-745-9324----212-745-0158----
sales@ibm.com----www.ibm.com

in Companies

Any help/hints/suggestions/code would be greatly appreciated.

TIA,

Still-learning Steve
Reply With Quote
  #2 (permalink)  
Old 03-05-2008
TheSQLGuru
 
Posts: n/a
Default Re: consolidation of multiple rows

Perhaps a grouping/max setup? Something like this (untested):

select nname, max(address), max(phone)...
from raw_companies
group by nname


--
Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net


<canoe414@telecom-digest.zzn.com> wrote in message
news:4f04fa64-b865-4bf1-a7ed-498872b6bd74@u10g2000prn.googlegroups.com...
> Greetings all,
>
> I've got a table raw_Companies
> --nname
> --address
> --phone
> --fax
> --email
> --homepage
>
> with no keys defined
> and a table Companies
> with the same structure but with nname declared as a primary key.
>
> Once raw_Companies gets loaded up I see where there are multiple rows
> with the same value for nname but missing data in the other fields.
> One row may have an address and phone while another will have only an
> email, while a third may have an address and a homepage.
>
> I need a way to take the rows in raw_Companies with the same nname and
> consolidate them into one composite row inserted into Companies.
>
> So from
>
> IBM----1234 Main Street,Rochester,NY----\N----\N----sales@ibm.com----
> www.ibm.com
> IBM----\N----212-745-9324----212-745-0158----\N----\N
>
> in raw_Companies to
>
> IBM----1234 Main
> Street,Rochester,NY----212-745-9324----212-745-0158----
> sales@ibm.com----www.ibm.com
>
> in Companies
>
> Any help/hints/suggestions/code would be greatly appreciated.
>
> TIA,
>
> Still-learning Steve



Reply With Quote
  #3 (permalink)  
Old 03-05-2008
Ed Prochak
 
Posts: n/a
Default Re: consolidation of multiple rows

On Mar 5, 11:19 am, canoe...@telecom-digest.zzn.com wrote:
> Greetings all,
>
> I've got a table raw_Companies
> --nname
> --address
> --phone
> --fax
> --email
> --homepage
>
> with no keys defined
> and a table Companies
> with the same structure but with nname declared as a primary key.
>
> Once raw_Companies gets loaded up I see where there are multiple rows
> with the same value for nname but missing data in the other fields.
> One row may have an address and phone while another will have only an
> email, while a third may have an address and a homepage.
>
> I need a way to take the rows in raw_Companies with the same nname and
> consolidate them into one composite row inserted into Companies.
>
> So from
>
> IBM----1234 Main Street,Rochester,NY----\N----\N----sa...@ibm.com----www.ibm.com
> IBM----\N----212-745-9324----212-745-0158----\N----\N
>
> in raw_Companies to
>
> IBM----1234 Main
> Street,Rochester,NY----212-745-9324----212-745-0158----
> sa...@ibm.com----www.ibm.com
>
> in Companies
>
> Any help/hints/suggestions/code would be greatly appreciated.
>
> TIA,
>
> Still-learning Steve


A table like raw_Companies is called a staging table and Companies is
the production table.
If you want the data to load automatically into companies, a PL/SQL
procedure or trigger is one way to do it.

You can audit the changes (e.g. put in a log table messages like 'ABC
Inc. : street address changed from 1234 Main Street to 123A Main
Street") as well as logging errors (e.g., Update failed), checking
data consistency (Zip code 11234 is invalid for Ohio), and even
outside data validation (look up address in the USPS yields Zip code
but also indicates the address is at least valid).

So it's time to step out of straight SQL and into PL/SQL.
HTH,

Ed



Reply With Quote
  #4 (permalink)  
Old 03-05-2008
--CELKO--
 
Posts: n/a
Default Re: consolidation of multiple rows

1) If two of the rows in your raw data staging table have different
values for the same attribute, say a phone number, which one do you
pick or does it matter? If not, then use this skeleton:

SELECT company_name, MAX(phone_nbr), MAX(..), ..
FROM RawData
GROUP BY company_name;

else find the conflicts:

SELECT company_name AS conflicting_data
FROM RawData
GROUP BY company_name
HAVING COALESCE (NULLIF(COUNT (DISTINCT phone_nbr), 1), NULLIF(COUNT
(DISTINCT ..), 1), .. ) IS NOT NULL ;

2) If two of the rows in your raw data staging table have the same
value for the same attribute, do you need to know or do you just use
it?

3) If all of the rows in your raw data have NULLs for an attribute, do
you use the NULL or want to now about it?

There are a lot of games we can play with a HAVING clause on the raw
data. You might want to get a copy of THINKING IN SETS and look at
Chapter 10, "Scrubbing Data with Non-1NF Tables" for some more
ideas.
Reply With Quote
  #5 (permalink)  
Old 03-05-2008
--CELKO--
 
Posts: n/a
Default Re: consolidation of multiple rows

>> So it's time to step out of straight SQL and into PL/SQL. <<

I'll go one step further and tell the guy took at Melissa Data address
scrubing products.
Reply With Quote
  #6 (permalink)  
Old 03-05-2008
hpuxrac
 
Posts: n/a
Default Re: consolidation of multiple rows

On Mar 5, 12:49*pm, --CELKO-- <jcelko...@earthlink.net> wrote:
> 1) If two of the rows in your raw data staging table have different
> values for the same attribute, say a phone number, which one do you
> pick or does it matter? *If not, then use this skeleton:
>
> SELECT company_name, MAX(phone_nbr), MAX(..), ..
> * FROM RawData
> *GROUP BY company_name;
>
> else find the conflicts:
>
> SELECT company_name AS conflicting_data
> * FROM RawData
> *GROUP BY company_name
> HAVING COALESCE (NULLIF(COUNT (DISTINCT phone_nbr), 1), NULLIF(COUNT
> (DISTINCT ..), 1), .. ) IS NOT NULL ;
>


Makes sense ... perhaps the OP can give us some more background of
how and why info gets into the raw table.

> There are a lot of games we can play with a HAVING clause on the raw
> data. *You might want to get a copy of THINKING IN SETS and look at
> Chapter 10, "Scrubbing Data with Non-1NF Tables" for some more
> ideas.


What is "Thinking in sets"? and who wrote it ... not familiar with
that title.
Reply With Quote
  #7 (permalink)  
Old 03-05-2008
Ruud de Koter
 
Posts: n/a
Default Re: consolidation of multiple rows

hpuxrac wrote:
> On Mar 5, 12:49 pm, --CELKO-- <jcelko...@earthlink.net> wrote:
>> 1) If two of the rows in your raw data staging table have different
>> values for the same attribute, say a phone number, which one do you
>> pick or does it matter? If not, then use this skeleton:
>>
>> SELECT company_name, MAX(phone_nbr), MAX(..), ..
>> FROM RawData
>> GROUP BY company_name;
>>
>> else find the conflicts:
>>
>> SELECT company_name AS conflicting_data
>> FROM RawData
>> GROUP BY company_name
>> HAVING COALESCE (NULLIF(COUNT (DISTINCT phone_nbr), 1), NULLIF(COUNT
>> (DISTINCT ..), 1), .. ) IS NOT NULL ;
>>

>
> Makes sense ... perhaps the OP can give us some more background of
> how and why info gets into the raw table.
>
>> There are a lot of games we can play with a HAVING clause on the raw
>> data. You might want to get a copy of THINKING IN SETS and look at
>> Chapter 10, "Scrubbing Data with Non-1NF Tables" for some more
>> ideas.

>
> What is "Thinking in sets"? and who wrote it ... not familiar with
> that title.


One could hardly be familiar with the title, as it is copyrighted in
2008 (any decent internet bookstore will show you). Quite
unsurprisingly, it is written by Joe Celko. I don't think he has any
reason to be ashamed of some shameless self-promotion...

Keep up the good work, mr. Celko!

Ruud.

Reply With Quote
  #8 (permalink)  
Old 03-05-2008
--CELKO--
 
Posts: n/a
Default Re: consolidation of multiple rows

>> What is "Thinking in sets"? and who wrote it <<

http://www.elsevierdirect.com/produc...=9780123741370

and me.

Reply With Quote
  #9 (permalink)  
Old 03-05-2008
--CELKO--
 
Posts: n/a
Default Re: consolidation of multiple rows

> I don't think he has any reason to be ashamed of some shameless self-promotion... <<

Nor do I! I remember being asked by a priest if I was ever bothered
by evil thoughts. "**Bothered** by them? No."

Reply With Quote
  #10 (permalink)  
Old 03-05-2008
canoe414@telecom-digest.zzn.com
 
Posts: n/a
Default Re: consolidation of multiple rows

> Nor do I! *I remember being asked by a priest if I was ever bothered
> by evil thoughts. *"**Bothered** by them? *No."


<troll>
It's good to go to the experts - I've often wondered if they keep the
really good ones all to themselves.
</troll>

Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 09:19 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0