Bluehost.com Web Hosting $6.95

Duplicate Records

This is a discussion on Duplicate Records within the MySQL Database forums, part of the Database Forums category; On Feb 6, 6:40 am, "Pankaj" <panah...@gmail.com> wrote: > Here's one record ........... ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 02-06-2007
Captain Paralytic
 
Posts: n/a
Default Re: Duplicate Records

On Feb 6, 6:40 am, "Pankaj" <panah...@gmail.com> wrote:
> Here's one record ........... exported this from phpmyadmin so that
> you can simply import in your table
>
> -- phpMyAdmin SQL Dump
> -- version 2.8.2.4
> --http://www.phpmyadmin.net
> --
> -- Server version: 5.0.24
> -- PHP Version: 5.1.6
> --
> -- Database: `temp`
> --
>
> -- --------------------------------------------------------
>
> --
> -- Table structure for table `edx_talent`
> --
>
> CREATE TABLE `edx_talent` (
> `i_talent_id` mediumint(9) NOT NULL auto_increment,
> `s_legal_name` varchar(60) default NULL,
> `s_first_name` varchar(150) default NULL,
> `s_last_name` varchar(150) default NULL,
> `s_middle_name` varchar(150) default NULL,
> `s_long_description` text,
> `s_short_description` varchar(255) default NULL,
> `d_dob` date NOT NULL default '0000-00-00',
> `s_sex` varchar(6) NOT NULL default '',
> `s_address_1` varchar(60) default NULL,
> `s_address_2` varchar(60) default NULL,
> `s_city` varchar(30) default NULL,
> `s_state_province` varchar(30) default NULL,
> `s_zip_route_code` varchar(10) default NULL,
> `s_country` varchar(20) default NULL,
> `s_drivers_license` varchar(15) default NULL,
> `s_drivers_license_state` varchar(5) default NULL,
> `i_view_on_website` int(1) default '0',
> `s_notes` text,
> `s_company` varchar(255) default NULL,
> `i_active` int(1) NOT NULL default '-1',
> PRIMARY KEY (`i_talent_id`),
> KEY `s_first_name` (`s_first_name`),
> KEY `s_last_name` (`s_last_name`),
> KEY `d_dob` (`d_dob`)
> ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
> AUTO_INCREMENT=10 ;
>
> --
> -- Dumping data for table `edx_talent`
> --
>
> INSERT INTO `edx_talent` (`i_talent_id`, `s_legal_name`,
> `s_first_name`, `s_last_name`, `s_middle_name`, `s_long_description`,
> `s_short_description`, `d_dob`, `s_sex`, `s_address_1`, `s_address_2`,
> `s_city`, `s_state_province`, `s_zip_route_code`, `s_country`,
> `s_drivers_license`, `s_drivers_license_state`, `i_view_on_website`,
> `s_notes`, `s_company`, `i_active`) VALUES
> (1224, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
> '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, 0),
> (1225, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
> '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', '', -1),
> (1226, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
> '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, -1);


OK, now I'm not sure about the significance of the i_active criteria
and you'll need to add the join conditions for first name if you want
them but this seems to take you in more like the right direction.

I have written it based on the table you posted rather than the first
query which seems to have a different table name and some different
field names.

Wierd about that missing quote???

Anyway, here it is:

SELECT DISTINCT t1. *
FROM edx_talent t1
JOIN edx_talent t2 ON t2.s_last_name = t1.s_last_name
AND t2.i_talent_id <> t1.i_talent_id

Reply With Quote
  #12 (permalink)  
Old 02-06-2007
strawberry
 
Posts: n/a
Default Re: Duplicate Records

On 6 Feb, 10:26, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:
> On Feb 6, 6:40 am, "Pankaj" <panah...@gmail.com> wrote:
>
>
>
> > Here's one record ........... exported this from phpmyadmin so that
> > you can simply import in your table

>
> > -- phpMyAdmin SQL Dump
> > -- version 2.8.2.4
> > --http://www.phpmyadmin.net
> > --
> > -- Server version: 5.0.24
> > -- PHP Version: 5.1.6
> > --
> > -- Database: `temp`
> > --

>
> > -- --------------------------------------------------------

>
> > --
> > -- Table structure for table `edx_talent`
> > --

>
> > CREATE TABLE `edx_talent` (
> > `i_talent_id` mediumint(9) NOT NULL auto_increment,
> > `s_legal_name` varchar(60) default NULL,
> > `s_first_name` varchar(150) default NULL,
> > `s_last_name` varchar(150) default NULL,
> > `s_middle_name` varchar(150) default NULL,
> > `s_long_description` text,
> > `s_short_description` varchar(255) default NULL,
> > `d_dob` date NOT NULL default '0000-00-00',
> > `s_sex` varchar(6) NOT NULL default '',
> > `s_address_1` varchar(60) default NULL,
> > `s_address_2` varchar(60) default NULL,
> > `s_city` varchar(30) default NULL,
> > `s_state_province` varchar(30) default NULL,
> > `s_zip_route_code` varchar(10) default NULL,
> > `s_country` varchar(20) default NULL,
> > `s_drivers_license` varchar(15) default NULL,
> > `s_drivers_license_state` varchar(5) default NULL,
> > `i_view_on_website` int(1) default '0',
> > `s_notes` text,
> > `s_company` varchar(255) default NULL,
> > `i_active` int(1) NOT NULL default '-1',
> > PRIMARY KEY (`i_talent_id`),
> > KEY `s_first_name` (`s_first_name`),
> > KEY `s_last_name` (`s_last_name`),
> > KEY `d_dob` (`d_dob`)
> > ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
> > AUTO_INCREMENT=10 ;

>
> > --
> > -- Dumping data for table `edx_talent`
> > --

>
> > INSERT INTO `edx_talent` (`i_talent_id`, `s_legal_name`,
> > `s_first_name`, `s_last_name`, `s_middle_name`, `s_long_description`,
> > `s_short_description`, `d_dob`, `s_sex`, `s_address_1`, `s_address_2`,
> > `s_city`, `s_state_province`, `s_zip_route_code`, `s_country`,
> > `s_drivers_license`, `s_drivers_license_state`, `i_view_on_website`,
> > `s_notes`, `s_company`, `i_active`) VALUES
> > (1224, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
> > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, 0),
> > (1225, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
> > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', '', -1),
> > (1226, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
> > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, -1);

>
> OK, now I'm not sure about the significance of the i_active criteria
> and you'll need to add the join conditions for first name if you want
> them but this seems to take you in more like the right direction.
>
> I have written it based on the table you posted rather than the first
> query which seems to have a different table name and some different
> field names.
>
> Wierd about that missing quote???
>
> Anyway, here it is:
>
> SELECT DISTINCT t1. *
> FROM edx_talent t1
> JOIN edx_talent t2 ON t2.s_last_name = t1.s_last_name
> AND t2.i_talent_id <> t1.i_talent_id


You could also add an ORDER BY (to group duplicates together) and a
WHERE clause (to discount NULLs or blanks) - something like this:

SELECT DISTINCT t1. *
FROM edx_talent t1
JOIN edx_talent t2 ON t2.s_first_name = t1.s_first_name
AND t2.s_last_name = t1.s_last_name
AND t2.i_talent_id <> t1.i_talent_id
WHERE s_first_name <> ''
AND s_last_name <> ''
ORDER BY s_first_name, s_last_name,i_talent_id
LIMIT 0 , 30

Reply With Quote
  #13 (permalink)  
Old 02-07-2007
Gordon Burditt
 
Posts: n/a
Default Re: Duplicate Records

>Yes, but if you use primary keys, /per definition/ no duplicate records
>can exist.
>
>But of course, in a person database with let's say one milliard records,
>there may be more than one male holding the name John Smith :-).


In a person database with records of people living on *ONE BLOCK*
of a particular street (which I also lived on, about three dozen
people and two dozen Smiths), there were four John Smiths, three
with the same middle name. All related. Two had the same address,
middle name, and phone number. Expand that to several surrounding
blocks, and there were a lot more John Smiths, along with more first
names that were duplicated in the Smith family.

>In your country you may not have a Central Person Register number
>uniquely defining every citizen and metic, and for other reasons it may
>not be wise to use such a field as the primary key.


Governments may assume that they have not handed out duplicate
numbers. They are probably wrong. In the case of the USA, I believe
the Social Security Administration is on record as saying they don't
attempt to make it unique, and in any case there are a number of
people who bought wallets with sample SSNs which people assumed
were real, and then there's massive identity theft by illegal
immigrants getting jobs, and regular identity theft. And there
have been cases where it's apparent that the SSA *assigned* duplicate
numbers to two different people, not that they were mistaken about
their number.

Oh, yes, nobody says that the identity numbers from country A won't
overlap those of country B.

And in any case, it's a bad idea to refuse a customer because they
have the same identity number as an existing customer (the real one
is likely to be the second one). Also, corporate entities, trusts,
etc. may not have identity numbers of the same kind but they might
want to be your customers also.

>Then just define one
>column or a set of columns as being 'unique'.


When in doubt, issue your own account numbers. Possibly you do this
with an auto_increment field. If the account numbers can be used to
spend money, more security is needed (guessing a valid account number
shouldn't be easy).

>Perhaps I should add that I'm more used to Oracle than to MySQL.


The issue of what to make a unique key is one of database design that
is essentially the same in Oracle and MySQL.

Reply With Quote
  #14 (permalink)  
Old 02-08-2007
Per Rønne
 
Posts: n/a
Default Re: Duplicate Records

Gordon Burditt <gordonb.l97u0@burditt.org> wrote:
> Per Rønne <per@RQNNE.invalid> wrote:


> >Yes, but if you use primary keys, /per definition/ no duplicate records
> >can exist.
> >
> >But of course, in a person database with let's say one milliard records,
> >there may be more than one male holding the name John Smith :-).


> >In your country you may not have a Central Person Register number
> >uniquely defining every citizen and metic, and for other reasons it may
> >not be wise to use such a field as the primary key.

>
> Governments may assume that they have not handed out duplicate
> numbers. They are probably wrong.


For sure not in Denmark. Everybody is given a unique CPR number when
being registered at birth or when being allowed permanent residence in
the country.

<http://www.workindenmark.dk/CPR>

> In the case of the USA, I believe
> the Social Security Administration is on record as saying they don't
> attempt to make it unique, and in any case there are a number of
> people who bought wallets with sample SSNs which people assumed
> were real, and then there's massive identity theft by illegal
> immigrants getting jobs, and regular identity theft. And there
> have been cases where it's apparent that the SSA *assigned* duplicate
> numbers to two different people, not that they were mistaken about
> their number.
>
> Oh, yes, nobody says that the identity numbers from country A won't
> overlap those of country B.


Certainly not - we would have to define universal person numbers first -
through ISO, I think. Like:

yyyymmdd-cccssssg

y: year.
m: month.
d: day.
c: country.
s: sequential number.
g: gender.

> And in any case, it's a bad idea to refuse a customer because they
> have the same identity number as an existing customer (the real one
> is likely to be the second one). Also, corporate entities, trusts,
> etc. may not have identity numbers of the same kind but they might
> want to be your customers also.


Use their cell phone number, then ...

> >Then just define one column or a set of columns as being 'unique'.

>
> When in doubt, issue your own account numbers. Possibly you do this
> with an auto_increment field. If the account numbers can be used to
> spend money, more security is needed (guessing a valid account number
> shouldn't be easy).


Yes.

> >Perhaps I should add that I'm more used to Oracle than to MySQL.

>
> The issue of what to make a unique key is one of database design that
> is essentially the same in Oracle and MySQL.


Yes, but how it is done in different databases is different.
--
Per Erik Rønne
http://www.RQNNE.dk
Reply With Quote
  #15 (permalink)  
Old 02-08-2007
Captain Paralytic
 
Posts: n/a
Default Re: Duplicate Records

On 8 Feb, 10:15, p...@RQNNE.invalid (Per Rønne) wrote:
> > And in any case, it's a bad idea to refuse a customer because they
> > have the same identity number as an existing customer (the real one
> > is likely to be the second one). Also, corporate entities, trusts,
> > etc. may not have identity numbers of the same kind but they might
> > want to be your customers also.

>
> Use their cell phone number, then ...


Yes certainly you can buy that...
Please give me your cell phone number.
What! You don't have a cell phone!
Sorry, we only do business with cell phone owners.

Reply With Quote
  #16 (permalink)  
Old 02-08-2007
Per Rønne
 
Posts: n/a
Default Re: Duplicate Records

Captain Paralytic <paul_lautman@yahoo.com> wrote:

> On 8 Feb, 10:15, p...@RQNNE.invalid (Per Rønne) wrote:
> > > And in any case, it's a bad idea to refuse a customer because they
> > > have the same identity number as an existing customer (the real one
> > > is likely to be the second one). Also, corporate entities, trusts,
> > > etc. may not have identity numbers of the same kind but they might
> > > want to be your customers also.

> >
> > Use their cell phone number, then ...

>
> Yes certainly you can buy that...
> Please give me your cell phone number.
> What! You don't have a cell phone!
> Sorry, we only do business with cell phone owners.


Of course they should use a sequence to generate primary keys. But also,
it should be possible to search on other fields, and have such fields
indexed.
--
Per Erik Rønne
http://www.RQNNE.dk
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 03:06 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0