This is a discussion on Duplicate Records within the MySQL Database forums, part of the Database Forums category; On Feb 6, 6:40 am, "Pankaj" <panah...@gmail.com> wrote: > Here's one record ........... ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
On Feb 6, 6:40 am, "Pankaj" <panah...@gmail.com> wrote:
> Here's one record ........... exported this from phpmyadmin so that > you can simply import in your table > > -- phpMyAdmin SQL Dump > -- version 2.8.2.4 > --http://www.phpmyadmin.net > -- > -- Server version: 5.0.24 > -- PHP Version: 5.1.6 > -- > -- Database: `temp` > -- > > -- -------------------------------------------------------- > > -- > -- Table structure for table `edx_talent` > -- > > CREATE TABLE `edx_talent` ( > `i_talent_id` mediumint(9) NOT NULL auto_increment, > `s_legal_name` varchar(60) default NULL, > `s_first_name` varchar(150) default NULL, > `s_last_name` varchar(150) default NULL, > `s_middle_name` varchar(150) default NULL, > `s_long_description` text, > `s_short_description` varchar(255) default NULL, > `d_dob` date NOT NULL default '0000-00-00', > `s_sex` varchar(6) NOT NULL default '', > `s_address_1` varchar(60) default NULL, > `s_address_2` varchar(60) default NULL, > `s_city` varchar(30) default NULL, > `s_state_province` varchar(30) default NULL, > `s_zip_route_code` varchar(10) default NULL, > `s_country` varchar(20) default NULL, > `s_drivers_license` varchar(15) default NULL, > `s_drivers_license_state` varchar(5) default NULL, > `i_view_on_website` int(1) default '0', > `s_notes` text, > `s_company` varchar(255) default NULL, > `i_active` int(1) NOT NULL default '-1', > PRIMARY KEY (`i_talent_id`), > KEY `s_first_name` (`s_first_name`), > KEY `s_last_name` (`s_last_name`), > KEY `d_dob` (`d_dob`) > ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1 > AUTO_INCREMENT=10 ; > > -- > -- Dumping data for table `edx_talent` > -- > > INSERT INTO `edx_talent` (`i_talent_id`, `s_legal_name`, > `s_first_name`, `s_last_name`, `s_middle_name`, `s_long_description`, > `s_short_description`, `d_dob`, `s_sex`, `s_address_1`, `s_address_2`, > `s_city`, `s_state_province`, `s_zip_route_code`, `s_country`, > `s_drivers_license`, `s_drivers_license_state`, `i_view_on_website`, > `s_notes`, `s_company`, `i_active`) VALUES > (1224, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, 0), > (1225, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', '', -1), > (1226, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, -1); OK, now I'm not sure about the significance of the i_active criteria and you'll need to add the join conditions for first name if you want them but this seems to take you in more like the right direction. I have written it based on the table you posted rather than the first query which seems to have a different table name and some different field names. Wierd about that missing quote??? Anyway, here it is: SELECT DISTINCT t1. * FROM edx_talent t1 JOIN edx_talent t2 ON t2.s_last_name = t1.s_last_name AND t2.i_talent_id <> t1.i_talent_id |
|
|||
|
On 6 Feb, 10:26, "Captain Paralytic" <paul_laut...@yahoo.com> wrote:
> On Feb 6, 6:40 am, "Pankaj" <panah...@gmail.com> wrote: > > > > > Here's one record ........... exported this from phpmyadmin so that > > you can simply import in your table > > > -- phpMyAdmin SQL Dump > > -- version 2.8.2.4 > > --http://www.phpmyadmin.net > > -- > > -- Server version: 5.0.24 > > -- PHP Version: 5.1.6 > > -- > > -- Database: `temp` > > -- > > > -- -------------------------------------------------------- > > > -- > > -- Table structure for table `edx_talent` > > -- > > > CREATE TABLE `edx_talent` ( > > `i_talent_id` mediumint(9) NOT NULL auto_increment, > > `s_legal_name` varchar(60) default NULL, > > `s_first_name` varchar(150) default NULL, > > `s_last_name` varchar(150) default NULL, > > `s_middle_name` varchar(150) default NULL, > > `s_long_description` text, > > `s_short_description` varchar(255) default NULL, > > `d_dob` date NOT NULL default '0000-00-00', > > `s_sex` varchar(6) NOT NULL default '', > > `s_address_1` varchar(60) default NULL, > > `s_address_2` varchar(60) default NULL, > > `s_city` varchar(30) default NULL, > > `s_state_province` varchar(30) default NULL, > > `s_zip_route_code` varchar(10) default NULL, > > `s_country` varchar(20) default NULL, > > `s_drivers_license` varchar(15) default NULL, > > `s_drivers_license_state` varchar(5) default NULL, > > `i_view_on_website` int(1) default '0', > > `s_notes` text, > > `s_company` varchar(255) default NULL, > > `i_active` int(1) NOT NULL default '-1', > > PRIMARY KEY (`i_talent_id`), > > KEY `s_first_name` (`s_first_name`), > > KEY `s_last_name` (`s_last_name`), > > KEY `d_dob` (`d_dob`) > > ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1 > > AUTO_INCREMENT=10 ; > > > -- > > -- Dumping data for table `edx_talent` > > -- > > > INSERT INTO `edx_talent` (`i_talent_id`, `s_legal_name`, > > `s_first_name`, `s_last_name`, `s_middle_name`, `s_long_description`, > > `s_short_description`, `d_dob`, `s_sex`, `s_address_1`, `s_address_2`, > > `s_city`, `s_state_province`, `s_zip_route_code`, `s_country`, > > `s_drivers_license`, `s_drivers_license_state`, `i_view_on_website`, > > `s_notes`, `s_company`, `i_active`) VALUES > > (1224, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', > > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, 0), > > (1225, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', > > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', '', -1), > > (1226, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '', > > '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, -1); > > OK, now I'm not sure about the significance of the i_active criteria > and you'll need to add the join conditions for first name if you want > them but this seems to take you in more like the right direction. > > I have written it based on the table you posted rather than the first > query which seems to have a different table name and some different > field names. > > Wierd about that missing quote??? > > Anyway, here it is: > > SELECT DISTINCT t1. * > FROM edx_talent t1 > JOIN edx_talent t2 ON t2.s_last_name = t1.s_last_name > AND t2.i_talent_id <> t1.i_talent_id You could also add an ORDER BY (to group duplicates together) and a WHERE clause (to discount NULLs or blanks) - something like this: SELECT DISTINCT t1. * FROM edx_talent t1 JOIN edx_talent t2 ON t2.s_first_name = t1.s_first_name AND t2.s_last_name = t1.s_last_name AND t2.i_talent_id <> t1.i_talent_id WHERE s_first_name <> '' AND s_last_name <> '' ORDER BY s_first_name, s_last_name,i_talent_id LIMIT 0 , 30 |
|
|||
|
>Yes, but if you use primary keys, /per definition/ no duplicate records
>can exist. > >But of course, in a person database with let's say one milliard records, >there may be more than one male holding the name John Smith :-). In a person database with records of people living on *ONE BLOCK* of a particular street (which I also lived on, about three dozen people and two dozen Smiths), there were four John Smiths, three with the same middle name. All related. Two had the same address, middle name, and phone number. Expand that to several surrounding blocks, and there were a lot more John Smiths, along with more first names that were duplicated in the Smith family. >In your country you may not have a Central Person Register number >uniquely defining every citizen and metic, and for other reasons it may >not be wise to use such a field as the primary key. Governments may assume that they have not handed out duplicate numbers. They are probably wrong. In the case of the USA, I believe the Social Security Administration is on record as saying they don't attempt to make it unique, and in any case there are a number of people who bought wallets with sample SSNs which people assumed were real, and then there's massive identity theft by illegal immigrants getting jobs, and regular identity theft. And there have been cases where it's apparent that the SSA *assigned* duplicate numbers to two different people, not that they were mistaken about their number. Oh, yes, nobody says that the identity numbers from country A won't overlap those of country B. And in any case, it's a bad idea to refuse a customer because they have the same identity number as an existing customer (the real one is likely to be the second one). Also, corporate entities, trusts, etc. may not have identity numbers of the same kind but they might want to be your customers also. >Then just define one >column or a set of columns as being 'unique'. When in doubt, issue your own account numbers. Possibly you do this with an auto_increment field. If the account numbers can be used to spend money, more security is needed (guessing a valid account number shouldn't be easy). >Perhaps I should add that I'm more used to Oracle than to MySQL. The issue of what to make a unique key is one of database design that is essentially the same in Oracle and MySQL. |
|
|||
|
Gordon Burditt <gordonb.l97u0@burditt.org> wrote:
> Per Rønne <per@RQNNE.invalid> wrote: > >Yes, but if you use primary keys, /per definition/ no duplicate records > >can exist. > > > >But of course, in a person database with let's say one milliard records, > >there may be more than one male holding the name John Smith :-). > >In your country you may not have a Central Person Register number > >uniquely defining every citizen and metic, and for other reasons it may > >not be wise to use such a field as the primary key. > > Governments may assume that they have not handed out duplicate > numbers. They are probably wrong. For sure not in Denmark. Everybody is given a unique CPR number when being registered at birth or when being allowed permanent residence in the country. <http://www.workindenmark.dk/CPR> > In the case of the USA, I believe > the Social Security Administration is on record as saying they don't > attempt to make it unique, and in any case there are a number of > people who bought wallets with sample SSNs which people assumed > were real, and then there's massive identity theft by illegal > immigrants getting jobs, and regular identity theft. And there > have been cases where it's apparent that the SSA *assigned* duplicate > numbers to two different people, not that they were mistaken about > their number. > > Oh, yes, nobody says that the identity numbers from country A won't > overlap those of country B. Certainly not - we would have to define universal person numbers first - through ISO, I think. Like: yyyymmdd-cccssssg y: year. m: month. d: day. c: country. s: sequential number. g: gender. > And in any case, it's a bad idea to refuse a customer because they > have the same identity number as an existing customer (the real one > is likely to be the second one). Also, corporate entities, trusts, > etc. may not have identity numbers of the same kind but they might > want to be your customers also. Use their cell phone number, then ... > >Then just define one column or a set of columns as being 'unique'. > > When in doubt, issue your own account numbers. Possibly you do this > with an auto_increment field. If the account numbers can be used to > spend money, more security is needed (guessing a valid account number > shouldn't be easy). Yes. > >Perhaps I should add that I'm more used to Oracle than to MySQL. > > The issue of what to make a unique key is one of database design that > is essentially the same in Oracle and MySQL. Yes, but how it is done in different databases is different. -- Per Erik Rønne http://www.RQNNE.dk |
|
|||
|
On 8 Feb, 10:15, p...@RQNNE.invalid (Per Rønne) wrote:
> > And in any case, it's a bad idea to refuse a customer because they > > have the same identity number as an existing customer (the real one > > is likely to be the second one). Also, corporate entities, trusts, > > etc. may not have identity numbers of the same kind but they might > > want to be your customers also. > > Use their cell phone number, then ... Yes certainly you can buy that... Please give me your cell phone number. What! You don't have a cell phone! Sorry, we only do business with cell phone owners. |
|
|||
|
Captain Paralytic <paul_lautman@yahoo.com> wrote:
> On 8 Feb, 10:15, p...@RQNNE.invalid (Per Rønne) wrote: > > > And in any case, it's a bad idea to refuse a customer because they > > > have the same identity number as an existing customer (the real one > > > is likely to be the second one). Also, corporate entities, trusts, > > > etc. may not have identity numbers of the same kind but they might > > > want to be your customers also. > > > > Use their cell phone number, then ... > > Yes certainly you can buy that... > Please give me your cell phone number. > What! You don't have a cell phone! > Sorry, we only do business with cell phone owners. Of course they should use a sequence to generate primary keys. But also, it should be possible to search on other fields, and have such fields indexed. -- Per Erik Rønne http://www.RQNNE.dk |