Bluehost.com Web Hosting $6.95

Duplicate Records

This is a discussion on Duplicate Records within the MySQL Database forums, part of the Database Forums category; I am using this query to find duplicate records (first_name, last_name and i_active) . However, this query takes a lot of ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-05-2007
Pankaj
 
Posts: n/a
Default Duplicate Records

I am using this query to find duplicate records (first_name, last_name
and i_active) . However, this query takes a lot of time to execute
when executed for the first time. When I execute it for the second
time, it is very fast.What could be the reason ? and can anyone
recommend a way to optimise this?

SELECT * FROM talent WHERE (((talent.first_name) In (SELECT first_name
FROM talent As Tmp GROUP BY first_name,last_name,i_active HAVING
Count(*)>1 And last_name = talent.last_name and i_active != '0')))
ORDER BY talent.first_name, talent.last_name

TIA

Reply With Quote
  #2 (permalink)  
Old 02-05-2007
strawberry
 
Posts: n/a
Default Re: Duplicate Records

On Feb 5, 8:39 am, "Pankaj" <panah...@gmail.com> wrote:
> I am using this query to find duplicate records (first_name, last_name
> and i_active) . However, this query takes a lot of time to execute
> when executed for the first time. When I execute it for the second
> time, it is very fast.What could be the reason ? and can anyone
> recommend a way to optimise this?
>
> SELECT * FROM talent WHERE (((talent.first_name) In (SELECT first_name
> FROM talent As Tmp GROUP BY first_name,last_name,i_active HAVING
> Count(*)>1 And last_name = talent.last_name and i_active != '0')))
> ORDER BY talent.first_name, talent.last_name
>
> TIA



The query is being cached - which is why it's quick the second time.

Use a join instead of a sub select. It will be much faster. There are
many examples of using joins in this way within these NGs.

Reply With Quote
  #3 (permalink)  
Old 02-05-2007
Pankaj
 
Posts: n/a
Default Re: Duplicate Records

> The query is being cached - which is why it's quick the second time.
>
> Use a join instead of a sub select. It will be much faster. There are
> many examples of using joins in this way within these NGs.


Thanks.... I use this query and the results are already faster

SELECT *
FROM talent as e1
LEFT JOIN talent AS e2
ON (e1.s_first_name=e2.s_first_name and e1.s_last_name=e2.s_last_name
and e1.i_active=e2.i_active)
WHERE (e2.i_talent_id IS NOT NULL)
GROUP BY e1.s_first_name,e1.s_last_name,e1.i_active
HAVING COUNT(*) > 1

but this displays the results only once. I want to view the duplicates
as well. Any suggestions ?

Reply With Quote
  #4 (permalink)  
Old 02-05-2007
strawberry
 
Posts: n/a
Default Re: Duplicate Records

On 5 Feb, 11:59, "Pankaj" <panah...@gmail.com> wrote:
> > The query is being cached - which is why it's quick the second time.

>
> > Use a join instead of a sub select. It will be much faster. There are
> > many examples of using joins in this way within these NGs.

>
> Thanks.... I use this query and the results are already faster
>
> SELECT *
> FROM talent as e1
> LEFT JOIN talent AS e2
> ON (e1.s_first_name=e2.s_first_name and e1.s_last_name=e2.s_last_name
> and e1.i_active=e2.i_active)
> WHERE (e2.i_talent_id IS NOT NULL)
> GROUP BY e1.s_first_name,e1.s_last_name,e1.i_active
> HAVING COUNT(*) > 1
>
> but this displays the results only once. I want to view the duplicates
> as well. Any suggestions ?


What happens if you leave out the GROUP BY clause?

Reply With Quote
  #5 (permalink)  
Old 02-05-2007
Captain Paralytic
 
Posts: n/a
Default Re: Duplicate Records

On 5 Feb, 11:59, "Pankaj" <panah...@gmail.com> wrote:
> > The query is being cached - which is why it's quick the second time.

>
> > Use a join instead of a sub select. It will be much faster. There are
> > many examples of using joins in this way within these NGs.

>
> Thanks.... I use this query and the results are already faster
>
> SELECT *
> FROM talent as e1
> LEFT JOIN talent AS e2
> ON (e1.s_first_name=e2.s_first_name and e1.s_last_name=e2.s_last_name
> and e1.i_active=e2.i_active)
> WHERE (e2.i_talent_id IS NOT NULL)
> GROUP BY e1.s_first_name,e1.s_last_name,e1.i_active
> HAVING COUNT(*) > 1
>
> but this displays the results only once. I want to view the duplicates
> as well. Any suggestions ?


Could you supply an exort of the structure and some data for us to
play with.

Reply With Quote
  #6 (permalink)  
Old 02-05-2007
Per Rønne
 
Posts: n/a
Default Re: Duplicate Records

Pankaj <panahuja@gmail.com> wrote:

> I am using this query to find duplicate records (first_name, last_name
> and i_active).


Make sure to use a primary key. In this way duplicate records will never
occur.
--
Per Erik Rønne
http://www.RQNNE.dk
Reply With Quote
  #7 (permalink)  
Old 02-06-2007
Pankaj
 
Posts: n/a
Default Re: Duplicate Records

On Feb 6, 3:42 am, p...@RQNNE.invalid (Per Rønne) wrote:
> Pankaj <panah...@gmail.com> wrote:
> > I am using this query to find duplicate records (first_name, last_name
> > and i_active).

>
> Make sure to use a primary key. In this way duplicate records will never
> occur.
> --
> Per Erik Rønnehttp://www.RQNNE.dk


I am using primary keys. but some fields like first_name and last_name
can be duplicate :-)

Reply With Quote
  #8 (permalink)  
Old 02-06-2007
Pankaj
 
Posts: n/a
Default Re: Duplicate Records

Here's one record ........... exported this from phpmyadmin so that
you can simply import in your table

-- phpMyAdmin SQL Dump
-- version 2.8.2.4
-- http://www.phpmyadmin.net
--
-- Server version: 5.0.24
-- PHP Version: 5.1.6
--
-- Database: `temp`
--

-- --------------------------------------------------------

--
-- Table structure for table `edx_talent`
--

CREATE TABLE `edx_talent` (
`i_talent_id` mediumint(9) NOT NULL auto_increment,
`s_legal_name` varchar(60) default NULL,
`s_first_name` varchar(150) default NULL,
`s_last_name` varchar(150) default NULL,
`s_middle_name` varchar(150) default NULL,
`s_long_description` text,
`s_short_description` varchar(255) default NULL,
`d_dob` date NOT NULL default '0000-00-00',
`s_sex` varchar(6) NOT NULL default '',
`s_address_1` varchar(60) default NULL,
`s_address_2` varchar(60) default NULL,
`s_city` varchar(30) default NULL,
`s_state_province` varchar(30) default NULL,
`s_zip_route_code` varchar(10) default NULL,
`s_country` varchar(20) default NULL,
`s_drivers_license` varchar(15) default NULL,
`s_drivers_license_state` varchar(5) default NULL,
`i_view_on_website` int(1) default '0',
`s_notes` text,
`s_company` varchar(255) default NULL,
`i_active` int(1) NOT NULL default '-1',
PRIMARY KEY (`i_talent_id`),
KEY `s_first_name` (`s_first_name`),
KEY `s_last_name` (`s_last_name`),
KEY `d_dob` (`d_dob`)
) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
AUTO_INCREMENT=10 ;

--
-- Dumping data for table `edx_talent`
--

INSERT INTO `edx_talent` (`i_talent_id`, `s_legal_name`,
`s_first_name`, `s_last_name`, `s_middle_name`, `s_long_description`,
`s_short_description`, `d_dob`, `s_sex`, `s_address_1`, `s_address_2`,
`s_city`, `s_state_province`, `s_zip_route_code`, `s_country`,
`s_drivers_license`, `s_drivers_license_state`, `i_view_on_website`,
`s_notes`, `s_company`, `i_active`) VALUES
(1224, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
'', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, 0),
(1225, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
'', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', '', -1),
(1226, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
'', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, -1);


Reply With Quote
  #9 (permalink)  
Old 02-06-2007
Per Rønne
 
Posts: n/a
Default Re: Duplicate Records

Pankaj <panahuja@gmail.com> wrote:

> On Feb 6, 3:42 am, p...@RQNNE.invalid (Per Rønne) wrote:
> > Pankaj <panah...@gmail.com> wrote:
> > > I am using this query to find duplicate records (first_name, last_name
> > > and i_active).

> >
> > Make sure to use a primary key. In this way duplicate records will never
> > occur.


> I am using primary keys. but some fields like first_name and last_name
> can be duplicate :-)


Yes, but if you use primary keys, /per definition/ no duplicate records
can exist.

But of course, in a person database with let's say one milliard records,
there may be more than one male holding the name John Smith :-).

In your country you may not have a Central Person Register number
uniquely defining every citizen and metic, and for other reasons it may
not be wise to use such a field as the primary key. Then just define one
column or a set of columns as being 'unique'.

Perhaps I should add that I'm more used to Oracle than to MySQL.
--
Per Erik Rønne
http://www.RQNNE.dk
Reply With Quote
  #10 (permalink)  
Old 02-06-2007
Captain Paralytic
 
Posts: n/a
Default Re: Duplicate Records

On Feb 6, 6:40 am, "Pankaj" <panah...@gmail.com> wrote:
> Here's one record ........... exported this from phpmyadmin so that
> you can simply import in your table
>
> -- phpMyAdmin SQL Dump
> -- version 2.8.2.4
> --http://www.phpmyadmin.net
> --
> -- Server version: 5.0.24
> -- PHP Version: 5.1.6
> --
> -- Database: `temp`
> --
>
> -- --------------------------------------------------------
>
> --
> -- Table structure for table `edx_talent`
> --
>
> CREATE TABLE `edx_talent` (
> `i_talent_id` mediumint(9) NOT NULL auto_increment,
> `s_legal_name` varchar(60) default NULL,
> `s_first_name` varchar(150) default NULL,
> `s_last_name` varchar(150) default NULL,
> `s_middle_name` varchar(150) default NULL,
> `s_long_description` text,
> `s_short_description` varchar(255) default NULL,
> `d_dob` date NOT NULL default '0000-00-00',
> `s_sex` varchar(6) NOT NULL default '',
> `s_address_1` varchar(60) default NULL,
> `s_address_2` varchar(60) default NULL,
> `s_city` varchar(30) default NULL,
> `s_state_province` varchar(30) default NULL,
> `s_zip_route_code` varchar(10) default NULL,
> `s_country` varchar(20) default NULL,
> `s_drivers_license` varchar(15) default NULL,
> `s_drivers_license_state` varchar(5) default NULL,
> `i_view_on_website` int(1) default '0',
> `s_notes` text,
> `s_company` varchar(255) default NULL,
> `i_active` int(1) NOT NULL default '-1',
> PRIMARY KEY (`i_talent_id`),
> KEY `s_first_name` (`s_first_name`),
> KEY `s_last_name` (`s_last_name`),
> KEY `d_dob` (`d_dob`)
> ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
> AUTO_INCREMENT=10 ;
>
> --
> -- Dumping data for table `edx_talent`
> --
>
> INSERT INTO `edx_talent` (`i_talent_id`, `s_legal_name`,
> `s_first_name`, `s_last_name`, `s_middle_name`, `s_long_description`,
> `s_short_description`, `d_dob`, `s_sex`, `s_address_1`, `s_address_2`,
> `s_city`, `s_state_province`, `s_zip_route_code`, `s_country`,
> `s_drivers_license`, `s_drivers_license_state`, `i_view_on_website`,
> `s_notes`, `s_company`, `i_active`) VALUES
> (1224, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
> '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, 0),
> (1225, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
> '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', '', -1),
> (1226, NULL, 'J, 'Antone', '', '', '', '1971-02-19', 'male', '', '',
> '', 'CA', '12345', 'US', 'A12345', 'CA', 0, '', NULL, -1);


Interesting, the Js are all missing their closing quote.

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 03:11 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0