nth Occurence of String in Set

This is a discussion on nth Occurence of String in Set within the MySQL Database forums, part of the Database Forums category; Hi all, I'm afraid I need some help with a string manipulation function in mysql. Suppose I have a ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 1 Week Ago
cvh@LE
 
Posts: n/a
Default nth Occurence of String in Set

Hi all,

I'm afraid I need some help with a string manipulation function in
mysql.

Suppose I have a column which contains a comma-separated set of unique
strings
e.g.

a,aab,b,bc,de
c,de,dd,dfr
xo,ff,fr

I would like to retrieve the nth element of a set in the same simple
way that find_in_set allows me retrieve the index of an element within
the list/set
e.g.
mysql> SELECT FIND_IN_SET('c','a,b,c,d');
+----------------------------+
| FIND_IN_SET('c','a,b,c,d') |
+----------------------------+
| 3 |
+----------------------------+
1 row in set (0.00 sec)

what I want is a "reverse find_in_set lookup"
like
mysql>SELECT GETELEMENT_AT_INDEX_IN_SET(3.'a,b,c,d');
....
answer would be
+------------------------------------------------------+
| GETELEMENT_AT_INDEX_IN_SET(3.'a,b,c,d');|
+-------------------------------------------------------+
| c |
+----------------------------+
1 row in set (0.00 sec)

(thats just made up)

I ' ve found away to achieve this by using various string function but
I'm quite sure there must be a better and more effective and simpler
way:

mysql> SELECT MID('a,b,c,d',
length(substring_index('a,b,c,d',",",3)),
length(substring_index('a,b,c,d',",",4))-
length(substring_index('a,b,c,d',",",3))-1
) complicated;
+-------------+
| complicated |
+-------------+
| c |
+-------------+
1 row in set (0.00 sec)

I know I could store that into a function and have my own function to
accomplish this rather nicely but isn't there a built in function to
achieve this?

Any hints or tips or reference I didnt find?

TIA

Chris



Reply With Quote
  #2 (permalink)  
Old 1 Week Ago
Jerry Stuckle
 
Posts: n/a
Default Re: nth Occurence of String in Set

cvh@LE wrote:
> Hi all,
>
> I'm afraid I need some help with a string manipulation function in
> mysql.
>
> Suppose I have a column which contains a comma-separated set of unique
> strings
> e.g.
>
> a,aab,b,bc,de
> c,de,dd,dfr
> xo,ff,fr
>
> I would like to retrieve the nth element of a set in the same simple
> way that find_in_set allows me retrieve the index of an element within
> the list/set
> e.g.
> mysql> SELECT FIND_IN_SET('c','a,b,c,d');
> +----------------------------+
> | FIND_IN_SET('c','a,b,c,d') |
> +----------------------------+
> | 3 |
> +----------------------------+
> 1 row in set (0.00 sec)
>
> what I want is a "reverse find_in_set lookup"
> like
> mysql>SELECT GETELEMENT_AT_INDEX_IN_SET(3.'a,b,c,d');
> ...
> answer would be
> +------------------------------------------------------+
> | GETELEMENT_AT_INDEX_IN_SET(3.'a,b,c,d');|
> +-------------------------------------------------------+
> | c |
> +----------------------------+
> 1 row in set (0.00 sec)
>
> (thats just made up)
>
> I ' ve found away to achieve this by using various string function but
> I'm quite sure there must be a better and more effective and simpler
> way:
>
> mysql> SELECT MID('a,b,c,d',
> length(substring_index('a,b,c,d',",",3)),
> length(substring_index('a,b,c,d',",",4))-
> length(substring_index('a,b,c,d',",",3))-1
> ) complicated;
> +-------------+
> | complicated |
> +-------------+
> | c |
> +-------------+
> 1 row in set (0.00 sec)
>
> I know I could store that into a function and have my own function to
> accomplish this rather nicely but isn't there a built in function to
> achieve this?
>
> Any hints or tips or reference I didnt find?
>
> TIA
>
> Chris
>
>
>
>


Normalize your database.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Reply With Quote
  #3 (permalink)  
Old 1 Week Ago
toby
 
Posts: n/a
Default Re: nth Occurence of String in Set

On May 5, 3:17 pm, "cvh@LE" <christian.han...@cpi-service.com> wrote:
> Hi all,
> ...
> Any hints or tips or reference I didnt find?


RTFM would reveal SUBSTRING_INDEX():
http://dev.mysql.com/doc/refman/5.0/...ubstring-index

> TIA
>
> Chris


Reply With Quote
  #4 (permalink)  
Old 1 Week Ago
cvh@LE
 
Posts: n/a
Default Re: nth Occurence of String in Set

On May 5, 7:13*pm, toby <t...@telegraphics.com.au> wrote:
> On May 5, 3:17 pm, "cvh@LE" <christian.han...@cpi-service.com> wrote:
>
> > Hi all,
> > ...
> > Any hints or tips or reference I didnt find?

>
> RTFM would reveal SUBSTRING_INDEX():http://dev.mysql.com/doc/refman/5.0/....html#function...
>
> > TIA

>
> > Chris


RTFM??!? If you had read what I 've written you would have realised
that I am well aware of substring_index:

>mysql> SELECT MID('a,b,c,d',
> length(substring_index('a,b,c,d',",",3)),
> length(substring_index('a,b,c,d',",",4))-
>length(substring_index('a,b,c,d',",",3))-1
> ) complicated;


It rather seems to me that you dont know what a SET is! So RTFM to
you. If you dont have valueble input than just leave it to others to
give advice.

substring index is helpful function I do use to achieve the goal of
returning a string

As said , I solved the problem but am wondering if there does exist a
built-in function like FIND_IN_SET which takes a index as parameter
and not a string-element. I created my own function as follows :

CREATE FUNCTION element_in_set
(mystring varchar(4096),sep varchar(25), myindex int(4))
RETURNS varchar(4096) DETERMINISTIC
BEGIN
RETURN
replace(substr(mystring,length(substring_index(mys tring,sep,myindex))
+1,length(substring_index(mystring,sep,myindex+1))-
length(substring_index(mystring,sep,myindex))),sep ,"");
END
$$




mysql> -- Select the 5th (index=4) element from a space separated set
mysql> select element_in_set('So this is a sample set separated by
spaces'," ",4);
+---------------------------------------------------------------------
+
| element_in_set('So this is a sample set separated by spaces'," ",4)
|
+---------------------------------------------------------------------
+
| sample
|
+---------------------------------------------------------------------
+
1 row in set (0.00 sec)

mysql> -- finding the 7th ( index=6) element in aa comma separated set
mysql> select element_in_set('this,is,a,comma,separated,set,with ,
9,elements',",",6);
+-----------------------------------------------------------------------
+
| element_in_set('this,is,a,comma,separated,set,with ,9,elements',",",
6) |
+-----------------------------------------------------------------------
+
|
with
|
+-----------------------------------------------------------------------
+
1 row in set (0.00 sec)


So is there a built-in function?

@Jerry; Unfortunately this is not an option in this case. The database
is to be taken as is, but I am afraid I cant provide more information
on this. So just trust me on this.
Reply With Quote
  #5 (permalink)  
Old 1 Week Ago
Jerry Stuckle
 
Posts: n/a
Default Re: nth Occurence of String in Set

cvh@LE wrote:
> On May 5, 7:13 pm, toby <t...@telegraphics.com.au> wrote:
>> On May 5, 3:17 pm, "cvh@LE" <christian.han...@cpi-service.com> wrote:
>>
>>> Hi all,
>>> ...
>>> Any hints or tips or reference I didnt find?

>> RTFM would reveal SUBSTRING_INDEX():http://dev.mysql.com/doc/refman/5.0/....html#function...
>>
>>> TIA
>>> Chris

>
> RTFM??!? If you had read what I 've written you would have realised
> that I am well aware of substring_index:
>
>> mysql> SELECT MID('a,b,c,d',
>> length(substring_index('a,b,c,d',",",3)),
>> length(substring_index('a,b,c,d',",",4))-
>> length(substring_index('a,b,c,d',",",3))-1
>> ) complicated;

>
> It rather seems to me that you dont know what a SET is! So RTFM to
> you. If you dont have valueble input than just leave it to others to
> give advice.
>
> substring index is helpful function I do use to achieve the goal of
> returning a string
>
> As said , I solved the problem but am wondering if there does exist a
> built-in function like FIND_IN_SET which takes a index as parameter
> and not a string-element. I created my own function as follows :
>
> CREATE FUNCTION element_in_set
> (mystring varchar(4096),sep varchar(25), myindex int(4))
> RETURNS varchar(4096) DETERMINISTIC
> BEGIN
> RETURN
> replace(substr(mystring,length(substring_index(mys tring,sep,myindex))
> +1,length(substring_index(mystring,sep,myindex+1))-
> length(substring_index(mystring,sep,myindex))),sep ,"");
> END
> $$
>
>
>
>
> mysql> -- Select the 5th (index=4) element from a space separated set
> mysql> select element_in_set('So this is a sample set separated by
> spaces'," ",4);
> +---------------------------------------------------------------------
> +
> | element_in_set('So this is a sample set separated by spaces'," ",4)
> |
> +---------------------------------------------------------------------
> +
> | sample
> |
> +---------------------------------------------------------------------
> +
> 1 row in set (0.00 sec)
>
> mysql> -- finding the 7th ( index=6) element in aa comma separated set
> mysql> select element_in_set('this,is,a,comma,separated,set,with ,
> 9,elements',",",6);
> +-----------------------------------------------------------------------
> +
> | element_in_set('this,is,a,comma,separated,set,with ,9,elements',",",
> 6) |
> +-----------------------------------------------------------------------
> +
> |
> with
> |
> +-----------------------------------------------------------------------
> +
> 1 row in set (0.00 sec)
>
>
> So is there a built-in function?
>
> @Jerry; Unfortunately this is not an option in this case. The database
> is to be taken as is, but I am afraid I cant provide more information
> on this. So just trust me on this.
>


Like I said, there is a way to do it. Normalize your database. Your
table violates first normal form.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Reply With Quote
  #6 (permalink)  
Old 1 Week Ago
cvh@LE
 
Posts: n/a
Default Re: nth Occurence of String in Set


@Jerry : As said, I am neither the admin nor have I the privileges to
normalize the db. Furthermore normalizing this column would create
more overhead than performance gain could be won.

@tobi: After reconsidering my function and rereading substring_index
documentation I changed the function to

CREATE FUNCTION element_in_set(mystring varchar(4096),sep varchar(25),
myindex int(4))
RETURNS VARCHAR(4096) DETERMINISTIC
BEGIN
RETURN substring_index(substring_index(mystring,sep,myind ex),sep,-1);
END

So thanks for the hint. I nonetheless consider a "RTFM" and a oneliner
rather rude, unhelpful and a totally unnecessary personal attack.
Groups are for seeking and giving advice not for lecturing,
downtalking or slagging so. off
Reply With Quote
  #7 (permalink)  
Old 1 Week Ago
Jerry Stuckle
 
Posts: n/a
Default Re: nth Occurence of String in Set

cvh@LE wrote:
> @Jerry : As said, I am neither the admin nor have I the privileges to
> normalize the db. Furthermore normalizing this column would create
> more overhead than performance gain could be won.
>


Not having the authority is one thing. Claiming it will create more
overhead is another - and probably FALSE.

Very seldom have I actually SEEN a degradation of performance when
normalizing a database. And virtually never when normalizing a column
with multiple values. That almost always improves performance across
the board. And it definitely improves performance on this particular query.

It was true that normalization hurt database performance - 25 years ago.
But things have come a long ways since then, and RDBMS's typically do
better with normalized data than not.

Of course, there are always exceptions. But until you actually take
performance benchmarks with your statements and your data, you can't say
that.



--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Reply With Quote
  #8 (permalink)  
Old 6 Days Ago
cvh@LE
 
Posts: n/a
Default Re: nth Occurence of String in Set

On 6 Mai, 13:09, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> cvh@LE wrote:
> > @Jerry : As said, I am neither the admin nor have I the privileges to
> > normalize the db. Furthermore normalizing this column would create
> > more overhead than performance gain could be won.

>
> Not having the authority is one thing. *Claiming it will create more
> overhead is another - and probably FALSE.
>
> Very seldom have I actually SEEN a degradation of performance when
> normalizing a database. *And virtually never when normalizing a column
> with multiple values. *That almost always improves performance across
> the board. *And it definitely improves performance on this particular query.
>
> It was true that normalization hurt database performance - 25 years ago.
> * But things have come a long ways since then, and RDBMS's typically do
> better with normalized data than not.
>
> Of course, there are always exceptions. *But until you actually take
> performance benchmarks with your statements and your data, you can't say
> that.
>
> --
> ==================
> Remove the "x" from my email address
> Jerry Stuckle
> JDS Computer Training Corp.
> jstuck...@attglobal.net
> ==================


@Jerry I absolutely agree with you on normalisation. And I, too, have
seldomly (although I do have) experienced that normalization is a
performance downer.

However, the application-database which makes use of this column-
containing-a-set problem is normalized already, except for this column
which happens to contain data similar to comments. Comments mostly are
formulated as sentences, which happens to be nothing more than an
ordered space separated set. This column contains exactly the same. I
doubt anybody would normalize a short one-sentence comment field in a
table holding, let's say, additional remarks (3NF) given as part of
responses (2NF) by respondents (1NF) in surveys.

Anyway, the whole question of mine targeted a function allowing me to
retrieve the nth element in a set (I agree, the word occurence was the
wrong choice - but, hey, English isnt my native tongue).

To illustrate the problem a bit:

The column in question containing the set could itself be the result
of an aggregate function :

imagine :
mysql> create table test1 (pk int(1),col1 varchar(20));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test1(pk,col1) values (1,"a"),(2,"b"),(1,"c"),
(2,"e"),(1,"f"),(2,"g");
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0

mysql> select * from test1;
+------+------+
| pk | col1 |
+------+------+
| 1 | a |
| 2 | b |
| 1 | c |
| 2 | e |
| 1 | f |
| 2 | g |
+------+------+
6 rows in set (0.01 sec)

mysql> select pk,group_concat(col1 separator ",") from test1 group by
1;
+------+----------------------------------+
| pk | group_concat(col1 separator ",") |
+------+----------------------------------+
| 1 | a,c,f |
| 2 | b,e,g |
+------+----------------------------------+
2 rows in set (0.00 sec)

mysql> select pk,metadaten.element_in_set(group_concat(col1 separator
","),",",2) 2ndElement from test1 group by 1;
+------+------------+
| pk | 2ndElement |
+------+------------+
| 1 | c |
| 2 | e |
+------+------------+
2 rows in set (0.00 sec)

mysql>



Reply With Quote
  #9 (permalink)  
Old 6 Days Ago
Jerry Stuckle
 
Posts: n/a
Default Re: nth Occurence of String in Set

cvh@LE wrote:
> On 6 Mai, 13:09, Jerry Stuckle <jstuck...@attglobal.net> wrote:
>> cvh@LE wrote:
>>> @Jerry : As said, I am neither the admin nor have I the privileges to
>>> normalize the db. Furthermore normalizing this column would create
>>> more overhead than performance gain could be won.

>> Not having the authority is one thing. Claiming it will create more
>> overhead is another - and probably FALSE.
>>
>> Very seldom have I actually SEEN a degradation of performance when
>> normalizing a database. And virtually never when normalizing a column
>> with multiple values. That almost always improves performance across
>> the board. And it definitely improves performance on this particular query.
>>
>> It was true that normalization hurt database performance - 25 years ago.
>> But things have come a long ways since then, and RDBMS's typically do
>> better with normalized data than not.
>>
>> Of course, there are always exceptions. But until you actually take
>> performance benchmarks with your statements and your data, you can't say
>> that.
>>
>> --
>> ==================
>> Remove the "x" from my email address
>> Jerry Stuckle
>> JDS Computer Training Corp.
>> jstuck...@attglobal.net
>> ==================

>
> @Jerry I absolutely agree with you on normalisation. And I, too, have
> seldomly (although I do have) experienced that normalization is a
> performance downer.
>
> However, the application-database which makes use of this column-
> containing-a-set problem is normalized already, except for this column
> which happens to contain data similar to comments. Comments mostly are
> formulated as sentences, which happens to be nothing more than an
> ordered space separated set. This column contains exactly the same. I
> doubt anybody would normalize a short one-sentence comment field in a
> table holding, let's say, additional remarks (3NF) given as part of
> responses (2NF) by respondents (1NF) in surveys.
>
> Anyway, the whole question of mine targeted a function allowing me to
> retrieve the nth element in a set (I agree, the word occurence was the
> wrong choice - but, hey, English isnt my native tongue).
>
> To illustrate the problem a bit:
>
> The column in question containing the set could itself be the result
> of an aggregate function :
>
> imagine :
> mysql> create table test1 (pk int(1),col1 varchar(20));
> Query OK, 0 rows affected (0.00 sec)
>
> mysql> insert into test1(pk,col1) values (1,"a"),(2,"b"),(1,"c"),
> (2,"e"),(1,"f"),(2,"g");
> Query OK, 6 rows affected (0.01 sec)
> Records: 6 Duplicates: 0 Warnings: 0
>
> mysql> select * from test1;
> +------+------+
> | pk | col1 |
> +------+------+
> | 1 | a |
> | 2 | b |
> | 1 | c |
> | 2 | e |
> | 1 | f |
> | 2 | g |
> +------+------+
> 6 rows in set (0.01 sec)
>
> mysql> select pk,group_concat(col1 separator ",") from test1 group by
> 1;
> +------+----------------------------------+
> | pk | group_concat(col1 separator ",") |
> +------+----------------------------------+
> | 1 | a,c,f |
> | 2 | b,e,g |
> +------+----------------------------------+
> 2 rows in set (0.00 sec)
>
> mysql> select pk,metadaten.element_in_set(group_concat(col1 separator
> ","),",",2) 2ndElement from test1 group by 1;
> +------+------------+
> | pk | 2ndElement |
> +------+------------+
> | 1 | c |
> | 2 | e |
> +------+------------+
> 2 rows in set (0.00 sec)
>
> mysql>
>
>
>
>


Yes, I understand your design. But there are no functions to retrieve
information form a set because the set itself violates 1NF. And if the
database is normalized properly, there is no need.

And yes, I would normalize such a field in such circumstances.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Reply With Quote
  #10 (permalink)  
Old 6 Days Ago
Peter H. Coffin
 
Posts: n/a
Default Re: nth Occurence of String in Set

On Wed, 07 May 2008 07:55:48 -0400, Jerry Stuckle wrote:
> cvh@LE wrote:
>> @Jerry I absolutely agree with you on normalisation. And I, too, have
>> seldomly (although I do have) experienced that normalization is a
>> performance downer.
>>
>> However, the application-database which makes use of this column-
>> containing-a-set problem is normalized already, except for this column
>> which happens to contain data similar to comments. Comments mostly are
>> formulated as sentences, which happens to be nothing more than an
>> ordered space separated set. This column contains exactly the same. I
>> doubt anybody would normalize a short one-sentence comment field in a
>> table holding, let's say, additional remarks (3NF) given as part of
>> responses (2NF) by respondents (1NF) in surveys.
>>

>
> Yes, I understand your design. But there are no functions to retrieve
> information form a set because the set itself violates 1NF. And if the
> database is normalized properly, there is no need.
>
> And yes, I would normalize such a field in such circumstances.


To generalize a bit off of what I think Jerry's saying is that since
your requirement to access parts of comments exists, then "comment"
isn't a discrete element anymore. "Foo-part of comment" becomes then the
discrete element and you must then define what each normal form looks
like in light of that discrete element in order to decide whether it's
normalized or not. For example, if all I care about is counts in a set
of users' mailboxes, a single email is a discrete element, and tracking
an ID for each email, and what user it belongs to is normalized enough
for the purpose. If I need to know how many emails are in each
conversation thread, and how many were sent only internally to this mail
system, then it's NOT normalized, even though the data I have to work
with hasn't changed at all.

--
The Write Many, Read Never drive. For those people that don't know
their system has a /dev/null already.
-- Rik Steenwinkel, singing the praises of 8mm Exabytes
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 06:25 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0