Bluehost.com Web Hosting $6.95

sed on a multiline block: strip spaces and newlines within tags

This is a discussion on sed on a multiline block: strip spaces and newlines within tags within the Linux General forums, part of the Linux Forums category; I have a command that spews descriptive output that has attributes (CWD, TIME, NAME) etc followed by the value enclosed ...


Go Back   Usenet Forums > Linux Forums > Linux General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-02-2008
Rahul
 
Posts: n/a
Default sed on a multiline block: strip spaces and newlines within tags

I have a command that spews descriptive output that has attributes (CWD,
TIME, NAME) etc followed by the value enclosed in <>

Unfortunately the original designer thought it to be "smart" [sic.] to
implement his own "word-wrap" at a certain number of columns.
Consequently many of my lines are split at strange locations. He also
inserted spaces to make the output "pretty" by aligning columns.

This makes it awfully inconvenient to process system paths etc. because
of the intervening space and newlines.

Example:
rrrr rrr
cruft cruft
blah blah CWD </xxx/yyy/fff
/zzz/eee/rrr
/ttt/aaa/qqq>

I need a
rrr rrr
cruft cruft
blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq>




How do I sed this so that any space-characters and newlines are stripped
BUT only if within a <>? I'm stumped at how to write a sed one liner that
will handle this multi-line pattern space! Any ideas?

--
Rahul
Reply With Quote
  #2 (permalink)  
Old 04-02-2008
Ed Morton
 
Posts: n/a
Default Re: sed on a multiline block: strip spaces and newlines within tags



On 4/2/2008 3:54 PM, Rahul wrote:
> I have a command that spews descriptive output that has attributes (CWD,
> TIME, NAME) etc followed by the value enclosed in <>
>
> Unfortunately the original designer thought it to be "smart" [sic.] to
> implement his own "word-wrap" at a certain number of columns.
> Consequently many of my lines are split at strange locations. He also
> inserted spaces to make the output "pretty" by aligning columns.
>
> This makes it awfully inconvenient to process system paths etc. because
> of the intervening space and newlines.
>
> Example:
> rrrr rrr
> cruft cruft
> blah blah CWD </xxx/yyy/fff
> /zzz/eee/rrr
> /ttt/aaa/qqq>
>
> I need a
> rrr rrr
> cruft cruft
> blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq>
>
>
>
>
> How do I sed this so that any space-characters and newlines are stripped
> BUT only if within a <>? I'm stumped at how to write a sed one liner that
> will handle this multi-line pattern space! Any ideas?
>


sed is the wrong choice for multi-line input. Try this:

$ cat file
rrrr rrr
cruft cruft
blah blah CWD </xxx/yyy/fff
/zzz/eee/rrr
/ttt/aaa/qqq>
$ awk 'BEGIN{RS=ORS=">\n";FS=OFS="<"}{gsub(/[[:space:]]+/,"",$NF)}1' file
rrrr rrr
cruft cruft
blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq>

Ed.

Reply With Quote
  #3 (permalink)  
Old 04-02-2008
Rahul
 
Posts: n/a
Default Re: sed on a multiline block: strip spaces and newlines within tags

Ed Morton <morton@lsupcaemnt.com> wrote in
news:47F3F442.4020208@lsupcaemnt.com:

> sed is the wrong choice for multi-line input. Try this:
>
> $ cat file
> rrrr rrr
> cruft cruft
> blah blah CWD </xxx/yyy/fff
> /zzz/eee/rrr
> /ttt/aaa/qqq>
> $ awk 'BEGIN{RS=ORS=">\n";FS=OFS="<"}{gsub(/[[:space:]]+/,"",$NF)}1'
> file rrrr rrr
> cruft cruft
> blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq>
>
> Ed.
>
>



Thanks Ed! That seems to work great. I was barking up the wrong tree! I
should have looked at awk. When to 'sed' and when to 'awk' is still a
pretty confusing choice for me...

--
Rahul
Reply With Quote
  #4 (permalink)  
Old 04-03-2008
pk
 
Posts: n/a
Default Re: sed on a multiline block: strip spaces and newlines within tags

Rahul wrote:

> I have a command that spews descriptive output that has attributes (CWD,
> TIME, NAME) etc followed by the value enclosed in <>
>
> Unfortunately the original designer thought it to be "smart" [sic.] to
> implement his own "word-wrap" at a certain number of columns.
> Consequently many of my lines are split at strange locations. He also
> inserted spaces to make the output "pretty" by aligning columns.
>
> This makes it awfully inconvenient to process system paths etc. because
> of the intervening space and newlines.
>
> Example:
> rrrr rrr
> cruft cruft
> blah blah CWD </xxx/yyy/fff
> /zzz/eee/rrr
> /ttt/aaa/qqq>
>
> I need a
> rrr rrr
> cruft cruft
> blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq>
>
>
> How do I sed this so that any space-characters and newlines are stripped
> BUT only if within a <>? I'm stumped at how to write a sed one liner that
> will handle this multi-line pattern space! Any ideas?


I must admit that I'm still in the process of figuring out how /exactly/
this works :-), but anyway it seems to do what you want with your input:

sed ':s; /</,/>/ {/>/! {$! {N; b s}; }; s/\n[[:space:]]*//g;}' file.txt

(yes, I really should dig deeper into sed)

If you want to use awk, here's another one:

awk '/</ {ok=1;line=$0;next}
ok {gsub(/^[[:space:]]*/,"");line=line $0}
/>/ {print line;line="";ok=0;next}
1' file.txt

Both solutions assume no "<" or ">" between the opening "<" and the
closing ">".

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
Reply With Quote
  #5 (permalink)  
Old 04-03-2008
pk
 
Posts: n/a
Default Re: sed on a multiline block: strip spaces and newlines within tags

Ed Morton wrote:

> $ cat file
> rrrr rrr
> cruft cruft
> blah blah CWD </xxx/yyy/fff
> /zzz/eee/rrr
> /ttt/aaa/qqq>
> $ awk 'BEGIN{RS=ORS=">\n";FS=OFS="<"}{gsub(/[[:space:]]+/,"",$NF)}1' file


Maybe

$ gawk 'BEGIN{RS=ORS=">\n";FS=OFS="<"}{gsub(/\n[[:space:]]+/,"",$NF)}1' file

to avoid removing possible spaces in the names (since IIUC those between <
and > are paths).

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 07:18 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0