This is a discussion on sed on a multiline block: strip spaces and newlines within tags within the Linux General forums, part of the Linux Forums category; I have a command that spews descriptive output that has attributes (CWD, TIME, NAME) etc followed by the value enclosed ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I have a command that spews descriptive output that has attributes (CWD,
TIME, NAME) etc followed by the value enclosed in <> Unfortunately the original designer thought it to be "smart" [sic.] to implement his own "word-wrap" at a certain number of columns. Consequently many of my lines are split at strange locations. He also inserted spaces to make the output "pretty" by aligning columns. This makes it awfully inconvenient to process system paths etc. because of the intervening space and newlines. Example: rrrr rrr cruft cruft blah blah CWD </xxx/yyy/fff /zzz/eee/rrr /ttt/aaa/qqq> I need a rrr rrr cruft cruft blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq> How do I sed this so that any space-characters and newlines are stripped BUT only if within a <>? I'm stumped at how to write a sed one liner that will handle this multi-line pattern space! Any ideas? -- Rahul |
|
|||
|
On 4/2/2008 3:54 PM, Rahul wrote: > I have a command that spews descriptive output that has attributes (CWD, > TIME, NAME) etc followed by the value enclosed in <> > > Unfortunately the original designer thought it to be "smart" [sic.] to > implement his own "word-wrap" at a certain number of columns. > Consequently many of my lines are split at strange locations. He also > inserted spaces to make the output "pretty" by aligning columns. > > This makes it awfully inconvenient to process system paths etc. because > of the intervening space and newlines. > > Example: > rrrr rrr > cruft cruft > blah blah CWD </xxx/yyy/fff > /zzz/eee/rrr > /ttt/aaa/qqq> > > I need a > rrr rrr > cruft cruft > blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq> > > > > > How do I sed this so that any space-characters and newlines are stripped > BUT only if within a <>? I'm stumped at how to write a sed one liner that > will handle this multi-line pattern space! Any ideas? > sed is the wrong choice for multi-line input. Try this: $ cat file rrrr rrr cruft cruft blah blah CWD </xxx/yyy/fff /zzz/eee/rrr /ttt/aaa/qqq> $ awk 'BEGIN{RS=ORS=">\n";FS=OFS="<"}{gsub(/[[:space:]]+/,"",$NF)}1' file rrrr rrr cruft cruft blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq> Ed. |
|
|||
|
Ed Morton <morton@lsupcaemnt.com> wrote in
news:47F3F442.4020208@lsupcaemnt.com: > sed is the wrong choice for multi-line input. Try this: > > $ cat file > rrrr rrr > cruft cruft > blah blah CWD </xxx/yyy/fff > /zzz/eee/rrr > /ttt/aaa/qqq> > $ awk 'BEGIN{RS=ORS=">\n";FS=OFS="<"}{gsub(/[[:space:]]+/,"",$NF)}1' > file rrrr rrr > cruft cruft > blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq> > > Ed. > > Thanks Ed! That seems to work great. I was barking up the wrong tree! I should have looked at awk. When to 'sed' and when to 'awk' is still a pretty confusing choice for me... -- Rahul |
|
|||
|
Rahul wrote:
> I have a command that spews descriptive output that has attributes (CWD, > TIME, NAME) etc followed by the value enclosed in <> > > Unfortunately the original designer thought it to be "smart" [sic.] to > implement his own "word-wrap" at a certain number of columns. > Consequently many of my lines are split at strange locations. He also > inserted spaces to make the output "pretty" by aligning columns. > > This makes it awfully inconvenient to process system paths etc. because > of the intervening space and newlines. > > Example: > rrrr rrr > cruft cruft > blah blah CWD </xxx/yyy/fff > /zzz/eee/rrr > /ttt/aaa/qqq> > > I need a > rrr rrr > cruft cruft > blah blah CWD </xxx/yyy/fff/zzz/eee/rrr/ttt/aaa/qqq> > > > How do I sed this so that any space-characters and newlines are stripped > BUT only if within a <>? I'm stumped at how to write a sed one liner that > will handle this multi-line pattern space! Any ideas? I must admit that I'm still in the process of figuring out how /exactly/ this works :-), but anyway it seems to do what you want with your input: sed ':s; /</,/>/ {/>/! {$! {N; b s}; }; s/\n[[:space:]]*//g;}' file.txt (yes, I really should dig deeper into sed) If you want to use awk, here's another one: awk '/</ {ok=1;line=$0;next} ok {gsub(/^[[:space:]]*/,"");line=line $0} />/ {print line;line="";ok=0;next} 1' file.txt Both solutions assume no "<" or ">" between the opening "<" and the closing ">". -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
|
|||
|
Ed Morton wrote:
> $ cat file > rrrr rrr > cruft cruft > blah blah CWD </xxx/yyy/fff > /zzz/eee/rrr > /ttt/aaa/qqq> > $ awk 'BEGIN{RS=ORS=">\n";FS=OFS="<"}{gsub(/[[:space:]]+/,"",$NF)}1' file Maybe $ gawk 'BEGIN{RS=ORS=">\n";FS=OFS="<"}{gsub(/\n[[:space:]]+/,"",$NF)}1' file to avoid removing possible spaces in the names (since IIUC those between < and > are paths). -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |