Sed —- The Stream Editor

Sed —- the Stream EDitor

Sed was developed some time after the creation of grep back in the mid 70’s.
It works like a filter — deleting, inserting and changing characters, words,
and lines of text. The user sends a script of editing instructions to sed, plus
the name of the file to edit (or the text to be edited may come as output from
a pipe). Sed’s default function is to always print every line it reads to the
standard output(stdout). The command structure basically is as follows

sed  [options] [commands] file

Regular Expressions (”regex’s” for short) are sets of symbols or meta characters
used to match patterns of text and under sed a regex is a command. Believe it or
not you probably use something like Regular Expressions all the time.

Example: >>> bash$# ls *.html      <---- the bash shell interprets *.html as                                           any file ending with .html    Using sed:   bash$# ls | sed -n '/.*.html$/p' -    <---- same as above :)                                  ^------------^      If this line does not work                                                for you read end of this file

Note that the sed example has the string ‘/.*\.html$/p’ The single quotes just
tells the shell not to parse the contents of the string. The string contains
the regex we use to pluck .html files listed by the piped ls program. The regex
itself matches the pattern [anything].html.

"-n " option implies that sed should not print.   " / " marks the beginning of the Regular Expression. / can be substituted by any            character (eg. %regex% )   " . " is a single character metacharacter that matches any one character.   " * " is a quantifier metacharacter what ever precedes it must match 0 or more.   "  " is a escape char. . is treated as a literal instead of metacharacter.   " $ " is an Anchor, this anchor matches at the end of the line.   " / " marks the the end of the regex.   " p " is a Flag that prints the line if the regex exists in that line.   " - " at the end is used instead of a file name because we are using the   standard input (the piped output). The sed example can be further simplified   with the regex '/.html$/p'

Substitution.

bash$# echo 'Ted is Dead, but Dead is not Ted' | sed 's/Ted/Fred/g' -

With the string ’s/x/y/g’, the beginning ” s ” stands for substitution, so
sed replaces regex x with the literal y. The g Flag is a group qualifier (ie..
for each line of text all occurrences of x will be substituted by y).
So the example above would output ” Fred is Dead, but Dead is not Fred ”

>>>>>>> bash$# echo "Teddy is Dead, but Dead is not Ted" | sed 's/Ted/Fred/' -    output> Freddy is Dead, but Dead is not Ted

As the stream is processed by sed, the regex substitutes the Ted in Teddy
at the beginning of the line instead of Ted at the end of the line. One work
around for this would be to use the $ anchor.

>>>>>>> bash$# echo "Teddy is Dead, but Dead is not Ted" | sed 's/Ted$/Fred/' -  or  >>>>>>> bash$# echo "Teddy is Dead, but Dead is not Ted" | sed 's/Ted/Fred/2' -

The second example used the Number Flag(replace the NUMBERth match of the regex)
output is ” Teddy is Dead, but Dead is not Fred ” :) for both above commands

Regular Expressions.

Single character metacharacters       .       Matches any one character     [...]   Matches any character listed between the brackets     [^...]  Matches any character except those listed between the brackets     Quantifier metacharacters       ?          Matches any character zero or one times     *          Matches the preceding element zero or more times     +          Matches the preceding element one or more times     {num}      Matches the preceding element num times     {min, max} Matches the preceding element at least min, but nomore than                  max times     Anchor metacharacters       ^          Matches at the start of the line     $          Matches at the end of the line     Grouping -     ( ... )    Grouping allows you to store references to each consecutive match              in a pattern. You can reference each match by 1 2 3 depending              on group order. eg ( s/^(Ted).Kat$/1ddy K/ ) where 1 = Ted.

Regular Expressions are an art and a complete explanation of them would be well
beyond the scope of this paper.

Commands and Internals.

Besides the regex there are other commands that come in handy when using sed.
But before we get into those I would like to point out how sed “really” works.
Input is read one line at a time into what sed calls the “pattern” space. This
pattern space is where the commands like regex, d, and N are performed. Some
commands depend on what is call hold space. Hold space is an area that you can
copy pattern space to for safe keeping. Normally though hold space is barely
used. Commands:

'd'  delete the pattern space; immediately start next cycle.  'D'  Delete text in the pattern space up to the first newline. If any text is        left, restart cycle with the resultant pattern space  'N'  Add a newline to the pattern space, then append the next line of input to        the pattern space.  'x'  Exchange the contents of the hold and pattern spaces  'h'  Replace contents of the hold space with the contents of the pattern space.  'g'  Replace contents of the pattern space with the contents of the hold space.  'x'  Exchange the contents of the hold and pattern spaces.  'G'  Append a newline to the contents of the pattern space, and then append the        contents of the hold space to that of the pattern space.  '{ commands }'       A group of commands may be enclosed between { and } characters.

Sed is a very powerful light weight program for quick jobs or small shell
scripts. It works best for substitution or adding double or triple spacing
to text files, But for heavy duty jobs or scripts I would recommend using awk
or perl instead.

References: (all used to setup this document)

  1. Using Regular Expressions{http://etext.lib.virginia.edu/helpsheets/regex.html}
  2. Sed FAQ {http://www.ptug.org/sed/sedfaq.htm}
  3. Gnu sed manual { http://www.gnu.org/manual/sed/html_mono/sed.html }
  4. Gnu awk user’s guide{http://www.gnu.org/manual/gawk-3.1.1/html_mono/gawk.html}

Questions/Comments

Ted Kat - dark_house_666@yahoo*com

*For shells that are colored coded or alias's of ls that specify --color option  **  ** bash$# ls | sed -n '/.*.html[[:cntrl:]][0m$/p' -  **  **   `ls --color` spits out ansi escape sequences enclosed in  **

Comments are closed.