Rich Megginson (richmegginson) wrote,
Rich Megginson
richmegginson

  • Music:

Using sed to unwrap ldif lines

LDIF is the ASCII format used to represent LDAP data. Tools such as ldapsearch produce LDIF output. LDIF wraps long lines.  Continuation lines begin with a space.  For example:
...
description: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
 ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
someotherattribute: ....

Programs that read LDIF must be able to concatenate the lines into a single value without the newline+space. This is problematic for a large number of use cases where you may want to use standard *nix tools such as grep to parse the output e.g.
ldapsearch .... "attr=something" | grep pattern

This is a problem if the attribute value is very long - the pattern may or may not match depending on how the value is wrapped. There is a very simple perl one-liner which can do this:
perl -p0e 's/\n //g' file.ldif

but sometimes perl is not the right tool for the job. Since sed is part of the standard *nix toolkit, and is very powerful, it would be nice to be able to use sed for this.

I started with http://www.shell-fu.org/lister.php?id=234 :
/^ / {; H; d; }; /^ /! {; x; s/\n //; };

Which is a good place to start, but has a few problems:
* prints a blank line as the first line
* does not handle more than one continuation line
* chops off the last line

Here is my solution, which works with sed -n (to suppress printing - the sed program does the printing):
1 {h; $ !d}; $ {x; s/\n //g; p}; /^ / {H; d}; /^ /! {x; s/\n //g; p}

For those unfamiliar with sed, here is a good introduction: http://www.grymoire.com/Unix/Sed.html


Notes: { and } are used for command grouping - ; is the command delimiter. The sed command above is really 4 separate address/command pairs.
1 {h; $ !d};

On the first line, store the line in the hold buffer. If this is not the last line, delete it, and go to the next line. If it is the last line, it will fall through to the next statement $ .... This (with sed -n) supresses printing the first line.
$ {x; s/\n //g; p};

On the last line, swap the hold buffer into the current pattern buffer. Delete all (/g) occurances of newline+space in the current pattern buffer. Print the current pattern buffer. This solves the problem that the original had with printing the last line.
/^ / {H; d};

If the line is a continuation line, just add it to the hold space, delete it, and go to the next line.
/^ /! {x; s/\n //g; p}

If the line is not a continuation line, swap the hold buffer with the current pattern buffer. Delete all (/g) occurances of newline+space in the current pattern buffer. Print the current pattern buffer.


To use this in a shell command:
$ ldapsearch .... '(something=otherthing)' | sed -n '1 {h; $ !d}; $ {x; s/\n //g; p}; /^ / {H; d}; /^ /! {x; s/\n //g; p}' | grep somepattern

Another common task is getting the value of a single-valued attribute. This is similar to the above, but slightly more complicated. This assumes you have a shell variable called attrname which you have set to the name of the attribute you are interested in. The sed looks like this and is quoted for the shell:
'/^'$attrname':/,/^$/ { /^'$attrname':/ { s/^'$attrname': *// ; h ; $ !d}; /^ / { H; $ !d}; /^ /! { x; s/\n //g; p; q}; $ { x; s/\n //g; p; q} }'

The blow-by-blow description:
/^'$attrname':/,/^$/ {

Only perform the following actions between lines that begin with attrname: to the end of the entry. In LDIF entries end with an empty line or EOF.
/^'$attrname':/ { s/^'$attrname': *// ; h ; $ !d};

If the line begins with attrname:, delete the attrname, the colon, and any white space after the colon, so that we have just the attribute value without the attribute name. Save the result to the hold buffer. If this is not EOF, delete the pattern buffer, and go to the next line. If this is EOF, fall through to the EOF statement ($ ...). The rest is similar to the previous, except that the last one ends with q - this assumes the attribute is single valued.
/^ / { H; $ !d};

If this is a continuation line, add it to the hold buffer. If not EOF, delete it, and go to the next line. If EOF, fall through to the EOF case.
/^ /! { x; s/\n //g; p; q};

If the line is not a continuation line, swap the hold buffer with the current pattern buffer. Delete all (/g) occurances of newline+space in the current pattern buffer. Print the current pattern buffer. Quit the sed program.
$ { x; s/\n //g; p; q} }

If we are at EOF, swap the hold buffer with the current pattern buffer. Delete all (/g) occurances of newline+space in the current pattern buffer. Print the current pattern buffer. Quit the sed program. Note that this only works with the first occurance of the attribute, and only works with single-valued attributes.

Here is an example:
$ attrname=svattrname
$ attrval=`ldapsearch ... '(somefilter)' $attrname | sed -n '/^'$attrname':/,/^$/ { /^'$attrname':/ { s/^'$attrname': *// ; h ; $ !d}; /^ / { H; $ !d}; /^ /! { x; s/\n //g; p; q}; $ { x; s/\n //g; p; q} }'`

Assuming (somefilter) is a valid LDAP search filter which returns a single entry, and svattrname is the name of a single-valued attribute in that entry, attrval will contain the value of that attribute, with the continuation lines unwrapped.
Subscribe

  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 12 comments