Check out my first novel, midnight's simulacra!
Xmlstarlet
XMLStarlet is a command line tool for manipulation of XML. It exposes XPath, XSLT, and validation among many other capabilities. On FreeBSD, it is generally invoked as xml and is available from the Ports Collection in textproc/xmlstarlet; on Linux, it is generally invoked as xmlstarlet (throughout the rest of this article, I will use xml). XMLstarlet invocations take the form: xml command [command options] [filename]. It is intended for use within pipelines; any diagnostic output generated will be placed on stderr. To get help on a command, run xml command --help.
As more and more tools generate XML-based output (subversion, nmap, etc), xmlstarlet becomes a useful and flexible tool for working with XML in a stream-editing context.
Escaping and Unescaping
Due to entity references, text must be processed prior to being embedded within an element as data. Conversion to and from entity references is managed with xml esc and xml unesc. Neither of these commands take any options. The input has no well-formedness requirements.
[diaconicon](0) $ xmlstarlet esc \<\>\& && echo \<\>\& | xmlstarlet esc <>& <>& [diaconicon](0) $ xmlstarlet esc \<\>\& | xmlstarlet unesc && echo \<\>\& | xmlstarlet esc | xargs xmlstarlet unesc <>& <>& [diaconicon](0) $
Document Presentation
Canonical Representation
xml c14n can be used to canonicalize XML (this primarily consists of removing whitespace leading or trailing the root node, and expanding empty nodes which use the <foo/> notation). The input must be well-formed. Comments can either be preserved or stripped via the with-comments and without-comments command-line options.
[diaconicon](0) $ echo -e "\n<foo> <bar />\n<qux> crap\tcrap </qux></foo>" | xmlstarlet c14n <foo> <bar></bar> <qux> crap crap </qux></foo>[diaconicon](0) $
Element Structure
xml el will display the element structure of well-formed XML. Attributes and their values can also be displayed; use the -a option to enable attribute output, and the -v option to enable attribute + value output. Each element and its complete ancestry will appear on its own line, with generations demarcated with foreslashes. Attributes and their values, if enabled, follow the element path in square brackets.
[diaconicon](0) $ echo '<foo><bar qux="quux" fux="fuux">quuux</bar><baz/></foo>' | xmlstarlet el -v foo foo/bar[@qux='quux' and @fux='fuux'] foo/baz [diaconicon](0) $
Line-oriented (PyX)
The PyX (PostScript + Python + TeX) notation can be used to prepare XML data for processing by line-oriented tools such as grep or wc. Each (preserved) syntactic element in the XML source results in a single line of text prefaced with a PyX classifier; this mapping is bijective:
XML Syntax | PyX Classifier | XML Example | PyX Equivalent |
---|---|---|---|
start-tag | '(' | <foo>, <foo/> | (foo |
end-tag | ')' | </foo>, <foo/> | )foo |
attribute | 'A' | foo="bar" | Afoo bar |
character data | '-' | <foo>bar</foo> | -bar |
processing instruction | '?' | <?A4TypeSetter PageBreak?> | ?A4TypeSetter PageBreak |
Use xml pyx to convert from XML to PyX and xml p2x to convert from PyX to XML. The input must be well-formed in either case.
[diaconicon](0) $ echo '<foo><?xml-stylesheet href="headlines.css" type="text/css"?><bar qux="quux">quuux</bar><baz/></foo>' | xmlstarlet pyx (foo ?xml-stylesheet href="headlines.css" type="text/css" (bar Aqux quux -quuux )bar (baz )baz )foo [diaconicon](0) $ echo '<foo><?xml-stylesheet href="headlines.css" type="text/css"?><bar qux="quux">quuux</bar><baz/></foo>' | xmlstarlet pyx | xmlstarlet p2x <foo><?xml-stylesheet href="headlines.css" type="text/css"?> <bar qux="quux">quuux</bar><baz></baz></foo> [diaconicon](0) $
Pretty Printing
xml fo can pretty-print well-formed input using either spaces or tabs. Consult xml fo --help; it's pretty self-explanatory.
XML Input
Directory Listings
xml ls will output a verbose listing of the current directory's contents, displaying all times in ISO 8601 format. It takes no options. The return value is, for reasons unfathomable, the number of files listed as opposed to a success/error indicator -- don't get caught unawares.
[diaconicon](0) $ xmlstarlet ls <dir> <d p="rwxr-xr-x" a="20070421T065820Z" m="20070415T190332Z" s="4096" n="trunk"/> <d p="rwxr-xr-x" a="20070421T044013Z" m="20070419T181636Z" s="4096" n=".svn"/> </dir> [diaconicon](2) $
Recovering Documents
xml fo -R attempts to magically fix up malformed XML. It's pretty badass.
[diaconicon](0) $ echo "<fucked> this is all <fubar> </fukked>" | xmlstarlet fo -R -:1: parser error : Opening and ending tag mismatch: fubar line 1 and fukked <fucked> this is all <fubar> </fukked> ^ -:2: parser error : Premature end of data in tag fucked line 1 ^ <?xml version="1.0"?> <fucked> this is all <fubar> </fubar> </fucked> [diaconicon](0) $
Subversion
svn log and svn status both accept the --xml command line option.
[diaconicon](0) $ svn status --xml <?xml version="1.0"?> <status> <target path="."> <entry path="trunk/clients/log.xml"> <wc-status props="none" item="unversioned"> </wc-status> </entry> </target> </status> [diaconicon](0) $ svn log -r299 --xml <?xml version="1.0"?> <log> <logentry revision="299"> <author>dank</author> <date>2007-04-19T16:47:10.226838Z</date> <msg>reenable the exitcode etc buildreport handlers</msg> </logentry> </log> [diaconicon](0) $
Validation
All forms of the xml val command as used below can be provided the -e command line option to verbosely print the causes of validation failures on stderr.
Well-formedness
Well-formedness of an XML document can be verified with xml val. Well-formedness mandates a single root node, balancing of all element start- and end-tags, proper syntax, proper use of character entities where necessary, etc.
RELAX-NG
If a RELAX-NG schema has been provided, validation can be performed against it using xml val -r schema.
Transformation
XSLT is awesomely powerful.
Dump all content
Delimited by spaces:
xml sel -t -m //\*\[not\(\*\)\] -v . -o ' '
Querying
XPath can be used to programmatically explore XML.
Counting
Count all nodes in a document:xml sel -t -v "count(//node())"
Count all occurrences of somenode nodes:
xml sel -t -v "count(//somenode)"
See Also