Xidel
XPath
Easiest method is XPath:
xidel --xpath "//guid" a.xml
For dealing with Atom namespaces:
xidel --xpath "//*:id" a.atom.xml
(Since Atom doesn't register a name for its namespace).
XQuery
May also want to use XQuery though.
Extract Data from OPML and Write to a TSV
Uses a single-line XQuery statement to output alternate lines of the Feed Name and Feed URL, then uses awk (see power-of-awk) to turn them into a TSV.
xidel feeds.opml --extract 'for $x in //*:outline return $x/@text | $x/@xmlUrl'|awk 'NR % 2 == 0 {printf "%s\t" $0}; NR % 2 == 1 {printf "%s\n" $0}' >feeds.tsv
XQuery Script to Turn RSS or Atom Feed into an HTML File
let $c := count(//*:entry) let $cr := count(//*:item) return if ($c > 0) then ( <a name="toc"></a>,<nav><ol>{ for $x at $n in //*:entry let $xt:=inner-html($x/*:title) let $xda:=( if ($x/*:published != '') then inner-html($x/*:published) else if ($x/*:updated != '') then inner-html($x/*:updated) else () ) order by $xda descending return <li><a href="#{$n}">{$xt}</a></li> }</ol></nav>,<main>{ for $x at $n in //*:entry let $xc:=( if ($x/*:content != '') then inner-html($x/*:content) else if ($x/*:summary != '') then inner-html($x/*:summary) else () ) let $xt:=inner-html($x/*:title) let $xl:=( if ($x/*:link/@href != '') then $x/*:link/@href else () ) let $xda:=( if ($x/*:published != '') then inner-html($x/*:published) else if ($x/*:updated != '') then inner-html($x/*:updated) else () ) return <article> <a name="{$n}"></a> <h1>{$xt}</h1> <p><strong>Date: {$xda}</strong></p> {parse-html(parse-html($xc))//body/*} <a href="{$xl}">Article</a> <a href="#toc">Back to TOC</a> </article> }</main> ) else if ($cr > 0) then ( <h1>{//channel/title}</h1>,<a name="toc"></a>,<nav><ol>{ for $x at $n in //*:item let $xt:=( if ($x/*:title != '') then inner-html($x/*:title) else inner-html($x/pubDate) ) let $xd:=inner-html($x/pubDate) return <li><a href="#{$n}">{if ($xt = $xd) then $xt else $xd,": ",$xt}</a></li> }</ol></nav>,<main>{ for $x at $n in //*:item let $xc:=( if ($x/*:description != '') then inner-html($x/*:description) else inner-html($x/*:title) ) let $xt:=( if ($x/*:title != '') then inner-html($x/*:title) else inner-html($x/pubDate) ) let $xd:=inner-html($x/pubDate) let $xl:=( if ($x/*:link != '') then inner-html($x/*:link) else () ) return <article> <a name="{$n}"></a> <h1>{if ($xt = $xd) then $xt else $xd,": ",$xt}</h1> {parse-html(parse-html($xc))//body/*} <a href="{$xl}">Article</a> <a href="#toc">Back to TOC</a> </article> }</main> ) else ( )
The doubly-nested `parse-html()` fixes a bug, seemingly.
Save this as something like feed-html.xq and run this with the command:
xidel "feed.xml" --extract-file=feed-html.xq --output-format=html
Or, to do all the feeds in a folder called xml and output them to a folder called html and create an index for them:
#!/bin/sh tohtml() { xidel "xml/$(basename "$1")" --extract-file=feed-html.xq --output-format=html > "html/$(basename "$1" .xml).html" } fsize() { ls -l "$1" | awk '{print $5}' } getlbd() { xidel "$1" --extract '( if (//*:lastBuildDate != "") then inner-html(//*:lastBuildDate) else if (//*:channel/*:pubDate) then inner-html(//*:channel/*:pubDate) else inner-html(//*:feed/*:updated) )' } gettitle() { xidel "$1" --extract '( if (//*:channel != "") then inner-html(//*:channel/*:title) else inner-html(//*:feed/*:title) )' } echo "<!DOCTYPE html>">index.html echo "<html><head><title>Feeds $(date)</title></head><body><h1>Feeds $(date)</h1><ol>">>index.html for i in $(ls -1 xml/*); do echo "$i" wcl="$(wc -l "$i"|awk '{print $1}')" echo $wcl if (test "$wcl" -gt 1 || test "$(fsize "$i")" -gt 10 ) && test "$wcl" -lt 25000; then tohtml "$i" echo "<li><a href=\"html/$(basename "$i" ".xml").html\">$(gettitle "$i") ($(getlbd "$i"))</a></li>">>index.html fi done echo "</ol></body></html>">>index.html
A very simplistic HTML file and a script that probably doesn't follow any style guides, but works.