Xidel
XPath
Easiest method is XPath:
xidel --xpath "//guid" a.xml
For dealing with Atom namespaces:
xidel --xpath "//*:id" a.atom.xml
(Since Atom doesn't register a name for its namespace).
XQuery
May also want to use XQuery though.
Extract Data from OPML and Write to a TSV
Uses a single-line XQuery statement to output alternate lines of the Feed Name and Feed URL, then uses awk (see power-of-awk) to turn them into a TSV.
xidel feeds.opml --extract 'for $x in //*:outline return $x/@text | $x/@xmlUrl'|awk 'NR % 2 == 0 {printf "%s\t" $0}; NR % 2 == 1 {printf "%s\n" $0}' >feeds.tsv
XQuery Script to Turn RSS or Atom Feed into an HTML File
let $c := count(//*:entry)
let $cr := count(//*:item)
return
if ($c > 0)
then
(
<a name="toc"></a>,<nav><ol>{
for $x at $n in //*:entry
let $xt:=inner-html($x/*:title)
let $xda:=(
if ($x/*:published != '')
then inner-html($x/*:published)
else if ($x/*:updated != '')
then inner-html($x/*:updated)
else ()
)
order by $xda descending
return
<li><a href="#{$n}">{$xt}</a></li>
}</ol></nav>,<main>{
for $x at $n in //*:entry
let $xc:=(
if ($x/*:content != '')
then inner-html($x/*:content)
else if ($x/*:summary != '')
then inner-html($x/*:summary)
else ()
)
let $xt:=inner-html($x/*:title)
let $xl:=(
if ($x/*:link/@href != '')
then $x/*:link/@href
else ()
)
let $xda:=(
if ($x/*:published != '')
then inner-html($x/*:published)
else if ($x/*:updated != '')
then inner-html($x/*:updated)
else ()
)
return
<article>
<a name="{$n}"></a>
<h1>{$xt}</h1>
<p><strong>Date: {$xda}</strong></p>
{parse-html(parse-html($xc))//body/*}
<a href="{$xl}">Article</a>
<a href="#toc">Back to TOC</a>
</article>
}</main>
)
else if ($cr > 0)
then
(
<h1>{//channel/title}</h1>,<a name="toc"></a>,<nav><ol>{
for $x at $n in //*:item
let $xt:=(
if ($x/*:title != '')
then inner-html($x/*:title)
else inner-html($x/pubDate)
)
let $xd:=inner-html($x/pubDate)
return
<li><a href="#{$n}">{if ($xt = $xd) then $xt else $xd,": ",$xt}</a></li>
}</ol></nav>,<main>{
for $x at $n in //*:item
let $xc:=(
if ($x/*:description != '')
then inner-html($x/*:description)
else inner-html($x/*:title)
)
let $xt:=(
if ($x/*:title != '')
then inner-html($x/*:title)
else inner-html($x/pubDate)
)
let $xd:=inner-html($x/pubDate)
let $xl:=(
if ($x/*:link != '')
then inner-html($x/*:link)
else ()
)
return
<article>
<a name="{$n}"></a>
<h1>{if ($xt = $xd) then $xt else $xd,": ",$xt}</h1>
{parse-html(parse-html($xc))//body/*}
<a href="{$xl}">Article</a>
<a href="#toc">Back to TOC</a>
</article>
}</main>
)
else
(
)
The doubly-nested `parse-html()` fixes a bug, seemingly.
Save this as something like feed-html.xq and run this with the command:
xidel "feed.xml" --extract-file=feed-html.xq --output-format=html
Or, to do all the feeds in a folder called xml and output them to a folder called html and create an index for them:
#!/bin/sh
tohtml() {
xidel "xml/$(basename "$1")" --extract-file=feed-html.xq --output-format=html > "html/$(basename "$1" .xml).html"
}
fsize() {
ls -l "$1" | awk '{print $5}'
}
getlbd() {
xidel "$1" --extract '(
if (//*:lastBuildDate != "")
then inner-html(//*:lastBuildDate)
else if (//*:channel/*:pubDate)
then inner-html(//*:channel/*:pubDate)
else inner-html(//*:feed/*:updated)
)'
}
gettitle() {
xidel "$1" --extract '(
if (//*:channel != "")
then inner-html(//*:channel/*:title)
else inner-html(//*:feed/*:title)
)'
}
echo "<!DOCTYPE html>">index.html
echo "<html><head><title>Feeds $(date)</title></head><body><h1>Feeds $(date)</h1><ol>">>index.html
for i in $(ls -1 xml/*); do
echo "$i"
wcl="$(wc -l "$i"|awk '{print $1}')"
echo $wcl
if (test "$wcl" -gt 1 || test "$(fsize "$i")" -gt 10 ) && test "$wcl" -lt 25000; then
tohtml "$i"
echo "<li><a href=\"html/$(basename "$i" ".xml").html\">$(gettitle "$i") ($(getlbd "$i"))</a></li>">>index.html
fi
done
echo "</ol></body></html>">>index.html
A very simplistic HTML file and a script that probably doesn't follow any style guides, but works.