the decision involves finding out which magnatunes artists I found through Irate and gave high ratings to their songs, how I did that is what this post is about, cool CLI hacks to get info from your Irate XML files (should help you with any XML file I suppose).
irate stores rating info in the ~/irate/trackdatabase.xml I needed to process this file and answer the question which tracks rated 10, 7 or 5 are by manatune artists
to answer this I needed to understand the structure of the file, turns out irate does not insert end of line caharacters or indent the file in anyway, so it was pretty hard to read, I needed a tool to indent XML files.
$ xmlindent ~/irate/trackdatabase.xml | head <?xml version="1.0"?> <TrackDatabase serial="663"> <User port="2278" name="alaa" password="OUCH" host="server.irateradio.org"/><AutoDownload setting="37" count="5012"/> <Player path="madplay"/> <PlayList length="49" UnratedRatio="97"/>
allright so there is a Track element and it has a url, artist and rating properties, thats all we need.
from this point on I could use grep but I decided to find out what I can do with XML cli tools, so first I asked what are the XML cli tools available on Mandrake $ urpmf --summary xml | grep -i 'command\|cli' xmlclitools:Command-line xml tools xmlstarlet:Command Line XML Toolkit
ok 2 packages only, lets install them and find out what commands they add
- urpmi xmlclitools xmlstarlet
turns out the thing was very simple, xmlgrep helps me select xml elements based on the value of their properties and xmlfmt displays only the info I need. here is a step by step
$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track ...ok this selects Track elements only; -f tells xmlgrep which file to read, the TrackDatabase.Track argument is the path to the elements we want to select (TrackDatabase being the parent of all Track elements), we could use the -g to make searches global and not bother about parents.
$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*' ...ok now the search is narrowed down to Tracks with the url attribute matching the regular expression .*magnatune.*, xmlclitools have this quirck where all periods have to be escaped so the previous regular expression has to be written as '\.*magnatune\.*'.
property~=regexp matches regular expressions, property=value matches exact value and property!=value selects items that don't match the value.
$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*':rating~='10|7|5\.0' ...this narrows down the search further to only tracks with ratings 10,7 or 5, note how I had to escape the period again.
you can search for any combiniations of attributes, I'm not sure if we can do OR searches instead of the default AND search though.
$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*':rating~='10|7|5\.0' | xmlfmt Track:artist ... Seismic Anamoly Seismic Anamoly Human Response Barbara Leoni Seismic Anamoly Thursday Group Jade Leary Curandero Falik ...finally we pipe it all to xmlfmt and ask it to only show us the artist property of Track elements.
notice that some artists are repeated because I rated multiple tracks for them, its easy to remedy that $ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*':rating~='10|7|5\.0' | xmlfmt Track:artist | sort | uniq Artemis Barbara Leoni Beth Quist C. Layne Curandero Drop Trio Ed Martin Falik Four Stones.Net Human Response Jacob Heringman and Catherine Jade Leary Jay Kishor Jeff Wahl Kyiv Chamber Choir Norine Braun Paul Berget Rapoon Seismic Anamoly Solace SoulPrint Stellamara The Napoleon Blown Aparts Thursday Group Tim Rayborn Tom Paul touchingGrace Version Very Large Array
good ol' GnuTextUtils to the rescue.
finally maybe I want to see the rating in front of each artist $ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*':rating~='10|7|5\.0' | xmlfmt Track:artist Track:rating | sort Artemis 10.0 Barbara Leoni 7.0 Beth Quist 10.0 Beth Quist 7.0 C. Layne 5.0 Curandero 7.0 Curandero 7.0 Drop Trio 10.0 Ed Martin 5.0 Falik 7.0 ...
so irate, xmlgrep, xmlfmt, sort, uniq, sed, awk, grep, wget and all the usual suspects are my bleeding edge music purchase decision support system.
Comments
add to your todo list
but hey, i thought bash scripting and xml files don't mix. very cool.
pipe the last command to uniq -f 1
not a good idea
since in Irate you rate by track and not artist, this is not a good idea, you want to look at all the rating you gave to this artist.
however something to tell you maximum, minimum and avergae rating would be nice.
Alaa
"i`m feeling for the 2nd time like alice in wonderland reading el wafd"