Ever wanted to download a huge number of files that have enumerated filenames like foo1.png foo2.png foo3.png etc?
Well, Alaa wrote two nice little tools to help you do so. That was back in 2002.
Compiling
First to compile them
g++ -o series series.cc g++ -o nseries nseries.cc
Series
Introduction
series simply takes some string like foo then generates a list like
foo01 foo02 foo03 ..... ..... foo13
then adds a string to each item in the list like .png so the end product would be
foo01.png foo02.png foo03.png ......... foo13.png
To generate the previous list you type:
./series 1 13 2 '.png' 'foo'
- 1 is the start of the list
- 13 is the end of the list
- 2 is the pad (minimuim number of digits)
(If the pad is 3 then the number 1 is generated as 001 while the number 23 is generated as 023)
- '.png' is the postfix (The string after the numbers)
- 'foo' is the prefix (The string before the numbers)
To sum things up, the syntax for series is:
./series start end pad postfix prefix
No prefix, or no postfix
You can have blank prefix or postfix by putting '' in their places
note it's 2 of (') and it's not (")
For example, to generate a list from 000 to 999 you just type
./series 1 999 3 '' ''
Multiple prefixes
series can generate one list for many prefixes in one command the syntax would become:
./series start end pad postfix prefix1 prefix2 prefix3 etc
For example, to generate the following list
foo1 foo2 foo3 foo4 fubar1 fubar2 fubar3 fubar4
You type:
./series 1 4 1 '' 'foo' 'fubar'
Nested lists
series has at least 4 arguments. The start, end, pad and postfix.
The fifth argument is the prefix(es), it is either:
- Expilicitly specified.
- Left empty. In this case it will ask/wait for you to input a prefix, then it'll generate a list for this prefix and then ask/wait for another one and so on until you press Ctrl-D.
- Pipelined. This can be used to redirect the program's output as its input and so accounts for the ability of this tool to generate complex nested lists.
Suppose you want to generate the following list
601 602 603 604 701 702 703 704 801 802 803 804
This is done by typing
./series 6 8 1 '' ''| ./series 1 4 2 ''
Nseries
nseries adds an incrementing number to each input, the inputs are either piped in or entered one by one.
nseries' syntax is:
./nseries start pad postfix
so if you want to generate the following list
a01.png b02.png c03.png d04.png . .
you type
./nseries 1 2 '.png'
and then press Enter, nseries will ask for the first input, you type 'a' then press Enter, then 'b', 'c', 'd' and so on.
or you can have a file named "alphabets" containing the following
a b c d . .
and you type
cat alphabets | nseries 1 2 '.png'
Note how different nseries is from series.
If series is given multiple input lines it generates a list from "start" to "end" for each input line.
If nseries is given multiple input lines it increments on "start" on each input line
Garfield for all of us
Now as a practical example, we use series and nseries to download all garfield comic strips since 1978. The first strip can be found here:
Notice that the format is foo/year1/year2_month_day.gif
where year2 is the last two digits in year1
cat years | ./series 1 12 2 '' | ./series 1 31 2 '.gif' >> all_garfieldNow all_garfield has the urls of all garfield comic strips. Remove the dates before 19/6/1978
To download with wget
wget -i all_garfield
to pause press Ctrl-C
to resume type:
wget -nc -i all_garfield
Recitation of the whole Quran.
Now, to download Yasser Salama's recitation of the Quran from Islamway.com
Now yasser_list has the urls of all the Suras.
To download with wget
wget -i yasser_list
to pause press Ctrl-C
to resume type:
wget -nc -i yasser_list
Just note that when killing wget with Ctrl-C and then resuming the list, the partially downloaded file (the one that you pressed Ctrl-C while being downloaded) will be assumed to be fully downloaded and wget will start downloading the next file on the list.
Note
Can't put both series.cc and nseries.cc on eglug.org yet.
Both get deformed when using the pre tag or the wiki syntax
Comments
nobody judge the code
this was a very quick hack to solve a problem while demonstrating the power of software as tools concept (pipes and redirects as part of the design), the code is naive.
I've used these two nifty tools thousands of times, but of course no tool was really needed, gnu seq combined with other tools can do the same, languages like awk are perfect for this, etc.
to quote from my original email, here is a cool thing you can do with series and nseries so lets say I want to to generat all three digit octal numbers and their decimal ../series 0 7 1 | ./series 0 7 1 '' | ./series 0 7 1 '=' | ./nseries 0 1 '' cool eh Alaa
"context is over-rated. who are you anyway?"
Pastebin deleted
Seems that pastebin deleted the source code of both tools that were posted there, it's probably autodeleted after several hours or so.
As noted, I couldn't put the code here. I've tried the pre and code tags and it didn't work.
reinstalled code filter
"context is over-rated. who are you anyway?"
Still borked
here are they:
What's "now"?
fixed using temporary hack
the 4.6 wiki module doesn't have this bug, the real solution is to upgrade eglug to 4.6
we have to do that soon anyways because once drupal 4.7 gets released 4.5 will not be supported anymore.
Alaa
"context is over-rated. who are you anyway?"
OMG!!!
man curl