Generating download lists

Ever wanted to download a huge number of files that have enumerated filenames like foo1.png foo2.png foo3.png etc?

Well, Alaa wrote two nice little tools to help you do so. That was back in 2002.


Compiling

First to compile them


g++ -o series series.cc
g++ -o nseries nseries.cc


Series

Introduction

series simply takes some string like foo then generates a list like


 foo01
 foo02
 foo03
 .....
 .....
 foo13

then adds a string to each item in the list like .png so the end product would be


 foo01.png
 foo02.png
 foo03.png
 .........
 foo13.png

To generate the previous list you type:


./series 1 13 2 '.png' 'foo'

To sum things up, the syntax for series is:


./series start end pad postfix prefix

No prefix, or no postfix

You can have blank prefix or postfix by putting '' in their places
note it's 2 of (') and it's not (")
For example, to generate a list from 000 to 999 you just type


./series 0 999 3 '' ''

Multiple prefixes

series can generate one list for many prefixes in one command the syntax would become:


./series start end pad postfix prefix1 prefix2 prefix3 etc

For example, to generate the following list


foo1
foo2
foo3
foo4
fubar1
fubar2
fubar3
fubar4

You type:


./series 1 4 1 '' 'foo' 'fubar'

Nested lists

series has at least 4 arguments. The start, end, pad and postfix.
The fifth argument is the prefix(es), it is either:

  1. Expilicitly specified.
  2. Left empty. In this case it will ask/wait for you to input a prefix, then it'll generate a list for this prefix and then ask/wait for another one and so on until you press Ctrl-D.
  3. Pipelined. This can be used to redirect the program's output as its input and so accounts for the ability of this tool to generate complex nested lists.

Suppose you want to generate the following list


601
602
603
604
701
702
703
704
801
802
803
804

This is done by typing


./series 6 8 1 '' ''| ./series 1 4 2 ''


Nseries

nseries adds an incrementing number to each input, the inputs are either piped in or entered one by one.

nseries' syntax is:


./nseries start pad postfix

so if you want to generate the following list


a01.png
b02.png
c03.png
d04.png
.
.

you type


./nseries 1 2 '.png'

and then press Enter, nseries will ask for the first input, you type 'a' then press Enter, then 'b', 'c', 'd' and so on.


or you can have a file named "alphabets" containing the following


a
b
c
d
.
.

and you type


cat alphabets | nseries 1 2 '.png'

Note how different nseries is from series.
If series is given multiple input lines it generates a list from "start" to "end" for each input line.
If nseries is given multiple input lines it increments on "start" on each input line

Garfield for all of us

Now as a practical example, we use series and nseries to download all garfield comic strips since 1978. The first strip can be found here:

Notice that the format is foo/year1/year2_month_day.gif

where year2 is the last two digits in year1

cat years | ./series 1 12 2 '' | ./series 1 31 2 '.gif' >> all_garfield

Now all_garfield has the urls of all garfield comic strips. Remove the dates before 19/6/1978

To download with wget


wget -i all_garfield

to pause press Ctrl-C
to resume type:


wget -nc -i all_garfield


Recitation of the whole Quran.

Now, to download Yasser Salama's recitation of the Quran from Islamway.com

Now yasser_list has the urls of all the Suras.

To download with wget


wget -i yasser_list

to pause press Ctrl-C
to resume type:


wget -nc -i yasser_list

Just note that when killing wget with Ctrl-C and then resuming the list, the partially downloaded file (the one that you pressed Ctrl-C while being downloaded) will be assumed to be fully downloaded and wget will start downloading the next file on the list.