Wednesday, February 22, 2017

2017-02-22: Archive Now (archivenow): A Python Library to Integrate On-Demand Archives

Examples: Archive Now (archivenow) CLI
A small part of my research is to ensure that certain web pages are preserved in public web archives to hopefully be available and retrievable whenever needed at any time in the future. As archivists believe that "lots of copies keep stuff safe", I have created a Python library (Archive Now) to push web resources into several on-demand archives, such as The Internet Archive, WebCite,, and For any reason, one archive stops serving temporarily or permanently, it is likely that copies can be fetched from other archives. By Archive Now, one command like:
$ archivenow --all

is sufficient for the current CNN homepage to be captured and preserved by all configured archives in this Python library.

Archive Now allows you to accomplish the following major tasks:
  • A web page can be pushed into one archive
  • A web page can be pushed into multiple archives
  • A web page can be pushed into all archives  
  • Adding new archives
  • Removing existing archives
Install Archive Now from PyPI:
    $ pip install archivenow

To install from the source code:
    $ git clone
    $ cd archivenow
    $ pip install -r requirements.txt
    $ pip install ./

"pip", "archivenow", and "docker" may require "sudo"

Archive Now can be used through:

   1. The CLI

Usage of sub-commands in archivenow can be accessed through providing the -h or --help flag:
   $ archivenow -h
   usage: archivenow [-h][--cc][--cc_api_key [CC_API_KEY]] 

                        [--host [HOST]][--port [PORT]][URI]
   positional arguments:
     URI                   URI of a web resource
   optional arguments:
     -h, --help            show this help message and exit
     --cc                  Use The Archive
     --cc_api_key [CC_API_KEY]
                           An API KEY is required by The

     --ia                  Use The Internet Archive
     --is                  Use The
     --wc                  Use The WebCite Archive
     -v, --version         Report the version of archivenow
     --all                 Use all possible archives
     --server              Run archiveNow as a Web Service
     --host [HOST]         A server address
     --port [PORT]         A port number to run a Web Service

To archive the web page ( in the Internet Archive:

$ archivenow --ia

By default, the web page (e.g., will be saved in the Internet Archive if no optional arguments provided:

$ archivenow

To save the web page ( in the Internet Archive ( and The

$ archivenow --ia --is

To save the web page ( in all configured web archives:

$ archivenow --all --cc_api_key $Your-Perma-CC-API-Key

Run it as a Docker Container (you need to do "docker pull" first)

$ docker pull maturban/archivenow

$ docker run -it --rm maturban/archivenow -h
$ docker run -p 80:12345 -it --rm maturban/archivenow --server
$ docker run -p 80:11111 -it --rm maturban/archivenow --server --port 11111
$ docker run -it --rm maturban/archivenow --ia

   2. A Web Service

You can run archivenow as a web service. You can specify the server address and/or the port number (e.g., --host localhost --port 11111)

$ archivenow --server
  * Running on (Press CTRL+C to quit)

To save the web page ( in The Internet Archive through the web service:

$ curl -i

     HTTP/1.0 200 OK
     Content-Type: application/json
     Content-Length: 95
     Server: Werkzeug/0.11.15 Python/2.7.10
     Date: Thu, 09 Feb 2017 14:29:23 GMT

      "results": [

To save the web page ( in all configured archives though the web service:

$ curl -i

    HTTP/1.0 200 OK
    Content-Type: application/json
    Content-Length: 172
    Server: Werkzeug/0.11.15 Python/2.7.10
    Date: Thu, 09 Feb 2017 14:33:47 GMT

      "results": [
        "Error (The Archive): An API KEY is required"

you may use the API_Key as following:

$ curl -i$Your-Perma-CC-API-Key

   3. Python Usage

>>> from archivenow import archivenow

To save the web page ( in The WebCite Archive:

>>> archivenow.push("","wc")

To save the web page ( in all configured archives:

>>> archivenow.push("","all")
['','','','Error (The Archive): An API KEY is required]

To save the web page ( in The

>>> archivenow.push("","cc","cc_api_key=$Your-Perma-cc-API-KEY")

To start the server from Python do the following. The server/port number can be passed (e.g,

start(port=1111, host='localhost')):
>>> archivenow.start()

* Running on (Press CTRL+C to quit)

Configuring a new archive or removing existing one

Adding a new archive is as simple as adding a handler file in the folder "handlers". For example, if I want to add a new archive named "My Archive", I would create a file "" and store it in the folder "handlers". The "ma" will be the archive identifier, so to push a web page (e.g., to this archive through the Python code, I should write ">>>archivenow.push("","ma")". In the file "", the name of the class must be "MA_handler". This class must have at least one function called "push" which has one argument. It might be helpful to see how other "*" organized.

Removing an archive can be done by one of the following options:
  • Removing the archive handler file from the folder "handlers"
  • Rename the archive handler file to other name that does not end with ""
  • Simply, inside the handler file, set the variable "enabled" to "False" 


The Internet Archive (IA) sets a time gap of at least two minutes between creating different copies of the 'same' resource. For example, if you send a request to the IA to capture ( at 10:00pm, the IA will create a new memento (let's call it M1) of the CNN homepage. The IA will then return M1 for all requests to archive the CNN homepage received before 10:02pm. The sets this time gap to five minutes.

Updates and pull requests are welcome:

--Mohamed Aturban

1 comment:

  1. Great! I was looking for a better way to integrate with my app.