Skip to content

A backend processor for Zonemaster, using CouchDB replication for distributing the workload

Notifications You must be signed in to change notification settings

cdybedahl/zonemaster-distributed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction
============

This module implements a daemon that monitors a CouchDB database and runs Zonemaster tests when requests appear. It has a ceiling on how many tests that run concurrently, and daemon processes on several machines can split a request queue among them via CouchDB replication.

The daemon is started (and its other functions are accessed) via the `zonemaster-distributed` script. For details on how to use it, see its own POD documentation.

Requirements
============

* Perl 5.14.2 or later, with some modules
    * Zonemaster (and all of its prerequisites)
    * IPC::Run
    * Daemon::Control
    * Log::Log4perl
* Enough permissions on a CouchDB server to create a database and optionally install replications.

Requests
========

Requests to perform Zonemaster tests are done by adding documents to the CouchDB database that the daemon monitors. In the minimal case, the document needs to have only two keys (apart from those required by CouchDB itself): `name` and `request`. The value of the first is a UTF-8 string with the name of the domain to be tested (the daemon will convert to IDNA before passing the name on to Zonemaster, if needed). The value of the second holds optional parameters for the test, such as fake NS and DS records.

An example of a minimal request looks like this:

```JSON
{
    "name": "räksmörgås.se",
    "request": {}
}
```

A request with all possible options set looks like this:

```JSON
 {
     "name": "räksmörgås.se",
     "request": {
         "ds": [
             {
                 "keytag": 4711,
                 "algorithm": 5,
                 "type": 2,
                 "digest": "40079DDF8D09E7F10BB248A69B6630478A28EF969DDE399F95BC3B39F8CBACD7"
             },
             {
                 "keytag": 4711,
                 "algorithm": 5,
                 "type": 1,
                 "digest": "EF5D421412A5EAF1230071AFFD4F585E3B2B1A60"
             }             
         ],
         "ns": {
             "ns.nic.se.":  ["212.247.7.228", "2a00:801:f0:53::53"],
             "ns3.nic.se.": ["212.247.8.152", "2a00:801:f0:211::152"]
         },
         "ipv4": true,
         "ipv6": true,
     }
 }
```

It is likely that more options will be added in the future, for example to specify specific test modules to run or a language to translate messages to.

Responses
=========

When the daemon starts processing a request, it adds a third top-level key called `results`. The value of this key is an _array_, the values of which are hashes with test results. Ordinarily there will be only one, but in the event of certain replication conflicts there might be more. Each result hash will look like this shortened example.

```JSON
{
    "alabel": "xn--hgskoleverket-imb.se",
    "ulabel": "högskoleverket.se",
    "nodename": "scanner.example.org",
    "end_time": 1409047357.08075,
    "start_time": 1409047319.49044,
    "messages": [
        {
            "args": {
                "version": "v0.0.1",
                "module": "Zonemaster::Test::Basic"
            },
            "level": "DEBUG",
            "module": "SYSTEM",
            "tag": "MODULE_VERSION",
            "timestamp": 0.00281381607055664
        },
        {
            "module": "SYSTEM",
            "timestamp": 0.00579690933227539,
            "tag": "DEPENDENCY_VERSION",
            "args": {
                "version": 0.66,
                "name": "Net::LDNS"
            },
            "level": "DEBUG"
        },
        {
            "args": {
                "retry": 900,
                "required_retry": 3600
            },
            "level": "WARNING",
            "timestamp": 37.181578874588,
            "tag": "RETRY_MINIMUM_VALUE_LOWER",
            "module": "ZONE"
        }
    ]
}
```

`alabel` is the IDNA-converted version of the requested name. `ulabel` is the request name without conversion. In the case of a pure-ASCII name, they will be the same. `nodename` is the name of the node that ran (or is running) the test. `start_time` is the time at which the daemon started the scan. The time is ordinary Unix time_t plus sub-second resolution. `end_time` is the time at which the scan ended. `messages` is an array of log messages from Zonemaster.

While `start_time` exists but not `end_time`, the scan is in progress.

Replication
===========

By using the `--peer` flag to the control script, replication documents can be added to the primary CouchDB instance. Two replications will be added for each remote CouchDB instance, one pushing local changes and one pulling remote changes. Any changes on either end will therefore quickly reach the other.

When picking an entry to start scanning, the daemon asks CouchDB for the ten first documents that do not have a `results` key. It will pick one of them at random, and start running it. When it is fully started and the document written back to the database with a `results` key added, the daemon will wait for between 0.09 and 0.11 seconds before repeating the procedure. This whole process is intended to reduce the probability that two nodes will both start running the same test.

If they _do_ start the same test, both of them will write versions of the document to their primary CouchDB. When replicated that will cause a conflict. CouchDB guarantees that when a conflicted document is requested with ID only (without a specific revision), it is deterministic which of the conflicting versions will be returned. The daemon will therefore repeatedly fetch the documents for all tests it's currently running, and if the nodename registered in the first hash in the `results` array is not its own, it will kill the process running the test. Thus only one node will run its Zonemaster scan to completion. Any others should terminate within a couple of seconds.

In the case that communication between nodes is lost after request documents have been added but before they're picked up by the daemons, it is possible that more than one node will run scans to completion. When communication and replication is restored, this will result in two conflicting _complete_ result documents. It is the intention that those be resolved by merging the results, so the `results` array gets more than one member. This conflict-resolution functionality is not yet implemented.

About

A backend processor for Zonemaster, using CouchDB replication for distributing the workload

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published