#########################
Byzantine dataset servers
#########################

*******
Summary
*******

DAP servers use dataset servers to download and pre-filter requested
datasets.

Dataset servers `may` (i.e. `will`) fail to properly transmit
data (memory or disk corruption, malicious user, etc...).

This document describes a way to protect PyCTOH from byzantine dataset
servers.

*********
Rationale
*********

.. graphviz::

   digraph datastream {
      rankdir=LR;
      "Store" -> "DatasetServer" -> "DAP Server";
   }

Storage servers provide intrinsic error checking. Dataset servers can
trust them without alleviating data integrity : if an undetected error
occurs while retrieving data, it will be detected by DAP server at the
next stage.

The critical part along the path from data store to DAP Server is
dataset server to DAP server transmission: dataserver may be on an
untrusted host, and return faked, though valid, data.

A byzantine server detection system should validate data received from
untrusted hosts and be able to score them with some "trust-level".

******
Design
******

Dataset requests are deterministic: whatever server we request,
response should be the same, bit-per-bit.

When DAP server requests for a dataset, it may [*]_ ask to *another*
dataset server to perform the same request and to return just a
checksum of the response [*]_.

DAP server checksums dataset received from first server and compares
this to the value returned by the second one.

If values don't match, both servers are tagged with a byzantine
warning flag, and the same request is re-issued to some other
hosts. When we have a validated response, we can unflag original hosts
which gave valid response.

.. rubric:: Footnotes

.. [*] Whether to perform a byzantine check can be determined by some
       ``byzantine_check_rate`` config value.

.. [*] This saves bandwith. A strong checksum is used so that a
       malicious server can't forge a fake response with a correct
       checksum. md5 is a good choice.

**************
Implementation
**************