The Update Process
The purpose of the updater is to securely pull the authentication repository and specified target repositories. Authentication repositories are git repositories which contain metadata files defined by TUF (The Update Framework). and target files required by the Open Law Collections framework. More details, take a look at the specification document.
Calling the updater
To invoke the updater, install the package and call the following command:
olc repo update auth_repo_url filesystem_path --clients-root-dir root-dir --scripts-root-dir scripts-root-dir --error-if-unauthenticated
Names of the repositories and the root directory
Only remote authentication's repository url and its filesystem path need to be specified when calling this command. If the
authentication repository and the target repositories are in the same root directory, locations of the target repositories will correctly be calculated based on the authentication repository's
path with no further input. If that is not the case, it is necessary to redefine this default value using the --clients-root-dir
option.
Names of target repositories (as defined in repositories.json
) are appended to the root
path (think of it as library root) thus defining the location of each target repository. If names of target repositories
are namespace/repo1
, namespace/repo2
etc. (the names have to be in the namespace/repo_name
format and the root directory is E:\\root
, paths of the target
repositories will be calculated as E:\\root\\namespace\\repo1
, E:\\root\\namespace\\root2
etc.
If the authentication repository's path is, say E:\\root\\namespace\\auth_repo
, it will be assumed that its name is namespace/auth_repo
and that the root directory is E:\\root
.
Dependencies
As described in the specification, one authentication repository can reference other authentication repositories.
This is defined using a special target file called dependencies.json
. These repositories will be cloned inside
the same directory as the top authentication repository and its targets. So, if the top authentication repository's (which contains dependecies.json
) path is E:\\root\top-namespace\\auth_repo
and names of other repositories in dependencies.json
are set as namespace1\auth_repo
and namespace2\auth_repo
, these authentication repositories will ne located at E:\\root\namespace1\auth_repo
and E:\\root\namespace2\auth_repo
.
error-if-unauthenticated
This flags raises an error if the repository allows unauthenticated commits and the updater detected authenticated commits newer than local head commit. Whether a repository allows unauthenticated commits or not is specified in repositories.json
. If unauthenticated commits are allowed, the repository can have commits in-between two authenticated commits. It will still be checked if all authenticated commits exist and are in the right order.
Hooks
Every authentication repository can contain target files inside targets/scripts
folder which are expected to be Python scripts which will be executed after successful/failed update of that repository. Scripts can also be defined on a host level - will be executed after update of all repositories belonging to that host.
If a repository was successfully pulled and updated, changed
, succeeded
and
completed
handlers will be called. If there were no new changes, unchanged
,
succeeded
and completed
will be executed. If the update failed, failed
and
completed
handlers will be invoked. Scripts are linked to the mentioned events by being
put into a folder of the corresponding name in side targets/scripts
. Each folder can
contain an arbitrary number of scripts and they will be called in alphabetical order.
Here is a sketch of the scriprs
folder:
/scripts
/repo
/succeeded - every time a repo is successfully pulled
/changed
/unchanged
/failed - every time a repo is not successfully pulled
/completed - like finally (called in both cases)
/host
/succeeded - once for each host, after host's repositories have been successfully pulled
/changed
/unchanged
/failed - if one repository failed
/completed - like finally (called in both cases)
Each script is expected to return a json containing persistent and transient data. Persistent data will automatically be saved to a file called persistent.json
after every execution and passed to the next script, while the transient data will be passed to the next script without being stored anywhere. In addition to transient and persistent data, scripts receive information about repositories (both the auth repo and its target repositories), as well as about the update.
For more information about the data which is passed to the scripts, take a look at their json schemas and the corresponding documentation.
Scripts root dir
While writing the scripts, it is hard to expect that everything will work on the first try. Since scripts are target files, they
cannot be added to an authentication repository without being signed (meaning that the corresponding target files should be
updated and committed. To avoid having to go through that process while still in the development phase, a the scripts can be read any directory on the filesystem by passing the path to it
using the --scripts-root-dir
option. This is only possible if development mode is turned on, which is currently the case. Further work will focus on making turning development mode on an off without having to modify the code.
Inside the scripts root directory, the framework expects to find the same directory structure:
scripts-root-dir
- namespace
- auth-repo-name
- repo
- changed
- unchanged
...
Script example
import sys
import json
def process_stdin():
return sys.stdin.read()
def do_something(data):
transient = data["state"]["transient"]
persistent = data["state"]["persistent"]
transient.update({"script1": {"namespace/law": "this is transient"}})
persistent.update({"script1": {"namespace/law": "this is persistent"}})
return {
"transient": transient,
"persistent": persistent
}
def send_state(state):
# printed data will be sent from the script back to the updater
print(json.dumps(do_something(data)
if __name__ == '__main__':
data = process_stdin()
data = json.loads(data)
state = do_something(data)
send_state(state)