Manuals for KM3NeT members concern mostly two parts: On the one hand how to provide the data and to publish them, on the other hand how to operate, maintain and contribute to the development of the open science servers and pipelines.
How to Develop Software
KM3NeT defines guidelines and recommendations for software development which help to maintain a consistent project structure and development process.
Continuous Integration
Each project on Git has to implement at least a basic continuous integration (CI) routine based on GitLab CI, which is stored in a single YAML file in the root of the project’s repository. This file contains workflows to automate a set of jobs whenever the project is updated, some of them are mandatory, others are optional. The mandatory CI jobs to automate are:
- Compilation: the most basic requirement is that the software actually compiles on at least one of the target systems. In case of KM3NeT, the main target system as of writing this document is Scientific Linux 7, which is used on most of the HPC environments.
- Running the test suite: the test suite consists of testing routines which target specific parts of the software and make sure that they are working as expected. Tests are categorised into different levels: unit-tests (function-level), integration tests (interface between components), system tests (complete integration of the software) and acceptance tests (define the readiness of a product)
- Installation: the actual installation routine as described in the user guide
- Documentation: generating the documentation including the API description
Optional but highly recommended CI jobs to automate are:
- Benchmarks: to measure the actual performance and potential regressions
- Dynamically created Tutorials: comprehensive guidelines which describe how perform specific tasks with the software. The code examples must not be static but dynamically executed. This makes sure that the tutorial is up-to-date and can be executed flawlessly by the user.
- Publishing: this includes the creation of Docker and Singularity images, as well as other publishing routines like e.g. uploading a release to PyPI
Development Process
The main development branch is the master
branch on Git and is meant to refer
to the latest stable version of the software. New features, experimental
implementations and bug fixing is done on separate branches and eventually
merged back to master
after a code review and approvals of at least two
additional developers. The CI makes sure that the software is working as
expected and also indicates when the code coverage – the fraction of untested
parts of the software – is decreased upon a merge.
A separate job in the pipeline is set up to check the coding style automatically
and ensures a common style among the whole project.
To keep track of important additions, changes, bug fixes and deprecations, a
CHANGELOG
is available and updated accordingly. This files serves as an
overview for the users to keep track of the development process without spending
too much time to browse through closed issues and merge requests.
The versioning follow the Semantic Versioning 2.0.0 (https://semver.org)
conventions which indicates possible complications in up- or downgrading. The
version number consists of three parts in form of MAJOR.MINOR.PATCH
, where
MAJOR
is increased on incompatible API changes, MINOR
when new functionality
is added in a backwards compatible manner and PATCH
for anything else, like
bug fixes or cosmetics etc.
Python Project Template
The most commonly used programming language in KM3NeT is Python. A Python
project template has been defined by KM3NeT and is based on the cookiecutter
(https://cookiecutter.readthedocs.io/) template framework. The template is
publicly available under https://git.km3net.de/templates/python-project and is
specifically designed to fit the KM3NeT GitLab CI environment. It includes a
skeleton Python project which will be populated with the meta information
obtained during the template creation process (project name, description,
authors, Git project URL, etc.) and features out-of-the-box
- a clear project folder structure
- a testing suite
- automatic documentation generation
- API documentation
- PyPI compliant setup
- a
Makefile
with commonly used routines (running the test suite, checking the code-style, creating a Python virtual environment…).
Providing data
KM3NeT defines guidelines and recommendations for physics analysis and public plots requirements. Templates are used to reenforce KM3NeT guidlines for physics analysis and public plots production, to ensure analysis reproducibility, consistency in analysis/plots archiving and documentation. In the following, we describe in details KM3NeT public plots templates and KM3NeT analysis templates.
Public plot template
The Public plot template is a template to automatically create and populate a KM3NeT public plot template with all the meta information needed to archive the plot on KM3NeT gitlab with the corresponding documentation and analysis repository. In other words, the public plot template is a ready-to-use CI to create KM3NeT public plot(s).
Motivation
- Standardize the archiving of public plots
- Easily track the history of a public plot
- A ready-to-use CI to create public plot(s)
- Ensure reproducibility and flexibility (easy to annotate, ability to hide graphs, change look and aspect ratio, colours, fonts etc)
- Store and provide high-level data (with the corresponding units and explanation) to reuse the plot and easily modify the plot generation for customized use in conference presentations, publications, outreach events, posters etc.)
- Easily share results for collaborative work and easy integration the data in common workflows, also for sharing with other experiments
- Ensure consistency in public plots archiving
Template project structure and use
A public plot template has been defined by KM3NeT and is based on the cookiecutter https://cookiecutter.readthedocs.io/ template framework. The template is publicly available under https://git.km3net.de/templates/km3net-public-plots . The template is specifically designed to fit the KM3NeT GitLab CI environment. The template is automatically populated with the meta information obtained during the template creation process (plot version, description, analysis repository, authors, etc).
- A demonstration on how a public plot repository looks like on git is available in https://git.km3net.de/templates/km3net-public-plots-demo/
- A step by step guide to produce this demonstration is here: https://git.km3net.de/templates/km3net-public-plots/-/blob/master/README.md
- The official public plots which have been approved and produced with the public plot template are made available to the public in a designated namespace.
- The public plot template contains the following files/directories:
- LICENSE: a file containing the official KM3NeT license for public plots.
- Makefile: a Makefile with commands like
make dependencies
andmake plot
. - README: a file pupulated with meta data (authors, contributors, analysis repository etc).
- data: data folder to store high level data to produce the plot.
- doc: a documentation folder to document the public plot following KM3NeT official data classes.
- plots: a folder to store public plots.
- src: a folder to store a script to produce the public plot(s).
- requirements: a text file for software environment.
- CHANGELOG: a file to keep track of the history (versions) of the public plot.
This setup allows extremly easy rebuild of the plot. To reproduce the public plot, the steps are simplified to:
- cloning the public plot project.
- typing
make dependencies
. - typing
make plot
.
Analysis template
The Analysis template is a template to automatically create and populate a KM3NeT analysis with all the meta information needed to understand, reproduce and archive an analysis on KM3NeT gitlab server.
Motivation
In addition to the motivation already described in the public plot template, analysis-specific aspects of the template generation are:
- Tracking changes/operations performed on data (with the corresponding units and explanation) to understand and reproduce the analysis and processing steps.
- Allow for a standardized review processed for the publication of analysis-related high level data through common structures in the storage of workflow-related information
- Link an analysis repository to a KM3NeT publication and facilitate the data publication process.
- Ensure consistency in analysis archiving
The analysis template
-
An analysis template has been defined and is based on the cookiecutter template framework. The template is publicly available under https://git.km3net.de/templates/km3net-analysis-template.
-
The analysis template contains the following files/directories:
- LICENSE: a file containing the official KM3NeT license for analysis.
- Makefile: a Makefile with commands like
make dependencies
andmake analysis
. - README: a file pupulated with meta data (authors, contributors, etc).
- data: data folder to store data (and or processed data) to (re)produce the analysis.
- docs: a documentation folder to document the analysis following KM3NeT official data classes.
- plots: a folder to store analysis plots.
- src: a folder to store any source code for the analysis.
- scripts: a folder to store scripts to reproduce the analysis and the plots.
- notebooks: a folder to store jupyter notebooks.
- requirements: a text file for software environment.
- CHANGELOG: a file to keep track of the history (versions) of the analysis.
Again, this simplifies (in a case where no external restricted dependencies and processing are involved) the execution of the main steps to
- cloning the analysis repository
- typing
make dependencies
- typing
make analysis