KM3NeT Open Science Portal

Manuals for KM3NeT members concern mostly two parts: On the one hand how to provide the data and to publish them, on the other hand how to operate, maintain and contribute to the development of the open science servers and pipelines.

How to Develop Software

KM3NeT defines guidelines and recommendations for software development which help to maintain a consistent project structure and development process.

Continuous Integration

Each project on Git has to implement at least a basic continuous integration (CI) routine based on GitLab CI, which is stored in a single YAML file in the root of the project’s repository. This file contains workflows to automate a set of jobs whenever the project is updated, some of them are mandatory, others are optional. The mandatory CI jobs to automate are:

Compilation: the most basic requirement is that the software actually compiles on at least one of the target systems. In case of KM3NeT, the main target system as of writing this document is Scientific Linux 7, which is used on most of the HPC environments.
Running the test suite: the test suite consists of testing routines which target specific parts of the software and make sure that they are working as expected. Tests are categorised into different levels: unit-tests (function-level), integration tests (interface between components), system tests (complete integration of the software) and acceptance tests (define the readiness of a product)
Installation: the actual installation routine as described in the user guide
Documentation: generating the documentation including the API description

Optional but highly recommended CI jobs to automate are:

Benchmarks: to measure the actual performance and potential regressions
Dynamically created Tutorials: comprehensive guidelines which describe how perform specific tasks with the software. The code examples must not be static but dynamically executed. This makes sure that the tutorial is up-to-date and can be executed flawlessly by the user.
Publishing: this includes the creation of Docker and Singularity images, as well as other publishing routines like e.g. uploading a release to PyPI

Development Process

The main development branch is the master branch on Git and is meant to refer to the latest stable version of the software. New features, experimental implementations and bug fixing is done on separate branches and eventually merged back to master after a code review and approvals of at least two additional developers. The CI makes sure that the software is working as expected and also indicates when the code coverage – the fraction of untested parts of the software – is decreased upon a merge. A separate job in the pipeline is set up to check the coding style automatically and ensures a common style among the whole project.

To keep track of important additions, changes, bug fixes and deprecations, a CHANGELOG is available and updated accordingly. This files serves as an overview for the users to keep track of the development process without spending too much time to browse through closed issues and merge requests.

The versioning follow the Semantic Versioning 2.0.0 (https://semver.org) conventions which indicates possible complications in up- or downgrading. The version number consists of three parts in form of MAJOR.MINOR.PATCH, where MAJOR is increased on incompatible API changes, MINOR when new functionality is added in a backwards compatible manner and PATCH for anything else, like bug fixes or cosmetics etc.

Python Project Template

The most commonly used programming language in KM3NeT is Python. A Python project template has been defined by KM3NeT and is based on the cookiecutter (https://cookiecutter.readthedocs.io/) template framework. The template is publicly available under https://git.km3net.de/templates/python-project and is specifically designed to fit the KM3NeT GitLab CI environment. It includes a skeleton Python project which will be populated with the meta information obtained during the template creation process (project name, description, authors, Git project URL, etc.) and features out-of-the-box

a clear project folder structure
a testing suite
automatic documentation generation
API documentation
PyPI compliant setup
a Makefile with commonly used routines (running the test suite, checking the code-style, creating a Python virtual environment…).

Providing data

KM3NeT defines guidelines and recommendations for physics analysis and public plots requirements. Templates are used to reenforce KM3NeT guidlines for physics analysis and public plots production, to ensure analysis reproducibility, consistency in analysis/plots archiving and documentation. In the following, we describe in details KM3NeT public plots templates and KM3NeT analysis templates.

Public plot template

The Public plot template is a template to automatically create and populate a KM3NeT public plot template with all the meta information needed to archive the plot on KM3NeT gitlab with the corresponding documentation and analysis repository. In other words, the public plot template is a ready-to-use CI to create KM3NeT public plot(s).

Motivation

Standardize the archiving of public plots
Easily track the history of a public plot
A ready-to-use CI to create public plot(s)
Ensure reproducibility and flexibility (easy to annotate, ability to hide graphs, change look and aspect ratio, colours, fonts etc)
Store and provide high-level data (with the corresponding units and explanation) to reuse the plot and easily modify the plot generation for customized use in conference presentations, publications, outreach events, posters etc.)
Easily share results for collaborative work and easy integration the data in common workflows, also for sharing with other experiments
Ensure consistency in public plots archiving

Template project structure and use

A public plot template has been defined by KM3NeT and is based on the cookiecutter https://cookiecutter.readthedocs.io/ template framework. The template is publicly available under https://git.km3net.de/templates/km3net-public-plots . The template is specifically designed to fit the KM3NeT GitLab CI environment. The template is automatically populated with the meta information obtained during the template creation process (plot version, description, analysis repository, authors, etc).

A demonstration on how a public plot repository looks like on git is available in https://git.km3net.de/templates/km3net-public-plots-demo/
A step by step guide to produce this demonstration is here: https://git.km3net.de/templates/km3net-public-plots/-/blob/master/README.md
The official public plots which have been approved and produced with the public plot template are made available to the public in a designated namespace.
The public plot template contains the following files/directories:
- LICENSE: a file containing the official KM3NeT license for public plots.
- Makefile: a Makefile with commands like make dependencies and make plot.
- README: a file pupulated with meta data (authors, contributors, analysis repository etc).
- data: data folder to store high level data to produce the plot.
- doc: a documentation folder to document the public plot following KM3NeT official data classes.
- plots: a folder to store public plots.
- src: a folder to store a script to produce the public plot(s).
- requirements: a text file for software environment.
- CHANGELOG: a file to keep track of the history (versions) of the public plot.

This setup allows extremly easy rebuild of the plot. To reproduce the public plot, the steps are simplified to:

cloning the public plot project.
typing make dependencies.
typing make plot.

Analysis template

The Analysis template is a template to automatically create and populate a KM3NeT analysis with all the meta information needed to understand, reproduce and archive an analysis on KM3NeT gitlab server.

Motivation

In addition to the motivation already described in the public plot template, analysis-specific aspects of the template generation are:

Tracking changes/operations performed on data (with the corresponding units and explanation) to understand and reproduce the analysis and processing steps.
Allow for a standardized review processed for the publication of analysis-related high level data through common structures in the storage of workflow-related information
Link an analysis repository to a KM3NeT publication and facilitate the data publication process.
Ensure consistency in analysis archiving

The analysis template

An analysis template has been defined and is based on the cookiecutter template framework. The template is publicly available under https://git.km3net.de/templates/km3net-analysis-template.
The analysis template contains the following files/directories:
- LICENSE: a file containing the official KM3NeT license for analysis.
- Makefile: a Makefile with commands like make dependencies and make analysis.
- README: a file pupulated with meta data (authors, contributors, etc).
- data: data folder to store data (and or processed data) to (re)produce the analysis.
- docs: a documentation folder to document the analysis following KM3NeT official data classes.
- plots: a folder to store analysis plots.
- src: a folder to store any source code for the analysis.
- scripts: a folder to store scripts to reproduce the analysis and the plots.
- notebooks: a folder to store jupyter notebooks.
- requirements: a text file for software environment.
- CHANGELOG: a file to keep track of the history (versions) of the analysis.

Again, this simplifies (in a case where no external restricted dependencies and processing are involved) the execution of the main steps to

cloning the analysis repository
typing make dependencies
typing make analysis