KM3NeT Open Science Portal

@Gitlab

KM3NeT uses a self-hosted GitLab instance as the main platform to develop and discuss software, analysis tools, papers and other private or collaborative creations. GitLab offers professional and advanced features to keep track of development history and its rich feature set allows to exchange and archive thoughts and ideas easily. The continuous integration (CI) that is part of the GitLab distribution, proves to be a powerful automation tool and is utilised to generate consistently up-to-date test reports, documentation and software releases in a transparent way. The CI pipeline is triggered every time when changes are pushed to a project. Each job runs in an isolated Docker container which makes them fully reproducible.

CI Pipelines of the km3pipe projects showing a failing test in a Python 3.6 environment

In case of test reports for example, failing tests are signalised in merge requests and prevent changes that broke them to be applied accidentally. The documentation is also built in a dedicated pipeline job and is published to the web upon successful generation. A tight integration of the documentation into the software projects is mandatory and highly improves its up-to-dateness.

The KM3NeT GitLab server is accessible to the public but only projects which are marked as global are visible to a regular visitor without a KM3NeT account. They can download the projects and all the public branches, access the issues, documentation and Wiki, they however are not allowed to collaborate, i.e. to comment or contribute in any way. To circumvent this problem, open source projects are mirrored to a yet inofficial GitHub group (https://github.com/KM3NeT) where everyone with a GitHub account is allowed to interact.

Docker

Due to the huge variety of operating systems, languages and frameworks, the number of possible system configurations has grown rapidly in the past decades. Operating-system-level virtualisation is one of the most successful techniques to tackle this problem and allows the conservation of environments, making them interoperable and reproducible in an almost system agnostic way. KM3NeT utilises Docker (https://www.docker.com) for this task, which is the most popular containerisation solution with high interoperability. Docker containers run with negligible performance overhead and create an isolated environment in a fully reproducible manner, regardless of the host system as long as Docker itself is supported (Linux, macOS and Windows). These containers are used in the GitLab CI to run test suites in many different configurations. Python based projects for example can easily be tested under different Python versions.

List of accessible docker images

ToDo: add list here!

Python environment

KM3NeT develops open source Python software for accessing and working with data taken by the detector, produced in simulations or in other analysis pipelines e.g. event reconstructions, and a number of other types like metadata, provenance history and environmental data. The software is following the Semantic Versioning 2.0 (https://semver.org) conventions and releases are automatically triggered on the GitLab CI by annotated Git tags. These releases including alpha and beta releases are uploaded to the publicly accessible Python Package Index, which is the main repository of software for the Python programming language. The installation of these packages is as simple as executing pip install PACKAGE_NAME. Additionally, the packages can also be installed directly from the GitLab repositories for example in case of experimental branches.

Preferred python packages

The general philosophy behind all Python packages is to build a bridge to commonly use open source scientific tools, libraries and frameworks. While the common base is built on NumPy, the de facto standard for scientific, numerical computing in Python, other popular packages from the SciPy stack are highly preferred. Examples are Matplotlib to create publication quality plots, Pandas which is used to work with tabular data, Astropy for astronomical calculations or numba for high-performance low level optimisations.

Preferred formats for interoperability

The output format is preferably CSV and JSON to maximise interoperability. For larger or more complex datasets, two additional formats are supported. HDF5 which is a widely used data format in science and accessible in many popular computer languages is used to store data from every tier, including uncalibrated low-level data and high-level reconstruction summaries. Additionally and mainly for astronomical data, the FITS dataformat is considered if required due to its high popularity among astronomers.

Python interface to KM3NeT data

In addition to offering services and data through the KM3NeT Open Data Center, the openkm3 python client was developed to directly use open data in python from local computer and within e.g. Jupyter notebooks. It interlinks with the ODC REST-API and allows to query metadata of the resources and collections. In additionn to that, it offers function to interpret the data according to its KM3NeT type description (ktype), e.g. returning tables in a required format. These interface options will be expanded according to the requirements of data integrated to the ODC.

In addition to that, basic functions relevant for astrophysics are offered in the km3astro package. As development of the python environment is an ongoing process, the number of packages offered for KM3NeT data interpretation will surely grow in the future.