Licencing policy for KM3NeT open science components

Why care about licenses?

Publishing FAIR data

Publishing data and source code to ensure reproducibility of scientific results is not only a matter of professional courtesy and an adherence to the call to return value to civil society for publicly funded endeavors, but also a necessity to ensure scientific diligence.

The FAIR principles governing open data publication list the licensing of published resources like data, software and supplementary material as one of the key principles (see go-fair.org). This ensures legal security for the user of the resources, and is therefore considered an essential part of the “reusablility”-requirement of FAIR data 1. These licenses should be machine-readable and be distributed with the products to allow easy use.

Open Science initiatives

Open Science including open access publications, open software and open data increasingly becomes a key elements in funding schemes. Developing the underlying policy schemes is therefore a development necessity. Beyond politics, open science (and the according licensing) can also promote or demote the involvement of external scientists, developers and civil society in the development and education. Therefore, using widely recognized licensing schemes also reduces barriers of insecurity in the reuse of publications.

Genesis of this document

The considerations here also draw on discussions with members of CTA and ASTRON, and relate to documents provided and discussions and held within the ESCAPE project and the IVOA. General guidelines are listed in the bibliography.

Core concepts

Legal concepts can vary due to the different legal and societal traditions in the various countries, although international copyright issues are successively standardized through international contracts and organizations 2. Note, however, that the following definitions follow an EU-based conceptualization and heavily draw on 3 as an exemplary point of view.

Open Science / Open Software Licensing

Licensing domains

If no license is provided, a work is generally protected by copyright. In order to lighten the restriction implied by the copyright on reuse and modification, the “open” environment necessitates licensing in various domains, which are customarily denoted as

  • OER: Open Educational Resources for education material

  • FOSS: Free and Open Source Software

  • Open Content (cultural content) and Open Access (scientific content)

License types

License templates are provided by various organisations, the texts of which are published as open content themselves, allowing free use of the texts.

  • Open Source Licenses for software: e.g. Free Software Foundation (General Public License, GPL)

  • Open Content Licenses: e.g. Creative Commons

  • Open Data Licenses for databases: Open Knowledge Foundation (e.g. ODbL).

  • Public domain declarations: used to mark work in the public domain, e.g. applicable to raw data (not a license in the strict sense).

Licenses should be used according to their respective domain and can only be issued by the holder of the copyright. Note that although many of the licenses by the above organisations are provided as “international” licenses, they might in some terms contradict some national laws, in which case the applicable national law takes priority.

Open licenses in general allow free dissemination, duplication and reproduction of the product. For Open Content, commercial use can be prohibited, while FOSS licensing generally allows commercialization. Note that this means that “open” content does not necessarily have to be “free”: If commercialization is allowed, it is e.g. possible to run open software on a server and require payment for the use of the service.

User duties

Licenses can include duties for users to define conditions under which the work can be used, namely

  • attribution: author or copyright holder must be named

  • copyright notice: the copyright notice must be reproduced alongside the work

  • share-alike: derivatives of the work must be published under the same license (in software called “copyleft”)

  • change documentation: changes to the work must be documented and published accordingly.

Warranty

The full exclusion of liability for the reuse of the published work (“no warranty clause”) is not binding in all jurisdictions, and therefore limited by the applicable law. However, liability for open source/open content work is under practical circumstances considered minor.

Embargo periods and revoking of license

Embargo periods before the publication of material do not affect the license per se, as the copyright is independent of the actual access to the work. Therefore, an open license does not imply a duty for the copyright holder to allow free access to the work for the wider public. Licenses can therefore already be applied to material that is still unpublished without any implications for the publishing process.

Note, however, that a work published under an open license is practically difficult to revoke, as redistribution of the work is permitted without “validating” the current status of the license with the copyright holder. Therefore, although some national jurisdictions technically allow revoking of an open license, in practical terms retracting the distribution of a work once it is “out there” is close to unachievable. Therefore, moving from a stricter license to a more open license in the course of time is easier than limiting permissions at a later stage. Discussion Issue #2. However, the process of relicensing in software is a tested practice, which however should be avoided.

Common practices

As guideline, practices in other science projects and collaborations and recommendations by relevant standard-setting authorities are summarized here.

Software

For software, licensing can quickly become difficult as the spectrum between fully copyright protected commercial software to “as free as possible” open software is large and various versions of widely used licenses exist.

Software types

In software development, e.g. the SKA software guidelines order software according to the development process, as this decides the liberty to choose the software license.

  • Off-the-shelf software is used “as-is”, in which case the license distributed with the software is outside the control of KM3NeT.

  • Derived software is developed within KM3NeT but includes software packages which come with a license. As these licenses might require redistribution, the derived software carries a mixture of licenses.

  • “bespoke” software is developed within KM3NeT without additional dependencies. Here, the license is free for choice.

Standard licenses

The choice of licenses varies in the community, including the 3 clause BSD license (SKA, CTA), the Apache 2.0 license (Astron), GNU general public license (Aladin, CDS) and others. This shows that software licensing follows no specific standards as long as they fall into the category of Open Source Licenses.

Relevant properties

Core aspects of software licenses in the open domain are related to the compatibility of various licenses, as combination of software packages for common use in derived software is common, and the protection against patent claims if the software is used (and patented) in a commercial context.

For the first aspect, the compatibility of the software license especially with the widely-used GNU license has to be considered (for a compatibility matrix, see Wikipedia), for resolving the patenting issue, which is most relevant for the US market, some licenses carry a patent claims protection clause. Discussion Issue #4

In KM3NeT

Some software in KM3NeT already carry licenses according to the choice of the programmers. They include the MIT license (km3pipe, jpp) or GNU general public license (gSeaGen). Others do not carry a license yet.

Data

Licensing of data is still in development within the community, as the necessity for licensing was not recognized for quite some time (and indeed does not exist for “basic” data sets, see above). For data sets and derivatives, the GAVO-DaCHS software e.g. allows for Creative Commons 0, with attribution or attribution/share-alike as standard values. Reference developments within the Research Data Alliance also indicate the possible use of Creative Commons.

In 4, Open Data Commons is introduced in addition to the Creative Commons licensing scheme (ODC-By, ODC-ODbL). The licenses provided here are more specific to data bases, and copyleft clauses more flexible for the reuse and re-licensing. These specification might therefore be advantageous for databases. However, the Open Data Commons licenses only cover the sui generis database rights and arrangement of the data. For the data offered in the database, a separate content license by Open Data Commons, or a Creative Commons 4.0 license can be applied, as the latter also specifically applies to data.

Texts, graphs and supplementary material

Creative Commons was designed for the use on text, graphics and also audio-visual media. It is therefore well suited to be used in this context and already finds wide application.

However, for scientific publications, restrictions on licensing might be given by the publisher. The possibility to apply an open license is therefore sometimes limited, although open access initiatives generally move towards the implementation of open schemes.

Policy and recommendations

Policy principles

Licensing in KM3NeT should, if not limited by other circumstances, follow the following principles:

1) Permissive license

Copyright licenses to products produced within the KM3NeT collaboration and for the KM3NeT collaboration should put minimal restrictions on the use of the products and should therefore be permissive.

2) Attribution

It should be ensured that the use of KM3NeT products is attributed to the creators, i.e. to the copyright holder according to the best current understanding of the legal situation and to the KM3NeT collaboration where possible. Where attribution to the KM3NeT collaboration is not possible, the association should be indicated by additional metadata provided with the product according to standards set by the collaboration.

3) No Share-Alike

As share-alike clauses might lead to compatibility issues at a later stage, they should be carefully considered and are not recommended at this point. Following principle 1 to publish “as open as possible”, no reason for placing this as general restriction can be seen at this point.

4) No Warranty

Also, liability for the use of KM3NeT products should be limited as far as the national jurisdiction allows. Licenses should therefore carry a “no warranty” clause.

5) Standard application

In addition to that, the license should be machine-readable to allow the easy distribution of the licensing information alongside the product. Standard licenses are provided within KM3NeT.

Licensing by product

As there need not to be a “one-size-fits-all” approach to licensing, licenses should be considered according to the actual use of the KM3NeT product and the related circumstances.

Software licensing

For software, the most widely used MIT License represents a compact license allowing reuse under the condition of forwarding the license to all derived software, while waiving all warranty. Although not provided by a central organization for various jurisdictions, its wide use makes it customary law in practically all jurisdictions.

As the community use does not indicate any preferred version for open software licenses, three different option are considered at this point, Apache 2.0, BSD-3 and MIT. Discussion Issue #4

Recommendation: MIT license, 3 clause BSD license or Apache 2.0 according to decision by the Software Working Group

Documentation and supplementary material

For supplementary material, the Creative Commons Attribution License is widely used and comes in all necessary flavours, as it allows all baseline rights as redistribution and modification, but under conditions like author attribution. CC International licenses 4.0 act as international licenses by incorporating clauses allowing for deviation from the license according to national regulations.

Recommendation: CC-BY 4.0 International license or higher

Note that one might consider other licensing for promotional material of KM3NeT, see Discussion Issue #3

Data and databases

Following the above principles and opting for a license that is specific for the use on data, a license by the OpenDataCommons is recommended, including attribution. Discussion Issue #1, for data, the use of CC-BY is recommended, as it is already offered as a default option within the Virtual Observatory software and widely used for datasets.

Recommendation for databases: ODC-By Recommendation for data: CC-BY 4.0 International license

Roadmap to Licensing in KM3NeT

Current status in the KM3NeT collaboration

Currently, various licenses have been applied to KM3NeT publications, e.g. on Zenodo, software package, e.g. on Git, and supplementary material on webpages and social media. In addition to that, also the quote of the copyright holder varies widely, from individuals to institutes or to “the KM3NeT collaboration”.

Starting with licenses

  • The use of standard licenses and the citation of the copyright holder should be agreed upon by the Institute Board of KM3NeT

  • Short guidelines how to apply the chosen licenses will be made available

  • For future approved publications of research results, data and software, authors should be encouraged to apply the licenses.

  • Where relicensing is appropriate or no licenses has yet been allocated to the research product, authors are encouraged to also apply the licenses retroactively.

Attribution

According to the above outlined considerations to creators in the KM3NeT context, the copyright notices should cite the actual copyright holder, which are in most cases the contracting institutes rather than the creator/researcher, or various institutes for joint efforts, or the full list of KM3NeT members and institutes regarding such high-level products into which the full collaborative effort was invested.

Note that the copyright holder might in most cases not agree with the author and / or contributors, as dedicated material to KM3NeT is mostly governed by the employment status of the creator to their respective institutes. For practical reasons, however, it is recommended to attribute the copyright to “the KM3NeT collaboration” where possible, and link these as matter of completeness to the full list of authors and member institutes at the time of licensing. In addition to that, (primary) authors and contibutors should be listed separately from the license.

For the full list of authors and institutes, a file for each year should be made available citing the current author list which is linked to the license, and is (as first installation) available here.

It is also recommended that licensing to the KM3NeT Collaboration follows internal reviewing processes, e.g. approval by the Publication Committee for open access material and an internal reviewing process for software and data by the yet needed open science committee. Discussion Issue #5

1

Indicator RDA-R1.1, in RDA (2020): FAIR Data Maturity Model: specification and guidelines, DOI: 10.15497/rda00045

2

e.g. by WTO regulations (TRIPS agreement, Berne Convention) to regulate international copyright

3

Kreutzer, T., Lahmann, H., Rechtsfragen bei Open Science, 2019, https://dx.doi.org/10.15460/HUP.195

4

Ball, A. (2014). ‘How to License Research Data’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/how-guides

5

Including a Commission Recommendation on the management of intellectual property.