Data guidelines - F1000Research

How to Publish

Data Guidelines

Background

1.1 Open Data Policy
1.2 Fair Data Principles

Share Your Data in 3 Steps

2.1 Prepare Your Data for Sharing
2.2 Select a Repository
2.3 Add a Data Availability Statement to Your Manuscript

1. Background

This page provides information about data you need to include when publishing an article in F1000Research, where your data can be stored, and how your data should be presented. In accordance with our data policies, authors will be required to submit their data or provide details of where their data is hosted upon submission (excepting ethical, data protection or confidentiality considerations). Please note that adherence to our data policies is only compulsory for articles, not posters or slides.

A large number of journals and publishers have confirmed that they welcome research articles reporting analysis and conclusions that are based on previously published datasets: They do not consider the publication of a dataset with a DOI and associated protocol information as a ‘prior publication’ that would preclude subsequent publication of new results obtained from such a dataset.

1.1 Open Data Policy

F1000Research advocates an Open Data policy. All articles should include the submission of the data underlying the results, together with details of any software used to process results. It is essential that others can see the raw data to be able to replicate your study and analysis of the data, as well as in some circumstances, reuse it. Furthermore, publishing your data will show clearly that you did the work first. Others that then reuse your data for their own studies will be required to cite your data (which can be cited separately from your article if appropriate). Failure to openly provide data for publication without good justification is likely to result in your article being rejected.

Exceptions: We recognise that there may be cases where openly sharing data may not be feasible (due to ethical, data protection or confidentiality considerations), or because the data have been obtained from a third party and access restrictions apply.

1.2 FAIR Data Principles

F1000Research endorses the FAIR Data Principles, alongside an Open Data policy, as a framework to promote the broadest reuse of research data.

Findable

In order for data to be reused, it must be findable. To ensure that others can find your data, we ask that data be hosted by a stable and recognised open repository (where it is safe to do so) and assigned a globally unique persistent identifier (such as a DOI). Using such a repository and identifier ensures that your dataset continues to be available to both humans and machines in a useable form in the future.

To aid discoverability, data should also be described using appropriate metadata. The content and format of metadata is often guided by a specific discipline and/or repository through the use of a metadata standard. When depositing data in a repository, it is important that you fill in as many fields as possible as this information usually contributes to the metadata record(s). In some cases, specifically where using a discipline-specific repository, the submission of metadata files alongside the data may be required.

For practical guidance please see Select a Repository.

Accessible

Data accessibility is defined by the presence of a user license. Data supporting F1000Research articles should be openly published under the CC0 licence which facilitates data reuse. For software and source code, we strongly advise the use an OSI-approved license.

However, we recognise that there may be cases where openly sharing data may not be feasible (due to ethical or confidentiality considerations). In these cases, we have policies in place to allow the publication of papers associated with such data, whilst maintaining the appropriate level of security.

For practical guidance please see Add a Data Availability statement to your manuscript.

Interoperable

Interoperable data can be compared and combined with data from different sources by both humans and machines – promoting integrative analyses. To bolster interoperability, data supporting F1000Research articles should be stored in a non-proprietary open file format and described using a standard vocabulary (where available). In some cases, the preferred file formats and vocabularies will be dictated by the repository you choose to host your data.

For practical guidance please see Prepare your data for sharing.

Reusable

Data that is findable, accessible, and interoperable is generally fit for reuse. On occasion, the inclusion of additional documentation alongside the data may be required to ensure that the data are understandable and thus reusable. As a general rule, someone who is not familiar with the data should be able to understand what it is about using only the metadata and documentation provided.

By extension, the same practises that enable data reuse also support reproducibility.

2. Share Your Data in 3 Steps

2.1 Prepare Your Data for Sharing

Before you begin, we strongly suggest that you consult FAIRSharing.org for details of data standards specific to the topic of your research. Depending on your field of study, there may already be standards in place that will help guide how your data should be structured, formatted, and annotated.

When sharing data involving human participants, authors must ensure that all datasets have been de-identified in accordance with the Safe Harbor method before submission.

If you are submitting files directly to F1000Research (reserved only for small datasets where there is no appropriate subject-specific repository), please ensure all files are labelled clearly so readers will understand the contents of, and difference between, the files. Similar files (e.g. groups of FASTA files) may be grouped together; if this is relevant, please inform us how the files should be grouped, with data legends at the end of the main article file. For each file/group, please provide:

A single short title describing the content of the files;
A more detailed legend describing each dataset, so it is clear that the files are distinct and downloadable (including the explanation of any acronyms used in the dataset). These should be placed at the end of the manuscript following any figure legends, and not placed within the dataset itself.

Please also number each dataset (e.g. Dataset 1, Dataset 2) and cite them in the text in the same way as standard tables and figures.

2.1.1 Spreadsheet data

To increase the accessibility and reusability of spreadsheet data (i.e. large tables or raw data), they should adhere to the following best practices:

Give each column a descriptive heading.
Use a single header row.
Ensure you have used the first cell, i.e. A1.
Include a title and a legend to describe each spreadsheet (please put this at the end of the accompanying manuscript file, after any figure legends).
Save each data file with a name that appropriately reflects the content of that file.
Submit each table that is part of the dataset as a separate file.
Submit each worksheet as a separate file.

DO NOT

Embed charts, comments or tables within a spreadsheet.
Use color coding (machine-based data mining cannot interpret this).
Include special (i.e. non alphanumeric) characters within the spreadsheet, including commas.
Use merged cells.
Submit multiple worksheets within a spreadsheet (such as those used in Microsoft Excel), as these are not supported by CSV and TAB formats.

Spreadsheets should be submitted in CSV or TAB format; EXCEPT if the spreadsheet contains variable labels, code labels, or defined missing values, as these should be submitted in SAV, SAS or POR format, with the variable defined in English.

2.1.2 Software source code

All articles should include details of any software that is required to view the datasets described or to replicate the analysis. For all software used, please state the version used, details of where the software can be accessed, and any variable parameters that could impact the outcome of the results.

Where software has been coded by the authors of the paper, the source code should be made available. If there are ethical or privacy considerations as to why the source code may not be made available, please contact the editorial team.

2.2. Select a Repository

Where it is possible to do so, data should be deposited in a stable and recognised open repository under a CC0 license prior to manuscript submission. Please check that the DOI(s) and/or accession number(s) you provide us are publicly available.

F1000Research strongly encourages the use of community-recognised repositories. For some data types, such as genetic sequences and protein structures, it is essential that the data are deposited in GenBank and Protein Data Bank, respectively. For X-ray crystal structures, please submit your validation reports to F1000Research alongside your manuscript.

Where a community-recognised repository does not exist, prepare the files according to the guidelines above and submit to a general data repository. Alternatively, small datasets (such as those composed solely of CSV files) can be hosted and published directly by F1000Research. Please include descriptive legends and, where applicable, coding schemas alongside your datasets.

Some types of data benefit from visualization within the article. F1000Research welcomes the submission of manuscripts featuring Plot.ly interactive figures and Code Ocean compute capsules. For further detail, please contact us. Videos and images can be displayed through a widget provided by the Figshare repository. If you think your dataset should be displayed through the Figshare viewer, please do not directly submit your data to Figshare but include the datasets in your submission. We will then advise whether such visualization is suitable for your data.

2.2.1 Non-exhaustive list of F1000Research-approved repositories

Below is a list of repositories that have already been approved for hosting data alongside an F1000Research article.

If you are an author who wishes to use a repository not already on this list, please contact us. If you manage a repository and would like to be included on the list, please complete our Repository Evaluation form and return it to us.

General data, research materials and supporting documents

Data Type	Where to submit*	What to include in the data availability section of your article
Any, but especially data in SAV and POR formats	Dataverse	Title, DOI
Any	Figshare^$	Title, DOI
Any, but especially deposits with mixed data, materials and documents	Open Science Framework^†	Title, DOI
Any, but especially deposits with mixed data and code	Zenodo	Title, DOI
Deposits of mixed data and code	Code Ocean	Title, DOI, embed code for interactive reanalysis tool
Any biological data, but especially data linked to studies in other databases	BioStudies	Title, accession number

* Please note that many repositories have a limit on the size (usually 2 or 5 GB) of single file uploads and charge for larger data files.
$ If you think your data are suitable for visualization within your article through the Figshare viewer, please send your data to us first (via our article submission form), so we can advise (please note that we may have to charge an additional fee for data files greater than 5 GB).
† Deposits must be made public.

3D-printable models

Data Type	Where to submit	What to include in the data availability section of your article
All 3D-printable models (including molecular, cellular, medical/anatomical and labware models)	NIH 3D Print Exchange	Title, model ID, URL

Environmental and ecological data

Data Type	Where to submit	What to include in the data availability section of your article
Complex environmental and ecological data	The Knowledge Network for Biocomplexity*	Title, DOI
Environmental data collected by NERC-funded researchers	NERC data centres	Data centre name, title and DOI
Geospatial	PANGAEA	Title, DOI

* Data entries must be made public.

Sequence and omics data

Data Type	Where to submit	What to include in the data availability section of your article
Expression and sequence data (including Nucleotide/protein sequence, microarray, SNP/SNV, GWAS, phenotype or sequence-based reagent data) Systems and chemical biology data (including chemical entities, chemical reactions, computational models, metabolic profiles, or molecular interactions)	Any appropriate NCBI- or EBI- based repository*	Accession number(s). For SNP/SNV data please provide HGVS name(s), local ID(s) and rs/ss number(s)
Metabolomic data	Metabolomics Workbench^$	Project DOI, Study ID
Proteomic data	PeptideAtlas^$	Accession number(s)

* Some higher-level repositories, such as BioProject and BioStudies, provide access to data deposited in various archival databases. In these cases, please cite the accession numbers that are assigned to the data submissions by the archival databases in addition to the higher-level identifier.
$ Or any appropriate NCBI- or EBI-based repository, see above.

Health data (restricted access to protect anonymity of participants)

Data Type	Where to submit	What to include in the data availability section of your article
Addiction and HIV data	National Addiction & HIV Data Archive Program	Title, DOI
Cancer imaging	Cancer Imaging Archive	Title, DOI

Macromolecule structures

Data Type	Where to submit	What to include in the data availability section of your article
3D protein structures	Protein Data Bank	PDB number
Crystallography*	Crystallography Open Database	COD ID
X-ray images	Coherent X-ray Imaging Data Bank	Title, DOI

* X-ray crystallography validation reports should be submitted (as a PDF) directly to F1000Research via the submission system.

Neuroimaging data

Data Type	Where to submit	What to include in the data availability section of your article
Raw fMRI datasets	OpenNeuro	Title and accession number(s)
MRI and PET unthresholded statistical maps	NeuroVault*	Title and URL (which includes a unique data ID)

* Please note that authors will still be expected to deposit their raw neuroimaging data in an appropriate repository. Also, once submitted, administrative powers will be transferred to F1000Research. This is necessary to ensure stability of the dataset; this transfer does not affect the CC0 licence assigned to all NeuroVault submissions.

Software & source code

Data Type	Where to submit	What to include in the data availability section of your article
Latest source code	GitHub, BitBucket, SourceForge or Google Code	URL
Archived source code	Zenodo	Title, DOI and licence* used
Deposits of mixed data and code	Code Ocean	Title, DOI, embed code for interactive reanalysis tool
Software	Authors may host software where they wish, though it is strongly recommended to use a stable URL	URL

* An open licence must be assigned and we strongly advise authors to use an OSI-approved license.

F1000Research can accept papers with the underlying data being hosted by an approved institutional data repository, such as Edinburgh DataShare. For other institutional repositories, please contact us if you would like to discuss further.

2.3 Add a Data Availability Statement to Your Manuscript

All articles must include a Data Availability statement, even where there is no data associated with the article. This statement should be added to the end of the manuscript prior to submission. The Data Availability statement should not refer readers or referees to contact an author to obtain the data, but should instead include the applicable details listed below.

No associated or additional data

For articles which have no associated data, the statement should read:

“No data is associated with this article.”

For articles where all associated data are presented in the article itself, please include the statement:

“All data underlying the results are available as part of the article and no additional source data are required.”

Repository-hosted data

Where underlying and/or extended data are hosted in a repository, please include the name of the repository used and the license along with details indicated in the ‘What to include in the data availability section of your article’ column in the tables above. This should be done in the style of, for example:

Repository: Dataset 1. Manually annotated miRNA-disease and miRNA-gene interaction corpora, 10.5256/f1000research.4591.d34639

License: CC0 1.0 Universal

Each dataset mentioned in the manuscript, including those in the Data Availability statement, must also be referenced using a formal data citation.

For more information on how to structure the Data Availability section, please see our Article Guidelines.

Data that cannot be shared

Ethical and security considerations

If data access is restricted for ethical or security reasons, please include a description of the restrictions on the data and all necessary information required for a reader or referee to apply for access to the data and the conditions under which access will be granted.

Data protection issues

Where human data cannot be sufficiently de-identified, please include: an explanation of the data protection concern; what, if anything, the relevant Institutional Review Board (IRB) or equivalent said about data sharing; and, where applicable, all necessary information required for a reader or referee to apply for access to the data and the conditions under which access will be granted.

Large data

Where data is too large to be feasibly hosted by an F1000Research-approved repository, please include all necessary information required for a reader or referee to access the data alongside a description of this process.

Data under license by a third party

In cases where data has been obtained from a third party and restrictions apply to the availability of the data, the manuscript must include: all necessary information required for a reader or referee to access the data by the same means as the authors; and publicly available data that is representative of the analysed dataset and can be used to apply the methodology described in the manuscript (please see Repository-hosted data above).

If you are unable to share your data for any reason not included here, or have additional questions about data sharing, please let our editorial team know and we will be happy to advise.

About F1000Research