The Bioconductor project promotes high-quality, well documented, and interoperable software. These guidelines help to achieve this objective; they are not meant to put undue burden on package authors, and authors having difficultly satisfying guidelines should seek advice on the bioc-devel mailing list.
Package maintainers are urged to follow these guidelines as closely as possible when developing Bioconductor packages.
General instructions for producing packages can be found in the
Writing R Extensions
manual, available from within R (RShowDoc("R-exts")) or on the R web
site.
[ Back to top ]
Most packages contributed by users are software packages that perform analytic calculations. Users also contribute annotation and experiment data packages.
Annotation packages are database-like packages that provide information linking identifiers (e.g., Entrez gene names or Affymetrix probe ids) to other information (e.g., chromosomal location, Gene Ontology category).
Experiment data packages provide data sets that are used, often by software packages, to illustrate particular analyses. These packages contain curated data from an experiment, teaching course or publication and in most cases contain a single data set. Collections of related data sets can be hosted in the ExperimentHub resource. Instructions for adding data to ExperimentHub are in the ExperimentHubData vignette.
An excellent practice is to develop a software package, and to provide or use an existing experiment data package to give a comprehensive illustration of the methods in the software package. If the data files of a package are larger than 100 MB but less than 2 GB, Bioconductor now supports the use of Git Large File Storage (Git LFS) during package contribution. Please be aware Git LFS is free for all users up to 1 GB of data and a monthly usage of 1 GB of bandwidth; more data and bandwidth can be purchases at the contributers expense.
The guidelines below apply to all packages, but annotation and experiment data packages are not required to conform to the space limitations of software packages. Developers wishing to contribute annotation or experiment data packages should seek additional support associated with package submission.
[ Back to top ]
Package developers should always use the devel version of Bioconductor when developing and testing packages to be contributed.
Depending on the R release cycle, using Bioconductor devel may or may not involve also using the devel version of R. See the how-to on using devel version of Bioconductor for up-to-date information.
[ Back to top ]
Bioconductor packages must pass R CMD build (or
R CMD INSTALL --build)
and pass R CMD check with no errors and no warnings using a recent R-devel.
Authors should also try to address all notes that arise during build or check.
Packages must also pass R CMD BiocCheck with no errors and no warnings. The BiocCheck package is a set of tests that encompass Bioconductor Best Practices. Every effort should be made to address any notes that arise during this build or check.
Do not use filenames that differ only in case, as not all file systems are case sensitive.
The source package resulting from running R CMD build should occupy
less than 4MB on disk. The package should require less than 5 minutes to run
R CMD check --no-build-vignettes.
Using the --no-build-vignettes
option ensures that the vignette is built only once.
Vignette and man page examples should not use more than 3GB of memory since R cannot allocate more than this on 32-bit Windows.
These requirement are the minimum for package acceptance and will still be subject to other guidelines below and a formal technical review by a Bioconductor team member.
[ Back to top ]
Choose a descriptive name. An easy way to check whether your name is already in use is to check that the following command fails
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("MyPackage")
Avoid names that are easily confused with existing package names, or
that imply a temporal (e.g., ExistingPackage2) or qualitative (e.g.,
ExistingPackagePlus) relationship.
[ Back to top ]
The “License:” field in the DESCRIPTION file should preferably refer to a standard license (see wikipedia) using one of R’s standard specifications. Be specific about any version that applies (e.g., GPL-2). Core Bioconductor packages are typically licensed under Artistic-2.0. To specify a non-standard license, include a file named LICENSE in your package (containing the full terms of your license) and use the string “file LICENSE” (without the double quotes) in the “License:” field of your DESCRIPTION file.
[ Back to top ]
Packages must
[ Back to top ]
Packages you depend on must be available via Bioconductor or CRAN; users and the automated build system have no way to install packages from other sources.
Reuse, rather than re-implement or duplicate, well-tested functionality from other packages. Specify package dependencies in the DESCRIPTION file, listed as follows
GenomicRanges package is
listed in the Depends: field of GenomicAlignments. It is unusual
for more than three packages to be listed as ‘Depends:’.Rmpi or parallel that
enhance the performance of your package, but are not strictly needed
for its functionality.A package may rarely offer optional functionality, e.g., visualization
with rgl when that package is available. Authors then list the
package in the Suggests field, and use requireNamespace() (or
loadNamespace()) to condition code execution. Functions from the
loaded namespace should be accessed using :: notation, e.g.,
x <- sort(rnorm(1000))
y <- rnorm(1000)
z <- rnorm(1000) + atan2(x,y)
if (requireNamespace("rgl", quietly=TRUE)) {
rgl::plot3d(x, y, z, col=rainbow(1000))
} else {
## code when "rgl" is not available
}
This approach does not alter the user search() path, and ensures
that the necessary function (plot3d(), from the rgl package) is
used. Such conditional code increases complexity of the package and
frustrates users who do not understand why behavior differs between
installations, so is often best avoided.
[ Back to top ]
Re-use existing functionality, especially for S4 input methods and S4 classes. This encourages interoperability and simplifies your own package development.
If your data requires a new representation or function, carefully design an S4 class or generic so that other package developers with similar needs will be able to re-use your hard work, and so that users of related packages will be able to seamlessly use your data structures. Do not hesitate to ask on the Bioc-devel mailing list for advice. Be sure to implement the essential S4 interface.
Implement a constructor (typically a simple function) if the user is supposed to be able to create an instance of your class. Write short accessors (functions or methods) if the user needs to extract from or assign to slots in the class. Constructors and accessors help separate the interface seen by the user from the implementation details relevant to the developer.
The following layout is sometimes used to organize classes and methods; other approaches are possible and acceptable.
show methods would go in R/show-methods.R.A Collates: field in the DESCRIPTION file may be necessary to order class and method definitions appropriately during package installation.
[ Back to top ]
Many R operations are performed on the whole object, not just the elements of the object (e.g., sum(x), not x[1] + x[2] + …). In particular, relatively few situations require an explicit for loop. See the Vectorize section of Robust and Efficient Code for additional detail. See also Coding Style for advice on common coding syntax.
[ Back to top ]
Packages that rely on access to web resources need to be written carefully. Web resources can change location, can be temporarily unavailable, or can be very slow to access and retrieve. Functions that query web resources, should anticipate and handle such situations gracefully – failing quickly and clearly when the resource is not available in a reasonable time frame. See Querying Web Resources for additional detail and examples of robust web-query functions.
[ Back to top ]
We recommend using BiocParallel which
provides a consistent interface to the user and supports the major
parallel computing styles: forks and processes on a single computer,
ad hoc clusters, batch schedulers and cloud computing. By default,
BiocParallel chooses a parallel back-end appropriate for the OS and
is supported across Unix, Mac and Windows. Coding requirements for
BiocParallel are:
lapply()-style iteration instead of explicit for loops.FUN argument to bplapply() must be a self-contained
function; all symbols used in the function are from default R
packages, from packages require()‘ed in the function, or passed in
as arguments.bplapply() without specifying BPPARAM; the user can
then override the default choice with BiocParallel::register().For more information see the BiocParallel vignette.
[ Back to top ]
message() communicates diagnostic messages (e.g., progress during lengthy
computations) during code evaluation.warning() communicates unusual situations handled by your code.stop() indicates an error condition.cat() or print() are used only when displaying an object to the user,
e.g., in a show method.[ Back to top ]
Use dev.new() to start a graphics device if necessary. Avoid using x11()
or X11() for it can only be called on machines that have access to an X
server.
[ Back to top ]
A vignette demonstrates how to accomplish non-trivial tasks embodying
the core functionality of your package. There are two common types of
vignettes. A Sweave vignette is an .Rnw file that contains LaTeX and
chunks of R code. The R code chunk starts with a line «»=, and ends
with @. Each chunk is evaluated during R CMD build, prior to LaTeX
compilation to a PDF document. An R markdown vignette is similar to
a Sweave vignette, but uses
markdown instead of
LaTeX for structuring text sections and resulting in HTML output. The
knitr package can process most Sweave and
all R markdown vignettes, producing pleasing output. Refer to
Writing package vignettes
for technical details. See the
BiocStyle package for a
convenient way to use common macros and a standard style.
A vignette provides reproducibility: the vignette produces the same results as copying the corresponding commands into an R session. It is therefore essential that the vignette embed R code between «»= and @; short-cuts (e.g., using a LaTeX verbatim environment, or using the Sweave eval=FALSE flag, or equivalent tricks in markdown) undermine the benefit of vignettes.
All packages are expected to have at least one vignette. Vignettes go
in the vignettes directory of the package. Vignettes are often used
as stand-alone documents, so best practices are to include an
informative title, the primary author of the vignette, the last
modified date of the vignette, and a link to the package landing page.
[ Back to top ]
Appropriate citations must be included in help pages (e.g., in the see also section) and vignettes; this aspect of documentation is no different from any scientific endeavor. The file inst/CITATION can be used to specify how a package is to be cited.
Whether or not a CITATION file is present, an automatically-generated citation will appear on the package landing page on the Bioconductor web site. For optimal formatting of author names (if a CITATION file is not present), specify the package author and maintainer using the Authors@R field as described in Writing R Extensions.
[ Back to top ]
All Bioconductor packages use an x.y.z version scheme. The following rules apply:
When first submitted to Bioconductor, a package usually has version 0.99.0. For more details, see Version Numbering
[ Back to top ]
If the package contains C or Fortran code, it should adhere to the standards and methods described in the System and foreign language interfaces section of the Writing R Extensions manual. In particular:
During package development, enable all warnings and disable optimizations. If you plan to use a debugger, tell the compiler to include debugging symbols. The easiest way to enforce these is to create a user-level Makevars file user’s home directory in a sub-directory called ‘.R’). See examples below for flags for common toolchains. Consult the Writing R Extensions Manual for details about Makevars files.
Example for gcc/g++:
CFLAGS=-Wall -Wextra -pedantic -O0 -ggdb
CXXFLAGS=-Wall -Wextra -pedantic -O0 -ggdb
FFLAGS=-Wall -Wextra -pedantic -O0 -ggdb
Example for clang/clang++:
CFLAGS=-Weverything -O0 -g
CXXFLAGS=-Weverything -O0 -g
FFLAGS=-Wall -Wextra -pedantic -O0 -g
Use of external libraries whose functionality is redundant with libraries already supported is strongly discouraged. In cases where the external library is complex the author may need to supply pre-built binary versions for some platforms.
By including third-party code a package maintainer assumes responsibility for maintenance of that code. Part of the maintenance responsibility includes keeping the code up to date as bug fixes and updates are released for the mainline third-party project.
For guidance on including code from some specific third-party sources, see the external code sources section of the C++ Best Practices guide.
[ Back to top ]
Unit tests are highly recommended. We find them indispensable for both package development and maintenance. Examples and explanations are provided here.
[ Back to top ]
You can submit an instructional video along with your package. In the DESCRIPTION file of your package, add a “Video:” line which contains the link to your video. We will then feature your video on our Bioconductor YouTube Channel.
[ Back to top ]
Authors are strongly discouraged from placing their package into both CRAN and Bioconductor. This avoids burdening the author with extra work and confusing the user.
[ Back to top ]
Acceptance of packages into Bioconductor brings with it ongoing responsibility for package maintenance. These responsibilities include:
BugReports: field to the DESCRIPTION file if
reports should be directed to a particular web page rather than the
package maintainer. You should register on the
support site
and edit your profile, changing the “Watched Tags” field to
include all packages you maintain, so you will be notified
when anyone posts a question about your package.All authors mentioned in the package DESCRIPTION file are entitled to modify package source code. Changes to package authorship require consent of all authors.
[ Back to top ]
Source Code & Build Reports »
Source code is stored in
svn
(user: readonly, pass: readonly).
Software packages are built and checked nightly. Build reports:
Development Version »
Bioconductor packages under development:
Developer Resources: