

The Compact Muon Solenoid (CMS) Collaboration at CERN is excited to announce the public release of the first batch of high-level, analysable and open data from the Large Hadron Collider (LHC), recorded by the CMS detector. The datasets are available on the new CERN Open Data Portal and are being released into the public domain under the Creative Commons CC0 waiver, in keeping with CMS’s commitment to data preservation and open data. “We have a duty to society to do so,” says Tiziano Camporesi, CMS Spokesperson. “The scientific knowledge we produce is for everyone to share and we hope educational tools built on top of our data will inspire the next generation of scientists.”
The first batch corresponds to approximately half of the total data volume recorded in 2010, the first year of LHC operations. While the data are in a processed format that is good for analysis, they are still quite complex and performing an analysis using these data is difficult: it takes CMS scientists working in groups and relying upon each others’ expertise many months or even years to perform a single analysis that must then be scrutinised by the whole collaboration before a scientific paper can be published. A first-time analysis typically takes about a year from start of preparation to publication, not taking into account the six months it takes newcomers to learn the analysis software.
Acknowledging these challenges, CMS is providing basic documentation to accompany the data in the form of some simple (open-source) analysis examples for users to familiarise themselves with the CMS analysis environment and get started with using the data. In addition to these, the portal also hosts simplified examples of analysis that can be used in a classroom environment, for both the high-school and the university level, as well as example applications that developers can learn from and build upon. Since the collaboration’s experts are devoted to new physics analyses, CMS has limited resources — provided on a voluntary basis — for additional support.
Data samples and analysis tools used in the international Physics Masterclass exercise developed by QuarkNet and the CMS e-Lab run by I2U2 are suited to the high-school classroom setting. CMS collaborators have also developed applications for university students with CMS open data: the HEP Tutorial by Christian Sander and Alexander Schmidt of Hamburg University provides an introduction to particle physics through an analysis of top quarks; the web-based VISPA tool developed at RWTH Aachen University is used by third-year undergraduate students to perform simple CMS analyses, such as calculating the mass of particles produced at the LHC.
“Several people have invested time and effort in bringing us this far,” adds Kati Lassila-Perini, CMS Data Preservation and Open Data coordinator, “but we know that this is just the beginning. The first of our open data are now available for the whole world to enjoy, and we look forward to hearing from developers and educators on how they are used!”
CMS would also like to thank the following in particular for their efforts in bringing our open data and tools to the portal: CERN’s IT Department, Scientific Information Service and PH-SFT group; Tom McCauley from the University of Notre Dame, USA; Adam Huffman and David Colling from Imperial College London, UK; Ana Rodríguez Marrero, Alicia Calderón Tazón and Jesús Marco from IFCA, CSIC-University of Cantabria, Spain; all the CMS Physics Object Group (POG) conveners; Andreas Pfeiffer and the CMS AlCaDB team; the QuarkNet programme, USA; and the Lapland University of Applied Sciences, Finland.
The following are provided through the portal:
2014-11-20, by Achintya Rao