Open Archives Initiative (OAI)
arXiv supports and participates in the Open Archives Initiative (OAI). arXiv is a registered OAI-PMH data-provider and provides metadata for all submissions which is updated each night shortly after new submissions are announced. Metadata for arXiv articles may be reused in non-commercial and commercial systems.
Notes for harvesters
- Base URL
arXiv supports OAI_PMH v2.0 at the baseURL
http://export.arxiv.org/oai2.- Identify response and policies
Policy links and various other details are included in the Identify response.
- Item = Article
Each article in arXiv is modeled as an Item in the OAI-PMH interface. Only the most recent version of each article is exposed via this interface (some metadata formats include the version history).
- Metedata formats
Metadata for each item (article) is available in several formats, all formats are supported for all articles. The available formats include:
oai_dc- Simple Dublin Core. See example inoai_dcformat.arXiv- arXiv specific metadata format which includes author names separated out, category and license information. See example inarXivformat.arXivRaw- arXiv specific metadata format which is very close the internal format stored at arXiv. Includes version history. See example inarXivRawformat.
You may request a list of all the metadata formats supported with the
ListMetadataFormatsverb.- Datestamps
Every OAI-PMH metadata record has a
datestampassociated with it, which is the last modification time of that record. Because arXiv has updated metadata records in bulk on several occasions, the OAI-PMHdatestampvalues do not correspond with the original submission or replacement times for older articles, and may not for newer articles because of administrative and bibliographic updates. The earliest datestamp is given then the<earliestDatestamp>element of the Identify response.The OAI-PMH interface does not support selective harvesting based on submission date. The datestamps are designed to support incremental harvesting of updates on an ongoing basis. It is not possible to selectively harvest only, say, articles submitted in February 2001 (identifiers 0102.xxxx). Except for selective harvesting based on subject areas (see description of Sets below) the interface is designed to support copying and synchronization of a complete set of arXiv metadata. In order to harvest metadata for all articles, either make requests without a datestamp range (recommended), or make requests from the
<earliestDatestamp>through to the present (but beware that because of bulk updates there are some dates on which there were large numbers of updates).Once an initial harvest has been completed, the copy may be maintained by making incremental harvesting requests with the
fromdate set to the date of last harvest (fromis best taken from the last server response; don't set theuntildate).- Sets
Each archive is available for selective harvesting as a separate set. This means that there are sets for
math,cs,nlinandq-bio. All the physics archives are exposed as sub-sets of aphysicsset. For example, justhep-thcan be harvested by harvesting the setphysics:hep-th. Alternatively, all physics archives can be harvested via the setphysics, or all of arXiv can be harvested by not specifying asetSpec. You may request a list of all the sets supported with theListSetsverb.- Update schedule
New papers are accepted daily and metadata is made available via the OAI-PMH interface by 10pm EST Sunday through Thursday.
- Play nice
arXiv uses
503 Retry-Afterreplies to implement flow control, be sure to abide by these responses (see OAI-PMH: 3.5 Flow Control).
Chronology
- 12 April 2007
- The arXiv OAI baseURL changed to
http://export.arxiv.org/oai2fromhttp://arxiv.org/oai2. The old URL will issue a redirect for some time but please update your harvester to use the new baseURL. - 1 April 2007
- Support for the long-deprecated
OAI_PMH v1.1
at baseURL
http://arXiv.org/oai1has been discontinued. Please use our v2.0 interface instead. - 29 December 2006
- arXiv Dienst interface disabled. The Dienst protocol was replaced by the OAI-PMH and arXiv's interface hasn't been used regularly by any service for a few years and not at all in the last few months.
- 2 July 2003
- Open Archives Initiative Protocol for Metadata Harvester v2.0 is released. arXiv supports both OAI-PMH v1.1 and v2.0; v1.1 is deprecated.
- 20 June 2001
- Minor update of the OAI protocol to follow changes in the XML Schema specification, arXiv updated to support OAI-PMH v1.1.
- 21 January 2001
- Open Archives Initiative Protocol for Metadata Harvester v1.0 released, the Santa Fe Convention is discontinued. See OAI website for details of the latest protocol.
- 15 February 2000
- The Santa Fe Convention officially released, arXiv is compliant.
- 27 January 2000
- arXiv Dienst implementation for Santa Fe Convention compliance announced to participants in the Open Archives initiative.
- 21-22 October 1999
- The Santa Fe Convention was the result of a meeting of the Open Archives initiative held in Santa Fe, New Mexico, USA.