Page tree
Skip to end of metadata
Go to start of metadata




Overview


The DAP uses the term "collection" rather than "dataset" for the following reasons:

  • A collection can contain more than one dataset.
  • A collection may contain software.
  • A collection may reference externally hosted services instead of, or in addition to, containing data.

DAP collections are version controlled.  Each version of a collection can have its own files and metadata.  It is possible for different versions of collections to have different restrictions.  Versions of a collection can also be embargoed for defined periods.

The structure of a DAP collection


Every DAP collection can have a number of different identifiers.  These can be used with the web services for different responses, e.g. returning the latest version of a collection vs. returning a specific version.

DAP Collection

Identifiers


IdentifierAlternate NamesDescription
DOI

Digital Object Identifier (external link).  These identifiers can be resolved using doi.org, e.g.

DOIs are designed as persistent identifiers.  When the DAP mints a DOI, it will always refer to a specific version of the relevant DAP collection and the exact files.

All publicly accessible DAP collections will have a DOI, unless the collection only contains metadata and no files.

Using a DOI with the DAP web services will resolve a specific version of the collection.  e.g. in the diagram above, version 1 and version 2 have the same DOI.  Once version 2 is published, the DOI that previously resolved to version 1 will resolve to version 2 instead.  Version 3 contains different files, so has a new DOI.  The DOI for version 2 still resolves to version 2, but the DOI for version 3 resolves to version 3.

ANDS PIDHandle

The ANDS PID is a Handle (external link).  These identifiers can be resolved using hdl.handle.net or doi.org, e.g.

Handles are designed as persistent identifiers.  In the DAP web services, a collection's ANDS PID will always resolve to the most recent version of a collection.

All publicly accessible DAP collections will have an ANDS PID, including collections that only contain metadata.

Fedora PIDFedora ID

The Fedora ID is a local identifier in the DAP.  It refers to the collection as a whole, including all versions.  In the DAP web services a Fedora ID will always resolve to the latest version of a collection.

All DAP collections will have a Fedora ID, including collections with restricted access.

Data Collection Common ID
The Data Collection Common ID is a local identifier in the DAP.  It refers to the collection as a whole, including all versions.  For many DAP collections the Data Collection Common ID will be identical to the numeric component of the Fedora PID, though there are exceptions.  The DAP web services do not resolve Data Collection Common IDs, if you attempt this it will be interpreted as a Data Collection ID (i.e. version 1, see below).
Data Collection ID

The Data Collection ID is a local identifier in the DAP.  It refers to a specific version of a collection.  The DAP web services will resolve a Data Collection ID to the exact version it relates to.

All versions of DAP collections have a Data Collection ID.

For version 1 of a given collection, the Data Collection Common ID will be identical to the Data Collection ID.


Resolving Identifiers


IdentifierConventional Usage Example/ws/v2 ExampleNotes
DOIhttps://doi.org/10.25919/5b6ae6b515850https://data.csiro.au/dap/ws/v2/collections/10.25919%2F5b6ae6b515850

You must URL encode the identifier, e.g. convert "/" to "%2F".

Resolves to a specific version of a collection.

ANDS PIDhttps://hdl.handle.net/102.100.100/72268https://data.csiro.au/dap/ws/v2/collections/102.100.100%2F72268

You must URL encode the identifier, e.g. convert "/" to "%2F".

Resolves to the latest version of a collection.

Fedora PID

https://data.csiro.au/dap/landingpage?pid=csiro:34437

https://data.csiro.au/collections/#collection/CIcsiro:34437

https://data.csiro.au/dap/ws/v2/collections/csiro:34437Resolves to the latest version of a collection.
Fedora PID + version

https://data.csiro.au/dap/landingpage?pid=csiro:34437&v=2

https://data.csiro.au/collections/#collection/CIcsiro:34437v002

https://data.csiro.au/dap/ws/v2/collections/csiro:34437v002

You can pad the version number with leading zeros (or omit them) to a maximum of four characters (e.g. "0049" is acceptable, but "00049" is not).

Note that this example is to version 2 of a collection that had a third version published at the time of writing:
https://data.csiro.au/dap/ws/v2/collections/csiro:34437v3

Data Collection IDhttps://data.csiro.au/collections/#collection/CI35997https://data.csiro.au/dap/ws/v2/collections/35997

Resolves to a specific version.

Note that this example is to version 2 of a collection that had a third version published at the time of writing: https://data.csiro.au/dap/ws/v2/collections/36589


Listing Versions


For a given identifier, you can find the Data Collection ID values for all versions that you have access to through the /ws/v2/{id}/versions endpoint.

e.g. https://data.csiro.au/dap/ws/v2/collections/10.25919%2F5b6ae6b515850/versions


Files


The DAP stores files on a hierarchical file system.  All files are stored on tape, and files that are used frequently or recently have a copy on disk.

Collections that have a total file size of less than 1GB are typically always on disk.  Collections larger than this may be on disk, but may also need to be "mounted" to disk before they can be accessed.

You can see if a collection is mounted by looking at the /ws/v2/collections/{id} endpoint and finding the "mounted" key/value pair, e.g.

https://data.csiro.au/dap/ws/v2/collections/csiro:9989

Additionally you can check the mount status using the /ws/v2/collections/{id}/mountstatus endpoint.  One note, you must use a Data Collection ID as the identifier when using this endpoint.  e.g.:

GET https://data.csiro.au/dap/ws/v2/collections/29934/mountstatus

Collections can be mounted either by requesting them in the DAP user interface, or by a POST request to the /ws/v2/collections/{id}/mount endpoint.



Need help with the Data Access Portal? Please contact Research Data Service Support at researchdatasupport@csiro.au or phone: +61 2 4960 6086

  • No labels