Blog

Blog

Research Data Support has set up a DAP announcements mailing list for system administration updates such as system outages and software updates.

Subscription is via https://lists.csiro.au/mailman/listinfo/dap-announcements or by sending an email to dap-announcements-request@lists.csiro.au with “subscribe your.email@csiro.au” in the body of the email.

If you know of other people who may be interested in subscribing to this list then please let them know about it.

This week the CSIRO DAP passed the 1 Petabyte (PB) mark. This means that it now holds over 1 PB of CSIRO data and software files which are preserved together with their metadata for long-term discovery. Over 900TB of the data in the DAP is astronomy data from the Parkes and ASKAP radio telescopes.

Not all the data in the DAP is publically accessible. CSIRO researchers can also use it to archive data for internal use. Every record in the DAP has a persistent URL, and public records have a Digital Object Identifier (DOI), used for data publication and citation.

With the first public release in 2011, storage growth has jumped from an average of 18 GB a week during Sept 2012-Feb 2013, to an average of 8 TB a week during Sept 2017-Feb 2018.

For more information see CSIRO DAP Help or contact researchdatasupport@csiro.au

 


An exciting partnership between CSIRO Mineral Resources postdoctoral researcher, Anusuriya Devaraju and a number of IMT teams results in a notable enhancement to the DAP. Introducing the Recommender System, designed to help find similar content based on a range of weighted criteria.

Check out this software collection in DAP, https://doi.org/10.4225/08/5a56c7a9e4c2a, which shows the two categories delivered by the Recommender System: Similar collections, based on matching metadata, and Others were also interested in, based on usage data

A new version of the DAP (V2.20.1647) was released on 23 January 2018.

Included in this release were:

  • Enhancements to the web services
    • Added “Activities” list to web service API.
    • Added more detail to Collection search web service.
    • Contributors included in web service search results.
    • Added OAuth2 authentication to web service API.
    • Added field that indicates if data is restricted in web service search.
    • Collection search web service can now be filtered by date.
    • Added facet filtering to collection web service search.
  • Enhancements for depositors:
    • “Related Materials” have been renamed “Related Links”.
    • Added function to clear file level metadata during collection deposit.
    • REST user password is now visible in on the user's "My Defaults" page.
    • Commas can now be used as delimiters for the collection keywords.
  • Enhancements for users:
    • Better handling of repeated file mount requests.
    • DOI links are now HTTPS.
    • Featured collection functionality enhancement on the home page.
    • The "Access" statement on the landing page was updated.

For complete release information see Release History.

This release was part of the DMCEP project stage 2

For more information about the program including details on the program's outcomes and deliverables please visit the DMCEP Wiki.

The DMCEP program is highly Researcher driven regarding what new capabilities are developed and when and feedback and improvement suggestions are welcome.

The current focus of the project is a refresh of the DAP User interface. If you would like to contribute please contact the RDS team via researchdatasupport@csiro.au.

Also please contact us if:

  • If you would like a demonstration or further information about any of these changes.
  • If you have a particular use case that you think would be suitable for extending our data management capabilities.
  • If you have any other comment or feedback to make.

Most viewed collection:

Durrant, Thomas; Hemer, Mark; Trenham, Claire; Greenslade, Diana (2013): CAWCR Wave Hindcast 1979-2010. v7. CSIRO. Data Collection. http://doi.org/10.4225/08/523168703DCC5

Most downloaded collection:

Yang, Sam; Chu, Clement; Hogan, Dominic (2016): Visualization of the hydrogen atom wave functions. v5. CSIRO. Data Collection. http://doi.org/10.4225/08/5808734d43852

Numbers at the end of October:

Public collections 2003

Encapsulated storage 857TB


CSIRO staff can see the full history of DAP usage reports via Data Access Portal Usage Statistics

A new version of the DAP (V2.19.1489) was released on 5 October 2017.

The big improvements from this release are:

  • DAP depositors can now add records for web services either in conjunction with data collection records or as standalone service collection records. For more information see https://confluence.csiro.au/display/daphelp/Link+to+Web+Services (internal users only). Service records will be identified as such in Research Data Australia and in the attribution statement. DAP users will now be able to discover and use these service records. For more information see https://confluence.csiro.au/display/daphelp/Access+Services+for+Data+or+Software.
  • File level metadata can now be added to any file type by depositors and be viewed by data users.
  • Depositors of acquired data collections for internal use are now enabled and encouraged to upload any pre-existing licence or terms and conditions documentation.
  • Approvers now have the option to approve all future versions of a collection.
  • Landing pages for collection records now have the Project title, Field of research, Keywords, Lead researcher and Contributor metadata hyperlinked to allow easy browsing to related records.
  • Further enhancements to the DAP API.
  • The “Activity” field is no longer mandatory.

For complete release information see Release History.

This release was part of the DMCEP project stage 2

For more information about the program including details on the program's outcomes and deliverables please visit the DMCEP Wiki.

The DMCEP program is highly Researcher driven regarding what new capabilities are developed and when and feedback and improvement suggestions are welcome.

We are currently undertaking a refresh of the DAP User interface. If you would like to contribute please contact the RDS team via researchdatasupport@csiro.au.

Also please contact us if:

  • If you would like a demonstration or further information about any of these changes
  • If you have a particular use case that you think would be suitable for extending our data management capabilities.
  • If you have any other comment or feedback to make.

The CSIRO Data Access Portal will now issue a DOI for restricted data and software collections.

A recent change to the DAP has enabled DOIs (Digital Object Identifiers) to be issued for data and software collections where the associated files cannot yet be made public, so long as the metadata about the collection can be made public.

In the past both the metadata and the files needed to be publicly available.

This change will enable a researcher to have a DOI when:

  • a researcher wants to publish the data or software in association with a journal article, and needs a DOI to include the citation for published data/software in their reference list, but does not wish to make the files available before the article is published.
  • ethical considerations mean that the data can only be available to selected users.
  • the data has other sensitivity concerns.

Embargos can be automatically applied to make the data public at a later date without changing the DOI.

Use Ask a Librarian or Research Data Support for assistance using the CSIRO Data Access Portal

The CSIRO DAP has been recently added to the list of recommended repositories by PLOS. We were able to demonstrate that the DAP adheres "to best practices pertaining to responsible data sharing, sustainable digital preservation, proper citation, and openness ..."

To comply with the PLOS data policy authors must select the CC-BY licence when creating their DAP record and depositing their data.

The CSIRO Data Access Portal (DAP) is an institutional data repository supporting the publication of data and software for its organisation, CSIRO, Australia’s national science agency. The aim of the CSIRO Data Access Portal is to provide reliable, long-term access to managed digital resources for CSIRO.

It complies with the Force 11 Data Citation Principles.

An attribution statement and a DOI are assigned to all data deposited and made publically available in the CSIRO Data Access Portal. DOIs are minted via the Australian National Data Service (ANDS) and DataCite. The CSIRO DAP complies with ANDS service policies.

Data is stored persistently within the repository with full metadata including unique identifiers and a licence. It is mirrored in at least two separate datacentres by default. Data and metadata are versioned.

See our entry on re3data.

A new version of the DAP (V2.17.1151) was released on 16 March 2017.

Highlights of this release for users are:

  • A clearer option for downloading all files in a collection on the files tab.
  • Downloads now include the collection metadata and licence information.
  • Improvements for viewing and selecting files.
  • Depositors uploading via STFP will notice that upload speeds have been improved at least four fold.
  • Depositors now get a warning if publishing a collection with no files uploaded.
  • Improvements to the DAP API allow authenticated users to create collections.

 

For further information see Release History.

This release was part of the DMCEP project

For more information about the program including details on the program's outcomes and deliverables please visit the DMCEP Wiki.

The DMCEP program is highly Researcher driven regarding what new capabilities are developed and when and feedback and improvement suggestions are welcome.

Please contact the RDS team via researchdatasupport@csiro.au :

  • If you would like a demonstration or further information about any of these changes
  • If you have a particular use case that you think would be suitable for extending our data management capabilities.
  • Or if you have any other comment or feedback to make.

A new version of the DAP (v2.16.1063) was released on 15 December 2016. Some highlights of this release are:

    • Return the user to the same page after successfully logs into DAP.
    • Improved search results refine and filter options.
    • Additional RESTful Web Services for creating and updating Collection.
    • Enable file search and paginated file loading on Provide Your Data page.
    • Provide map view on Describe Your Data page.
    • 'Rejected' option in Approver Decision options removed.
    • Validate new Non-CSIRO authors to reduce duplication on Create Your Citation page.
    • General infrastructure improvements and bug resolutions including futher stages to the implemetation of MongoDB and Elasticsearch

For further information see Release History.

This release was part of the DMCEP project

For more information about the program including details on the program's outcomes and deliverables please visit the DMCEP Wiki.

The DMCEP program is highly Researcher driven regarding what new capabilities are developed and when and feedback and improvement suggestions are welcome.

Please contact the RDS team via researchdatasupport@csiro.au :

  • If you would like a demonstration or further information about any of these changes
  • If you have a particular use case that you think would be suitable for extending our data management capabilities.
  • Or you have any other comment or feedback to make.

 

Anusuriya Devaraju is conducting a short survey (https://www.surveymonkey.com/r/NP7MPSV) to identify the important metadata elements when searching datasets on the CSIRO Data Access Portal (https://data.csiro.au/). The survey results will be used to support her research on developing a recommender system of research datasets.

  • The survey includes 2 questions, and should only take about 5 minutes of your time.
  • Participation in this survey is completely voluntary.
  • Your responses will remain strictly confidential.

 

Thank you in advance for taking the time to complete this survey. Please forward this post to potential users of the DAP

 

New CSIRO Data Access Portal version 2.13 released

A new version of the DAP (v2.13.754) was released on 29 April 2016. Highlights of this release:

  • Deposit now includes a Collection Type field, with options Data or Software; the selected type is included in the published Attribution Statement
  • Where a collection contains more than 25 files, the file list is initially displayed collapsed and a Search for Files option allows filtering/finding within list
  • Web services API enhancements:
    • Can request a specific version of a collection
    • Retrieving metadata for a collection includes more fields in response
    • New endpoint /collections/{id}/versions to list available versions of a collection
  • In Deposit Data Collection 'Citation' tab, contributors can be reordered by dragging and dropping
  • Department of Education and Training (Australia) added as a funding source
  • Bug fixes include:
    • Special/extended characters now encoded correctly in web services API XML responses
    • Team and Business Unit fields now saved as part of draft collections
Data Access Portal - scheduled outage 28 April 2016

The CSIRO Data Access Portal and the CSIRO ASKAP Science Data Archive (CASDA) will be unavailable Thursday 28 April 2016 10:30-17:30 AEST to release new versions of these services.

IMT formed the Data Management Capability Enhancement Program (DMCEP) to further develop the DAP and related data management services across CSIRO and build an integrated data management ecosystem servicing the needs of CSIRO and the Australian Innovation System.

To date the DMCEP has delivered 6 development sprints and 4 production releases including approximately 100 enhancements (new functionality and bug fixes) to the DAP.

These enhancements have included:

Changes for depositors:

    • Removed the need for further approval for metadata edits to records already published. This means that any edits that depositors have been wanting to make to records, for example, incorrect spatial coordinates, or typos, can now be done without an addtional approval step. Any changes to the data will still require re-approval.
    • Changes to the attribution statement to make it easier to reuse
      • all contributors are now listed
      • contributor name order has been changed to last name, first name
      • the DOI is a full link
    • CSIRO Groups can be used as Contacts
    • Corporate Names can be included in the Contributors list
    • Related materials 
      • can now include a relevant Attribution Statement
      • now have a type: website, publication, collection
      • drag and drop to reorder
    • New section added for tracking Funding Sources
    • Added search for Field of Research (FOR) codes
    • Co-ordinates entry section in Location details now a spatial representation
    • Start and end dates can now be a year, or a month and a year
    • Files display in order on the data tab
    • Image metadata is no longer mandatory
    • The Collaborating Organisations element and Rights Statement are co-located and it can now be seen how they relate
      • Collaborating Organisations can be reordered using drag and drop
    • Project Leader is now auto-populated
    • Can add documentation for approvers-eyes only
    • A record submitted to an approver can be changed to another approver by the depositor
    • Shareable persistent link is now on the description tab and the data tab

Changes for Approvers:

    • Reminder emails are now sent monthly
    • Removed mandatory Data Deposit Checklist question
    • A record submitted to an approver can be changed to another approver by the depositor
    • Approval process for metadata only updates are not required
    • Can receive documentation from depositors

Changes for Data Users:

    • Improved interface for requesting large collections
    • Shareable persistent link now on the description tab and the data tab

Large collection changes:

    • Increased the default large collection mount time from 48 Hours to 1-2 weeks
    • Large collections can be now mounted for extended periods on request to RDS

Web services enhancements:

Other:

    • Improvements to the feed to ANDS Research Data Australia
    • Better support for notification for outages etc for example a banner on the header of the DAP which will be used for notifications

CASDA:

CASDA version 1.1 has now been released.

The enhancements in version 1.1 over version 1.0 (released Nov 2015) are:

  • Scripted access to large data files via Virtual Observatory (VO) protocols, including authenticated access
  • 3-d image cube cut-outs including spatial, spectral and polarisation filtering
  • Example script for producing bulk image cut-outs based on a catalogue
  • Team member access to unreleased data products via VO protocols
  • Support for direct transfers within Pawsey Supercomputing Centre for users with Pawsey accounts
  • Administration of project roles (e.g. allocation of validation rights for a project)

More to come ...

We have also been working on analyses and precursor work for:

  • more improvements to large data collection capabilities
  • enabling linking to data hosted externally
  • support for file/object level metadata (investigating implementation of MongoDB)
  • development of a Data Management Plan online tool
  • better indexing to support improved discovery, and discovery via web services (investigating moving from Solr to Elasticsearch)
  • better reliability and stability monitoring
  • more enhancements to web services
  • better statistical reports for depositors
  • Provenance capabilities.
  • Semantic capabilities.

More information and getting involved

For more information about the program including details on the program's outcomes, deliverables please visit the DMCEP Wiki.

The DMCEP program is highly Researcher driven regarding what new capabilities are developed and when and feedback and improvement suggestions are welcome.

Please contact the RDS team via researchdatasupport@csiro.au :

  • If you would like a demonstration or further information about any of these changes
  • If you have a particular use case that you think would be suitable for extending our data management capabilities.
  • Or you have any other comment or feedback to make.

CSIRO Astronomy and Space Science has just passed a major milestone in data archiving. The total number of Parkes pulsar projects published in the CSIRO Data Access Portal has just passed the 100 mark - there are now 104 projects with ~405,000 files spanning 23 years (1991 to 2014). Those ~405,000 files represent ~190TB of data.

There is a collection for each semester of a project so each project has more than one associated data collection. There are currently 569 Parkes Pulsar collections publicly accessible.