Publishing Data#

Note

Materials Commons follows the FAIR principles outlined below.

Best Practices Guide for Publishing Research Data#

Following FAIR Principles#

Research data should follow the FAIR principles:

Findable#

  • Use persistent identifiers (DOIs) for datasets

  • Provide rich metadata descriptions

  • Register data in searchable resources

  • Include clear version information

Accessible#

  • Store data in a trusted repository such as Materials Commons

  • Ensure data can be retrieved using standard protocols

  • Maintain metadata even if data is no longer available

  • Provide clear access conditions and licenses

Interoperable#

  • Use standard formats and vocabularies

  • Include qualified references to other data

  • Document data structure and relationships

  • Provide machine-readable metadata

Reusable#

  • Include detailed documentation

  • Specify clear data usage licenses

  • Provide provenance information

  • Meet domain-relevant community standards

Data Preparation#

Documentation#

  • Create comprehensive README files

  • Document methodology and collection procedures

  • Include data dictionaries

    • If you include a Materials Commons formatted spreadsheet and select samples/computations then a data dictionary will automatically be built from it for your dataset

  • Describe variables and units

  • Document software versions and parameters used

Quality Control#

  • Validate data integrity

  • Check for completeness

  • Verify accuracy

  • Remove sensitive information

  • Review for errors

File Organization#

  • Use consistent file naming conventions

  • Organize files logically

  • Include version control information

  • Separate raw and processed data

  • Document file relationships

Metadata Requirements#

  • Follow repository-specific metadata standards

  • Include all required fields

  • Provide additional optional metadata when relevant

  • Use controlled vocabularies where applicable

Best Practices for Specific Data Types#

Tabular Data#

  • Use standard formats (CSV, TSV)

  • Include column headers

  • Document missing value codes

  • Specify units of measurement

  • Provide data dictionaries

Images and Media#

  • Use non-proprietary formats

  • Include calibration information

  • Provide resolution details

  • Document processing steps

  • Include scale information

Code and Scripts#

  • Include version information

  • Document dependencies

  • Provide usage instructions

  • Include example data

  • Specify system requirements

Ensuring Long-term Access#

Preservation#

  • Choose sustainable file formats

  • Include sufficient documentation

  • Plan for format migration

  • Document preservation strategy

Maintenance#

  • Update contact information

  • Monitor data accessibility

  • Address user questions

  • Fix reported issues

  • Track citations and reuse

Note

Remember that good data publishing practices enhance the visibility, impact, and reusability of your research.

Warning

Always check institutional and funding requirements before publishing research data.

Additional Considerations#

  • Privacy and ethical concerns

  • Data protection regulations

  • Intellectual property rights

  • Embargo periods

  • Citation requirements

Resources and Tools#

  • Data management plan templates

  • Metadata creation tools

  • File format validators

  • Repository directories

  • Documentation guidelines

How To Publish On Materials Commons#

This guide describes how to use the Create Dataset interface for entering and managing dataset metadata.

You can publish your data on Materials Commons from any project you have uploaded data to. You can publish all or a subset of your data and meta-data. Materials Commons follows the FAIR principles. It will assign a DOI to your dataset. In addition it will ensure that your dataset is findable in Google Dataset Search.

Creating a Dataset#

Materials Commons publishes datasets. Datasets are a subset of the data in your project. A dataset contains additional data about your research data such as tags, description, authors, associated paper(s), funding and other important descriptive information.

Dataset Creation Steps#

  • Navigate to your project

  • Click on “Datasets” in the sidebar

  • Click on “Create Dataset” in the upper right of the card

  • Fill out the dataset details:

    • Authors Management:

      • By default all project members are included in the authors list

      • Reorder authors using drag and drop

      • Add new authors (Materials Commons account not required)

    • Content Selection:

      • Choose Files, Samples, and Computations to include

      • Note: Selecting a Sample or Computation automatically includes associated files

        • To exclude specific files: Go to Files tab and uncheck selected files/directories

Publishing Your Dataset#

  • Select your dataset and click “Publish”

    • Publishing process:

      • Runs in the background

      • Aggregates all files

      • Creates a Globus download location

      • Creates a ZIP file (for datasets under 4GB)

Dataset Management#

  • Version Control:

    • Datasets are snapshots of your project data

    • Project files can be modified without affecting published datasets

  • Update Options:

    • Quick update: Click “Refresh” to sync with latest file changes

    • Full update:

      1. Click “Unpublish”

      2. Make necessary changes

      3. Click “Publish” to create new version

Note

Changes to project files after dataset publication won’t affect the published dataset unless you explicitly republish the dataset.