Publishing Data#
Note
Materials Commons follows the FAIR principles outlined below.
Best Practices Guide for Publishing Research Data#
Following FAIR Principles#
Research data should follow the FAIR principles:
Findable#
Use persistent identifiers (DOIs) for datasets
Provide rich metadata descriptions
Register data in searchable resources
Include clear version information
Accessible#
Store data in a trusted repository such as Materials Commons
Ensure data can be retrieved using standard protocols
Maintain metadata even if data is no longer available
Provide clear access conditions and licenses
Interoperable#
Use standard formats and vocabularies
Include qualified references to other data
Document data structure and relationships
Provide machine-readable metadata
Reusable#
Include detailed documentation
Specify clear data usage licenses
Provide provenance information
Meet domain-relevant community standards
Data Preparation#
Documentation#
Create comprehensive README files
Document methodology and collection procedures
Include data dictionaries
If you include a Materials Commons formatted spreadsheet and select samples/computations then a data dictionary will automatically be built from it for your dataset
Describe variables and units
Document software versions and parameters used
Quality Control#
Validate data integrity
Check for completeness
Verify accuracy
Remove sensitive information
Review for errors
File Organization#
Use consistent file naming conventions
Organize files logically
Include version control information
Separate raw and processed data
Document file relationships
Metadata Requirements#
Follow repository-specific metadata standards
Include all required fields
Provide additional optional metadata when relevant
Use controlled vocabularies where applicable
Best Practices for Specific Data Types#
Tabular Data#
Use standard formats (CSV, TSV)
Include column headers
Document missing value codes
Specify units of measurement
Provide data dictionaries
Images and Media#
Use non-proprietary formats
Include calibration information
Provide resolution details
Document processing steps
Include scale information
Code and Scripts#
Include version information
Document dependencies
Provide usage instructions
Include example data
Specify system requirements
Ensuring Long-term Access#
Preservation#
Choose sustainable file formats
Include sufficient documentation
Plan for format migration
Document preservation strategy
Maintenance#
Update contact information
Monitor data accessibility
Address user questions
Fix reported issues
Track citations and reuse
Note
Remember that good data publishing practices enhance the visibility, impact, and reusability of your research.
Warning
Always check institutional and funding requirements before publishing research data.
Additional Considerations#
Privacy and ethical concerns
Data protection regulations
Intellectual property rights
Embargo periods
Citation requirements
Resources and Tools#
Data management plan templates
Metadata creation tools
File format validators
Repository directories
Documentation guidelines
How To Publish On Materials Commons#
This guide describes how to use the Create Dataset interface for entering and managing dataset metadata.
You can publish your data on Materials Commons from any project you have uploaded data to. You can publish all or a subset of your data and meta-data. Materials Commons follows the FAIR principles. It will assign a DOI to your dataset. In addition it will ensure that your dataset is findable in Google Dataset Search.
Creating a Dataset#
Materials Commons publishes datasets. Datasets are a subset of the data in your project. A dataset contains additional data about your research data such as tags, description, authors, associated paper(s), funding and other important descriptive information.
Dataset Creation Steps#
Navigate to your project
Click on “Datasets” in the sidebar
Click on “Create Dataset” in the upper right of the card
Fill out the dataset details:
Authors Management:
By default all project members are included in the authors list
Reorder authors using drag and drop
Add new authors (Materials Commons account not required)
Content Selection:
Choose Files, Samples, and Computations to include
Note: Selecting a Sample or Computation automatically includes associated files
To exclude specific files: Go to Files tab and uncheck selected files/directories
Publishing Your Dataset#
Select your dataset and click “Publish”
Publishing process:
Runs in the background
Aggregates all files
Creates a Globus download location
Creates a ZIP file (for datasets under 4GB)
Dataset Management#
Version Control:
Datasets are snapshots of your project data
Project files can be modified without affecting published datasets
Update Options:
Quick update: Click “Refresh” to sync with latest file changes
Full update:
Click “Unpublish”
Make necessary changes
Click “Publish” to create new version
Note
Changes to project files after dataset publication won’t affect the published dataset unless you explicitly republish the dataset.