Using Spreadsheets For Data#
Overview#
Materials Commons allows you to load data from spreadsheets through an ETL (Extract, Transform, Load) process. By following a few simple rules, Materials Commons can automatically create samples, computations, processing steps, measurements, and process attributes from your spreadsheet.
File Format Support#
Excel files (.xlsx, .xls)
Google Sheets
Best Practices for Spreadsheet Data#
Data Organization
Use consistent column headers
One type of information per column
Avoid merging cells
Keep raw data separate from analysis
Include units in column headers where applicable
File Naming
Use descriptive names
Include dates (YYYY-MM-DD format)
Example:
2024-03-15_tensile_test_results.xlsx
Data Quality
Validate data before uploading
Use consistent units throughout
Document any data transformations
Include metadata sheets when necessary
Tip
Include a metadata sheet or separate file containing:
Data collection methods
Equipment used
Units of measurement
Date of collection
Researcher name
Any relevant calibration information
Spreadsheet Structure Guidelines#
Header Row
Clear, descriptive column names
Units in parentheses where applicable
Example:
Temperature (°C)
,Stress (MPa)
Samples/Computations Column
Document any naming conventions captured in sample/computation name
Avoid special characters
First column is for sample/computation
Use c: in header to specify column is for computations, otherwise defaults to samples
Worksheets Are Processing/Workflow Steps
The name of the work sheet is the processing/workflow step for the samples/computations
By default workflow order follows order of worksheets
Data Rows
Consistent data format
No empty rows between data
Use standard numerical formats
Avoid special characters
Files
Header rows starting with f: denote file references
Include full path to file in Materials Commons Project (not laptop path)
Project paths in Materials Commons start with a ‘/’
Most Important Data
Don’t include all metadata, only the most important
Use file references to point to large sets of data, don’t include the data in the spreadsheet
Documentation
Include a ‘i:README’ or ‘i:Notes’ sheet
The i: tells Materials Commons not to process that sheet
Document any calculations or formulas
Explain any color coding or formatting
Use i: Headers to add documentation to rows/worksheets
Warning
Excel and LibreOffice have compatibility issues with colons (‘:’) in worksheet names. To ensure cross-platform compatibility:
Use a dash (‘-’) instead of a colon
Examples: Use ‘i-README’ or ‘i-Notes’ instead of ‘i:README’ or ‘i:Notes’
This applies to all ignored worksheets and documentation sheets
This will prevent potential data corruption or worksheet access issues when sharing files between different spreadsheet applications.
Tips for Large Datasets#
Organization
Split large datasets into logical chunks
Use separate worksheets for different test conditions
Keep raw and processed data separate
Performance
Remove unnecessary formatting
Limit formulas in large datasets
Common Issues to Avoid#
Embedding multiple tables in one sheet
Using color alone to convey information
Inconsistent date formats
Hidden rows or columns
Complex nested formulas
Hardcoded values in formulas
Best Practices for Data Analysis#
Data Processing
Document all processing steps
Keep original data unchanged
Use separate sheets for analysis
Include version information
Quality Control
Verify formulas and calculations
Check for data consistency
Validate automated processes
Document any data filtering
Note
Always maintain a backup of your original spreadsheet files before importing or making modifications.
Note
Materials Commons includes automatic version control for files:
All files are automatically versioned when new versions are uploaded
Previous versions remain accessible and can be restored if needed
Warning
Google Sheets limitations:
Materials Commons cannot version Google Sheets documents
Changes made in Google Sheets are not tracked by Materials Commons
For version control, consider: - Using Excel files instead - Downloading periodic copies of your Google Sheet as Excel files - Using Google Sheets’ built-in version history feature
Important data should be backed up regularly if using Google Sheets
Basic Requirements#
Three mandatory rules must be followed when creating a spreadsheet for Materials Commons:
The first row must contain headers
The first column must contain sample or computation names
Each processing step must be in a separate worksheet, with the worksheet name becoming the processing step name
Example of basic spreadsheet format:
Sample Name |
Attribute 1 |
Attribute 2 |
---|---|---|
Sample 1 |
4.0 |
5 |
Sample 2 |
4.1 |
5.01 |
Note
By default, all attributes are considered process attributes unless specified otherwise.
Units Specification#
To specify units, include them in parentheses in the attribute header. For example:
Sample Name |
Attribute 1 (mm) |
Attribute 2 (s) |
---|---|---|
Sample 1 |
4.0 |
5 |
Sample 2 |
4.1 |
5.01 |
Column Types#
Materials Commons uses keywords to interpret column types. Keywords:
Must appear as the first word in the column header
Are followed by a colon (:)
Are case-insensitive
Can have leading whitespace
Supported Keywords:
Keyword |
Description |
Example |
---|---|---|
s, sample |
Specifies a sample attribute |
s:grain size |
p, process |
Specifies a process attribute |
p:temperature |
f, file, files |
Specifies file entries |
|
i, ignore, note, notes |
Marks column to be ignored |
note:comments |
Note
To specify computations instead of samples in the first column, use the c:
prefix in the header.
Example with column types:
Sample |
s:GS(mm) |
p:Time(sec) |
p:Temp(k) |
note: |
i:valid? |
|
---|---|---|---|---|---|---|
S1 |
42 |
100 |
400 |
comment |
yes |
image.jpg |
Ignoring Columns#
Materials Commons provides several ways to mark columns for exclusion from processing:
Using ignore keywords: -
i:
orignore:
-note:
ornotes:
Special case: When these keywords are the only content in a header, the colon is optional
Example of ignored columns:
Sample |
i:Valid? |
notes |
ignore:Comments |
---|---|---|---|
S1 |
No |
Check settings |
Initial run |
S2 |
Yes |
Verified |
Final run |
Note
All values in ignored columns are preserved in the spreadsheet but not processed by Materials Commons.
Ignoring Worksheets#
Materials Commons allows you to mark specific worksheets as ignored, enabling you to maintain documentation, charts, analysis, or other non-processable content within your spreadsheet without affecting the data processing workflow.
Methods to Ignore Worksheets#
There are two ways to mark a worksheet as ignored:
Prefix the worksheet name with
i:
Example:i:Documentation
Prefix the worksheet name with
i-
Example:i-Charts
Common Use Cases#
Ignored worksheets are useful for:
Documentation and readme information
Data visualization and charts
Analysis notes
Temporary calculations
Reference materials
Metadata sheets
Change logs
Best Practices#
When using ignored worksheets:
Use descriptive names that clearly indicate the worksheet’s purpose
Keep documentation organized and up-to-date
Consider creating a standardized
i:README
sheet for project documentationUse consistent prefixes throughout your workbook
Note
Ignored worksheets are preserved in the spreadsheet but completely skipped during Materials Commons processing.
Tip
You can use ignored worksheets to maintain important project context without interfering with your data processing workflow.
Handling Blank Values#
Materials Commons handles missing or inapplicable data in several ways:
Empty cells are automatically ignored during processing
Special values that indicate missing data: -
blank
-n/a
orN/A
Example of blank value handling:
Sample |
S:grain size(mm) |
P:Temperature(k) |
---|---|---|
S1 |
N/A |
400 |
S2 |
3 |
blank |
S3 |
450 |
Note
Case is not significant for special values - “N/A”, “n/a”, and “Blank” are all valid.
File References in Spreadsheets#
Overview#
Materials Commons allows you to associate files with samples and computations in your spreadsheets through a flexible referencing system. This enables you to link experimental data, images, analysis results, and other files directly to your samples or computations.
File Association Requirements#
Before setting up file references, ensure:
File Availability * All referenced files must be uploaded to your Materials Commons project * Files cannot be uploaded through the spreadsheet itself
Path Structure * All paths must start with a forward slash (‘/’) * Paths are relative to your Materials Commons project root * Example:
/microscopy/sem/sample1.jpg
File Selection Options * Individual files:
/data/result.csv
* Multiple files: Use semicolons as separators * Wildcards: Use*
for pattern matching
Reference Syntax#
Files are referenced using either the file:
or f:
keyword prefix in column headers:
Pattern |
Example |
Usage |
---|---|---|
Single file |
|
Link one specific file |
Multiple files |
|
Link several specific files |
Directory wildcard |
|
Link all files in a directory |
Extension filter |
|
Link all files of specific type |
With description |
|
Add context to file references |
Common Patterns#
Here are some typical file association scenarios:
Experimental Data
` f:/raw_data/experiment_*.csv `
Multiple Result Files
` f:/results/thermal.csv;/results/mechanical.csv `
Image Sets
` f:Microscopy Images:/sem/sample1/*.tif `
Documentation
` f:Protocol Document:/protocols/procedure.pdf `
Tip
Use descriptive file paths and meaningful directory structures to maintain organized and easily navigable projects.
Warning
Ensure all referenced files exist in your project before processing the spreadsheet to avoid errors.
Prerequisites#
Warning
All referenced files must be pre-uploaded to your Materials Commons project before processing the spreadsheet.
Path Configuration#
All file paths are relative to your Materials Commons project root (‘/’)
Paths are independent of your local computer file system
Always use forward slashes (‘/’) in paths
Paths must start with a forward slash
Supported File Keywords#
You can reference files using either file:
or the shorter f:
prefix:
Basic Path Reference
` file:/path/to/file.txt f:/path/to/file.txt `
Path with Description
` file:Description of file:/path/to/file.txt f:Description of file:/path/to/file.txt `
Empty Path (column will be processed but no files associated)
` file: f: `
Examples#
Here are some common usage patterns:
Purpose |
Syntax |
Example |
---|---|---|
Single file |
|
|
Multiple files |
|
|
With description |
|
|
All files in directory |
|
|
Files by pattern |
|
|
Best Practices#
Use consistent path structures across your project
Include descriptive file names that indicate content
Keep related files in organized directories
Use descriptions for better file context
Verify all files exist in the project before processing
Tip
Plan your project directory structure before setting up file references in your spreadsheets.
Example of file associations:
Sample |
||
---|---|---|
S1 |
data.xlsx |
sample1.jpg;sample2.jpg |
S2 |
results.csv |
/Other/special.png |
Workflow Construction#
By default, processing steps follow worksheet order. For complex workflows:
Add a “From” column to specify preceding steps
Each worksheet represents a processing step
Use worksheet names as step identifiers
Example workflow:
Sheet: SEM
Sample |
From |
S:grain size(mm) |
---|---|---|
S1 |
Heat Treatment |
4 |
Sheet: Heat Treatment
Sample |
From |
P:Time(m) |
P:Temp(k) |
---|---|---|---|
S1 |
30 |
400 |
Global Constants#
Use the ‘mc constants’ worksheet to specify values shared across all entries:
Sheet |
Attribute |
Value |
---|---|---|
Heat Treatment |
p:Temperature(c) |
400 |
SEM |
p:Magnification |
2x |
Loading Spreadsheets#
To load a spreadsheet:
Upload the spreadsheet to your Materials Commons project
Select “Create Experiment”
Choose your spreadsheet from the dropdown
For Google Sheets, share with “Anyone with the link” and paste the URL
Note
Materials Commons ignores blank cells and cells containing “blank” or “n/a”.