Using Spreadsheets For Data#

Overview#

Materials Commons allows you to load data from spreadsheets through an ETL (Extract, Transform, Load) process. By following a few simple rules, Materials Commons can automatically create samples, computations, processing steps, measurements, and process attributes from your spreadsheet.

File Format Support#

  • Excel files (.xlsx, .xls)

  • Google Sheets

Best Practices for Spreadsheet Data#

  1. Data Organization

    • Use consistent column headers

    • One type of information per column

    • Avoid merging cells

    • Keep raw data separate from analysis

    • Include units in column headers where applicable

  2. File Naming

    • Use descriptive names

    • Include dates (YYYY-MM-DD format)

    • Example: 2024-03-15_tensile_test_results.xlsx

  3. Data Quality

    • Validate data before uploading

    • Use consistent units throughout

    • Document any data transformations

    • Include metadata sheets when necessary

Tip

Include a metadata sheet or separate file containing:

  • Data collection methods

  • Equipment used

  • Units of measurement

  • Date of collection

  • Researcher name

  • Any relevant calibration information

Spreadsheet Structure Guidelines#

  1. Header Row

    • Clear, descriptive column names

    • Units in parentheses where applicable

    • Example: Temperature (°C), Stress (MPa)

  2. Samples/Computations Column

    • Document any naming conventions captured in sample/computation name

    • Avoid special characters

    • First column is for sample/computation

    • Use c: in header to specify column is for computations, otherwise defaults to samples

  3. Worksheets Are Processing/Workflow Steps

    • The name of the work sheet is the processing/workflow step for the samples/computations

    • By default workflow order follows order of worksheets

  4. Data Rows

    • Consistent data format

    • No empty rows between data

    • Use standard numerical formats

    • Avoid special characters

  5. Files

    • Header rows starting with f: denote file references

    • Include full path to file in Materials Commons Project (not laptop path)

    • Project paths in Materials Commons start with a ‘/’

  6. Most Important Data

    • Don’t include all metadata, only the most important

    • Use file references to point to large sets of data, don’t include the data in the spreadsheet

  1. Documentation

    • Include a ‘i:README’ or ‘i:Notes’ sheet

    • The i: tells Materials Commons not to process that sheet

    • Document any calculations or formulas

    • Explain any color coding or formatting

    • Use i: Headers to add documentation to rows/worksheets

Warning

Excel and LibreOffice have compatibility issues with colons (‘:’) in worksheet names. To ensure cross-platform compatibility:

  • Use a dash (‘-’) instead of a colon

  • Examples: Use ‘i-README’ or ‘i-Notes’ instead of ‘i:README’ or ‘i:Notes’

  • This applies to all ignored worksheets and documentation sheets

This will prevent potential data corruption or worksheet access issues when sharing files between different spreadsheet applications.

Tips for Large Datasets#

  1. Organization

    • Split large datasets into logical chunks

    • Use separate worksheets for different test conditions

    • Keep raw and processed data separate

  2. Performance

    • Remove unnecessary formatting

    • Limit formulas in large datasets

Common Issues to Avoid#

  • Embedding multiple tables in one sheet

  • Using color alone to convey information

  • Inconsistent date formats

  • Hidden rows or columns

  • Complex nested formulas

  • Hardcoded values in formulas

Best Practices for Data Analysis#

  1. Data Processing

    • Document all processing steps

    • Keep original data unchanged

    • Use separate sheets for analysis

    • Include version information

  2. Quality Control

    • Verify formulas and calculations

    • Check for data consistency

    • Validate automated processes

    • Document any data filtering

Note

Always maintain a backup of your original spreadsheet files before importing or making modifications.

Note

Materials Commons includes automatic version control for files:

  • All files are automatically versioned when new versions are uploaded

  • Previous versions remain accessible and can be restored if needed

Warning

Google Sheets limitations:

  • Materials Commons cannot version Google Sheets documents

  • Changes made in Google Sheets are not tracked by Materials Commons

  • For version control, consider: - Using Excel files instead - Downloading periodic copies of your Google Sheet as Excel files - Using Google Sheets’ built-in version history feature

  • Important data should be backed up regularly if using Google Sheets

Basic Requirements#

Three mandatory rules must be followed when creating a spreadsheet for Materials Commons:

  1. The first row must contain headers

  2. The first column must contain sample or computation names

  3. Each processing step must be in a separate worksheet, with the worksheet name becoming the processing step name

Example of basic spreadsheet format:

Sample Name

Attribute 1

Attribute 2

Sample 1

4.0

5

Sample 2

4.1

5.01

Note

By default, all attributes are considered process attributes unless specified otherwise.

Units Specification#

To specify units, include them in parentheses in the attribute header. For example:

Sample Name

Attribute 1 (mm)

Attribute 2 (s)

Sample 1

4.0

5

Sample 2

4.1

5.01

Column Types#

Materials Commons uses keywords to interpret column types. Keywords:

  • Must appear as the first word in the column header

  • Are followed by a colon (:)

  • Are case-insensitive

  • Can have leading whitespace

Supported Keywords:

Keyword

Description

Example

s, sample

Specifies a sample attribute

s:grain size

p, process

Specifies a process attribute

p:temperature

f, file, files

Specifies file entries

file:/path

i, ignore, note, notes

Marks column to be ignored

note:comments

Note

To specify computations instead of samples in the first column, use the c: prefix in the header.

Example with column types:

Sample

s:GS(mm)

p:Time(sec)

p:Temp(k)

note:

i:valid?

file:/Dir1/Dir2

S1

42

100

400

comment

yes

image.jpg

Ignoring Columns#

Materials Commons provides several ways to mark columns for exclusion from processing:

  1. Using ignore keywords: - i: or ignore: - note: or notes:

  2. Special case: When these keywords are the only content in a header, the colon is optional

Example of ignored columns:

Sample

i:Valid?

notes

ignore:Comments

S1

No

Check settings

Initial run

S2

Yes

Verified

Final run

Note

All values in ignored columns are preserved in the spreadsheet but not processed by Materials Commons.

Ignoring Worksheets#

Materials Commons allows you to mark specific worksheets as ignored, enabling you to maintain documentation, charts, analysis, or other non-processable content within your spreadsheet without affecting the data processing workflow.

Methods to Ignore Worksheets#

There are two ways to mark a worksheet as ignored:

  1. Prefix the worksheet name with i: Example: i:Documentation

  2. Prefix the worksheet name with i- Example: i-Charts

Common Use Cases#

Ignored worksheets are useful for:

  • Documentation and readme information

  • Data visualization and charts

  • Analysis notes

  • Temporary calculations

  • Reference materials

  • Metadata sheets

  • Change logs

Best Practices#

When using ignored worksheets:

  • Use descriptive names that clearly indicate the worksheet’s purpose

  • Keep documentation organized and up-to-date

  • Consider creating a standardized i:README sheet for project documentation

  • Use consistent prefixes throughout your workbook

Note

Ignored worksheets are preserved in the spreadsheet but completely skipped during Materials Commons processing.

Tip

You can use ignored worksheets to maintain important project context without interfering with your data processing workflow.

Handling Blank Values#

Materials Commons handles missing or inapplicable data in several ways:

  1. Empty cells are automatically ignored during processing

  2. Special values that indicate missing data: - blank - n/a or N/A

Example of blank value handling:

Sample

S:grain size(mm)

P:Temperature(k)

S1

N/A

400

S2

3

blank

S3

450

Note

Case is not significant for special values - “N/A”, “n/a”, and “Blank” are all valid.

File References in Spreadsheets#

Overview#

Materials Commons allows you to associate files with samples and computations in your spreadsheets through a flexible referencing system. This enables you to link experimental data, images, analysis results, and other files directly to your samples or computations.

File Association Requirements#

Before setting up file references, ensure:

  1. File Availability * All referenced files must be uploaded to your Materials Commons project * Files cannot be uploaded through the spreadsheet itself

  2. Path Structure * All paths must start with a forward slash (‘/’) * Paths are relative to your Materials Commons project root * Example: /microscopy/sem/sample1.jpg

  3. File Selection Options * Individual files: /data/result.csv * Multiple files: Use semicolons as separators * Wildcards: Use * for pattern matching

Reference Syntax#

Files are referenced using either the file: or f: keyword prefix in column headers:

Pattern

Example

Usage

Single file

f:/data/sample1.csv

Link one specific file

Multiple files

f:/data/sample1.csv;/images/photo1.jpg

Link several specific files

Directory wildcard

f:/data/*

Link all files in a directory

Extension filter

f:/images/*.jpg

Link all files of specific type

With description

f:SEM Images:/microscopy/*.tif

Add context to file references

Common Patterns#

Here are some typical file association scenarios:

  1. Experimental Data ` f:/raw_data/experiment_*.csv `

  2. Multiple Result Files ` f:/results/thermal.csv;/results/mechanical.csv `

  3. Image Sets ` f:Microscopy Images:/sem/sample1/*.tif `

  4. Documentation ` f:Protocol Document:/protocols/procedure.pdf `

Tip

Use descriptive file paths and meaningful directory structures to maintain organized and easily navigable projects.

Warning

Ensure all referenced files exist in your project before processing the spreadsheet to avoid errors.

Prerequisites#

Warning

All referenced files must be pre-uploaded to your Materials Commons project before processing the spreadsheet.

Path Configuration#

  • All file paths are relative to your Materials Commons project root (‘/’)

  • Paths are independent of your local computer file system

  • Always use forward slashes (‘/’) in paths

  • Paths must start with a forward slash

Supported File Keywords#

You can reference files using either file: or the shorter f: prefix:

  1. Basic Path Reference ` file:/path/to/file.txt f:/path/to/file.txt `

  2. Path with Description ` file:Description of file:/path/to/file.txt f:Description of file:/path/to/file.txt `

  3. Empty Path (column will be processed but no files associated) ` file: f: `

Examples#

Here are some common usage patterns:

Purpose

Syntax

Example

Single file

f:/directory/filename

f:/images/sample1.jpg

Multiple files

f:/directory/file1;/directory/file2

f:/data/test1.csv;/data/test2.csv

With description

f:Description:/path/to/file

f:SEM Image:/microscopy/sem1.tif

All files in directory

f:/directory/*

f:/raw_data/*

Files by pattern

f:/directory/*.extension

f:/images/*.png

Best Practices#

  • Use consistent path structures across your project

  • Include descriptive file names that indicate content

  • Keep related files in organized directories

  • Use descriptions for better file context

  • Verify all files exist in the project before processing

Tip

Plan your project directory structure before setting up file references in your spreadsheets.

Example of file associations:

Sample

file:/Data

file:Images:/Pictures

S1

data.xlsx

sample1.jpg;sample2.jpg

S2

results.csv

/Other/special.png

Workflow Construction#

By default, processing steps follow worksheet order. For complex workflows:

  1. Add a “From” column to specify preceding steps

  2. Each worksheet represents a processing step

  3. Use worksheet names as step identifiers

Example workflow:

Sheet: SEM

Sample

From

S:grain size(mm)

S1

Heat Treatment

4

Sheet: Heat Treatment

Sample

From

P:Time(m)

P:Temp(k)

S1

30

400

Global Constants#

Use the ‘mc constants’ worksheet to specify values shared across all entries:

Sheet

Attribute

Value

Heat Treatment

p:Temperature(c)

400

SEM

p:Magnification

2x

Loading Spreadsheets#

To load a spreadsheet:

  1. Upload the spreadsheet to your Materials Commons project

  2. Select “Create Experiment”

  3. Choose your spreadsheet from the dropdown

  4. For Google Sheets, share with “Anyone with the link” and paste the URL

Note

Materials Commons ignores blank cells and cells containing “blank” or “n/a”.