ObjTables builds on a community initiative to make supplementary spreadsheets more reusable. The long-term goal is to create an ecosystem of reusable data for comparative and integrative research. We invite the community to share feedback or get involved.
ObjTables provides simple markup for indicating the class and attribute encoded into each worksheet and column.
ObjTables provides a simple format for describing the types of objects encoded in a spreadsheet, their attributes, and their possible relationships.
ObjTables provides software for using schemas to parse, validate, compare, compose, and analyze annotated spreadsheets.
!!!ObjTables objTablesVersion='1.0.0' date='2020-03-14 13:19:04' |
---|
!!ObjTables type='Data' class='Person' tableFormat='row' | |||
---|---|---|---|
!Name | !Company | !Email address | !Phone number |
Mark Zuckerberg | zuck@fb.com | 650-543-4800 | |
Reed Hastings | Netflix | reed.hastings@netflix.com | 408-540-3700 |
!!ObjTables type='Schema' tableFormat='row' | ||||||
---|---|---|---|---|---|---|
!Name | !Type | !Parent | !Format | !Verbose name | !Verbose name plural | |
Person | Class | row | Person | People | ||
name | Attribute | Person | String(primary=True, unique=True) | Name | ||
company | Attribute | Person | String | Company | ||
email_address | Attribute | Person | Email address | |||
phone_number | Attribute | Person | String | Phone number |
ObjTables enables users to leverage programs such as Microsoft Excel and LibreOffice Calc as graphical interfaces for viewing and editing datasets. ObjTables uses following features:
To make it easy to build datasets, the ObjTables software can generate template spreadsheets for schemas with a table of contents, skeletons for the tables and columns, inline help, dropdown menus, and basic validation.
The ObjTables software leverages Git to make it easy to build datasets iteratively, revision datasets, and track their provenance, including when each revision was made, who made it, and why it was made.
To make it easy to build schemas iteratively, the ObjTables software can revision schemas, as well as migrate datasets between different versions of schemas (e.g., adding, removing, and renaming tables and columns).
ObjTables makes it easy to validate and debug datasets at multiple levels:
To help users build large datasets, the ObjTables software can merge datasets by identifying and fusing common objects. ObjTables can also decompose datasets into smaller, more manageable pieces by cutting relationships.
To help users compare and review changes to datasets, the ObjTables software can determine if datasets are semantically equal and identify their differences.
The ObjTables Python package makes it easy to find objects in datasets and use Python to conduct complex analyses of datasets such as numerical simulations.
ObjTables can pretty print datasets with tables of contents, formatted table titles and column headings, and inline help.
To help users understand schemas, ObjTables can generate UML diagrams.
ObjTables schemas capture the format of each table, including the name and data type of each column, which cells represent relationships among the entries in the tables, and constraints on the value of each cell. ObjTables supports three modes of encoding relationships into cells in tables.
ObjTables was designed to help users work with complex data with the ease of spreadsheets and the rigor of schemas. ObjTables excels at cases where datasets need to be both human and machine-readable, such as supplementary materials of journal articles. ObjTables is also well-suited to emerging fields which need to quickly build new formats for new types of data.
Although supplementary spreadsheets contain valuable data, supplementary spreadsheets are hard to reuse because they often contain errors and often capture data ad hoc.
ObjTables enables authors to create high-quality datasets that are both human- and machine-readable: (a) authors can use ObjTables to debug their data, (b) authors can use ObjTables to pretty print data with tables of contents and inline help, (c) authors can publish schemas for parsing their data, and (d) readers can use these schemas to parse and analyze published data with minimal effort.
Research often involves novel datasets and models that require new formats. Unfortunately, the substantial effort needed to reuse these custom formats is a frequent barrier to collaboration.
ObjTables makes it easier to share data and models with collaborators by (a) enabling researchers to clearly describe the structure of their data or model with a schema, (b) enabling researchers to capture metadata about their data or model, (c) providing researchers software tools for validating their data, and (d) enabling collaborators to use these schemas to quickly parse data from colleagues.
Many fields aim to understand how behaviors emerge from complex networks. This often requires integrating diverse data. For example, systems biology aims to understand how cellular behavior emerges from genotype, often using genomics and other data. Spreadsheets are a popular tool for merging data because they are easy to use. However, spreadsheets only support a few data types, and spreadsheets have limited support for multi-dimensional data. In addition, it is difficult to debug spreadsheets.
ObjTables makes it easy to build, validate, and analyze complex datasets: (a) users can use spreadsheets to assemble diverse data, (b) users can quickly define schemas for their data, and (c) users can use these schemas to validate their data and parse it into object-oriented data structures for further analysis in languages such as Python. For example, we have used ObjTables to integrate data about the biochemistry of H1 human embryonic stem cells.
ObjTables also makes it easy to build datasets iteratively over time by helping users revision data with Git and migrate their data as they revise their schemas.
New areas of science often require new types of data and new kinds of models. In turn, this often requires new formats to capture these data and models and new software for working with these formats. Creating these formats is often an obstacle for new domains that have limited resources. Furthermore, evolving these formats as new approaches emerge is challenging because this often requires updating the software tools and converting old files to the new format.
ObjTables addresses this issue by making it easy to define schemas for domain-specific data and providing software tools for parsing, manipulating, and validating data encoded in these schemas. For example, we have used ObjTables to create, WC-KB , a format for the experimental omics, biochemical, and physiological data needed to model cellular biochemistry. We have also used ObjTables to create, WC-Lang , a format for whole-cell models of all of the biochemical activity in a cell. Creating these formats required minimal code.
Extensive examples, interactive tutorials, and documentation for the ObjTables formats and software tools are available through the links below.
The getting started page contains quick guides for (a) authors for creating reusable spreadsheets and (b) readers for reusing spreadsheets from other investigators.
Installation instructions for the command-line program are available at docs.karrlab.org. A Dockerfile for building an Ubuntu Linux image with ObjTables is available here .
Installation instructions for the Python package are available at docs.karrlab.org. A Dockerfile for building an Ubuntu Linux image with ObjTables is available here .
A Jupyter notebook with interactive tutorials is available at sandbox.karrlab.org.
Documentation for the formats for schemas and the formats for datasets is available at objtables.org/docs.
Documentation for the command-line program is available inline by running obj-tables --help.
An introduction to the Python package is available at objtables.org/docs. Detailed documentation is available at docs.karrlab.org.