JANUARY 28 2025
Case study: Hypermode + MIT
How the Massachusetts Institute of Technology uses graphs & AI to enable collaborative polymer research at scale
In the field of polymer research, structuring and sharing data presents a number of unique technical challenges. The CRIPT platform - powered by Hypermode's Dgraph - helps researchers across academia and industry collaborate on polymer research at a speed and scale never before imagined.
About the Project
The Massachusetts Institute of Technology (MIT) is home to a large portfolio of research projects and consortia. One such project - funded in part by the National Science Foundation (NSF) - is the Community Resource for Innovation in Polymer Technology or CRIPT.
Within the field of chemistry, polymers are a type of large, complex molecule that present unique challenges when it comes to communicating their size, structure, and interactions, especially in the form of research data. CRIPT is an ecosystem of tools for polymer scientists to capture and share their data with colleagues at both academic institutions and commercial companies.
As they end the third year of the project in 2024, CRIPT is preparing to spin off from MIT and become their own entity.
The Challenge
One of the biggest difficulties facing polymer researchers is the ability to capture, share, and access each others' data.
Different labs and research teams all use different tools to capture and process their own data, making collaboration difficult, even among the same organizations - let alone replicating the experiments of other labs or innovating on existing material data.
For example, if an enterprise like Dow or BASF is collaborating with an academic institution on a specific piece of polymer research, the industrial and academic labs often struggle to understand the structure of each other's data. While Electronic Lab Notebooks (ELNs) have helped industry-academic collaborations in other fields of chemistry, ELNs often don't work well - or at all - for the size and complexity of polymer molecules.
Then, once research results do get published, polymer scientists can find it difficult to structure and share their findings data in an efficient manner.
“We have a lot of institutions who are just trying to structure their data prior to publication,” said Ardiana Osmani, the Project Manager for CRIPT.
In addition, scientists and research teams face other data challenges, including:
-
Losing track of data between different systems and tools
-
Processing huge amounts of molecular data
-
Capturing their own experiment data in a structural way because of the stochastic nature of polymer molecules
When Osmani joined the CRIPT project at MIT, the team was using MongoDB as the platform's primary data store. However, as a document database, Mongo wasn't delivering the results the CRIPT team needed.
From there, the CRIPT team switched to a generic SQL database. It was easy for the team to use, but Osmani said they ended up storing a lot of empty rows and objects in the relational database since their own data processing needs didn't match the data model.
In order to meet the requirements of the researchers, institutions, and companies participating in CRIPT, the team needed a new database to power their platform.
The Solution
Osmani had previous experience with Dgraph prior to joining MIT.
“When we weren't getting the results we wanted, I started pushing for Dgraph from day one,” said Osmani. “Chemists think visually, so a graph database just made sense since it has a flexible and easy-to-visualize data model.”
In the second year of the project, the CRIPT team migrated to Dgraph. Switching to a graph database allowed them to build out a number of the needed tools and features of the CRIPT platform while also scaling to meet the demands of the growing number of institutions and enterprises joining the project.
Some of the features of the CRIPT platform that Dgraph enables include:
-
A data model that empowers chemists and other researchers to structure their data in a way that's easy to understand by fellow scientists.
-
Big Smiles Line Notation, which enables researchers to create a string representation of a polymer using a unique identifier.
-
A polymer-specific collaboration tool that allows scientists to work simultaneously on the same dataset while recording who added or modified different aspects of the data.
-
A scalable search engine where researchers can search for any polymer - in whole or in part - using the proprietary Big Smarts Query Language with results ranked by similarity and recency of relevant research papers. Python software development kits (SDKs) for technical researchers and data scientists, and a user-friendly web interface for researchers who don't have technical expertise.
For the CRIPT team, the results of these innovations - in terms of the volume of data and the resulting collaborations of their ecosystem - have been astounding.
The Results
CRIPT has drawn a lot of interest from academic institutions and commercial enterprises who want to join the platform to manage, organize, and share their data. For university researchers, CRIPT allows them to make their data accessible to other polymer researchers after a paper is published, and having the data all be on the same platform benefits everyone.
Industry partners - including the likes of Dow Chemical, BASF, P&G, and DuPont - have already shared their non-proprietary datasets on the CRIPT platform. In addition to competitive prestige, these enterprises are keen to have their names listed as contributors to future research based on their data.
One enterprise team said the primary value of CRIPT for them was being ahead of the curve: As early adopters of the platform, they were in a better position to leverage the shared data more efficiently as the ecosystem of polymer researchers grew.
The end goal for many industry partners, said Osmani, is to have CRIPT run locally against their own data using property prediction, machine learning, and more to speed up material discovery and other innovations - sometimes even by 10-fold.
“But none of this collaborative research, whether academic or industrial, would be possible if we were trying to run the CRIPT platform on a legacy database model,” said Osmani. “Dgraph helped make all of this possible.”
Future Plans
In addition to spinning off of MIT, the CRIPT team has a number of future plans for the platform and polymer research ecosystem. Here are a few:
-
Building out a machine learning suite for the platform
-
Adding AI outlier analysis to the data validation process
-
Having a Big Smiles Line Notation string auto-generate a depiction of the molecule; conversely, if a user draws a molecule, having CRIPT auto-generate the Big Smiles string
-
Creating advanced search features that allow users to filter based on temperature, repeat substructures, and more
“We have a lot of interest from researchers across both academia and industry,” said Osmani. “CRIPT is an ecosystem for all, and we're happy to serve as that connecting point.”
Learn more about why developers love using Dgraph and sign up for a free trial of Dgraph Cloud