NSF Backs Clowder Open-Source Data Management Tool With $5M

2/28/2019 By the National Center for Supercomputing Applications

Clowder, designed to manage large collections of data now essential to science, is a key part of the 4CeeD framework developed by Illinois CS Professor Klara Nahrstedt.

Written by By the National Center for Supercomputing Applications

The National Science Foundation has awarded $5 million to extend the community of researchers who use Clowder, an open-source data management tool that was developed at the National Center for Supercomputing Applications.

Klara Nahrstedt
Klara Nahrstedt

Clowder was designed to help preserve, share, navigate, and reuse large collections of data that are now essential to scientific discoveries and is a key part of the 4CeeD framework developed by Illinois Computer Science Professor Klara Nahrstedt. The data navigation needs addressed by Clowder are also important when considering the growing number of research areas where data and tools must span multiple domains.

“Clowder was built from common needs across different research communities,” said Kenton McHenry, a principal research scientist at NCSA who is a principal investigator on the project along with Nahrstedt and Civil and Environmental Engineering Professor Praveen Kumar. NCSA Senior Research Programmer Luigi Marini is Clowder’s software architect.

Clowder already has had a major impact on materials and semiconductor research areas through the 4CeeD system, whose development was funded by NSF and led by by Nahrstedt. She is also director of the Coordinated Science Laboratory.

But using NSF support, Kumar will lead an effort to work with nine NSF Critical Zone Observatories across the United States – where researchers study what they call the critical zone, the surface of the Earth, where atmosphere, soil, water, and ecosystems meet -- to help organize data and demonstrate the applicability of the system to their interdisciplinary work.

“We want to use this system to do scientific investigation using this cross-observatory data, and the purpose is to make sure that the systems put in place are designed to support valuable science investigation rather than being arbitrarily stacked together,” Kumar said. “The scientific investigation may create requirements for the organization of the system, the architecture of the system.”

An integral part of using the types of large sets of data common to all kinds of research is the use of metadata for searching and sorting. Clowder helps automate part of that often laborious and tedious work.

Clowder also makes it easier for users to access advanced HPC and cloud resources and analysis tools. Clowder’s auto-curation feature allows researchers to upload data into a Dropbox-like interface, and trigger complex analysis tools operating in the background.

Through 4CeeD, material scientists and semiconductor fabrication researchers will continue to use 4CeeD to capture, curate, coordinate, and distribute their data from scientific instruments, such as microscopes, to private cloud infrastructure.

“Discovering new materials can take decades, in part due to the time it takes to conduct research, thanks to the loss of knowledge that occurs when vital information is tossed out or is inaccessible,” Nahrstedt said. “4CeeD enables researchers to capture, curate, analyze, and correlate instrument data during experiments in real-time, search for experimental data with specific instrument parameters and receive insights into their own work, a task that would not be possible without the power of the Clowder system.”

The Clowder project will be conducted in conjunction with the Coordinated Science Lab and the Department of Civil and Environmental Engineering.

Before the NSF grant, Clowder had never been funded on its own.

“All of us are looking forward to building this roadmap for future software needs from the research community by bringing together these partners,” McHenry said.


Share this story

This story was published February 28, 2019.