The Data-Driven Discovery Initiative (DDD) is beginning a multi-month external evaluation later this month, and the foundation will share the report and our responses publicly. We hire professional evaluation consultants to conduct these external evaluations, and have done so for many years for all of our major initiatives. We use their reports to gain independent assessments of progress, challenges, and opportunities, and then develop revised strategic plans accordingly.
This is the first of a two-part post and will describe the original motivations for funding DDD. In the next post, I will describe the research questions we are focusing on for this “mid-term” external evaluation, and how these questions are linked with our funding choices — or what foundation folk often call “strategies” or “portfolios”.
In the beginning
The foundation’s history with data-intensive science started with a series of grants made by former foundation employees John Marchioni, Steve Cheng, Ed Yoon, Jim Omura, and sometimes me (in those early years I weighed in as an IT consultant on some of these projects). We made a series of grants meant to help bridge conservation science with modern (at the time) information technologies. Marchioni coined the term Eco-Intelligence, a clever play on business intelligence (what some call “proto” data science) meets ecological informatics. Building and extending on this work, Jim Omura, David Kingsbury, and I began funding computer clusters, sensor networks, database projects, and other heavy-duty technology infrastructure for data-intensive science efforts. These one-off projects were successful, but were always limited in scope as we simply couldn’t support every new lab that began to have these data-intensive needs. Around 2008, David Kingsbury brought me over to the science team full time to begin looking at how we might build a cohesive effort to move things forward.
If you recall back to 2008, “big data” was still relatively a new term, as was “data deluge” — with pictures of the iconic Japanese tsunami print at the front of everyone’s slide deck — data science wasn’t sexy yet, and informatics-everything was on the ascendant. We began initially looking at environmental science data, partly because we have a sister program at the foundation focused on Environmental Conservation. We quickly realized that the challenges of environmental science data were shared by other disciplines and broadened scope to all scientific data.
Shortly after I started looking into this, the 2009 Great Recession happened, which curtailed new Moore Foundation investments for a while, but also provided time and space for us to dig into the issues surrounding the data deluge in more detail. By 2011 Vicky Chandler was at the helm of the Science Program, the foundation had renewed purchasing power, and there was a lot more research on a suite of new phenomena sweeping academic research: the popularity of the Fourth Paradigm concept, artificial intelligence emerging from its hiatus, and the idea of data-driven research to name a few. We started to see a pattern in groups that were leveraging the data deluge for new scientific discoveries.
The rise of the academic data scientist, aka — the data-driven researcher
At the center of any lab, department, or other group working with massive, complex and/or fast-moving data, there is an individual or tight-knit group that plays an active role in harnessing the information for discovery. These multi-/inter-/trans-disciplinary people and teams are the driving force behind data-driven research, and the tools, methods, and practices of these people are the bedrock for all of modern data science. They are not just computer scientists applying their bag of tricks, though some have deep formal training and accomplishments in CS. They aren’t just natural/social scientists with only enough CS and stats training to be dangerous. They also aren’t just statisticians with a cursory knowledge of domain science. Somewhere in the middle of that now-famous Venn diagram lay data-driven researchers — the key to success for leveraging modern scientific data. I wrote a bit about career paths for these folks in another blog.
These data scientists are developing new practices (i.e., methods) that others can use to harness their own data. We think the people and practice of data-driven research need to be recognized as first-class participants in the scientific enterprise. We think they should be rewarded in the same way tools and people in the deep domain silos are rewarded and recognized. And so we set out in 2012 to do just that, to mark a path towards how institutions can support data-driven researchers. To shine a spot light on the kinds of people who exemplify this new kind of research. Finally, to amplify and stimulate the kinds of practices (i.e., tools, techniques, and training) needed to help everyone leverage their data for new discoveries.
Part II will dive into why we think these three “strategies” or approaches are a good start, and also what we, under our third champion for data intensive science, Dr. Bob Kirshner, hope to learn about our portfolio in the coming months.
*thanks to Carly Strasser for reviewing and suggesting changes
Message sent
Thank you for sharing.