Earlier this month, researchers from across industry, academia, and Department of Energy (DOE) national laboratories gathered in Santa Fe to discuss data-intensive applied science at CoDA (Conference of Data Analysis) 2016:
Co-sponsored by the Center for Nonlinear Studies and the Information Science and Technology Institute at Los Alamos National Laboratory, the conference invited speakers to explore six distinct themes over three days:
- Power Grid Data
- Subsurface Modeling
- Cyber Security
- Data Analysis at Exascale
- Multisource Data
- Really Expensive Data
For instance, Nina Lanza (Los Alamos National Laboratory) discussed her use of the ChemCam onboard Curiosity to analyze the composition of Mars’ surface (images here). Ruby Leung (Pacific Northwest National Laboratory) discussed the challenges of modeling climate change using multisource data sets. And, with humor, Earl Lawrence (Los Alamos National Laboratory) reminded us of the very dangerous ramifications of unsecured networks for cyber-physical security issues in the 21st century.
— Joshua Finnell (@JoshuaFinnell) March 3, 2016
Data management was a continuing concern for DOE presenters, whether working with multiple CSV files or performing data analysis at exascale.
Such data management challenges will sound familiar to the data librarian: the need for data citation standards, standardized metadata within and across scientific disciplines, and access and preservation to both ephemeral and big data sets. These issues, of course, intersect with the need for transparency, reproducibility, and ethical scientific research. Although you may not realize it, opportunities abound for data librarians across the DOE national laboratories.
At the conference banquet, Rayid Ghani (program director of Data Science for Social Good at U Chicago) discussed the intersection between data science, public policy, and social good. Perhaps you remember Rayid as the Chief Scientist for the 2012 Obama Campaign:
Blending computational analysis and public policy, the Data Science for Social Good Fellowships bring together data scientists, nonprofits, and government on social impact projects such as building early warning systems to Prevent Negative Police-Public Interactions and reducing blight in Cincinnati Neighborhoods.
Bluegrass, Blight, and the Future of Cities
How a fiddler and an astrophysicist introduced predictive analytics to Cincinnati
Listening to Rayid speak passionately about the need for data scientists to define a problem from a myriad of perspectives, before coding a solution had me reflect on an initiative at the Graduate School of Library and Information Science (GSLIS) at the University of Illinois. The Community Informatics and Socio-technical Data Analytics programs at GSLIS focus on how data can both empower and disenfranchise communities. (The geographic proximity of GSLIS and the University of Chicago also make an appealing trajectory for a budding data librarian.)
Perhaps the best takeaway from the conference was Kary Myers minting June 13th Statistics and Beer Day in honor of William Sealy Gosset’s birthday. As a Guinness employee, Gosset applied statistics to select the best barley for brewing, and is remembered by his pseudonym, Student, in Student’s t-distribution:
I fully support Kary’s call to action. Let us unite, as data librarians, on June 13th and hoist a drink and a data problem in unison!
If you’d like to learn more, Statistical Analysis and Data Mining will publish a special journal issue featuring work presented at CoDA 2016. Of course, the deadline to submit for presenters is June 13th, 2016.
Joshua Finnell (@joshuafinnell) is a librarian at Los Alamos National Laboratory working on issues related to scholarly communication and data management. His work has appeared in New Library World, Library Philosophy and Practice, and Digital Library Perspectives. He is the recipient of the 2015 DLF+FORCE11 Cross-Pollinator Award. As a corollary, you can hoist a drink with him in person at the FORCE2016 Conference in Portland.