On Data: Perspectives From An Incoming MLIS Candidate & A Recent Graduate

This post is the brain child of Celia Emmelhainz, who after asking me if I’d like to produce a biweekly blog roundup for this site, asked if I would be interested in sharing my own, and other, student perspectives on data librarianship. I, of course, jumped at the chance, and after tweeting and emailing students, I settled on a post that would feature my own thoughts as an incoming MLIS candidate at the University of Washington, and that of outgoing MLIS candidate (and now Information Literacy Instructor), Sarah Crissinger, from the University of Illinois Urbana-Champaign.

While Sarah’s comments will appear as a formal interview, my own will appear as a standard narrative. As we are at separate points in our education, with different backgrounds, our input is quite different. . .  and yet similar. Let’s begin:

I don’t believe I had ever thought about data and librarianship until I attended a mock class of Professor Jevin West’s “Intro to Big Data” course at the iSchool Preview event in November 2014. In that short, cursory introduction, West opened my mind to an important role for librarians in the 21st century – data literacy. While others around me discussed how to use big data to design new products, or sell the data as a product itself, my mind instinctively went to two places: education and research.

To explain why I intuitively shied away from turning data in money, and instead turning data into literacy and action, let me elaborate on the circuitous path that life took me on to get to, well, here.

As an undergraduate I took a paraprofessional role in a community college library that I grew in and excelled at for three and a half satisfying years. I knew when I went to university that I wanted a role in education, what I didn’t know was that librarianship in higher education would be the role for me. That said, despite those thoughts, I ditched the idea of library school in the final year of my undergraduate career and was instead seduced into the world of communications. I had taken a public relations internship with an urban public library system the summer after my junior year and six months later found myself accepted in the NASA Langley Aerospace Research Student Scholars (LARSS) program, in another public relations internship role, this time for the Department of Education & Outreach at the National Institute of Aerospace. This, as it turns out, was another role in which I could serve the world of higher ed. My NASA scholarship evolved into a role as a technical writer and then a broader communications role. Those years were truly a seminary time of my life. I was given access to research and researchers that I would have otherwise, I am sure, have known nothing about. It was during this time that I began to understand how much of our world is built on the back of data. Raw data. Data that can be ugly and unwieldy, yet wildly important. As a communicator working among researchers I understood both how data moved research forward and how data impacts what we see online, how we communicate, and how companies (and networks) categorize us and target information for us. As I became increasingly aware of how technology would connect more people, and more devices, my interest in life as a communicator diminished (although certainly an exciting field to be in) and I once again returned to my interest in education – specifically librarianship.

And now here I am, once again a library professional in a community college setting, soon to start an MLIS program.

I think we live at an exciting turning point in human history. With advancements in sensors and IoT technologies, we will have more information about our environment than ever – as well as information pitfalls. As more and more people come online, the potential for innovation could be (and has been) exponential. However, it’s not enough to be connected to information, a person has to know what do with it, why it exists, who created it and the merit it may or may not posses. There are, of course, also ethical concerns regarding privacy that are complicated by how efficiently modern technologies can identify not only the producer of a work, or data set, but where and when it was created/collected. In a blog post I put out on the future of academic libraries, I delve into why those very things will make databrarians and data literacy commonplace in the immediate future.

For now, for me, I believe data literacy can be as complex or as simple as a patron needs it to be. In my current role, I am given opportunities to provide reference support and I am happy to say I have had a few encounters where data literacy has played a central role. For “my students,” this often translates to encounters where we review charts of statistical data and discuss what the numbers mean, how they relate to one another, and the merit behind them and the organization who has published them. These are seemingly often moments of panic for students. I think the panic comes from a number of sources: 1) Data can be intimidating. Mostly because numbers are often intimidating;  2) The information either comes to them in a large web-based spreadsheet, a PDF or Excel file, (leading to the question: should I even open this?!) and 3) It’s often from a seemingly ominous source from which they are not sure if they should be downloading spreadsheets of information. Students, sometimes already suffer from a level of information/library anxiety as it is, as the world of data becomes ever more accessible, we will need to ease that anxiety. It is my belief, that at the community college level, on a broad level, all of those concerns can be assuaged by including data literacy into information literacy instruction.

In the meantime, I look forward to growing within my MLIS program at UW. As I wait for the program to begin I have begun to dabble in R, and have toyed a bit with data visualization tools. While in the program I intend on taking advantage of courses on information management and information technology, as well as building a relationship with the data services team at the UW libraries (this is your warning!).  All the while, I intend to continue to be a point of reference for data literacy for my current students and to – hopefully – impact how the library I work in approaches these topics in the future.

Those are my thoughts. Now for my interview with Sarah Crissinger, recent MLIS graduate from the iSchool at the Univeristy of Illinois Urbana-Champaign, and newly minted Information Literacy Librarian at Davidson College in Davidson, North Carolina.

What are your thoughts on data and its role in the 21st century information environment?

Data is becoming a more integral part of the information environment and, I would add, a more important part of our profession. For better or worse, data is becoming a major thread in the world we live in. It is a huge player in commercial and governmental decisions. Quantitative data is, in many disciplines, the backbone of research and securing grants. These new uses of data—many of which are used to support or preclude very important societal movements and innovations—are creating new ethical issues in areas like confidentiality, privacy, consent, and de-identification (for more about this, see one of my professor’s awesome publications on striking a balance between progress and protection in the context of data here).

Thus, I see librarians playing several roles in our new information environment. The obvious, practical role is to help the academy adjust to funding changes, navigate data management plans, and learn the best practices for reproducibility. But there is also an instructional/ advocacy role for librarians to play in this conversation. We can teach students about privacy issues, decision-making processes, and how to be skeptical of certain data collection practices (more on this soon). We have a great amount of expertise in metadata, documentation, searching, and evaluation. But, I believe, we are also in a position to utilize our instructional design and information literacy skills to teach data literacy in a critical, active way.

Going into your MLIS program, were you concerned at all with data literacy or data management?

Actually, no. I came into UIUC’s program hoping to learn more about reference and instruction. I used my first year to really focus on core classes on topics like academic libraries, instruction, reference, and technology.

During the fall of my second year, I took a course on Scholarly Communication. Everything changed! I found a topic area that really honed in on all of the things I had cared about all along: knowledge production, assessing the value of knowledge, the application of knowledge to society and community, and crowdsourcing all of these processes to make the world a better place. It was in that course that I learned about DMPs, IRs, OA, and other topics. I used the spring semester of my final year to take two deep dives into topics that interested me.

Do you feel that your MLIS program at UIUC provide options for exploring topics related to data?

Yes, fortunately!

If so, can you tell me about the courses you may have taken, or had the option to take?

The two courses that allowed me to take a deeper dive into data were entitled “Metadata in Theory and Practice” and “Scientific Data Policy Seminar.” These courses were almost polar opposites of each other but they both covered very important aspects of working with data. The metadata course gave me more insight into what a DMP would actually look like and why metadata is an important component of that. We covered a range of metadata schemas, including CDWAlite, Simple DC, METS, LOM, PREMIS, and others. I think that working more closely with these schemas has enabled me to be more conversant with more technical audiences about DMPS, IRs, and other scholarly communication topics.

The data policy seminar was more focused on discussion and exploration. My professor, Victoria Stodden, is a new faculty member at GSLIS. Her work focuses on reproducibility and transparency in the computational sciences. Thus, she cares deeply not only about sharing data but also the community sharing and building upon software and code. Her seminar was one of the best classes I have taken at GSLIS. We explored Congress documents, NSF policies, IP and copyright issues, funding structures, conference proceedings from transparency experts, and data, code, and software sharing mechanisms. We spent the course looking at the major players of the system (publishers, repositories, tenure committees, funding agencies, etc.), how their systems functioned and how they were rewarded, and then how they could potentially contribute to a culture of reproducibility. I have no training in scientific research and I think that one of the important lessons the course taught me is that understanding faculty workflows, reward systems, and academic hierarchies is just as important as telling faculty or administrators about reproducibility and why it is important. Both parties have to learn from each other.

I should also mention that GSLIS has a lot of data courses I didn’t have the opportunity to take. We also have two rigorous specializations, one in socio-technical data analytics (SODA) and one in data curation. I probably would have taken advantage of one of these if I had come in hoping to do something more advanced with data.

 As you seek postgraduate professional roles, would you feel comfortable in a role that requires significant work with data?

 Yes and no. I recently took an instruction position and I interviewed for positions in the realm of scholarly communication. I would absolutely be comfortable helping faculty walk through DMPs or walking them through metadata creation for the IR. I think I’m also pretty fluent in some of the language behind reproducibility and openness.

I’m not sure, however, I would be completely comfortable taking the reins on cleaning and analyzing data. I think most data visualization tools are approachable and I could figure them out with enough time. But I haven’t had to collect data, share inputs and outputs, write code or scripts to clean or manipulate data, share documentation with others, etc. I hope to develop this skillset further as I begin to do my own research. I also hope to continue to learn more tools that could be helpful in this area, like R and SPSS. I think, though, that being able to converse with different stakeholders about data and data practices while thinking critically about data analysis, curation, manipulation, etc. is just as (if not more) important as knowing a tool that might be obsolete in the next five years.

 Your ePortfolio discusses a desire to serve underserved communities. Do you see data literacy as part of serving those communities? If so, elaborate. 

 This is a really great question! I hope to write more extensively about this soon. But one of the modules in Dr. Stodden’s class covered ethics and privacy. There are so many examples I could discuss here—especially in the context of social justice and protecting all people—but I’d like to highlight just one because I think it’s approachable and could definitely be used to talk to undergraduate audiences about critical data practices.

As reproducibility, community, and crowdsourcing become more commonplace, our research workflows and habits will undoubtedly change. This means a lot of things for the academy: IRBs might need to redefine their protocols, policies might need to change to address commercial interests, research subjects might need more extensive informational sessions or counseling. I definitely don’t have all of the answers here! But Apple recently created a “research kit” where medical researchers can create apps so that iPhone users anywhere can participate in their studies. Apple’s explanation for developing the tool is as follows:

“ResearchKit gives the scientific community access to a diverse, global population and more ways to collect data than ever before…ResearchKit also makes it easier to recruit participants for large-scale studies, accessing a broad cross-section of the population—not just those within driving distance of an institution. Study participants can complete tasks or submit surveys right from the app, so researchers spend less time on paperwork and more time analyzing data. ResearchKit also enables researchers to present an interactive informed consent process” (Apple Introduces ResearchKit, Giving Medical Researchers the Tools to Revolutionize Medical Studies)

Sounds great, right? Maybe on face value. Don’t get me wrong—this work definitely has the potential to put medical research on the fast track and I think that finding cures is something we all want to contribute to. I just think there are still a lot of issues we—as library and information professionals—should consider. First and foremost, by crowdsourcing medical research to only smartphone users (assuming that Droid might soon have an equivalent), we leave a huge portion of the population out of this research, especially if it becomes a commonplace model for medical research. These populations might have specific health issues that just won’t be found, researched, or cured through this lens. In short, I’m guessing that we will probably just continue to learn more about the white, affluent, male’s health issues. (Note: This will just contribute to the existing trend of medical studies being focused on this specific population. To learn more, simply search for “medical research” and white and/or male. The results are pretty astounding).

Moreover, this type of research is in some ways breaking the protocol and boundaries we have in place. It is still unclear if a participant decides to end the study whether or not their previous data is wiped (It’s my understanding that it would be if they were in a regular medical study). Also, how is information de-identified, especially with GPS coordinates almost definitely being collected? How is privacy maintained? How rigorous is consent and how different is consent through a phone vs. talking to someone in person?

My point is that these advancements in research and data collection are exciting but also dangerous in some ways. It is not my assertion that we should give up on them or rule them out as an option for improving our world. I instead think that we have a responsibility to teach students about how these processes work and how they can think critically to help them improve so that we can have more equitable systems. I find this work fascinating and that’s where I hope to focus my interest in both data and underserved populations. Be on the look out for a #critlib discussion I am moderating on this topic as well as a publication on critical open practices.



One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *