Data Science and the Art of Dogfooding

dogfooding

Why should library professionals learn about data?

Dogfooding and Data Science

Eating your own dogfood,” or “dogfooding,” is the act of using the product you are designing. When the designer of a product actually uses the product in their everyday life, they hit the pain points that a user would experience, before the product ever goes to market. A good designer will keep improving the product until it meets her or his own standards, and only then is it fit for consumption.

Data Science” is a phrase you may be more familiar with. Like “Big Data,” data science is a buzzword that’s reached critical mass. Suddenly, everyone is aware of it even if they aren’t exactly sure what “it” is. There are lots of rival definitions, but for the sake of simplicity let’s just say that the main tasks of a data scientist are finding data and using it to learn new things, usually with nifty visualizations that make their findings easier for others to understand.

A huge part of our work as librarians is in finding information, organizing it and making it more accessible to others. Whether we do that through collection development, tech services, reference, scholarly communication or digital scholarship, we are all tied to that core work with data and information.

While some may take this as an argument that librarians are actually a kind of data scientist, I think our job is more important than that. Good data science depends on good data, and the role of libraries is rapidly shifting towards a new goal: producers of the top-quality web data. Think about it: libraries have always been at the epicenter of the art of describing things. We even standardized the way we describe things so other libraries could read our descriptions. The entire history of libraries has been leading up to the semantic web where things are described so consistently that even machines can read and make sense of these descriptions. This is no trivial task. But while the Web 3.0 is still far off in the horizon, libraries are working with the rest of the world to lay the foundation of the semantic web with technology like BIBFRAME and linked data.

What does this have to do with dogfooding and data science? While it’s not crucial for most librarians to learn data science, we do our work in a world of data, information, and metadata. A grounding in the concepts and methods of data science could be incredibly helpful to many librarians engaged in producing quality data for the web. Learning about data science expands the way we think about data and its uses by patrons and consumers, which in turn expands the way we think about our own library data. The librarian who can put on her data scientist hat and actually use the data she is producing is the ultimate dogfooder.

“The librarian who can put on her data scientist hat and actually use the data she is producing is the ultimate dogfooder.”

How Do I Get Involved?

If you would like to learn more about data science, there has never been a better time. Take a look at the data science specialization at Coursera, a free 9-class curriculum aimed at beginners, and which repeats monthly. You can also join the Data Science Study Group on Google Groups. We’re a friendly bunch of librarians discussing data science and how it applies to our work. We’re just starting the course on R now, but you can take any of the data science courses at any time and chime in with your questions and comments.

The possibilities are endless for learning about data science and applying it to your work as a librarian (data literacy! assessment! charts that show you deserve a raise!). As our world becomes more and more data driven, data skills will only increase in value–for us and for everyone we work with.

~~~

[Ed. note: for more thoughts on work and useful apps/tools, see Bryan’s recent profile on the ACRL TechConnect blog.]

2 Comments

  1. Sally said:

    Thanks for a thoughtful post, Bryan. I 100% agree with your thoughts on dogfooding and the need for librarians to “eat their own dog food” when it comes to working with data. We need to know what we’re talking about and we only get there by getting our hands dirty and honing our skills.

    I caution, though, against saying that our role is more important than that of the data scientist. I think it’s better to say that we have different roles. I also think it’s fair to say that our roles overlap in many areas. I know an awful lot of data scientists who can teach me a thing or two about good data management. Perhaps this is because they’ve been dogfooding longer than I have. It’s good to remember that. 🙂

    January 9, 2015
    Reply
    • Hi Sally! I completely agree that librarians are not more important than data scientists, although I can see how one might interpret what I said that way. What I really meant was that looking at traditional librarianship as a kind of data science is possible (librarians do gather information, organize and analyze it to some extent), but it misses the actual value of librarianship to see it that way. The primary role of libraries, at least in my opinion, is that of data provider as opposed to consumer. Librarianship isn’t inherently data science so much as it should use the concepts and methods of data science to evaluate its own products. We need to periodically taste our own kibble to make sure it’s still good.

      January 9, 2015
      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *