What is Data Literacy?

Ed.: Find more on information literacy instruction on Adam’s blog, and check out his ACRL presentation this spring in Portland. 

This March I’ll be presenting at the ACRL 2015 conference with Christine Murray (Bates College) on teaching data literacy in the library. To help me prepare and perhaps preview our discussion, I thought I’d post a few thoughts on the blog to get the juices flowing. Let’s begin with some definitions as they appear in both the library literature and the scholarship of statistics education in order to answer the question: what is data literacy?

What is Data Literacy?

In Libraryland, “data literacy” seems to be the most popular term (over statistical literacy, quantitative literacy, and numeracy), and consists of two aspects: information literacy and data management. From an information literacy perspective, the emphasis is on statistics, which are considered a special form of information but one that still falls under the information literacy umbrella. For example, Schield (2004:6) describes statistical literacy as the critical consumption of statistical information when used as evidence in arguments. Similarly, Stephenson and Caravello (2007) advocate for librarians to promote statistical literacy by assisting learners to locate and evaluate authoritative statistical sources, recalling Standards 2 and 3 of the 2000 ACRL Information Literacy Standards, as well as reference classics like the annual Statistical Abstract of the United States.

Milo Schield Literacy Framework: Data literacy, information literacy, statistical literacy (4/6) by Justin Grimes on Flickr
Literacy framework, posted by Justin Grimes, on Flickr

From the data management perspective, the emphasis is on data rather than statistics, and focuses on the organizational skills needed to create, process, and preserve original data sets. Returning to Schield (2004:7), he defines data literacy as the ability to obtain and manipulate data, but reserves these skills for certain fields of study such as business or the social sciences.  Carson et al. (2011:631), based on interviews with faculty and GIS students, emphasize the importance of data management and curation skills required to “store, describe, organize, track, preserve, and interoperate data.”

There is plenty of literature on data management, a hot topic in Libraryland fueled by interest in e-science initiatives, new data requirements for federal grants, and the creation of institutional repositories. In my experience, though, discussion of data management is often divorced from statistical literacy, perhaps due to its focus on faculty and other experts rather than data novices. Calzada Prado and Marzal (2013) do attempt to unify the information literacy and data management aspects under one rubric, although their proposal for five data literacy standards is largely derivative of the soon-to-be-sunsetted 2000 ACRL Information Literacy Standards, which doesn’t bode well for their wider adoption.

Turning away from librarianship, we find that statisticians and statistics educators typically use the term “statistical literacy” to describe the knowledge, skills, and dispositions surrounding their field. One widely cited exposition of statistical literacy is that of Iddo Gal (2002:2-3), who identifies two interrelated components: the ability to interpret and critically evaluate statistical information, as well as the ability to discuss and communicate one’s understanding, opinions, and concerns regarding such statistical information. Gal (2002:4) further describes a model of interrelated knowledge elements and dispositions that together enable statistically literate behavior. Gal’s definition will no doubt look familiar to information literacy librarians, incorporating the evaluative and communicative aspects of information literacy along with the dispositions and affective components we find highlighted under the new ACRL Framework.

But what is the nature of statistical information, the object of Gal’s model for statistical literacy? It may be helpful to consider this in terms put forth by George Cobb and David S. Moore (1997). In their oft cited article on statistics pedagogy, they break down statistical analysis into three interrelated phases: data production, data analysis, and formal inference. Each of these phases produces statistical information requiring varying levels of contextual and mathematical knowledge.

Data production includes aspects of the research process related to designing a study, creating a data set, and preparing the data for short term and long term analysis. Viewed from the library, the data production phase is most closely associated with data management skills. Data analysis, next in Cobb and Moore’s schema, consists of the exploratory and descriptive phase of data-driven research. This includes examining the data set to discover trends or outliers, and using descriptive statistics to reduce large amounts of data into summary information such as measures of central tendency and variance (e.g. mean, median, mode, range, percentiles, standard deviation). Through this analysis, researchers can make hypotheses or predictions about phenomena revealed by the data. Finally, formal inference can be used to draw conclusions about a population from findings in sample data. Here we find the notorious formulas full of Greek letters such as Student’s t-test, chi-square test, ANOVA, and regression models. I’ll return to Cobb and Moore’s pedagogical advice in a future post.

So back to the original question: what is data literacy?

I suggest librarians borrow heavily from statistics educators when trying to answer this question. To paraphrase Gal and apply his definition to Cobb and Moore’s three phases of statistical analysis, the simplest definition of data literacy is the ability to interpret, evaluate, and communicate statistical information. Central to this ability is an understanding of how statistical information is created, encompassing data production, data analysis, and formal inference. In other words, data literacy includes the ability to evaluate the modes of data production, including the underlying research design and means of sampling, and how this impacts the possible findings. Data literacy also includes the ability to interpret the results of formal inference tests, including confidence intervals and the probability that findings are representative of a population rather than coincidental to the given sample. And finally, data literacy includes the ability to interpret and communicate about the descriptive statistics learners and citizens encounter everyday, from unemployment rates to political polling.

And what about data management? Ultimately it belongs to the data production phase of Cobb and Moore’s schema, and is perhaps one aspect of data literacy that, as Schield intimated, can be reserved for the specialists. While the data literate person can identify and evaluate the soundness of a research design and data collection methods, perhaps only trained practitioners need the specialized skills to carry out a full-fledged project involving data curation and advanced tools. And in most instances, teaching these skills is beyond the purview of librarians. Stay tuned to Library Instruction Lagniappe for more on this and data literacy instruction in the library.


  • Calzada Prado, Javier and Miguel Ángel Marzal. 2013. “Incorporating Data Literacy into Information Literacy Programs: Core Competencies and Contents.” Libri: International Journal of Libraries & Information Services 63(2):123–34.
  • Carlson, Jacob, Michael Fosmire, C. C. Miller, and Megan Sapp Nelson. 2011. “Determining Data Information Literacy Needs: A Study of Students and Research Faculty.” portal: Libraries and the Academy 11(2):629–57.
  • Cobb, George W. and David S. Moore. 1997. “Mathematics, Statistics, and Teaching.” The American Mathematical Monthly 104(9):801–23.
  • Gal, Iddo. 2002. “Adults’ Statistical Literacy: Meanings, Components, Responsibilities.”International Statistical Review 70(1):1–25.
  • Schield, Milo. 2004. “Information Literacy, Statistical Literacy and Data Literacy.” IASSIST Quarterly 28(2):6–11.
  • Stephenson, Elizabeth and Patti Schifter Caravello. 2007. “Incorporating Data Literacy into Undergraduate Information Literacy Programs in the Social Sciences: A Pilot Project.”Reference Services Review 35(4):525–40.

3 Comments

  1. Milo Schield said:

    Nice summary. Your short definition of data literacy ( the ability to interpret, evaluate, and communicate statistical information) seems more like a definition of statistical literacy. I’d expect a definition of data literacy to include something about data selection (use of AND/OR ) and data sumarization (e.g., Excel pivot tables).
    The Literacy graphic posted by Grimes looks a lot like Figure 1 in Schield (2004). The Schield (2004) paper is available in two formats:
    * With hyperlinks: http://www.iassistdata.org/downloads/iqvol282_3shields.pdf
    * Without hyperlinks: http://www.statlit.org/pdf/2004-Schield-IASSIST.pdf

    February 21, 2015
    Reply
    • Thanks, Milo. I found your work very helpful in working out my understanding of data literacy, and I agree with your assessment that I’m leaning heavily into the statistical literacy side of the equation here. I’m working on a follow up post to help clarify and elaborate on my position. I originally wrote this post on my information literacy instruction blog with a different audience in mind, so I look forward to continuing the conversation with folks more well versed in advanced data services than I am.
      And yes, the graphic bears a strong resemblance to the one in Schield (2004). It was added by one of the blog editors.

      February 22, 2015
      Reply
      • Milo Schield said:

        Defining “data literacy” requires a precise definition of “data.” “Data” is an ambiguous word. For computer systems engineers, data is electrical: analog (continuous voltages) or digital (discrete binary bits). For computer systems developers, data is code: ASCII or EBCIDIC characters. For computer applications programmers, data are values of variables (columns) generally involving something common (rows). To a statistician, data is typically a category (M/F) or a measurement (ht, wt) for a single subject. To a social scientist, data are typically summary statistics (counts, totals, averages, rates or percents) for a group of subjects having a common characteristic.
        From my perspective, any definition of data literacy should be different from one’s definition of statistical literacy or information literacy. Good luck!

        February 24, 2015
        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *