Qualitative Data at the ICPSR Social Science Data Archive

Ed: There’s a lot out there on quantitative data management, but qualitative researchers may find archiving and sharing their data harder to navigate. Last September, I sat down with Justin Noble, the acquisitions manager at ICPSR, to talk about the major social science data archive in America. We discussed how his archive handles qualitative data, and I’ve checked in with him to see that this is still accurate as of March 2016. But because archives are always in development, do see the ICPSR website for the latest guidelines! 

Celia: So what’s ICPSR collecting now?

Justin Noble: We have a collection development policy for data that’s especially high priority… right now we are seeking more datasets on sexual orientation, bullying, social media, immigration, well-being, longitudinal data, and international data.

Are there topics which people often search for, that you don’t have?

We did a detailed review of search behavior on our website—over 500,000 searches in 2014. We looked at the most popular terms, as well as those that had a high search exit rate, where the user searched and then closed the browser or left our site. Key areas in which we see a lot of interest and want more studies include:

  • Facebook, twitter, and social media usage and how that impacts relationships—especially surveys about social media.
  • International data, especially comparative data or data on Asian countries.

What about qualitative data?

We certainly accept qualitative data. We have 65 studies with some type of qualitative data component, and we recently deployed a new feature on our website to filter search results by a “Type of Analysis” facet, which allows users to identify studies with quantitative, qualitative, and/or GIS data. [ed.: before submitting qualitative data to ICPSR, researchers should contact deposit@icpsr.umich.edu].

Are your qualitative data mostly anonymous surveys, or do you have full interview texts as well?

Pure qualitative studies that are just open ended? That’s pretty small, especially compared to the total number of studies in ICPSR’s collection of holdings. It’s more common to see open-ended string variables or other things that could be analyzed qualitatively, but within quantitative data sets.

How do you decide what researchers can archive at ICPSR?

If the study fits within our scope of acceptable data, our staff will do a quality review, check the data, and decide whether it should be restricted. We have to make decisions about which studies we accept or reject, which we process right away or sit in our queue. When collections come in, whether quantitative or qualitative, we assess to decide what amount of resources we’re going to donate to supporting that data.

Our criteria for involvement in data curation include the quality of files, a high priority topic, the reputation of the PI and the study, whether it’s highly cited, is there a lot of use for secondary analysis, and has a lot of the value already been used up, or is there something that secondary users might be able to publish. This would require discussions to talk about the merits of the study based upon these criteria.

Additional information about what data are in scope and what data are out of scope along with the selection criteria employed for our two levels of curation services are also available in the ICPSR Collection Development Policy.

What if it doesn’t fit within scope? What other options do you have for depositing data at ICPSR?

We have two options for data deposit. The first is “Curated ICPSR,” where ICPSR staff perform member-funded curation and enhancement on data deposits, as long as they are within scope for us and are considered to be valuable (either in the present or future) to the membership of ICPSR. This involves the review, enhancement, and quality checking of the data for usability and findability. There is no charge for ICPSR members to access this data curated using member funds, but nonmembers would have to pay a fee to access this enhanced collection of data.

The second option is “OpenICPSR,” an open-access repository for data which is within scope, which researchers want to be open access to the general public and not just to ICPSR members. There is no fee for members to deposit, but it costs $600 for non-members to deposit… This collection is not curated, so any researcher with data that fits our Collection Development Policy can self-deposit and publish data in openICPSR, by doing the work to upload files and write metadata themselves. We release this as-is.

Do people have to anonymize qualitative data before they deposit at ICPSR?

Yes, it’s always the researcher’s responsibility to anonymize. But even if qualitative data are anonymized, they are often still restricted access, because all the contextual information could allow someone to create a profile and figure out who that individual is. The confidentiality of the research subjects is the number one priority from ICPSR’s perspective.

So are there multiple levels of restriction for qualitative data at ICPSR?

Our two options are “open” (downloadable) or “restricted.” For restricted data, the method for dissemination is typically the ICPSR virtual data enclave. The user connects onto a remote machine, and never gets access to the full raw data on their network or computer. They do their analyses virtually. But that’s mostly for quantitative data. We don’t have many qualitative studies set up for that.

Why can’t ICPSR offer researchers remote access to restricted qualitative data?

The main barrier is the licensing of the software programs. For qualitative data, we investigated NVIVO, MaxQDA, and Atlas.ti. But because users do their analysis online by connecting to a virtual machine, we have to overcome the software licensing fees. With quantitative data, vendors are reasonable and we build the licensing costs into the fees a researcher pays for restricted access, say $350 for one user for one year. But the pricing models for qualitative data analysis… it’s just too big of a barrier right now.

So there’s currently no way to analyze qualitative data via remote access?

Remote access through the ICPSR virtual data enclave may be an option for accessing restricted qualitative data. However, qualitative software programs are currently not available virtually, which limits the types of analysis that can be currently performed in the virtual environment.

Even with IRB approval, you couldn’t download restricted qualitative data?

Access to restricted qualitative studies at ICPSR varies by project and study, and could include secure download or visiting a physical data enclave. We would need IRB documentation from the researcher, as well as an approved data security plan. Researchers who want to disseminate qualitative data through ICPSR should contact us before data deposit to discuss sharing options.

The ICPSR topical archive with the most restricted qualitative data collections is the National Archive of Criminal Justice Data. That data can be accessed by approved users via secure download… for NACJD as well as other sponsored projects, the costs of managing applications for restricted data are supported by federal agencies. Once users have approval, they can access the data according to the data’s restricted use agreement. Restricted data housed in the ICPSR membership archive can also be provided by secure download, once the data protection plan is approved.

So do you ever refer qualitative data to other archives?

If it’s not a good fit, we’re still here to help people and refer them to another archive… I’d refer people to the Qualitative Data Repository at Syracuse… if they have a big qualitative data project and we don’t have the resources to dedicate to processing that.

Have you connected with NSF over this issue? I know that researchers like Lisa Cliggett have been looking at ways to share qualitative data.

We review their memorandums, and ICPSR has been awarded NSF and foundation grants for data curation and archiving, but we don’t currently have a contract with NSF to curate specific NSF-funded data collections in the social and behavior sciences.

Historically, the ICPSR acquisitions team does go after data resulting from NSF and NIH grants. We created a database with thousands of records, and did a lot of outreach, email blasts to those researchers. We also looked at how frequently data from these grants were archived, and whether they even have the data anymore. My colleague Amy Pienta has done some papers on this.

I’ve referenced the criminal justice archive; I’ve been at ICPSR for eight years and the first seven were in that department. You can find criminal justice data in ICPSR at NACJD. They have guidelines for depositing qualitative data and dealing with sensitive content, and that applies more generally to any researchers who are archiving qualitative data—on issues like pseudonyms, confidentiality, coding.

Can you help researchers with data management in any other ways?

For information about best practices throughout the data lifecycle, we refer data producers to the ICPSR Guide to Social Science Data Preparation and Archiving.

We also offer letters of support for researchers applying for NSF or NIH grants. Now, we’re trying to assist researchers with the Data Management Plan process. Researchers can contact us at ICPSR and if it’s within scope, we’ll write a letter of support saying that we’ll archive it in openICPSR or in ICPSR.

We also encourage researchers to do professional data curation with us. If they write us into their grant, we can use that funding to have our staff do full processing and make the resulting data open access to the public, instead of to ICPSR members only.

That seems great—a very sustainable funding model for data curation!

The problem is that the cycle takes so long. We’re writing letters of support now, but we wouldn’t get the data until a couple years down the road at the soonest.

Another question—when researchers are writing consent forms, should they ask for permission to archive interview data?

Right now, we just rely on the researchers to make sure their research has consent. But if they only asked participants to consent to being studied by the original research team—that’s what we’re worried about. Or if they said they’d only release data in the aggregate—that terminology may preclude someone from archiving their qualitative data. We’re aware there is a potential need out there, and we do talk about consent and maintaining confidentiality on our website, what to include and not include when applying for IRB approval. ICPSR staff are currently evaluating ways to better educate the research community on these types of issues.

Is there anything else that researchers should be aware of when applying for IRB on these projects?

One thing we talked about in a recent meeting, is that people were writing on the consent forms that they were going to archive the data at a particular archive. That was something that we didn’t think was best practice. We didn’t want them to write on the consent form itself that they’d archive at ICPSR. That should have generic language.

Thank you!

Thank you.

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *