In this post, I share my notes on Caroline Kuo’s presentation on data security at the Seventh Annual eScience Symposium, which was attended by New England librarians on April 9, 2015. Better yet, do yourself a favor and watch Kuo’s full presentation here:
I knowww… the last thing you want to read is someone else’s notes on a presentation! Yet I’m sharing my notes here, because data security is an area that continually comes up in my conversation with anthropologists, ethnographers, and any qualitative researchers who works in remote areas with data that may be contextual, sensitive, or identify research participants. This gives me an easy URL to share my notes with others… and if this was interesting, you can watch Kuo’s whole presentation, above!
Notes from the presentation: Kuo’s research is on international public health, and specifically on data collection around HIV in epidemic/deprived areas in South Africa, especially among children with ill or orphaned parents. This is a vulnerable population, so Kuo’s standards of data collection can be a guide for the rest of us.
Kuo’s key challenge was developing ‘robust data systems’ that could handle ‘audio, cross-sectional and longitudinal data,’ including sensitive data about children and multiple family members moving across regions over time.
Kuo: The Challenges of Digital Data Collection
In the early stages, Kuo’s research team stuck with paper collection of records, as carrying electronics wasn’t safe in a high-theft region. Yet to ensure their paper data wasn’t lost or damaged, they eventually reverted to digital data collection.
The team also wanted to invest in local communities with 50% unemployment, rather than fly in a large foreign research team. This meant they needed to build capacity in local research teams, creating clear standards and documentation for non-professionals to assist with research, while also finding ways to securely back up and transmit data to secure servers.
Other challenges included:
- International research over many time zones and sites
- Working in 3+ countries with 25 different people
- Transmitting data securely (VPN didn’t work b/c of slow internet speed)
- Huge audio and visual files added to the challenge of transmitting data
They stored basic project files on Dropbox--but put no human subjects data in Dropbox or other cloud storage. When they did upload non-identifying data, they sent encrypted files to an encrypted cloud that sat “like a fence” on top of Dropbox., double-encrypting the upload and download of large, compressed files. They also installed software on local machines that let them remotely wipe all data in case of loss. Again, much of this is still not human subjects data, which was not stored locally; just other project files, questionnaire texts, and documentation!
Mobile Data Collection
Kuo’s team was afraid that mobile data collection would put the team and participants in danger through theft, violence, or data exposure–yet they needed to collect sensitive behavioral data, even asking explicit questions about sexuality that people might not want to answer in a face to face conversation.
“People ask, where’s the private interview room? But that’s not the context – South African people are used to being together all the time. So here, privacy means you lower your voice and take them to private space under a tree” (Kuo)
These mobile devices were equipped with ACASI (audio computer-assisted self-interviewing) software, which allowed a respondent to listen through headphones to interview questions and select their answer options in a way that means the interviewer wasn’t directly listening in. When looking at stigmatized issues, people want to say desirable things, and so removing the ‘human’ can allow them to give more honest answers.
Another thing they used was redundant security, protecting their mobile data collection phones with Oasis usernames and passwords, with encrypted access. They chose the Survey to Go software because it could transmit data securely to Amazon servers immediately after collection, so that no data was stored on the local phone. Researchers could also log in remotely and wipe data from local phones, as well as remotely set a stolen phone to ring at high volume, in a way that can only be shut off with a password and might frighten off would-be thieves.
Mobile Data Collection Specs
In remote field sites, Kuo’s team faced a lot of “paper disasters,” such as water damage when crossing a a lake. Because of the potential for damage or theft, they used the lowest-cost pay-as-you-go phones that would support their software, topped up with limited data and minutes from township stores. Because rural areas don’t have path, road, or house addresses, they collected automated GPS data using GPS Auto trace technology to track their locations.
Given limited electricity and internet, Kuo’s team choose systems with long-life batteries and offline data storage. The systems were set up to send data at the beginning and end of interview, only holding data offline until they could sync it off of local systems and onto a secure server at Kuo’s university. Mobizen let them use remote troubleshooting to assist from anywhere in the world.
A wish: reviews of data tools for the social and health sciences
Kuo ended with a wish that there were a source for researchers to learn what software for data storage, security, and transmission would work well in their context, as well as a place they could share reviews of software, systems, and tools outside of the silos of individual disciplines.
This sure seems like something that both qualitative data librarians and quantitative data librarians could help with. I hope to see us develop such resources for researchers in the future.
If this extract has been of interest, feel free to watch Caroline Kuo’s whole presentation above–she did a great job!