Tag Archives: data access
Self-knowledge through numbers. Personal meaning from personal data. These are the guiding principles of the work we do here at Quantified Self Labs. Through our editorial work, our events, and our support of a worldwide network of meetups we are focused on shaping the culture of personal data and it’s impact on our lives. We realized some time ago that impact is determined not only by data analysis skills, scientific training, or even the use of new tools and technologies (although all of these play an important role). Rather, impact is directly related to our ability to access the data we’re creating and collecting during the course of our lives.
We’re happy to announce our new QS Access Program with support from the Robert Wood Johnson Foundation. We’re working together to bring issues, ideas, and insights related to personal data access for personal and public health to the forefront of this evolving conversation. We hope you join us.
You can read the full release here. Below are two quotes from the release that embody our current and future work.
“The Robert Wood Johnson Foundation is working with many partners to build a Culture of Health in the U.S., and in that culture of health, people are attuned to the factors that influence their health and the health of their communities,” said Stephen Downs, Chief Information and Technology Officer at the Robert Wood Johnson Foundation. “The explosion of data on day-to-day life creates tremendous potential for new insights about health at both the personal and population levels. To realize this potential, people need access to their data — so they can use services that surface the connections between symptoms, behaviors and community environments and so they can choose to contribute their data to important research efforts.”
“We believe that when individuals, families, and communities are able to ask their own questions of their own data, everybody benefits,” said Gary Wolf, Director of QS Labs. “We look forward to doing our part to build a culture of health with the support of the Robert Wood Johnson Foundation, and we invite anybody who has an access story to tell to get in touch.”
If you’d like to learn more or get involved. Please contact:
Quantified Self Labs
This past fall we learned about a unique study, conducted at Stanford University, designed to contribute to the understanding of the human microbiome. This study also has a component not common to academic research — data is being returned to the participants. Intrigued, I contacted the principle investigator, Les Dethlefson, to learn more.
Ernesto: Tell me about the Dynamics of Human Microbiota study.
Les Dethlefsen: Since I joined the Relman Lab at Stanford, I’ve been looking at the human gut microbiota, focusing on what affects it and how it changes over time. In our study, we are looking at three different perturbations, deliberate changes to the gut ecology, to see how the microbiota population is affected.
We are very interested in the patterns that emerge. In people who have very stable gut microbiota, does their microbiota remain that way when they undergo diet shifts, a colon cleanout, or an antibiotic? Or maybe people who have a stable gut microbiota most of the time are the ones who are most affected by something unusual such as taking antibiotics. We just don’t know enough to understand these patterns right now. So, we’re really looking for basic ecological information.
Ernesto: If you look at the popular press, it seems the microbiome is the new golden child of biological life sciences. We’re even seeing companies in Silicon Valley get involved with this kind of work.
Les: It is broader than that. It really is a worldwide interest on the parts of both the scientific community and the public. And unfortunately, we are probably going to see some overhype, just as we did with the Human Genome Project. But I do believe this is a very important area. I think there will be a lot of payoffs and health impacts from this research, although it’s not going to be everything.
The shift that, I think, would be good for us to make intellectually is to get rid of the “us vs. them” thinking, because we are symbiotic organisms.
We have evolved with a native gut microbiota, and native microbiota is pretty much everywhere. We have evolved together, so it’s fallacious — an artifact of our past ignorance — that we don’t think of our microbes as part of our physiology.
Ernesto: It seems like exploring the deep sea, an unknown world that we’re just starting to peek into.
Les: It’s along those lines. You’re not wrong about that. But unlike, let’s say, the deep waters surrounding an undersea hydrothermal vent, we already know a lot about human physiology. There are a lot of molecular details and genetic pathways that we already have worked out. The context is somewhat understood.
And now, we have a reasonable start on the initial research: What microbes are present, and where? What’s the range of what we think is the normal distribution? We certainly don’t know enough, because we only know about people in the developed world. However, this may not represent all of human diversity or a very natural state of the gut microbiota.
Ernesto: Let’s get back to your study. You are asking participants to send microbiome data in the form of fecal matter and urine to your lab. What are you doing with those samples?
Les: We ask participants to provide both stool and urine samples. With the stool sample, we apply four different methodologies to turn it into data. One is the very common 16S ribosomal RNA (16S rRNA) gene sequencing approach. It’s relatively standard and inexpensive. It acts like an ID card for microbial taxa — telling us approximately what strains are present and in what relative abundance. We have a lot of data like that already for comparison.
The second approach we will be applying is metagenomic sequencing, wherein we will be sequencing a random selection of all the genomes of the microbial types that are present. We can’t take this to completion, even with the dropping cost of sequencing, especially because there are some very, very rare microbes that we barely even have the chance to see at all. But we can get a pretty good swathe of genetic sequence data from all the microbes.
The third approach is even more ambitious. It’s called metatranscriptomics. Genes can be carried by any critter, you and I included, but not expressed. Knowing which genes are turned on, and to what extent they’re turned on is a better measure of the biological activity that is actually happening. The metagenome is a measure of potential activities, what the bugs can do. The metatranscriptome shows what the microbes are actually doing. Metatranscriptomics is even more challenging than metagenomics partly because of the nature of messenger RNA (mRNA). It’s a highly unstable molecule. There are technical challenges, but we’re ambitious enough to try to collect information on gene expression.
The fourth approach is not based on gene sequences, but on chemical composition. Metabolomics is the name given to a number of these approaches that are not directed to a specific chemical. These are techniques that try to measure a broad swathe of chemicals present in the environment and their relative abundance. This is a technology that we, in the Relman Lab, know very little about. We’re collaborating with the Nicholson Lab in Imperial College in London, and they will be doing the metabolomic analyses on the stool samples. That may be even closer to where the rubber meets the road — knowing not just the gene expression but also the resulting chemical changes that are happening in the environment.
Metabolomics takes us to the other type of sample we’re collecting: the urine samples. We aren’t doing this because we have an interest in the urinary microbiome itself, but because, as the Nicholson Lab suggested, the urine provides a more complete, integrated picture of the co-metabolism between the human host and most of the gut microbiota. So while metabolomics for the stool samples would primarily measure the gut microbial activity and what they contribute to the host’s physiology, the urine provides a more integrated picture about how the host metabolism works in concert with the gut microbiota.
Ernesto: If a participant is going to be contributing all of that data, will they have access to it?
Les: As someone with similar interests, I certainly knew that a huge motivation for people to join the study would be the access to their own data. We offer monetary compensation, but for the amount of time that will be spent in contributing samples, it is probably trivial. We knew we would attract the curious, scientifically inclined, and practising scientists. Of course, they would want to see their data.
The Institutional Review Board (IRB) was quite open to us sharing information with the participants about their own microbiota. It probably helped that there’s publicity about ways people can get this information. There is the American Gut project, offering an assessment of your microbiota for a donation, and uBiome, a private company offering the same kind of service.
I, or another staff member of the study, are going to share this microbiota data with each participant in a conference call. So in effect, I’m going to be a microbiota counselor. It’s nowhere near as high-stakes as sharing genome information. We don’t know enough to say, for example, that this microbiome is definitively healthy, or that it’s unhealthy, or what the exact risks of diseases are due to this particular composition. So we will be putting this information in context, and we will be available as interpreters of the scientific literature. We may be able to say that there is a statistical association between a particular microbial group that someone may have in their gut and some health-related outcome.
Ernesto: Will participants be getting a copy of their data as well?
Les: Yes, we will provide that. I have an open source mentality. Added to that is the fact that there are many practicing scientists signing up for the study and saying they want data, not just a PDF summary. I am happy to provide the data in as raw a format as people want. They can get the raw sequence information, a low-level summary (which is the result of the first pass of data processing), or the final summary. I have permission and full intention to share all the data derived from a person’s samples with that person.
Ernesto: Do you think we will see this happening more in the future?
Les: I think we will probably see more of it in the future. We’re moving in the direction of access to information. The open source movement has reached the health and medical realm from its origins in tech and computing. I think the participatory nature of access to data and scientific information is a good thing. It has started, and I don’t see any way of reversing the trend. I would hope that it becomes the norm that there is some appropriate level of sharing, that research participants have access to their data if they wish, and in a way that lets them interpret that data appropriately.
I believe that people have a right to that level of knowledge about their bodies, and if we, scientists, are generating that knowledge, there’s no reason not to share it with the individuals.
The Dynamics of Human Microbiota study is currenlty recruiting participants. If you’re interested in learning more about the ecosystem within read more about the study and check to see if you’re eligible to participate here.
We hope you enjoy this week’s What We’re Reading list!
The Wow of Wearables by Joseph Kvedar. An excellent post here in the wake of the “Smartphones vs. Wearables” hype in the past weeks. Favorite part:
“I’d have to say that reports of the death of wearables have been greatly exaggerated. The power of sensor-generated data in personal health and chronic illness management is simply too powerful to ignore.”
Survival of the Fittest: Health Care Accelerators Evolve Toward Specialization by Lisa Suennen. If you’re at all interested in the recent surge in health and healthcare focused accelerators this is for you. Excellent reporting. (Thanks for sharing Maarten!)
Your Brain Is Primed To Reach False Conclusions by Christie Aschwanden. Fascinating piece here about the nature of the “illusion of causality.”
A Few Throughs About Patient Health Data by Emil Chiauzzi. Emil, Research Director at PatientsLikeMe, lays out four point to consider when thinking about how to best use and grow self-collected patient data.
Having Parkinson’s since I was 13 has made me an expert in self-care by Sara Riggare.
I am the only person with the whole picture. To me, self-care is everything I do to stay as healthy as possible with a disease that is a difficult life companion. It entails everything from making sure I take my medication in the optimal way, to eating healthily, getting enough sleep, to making sure I stay physically active. I also make an effort to learn as much as I can about my condition; my neurologist says that I know more about Parkinson’s research than he does. I don’t find that odd, since he needs to try to stay on top of research in probably hundreds of neurological diseases, whereas I focus on just one.
From Bathroom to Healthroom: How Magical Technology will Revolutionize Human Health by Juhan Sonin. A beautifully written and illustrated essay on the design of our personal healthcare future.
Experimenting with sprints at the end of exercise routines by Gustavo M. Gustavo is a person with type 1 diabetes. After reading that post-exercise high intensity exertion might have an effect on blood glucose he put it to the test.
On Using RescueTime to Monitor Activity and Increase Productivity by Tamara Hala. Tamara walks us through the last three years of her RescueTime data and how she used that information to understand her work and productivity.
How Do You Find Time to Write? by Jamie Todd Rubin. Jamie has been writing for 576 consecutive days. How does he do it? A mixture of data and insight of course!
Say “I Love You” With Mapping by Daniel Rosner. Wonderful to see CHI papers ending up on Medium. This seems like a fun self-tracking/art project.
Cleaning up and visualizing my food log data with JMP 12 by Shannon Conners. Once again, Shannon displays a wonderful ability to wow us with her data analysis and visualization. Above is four years of food tracking data!
Two Trains: Sonification of Income Inequality on the NYC Subway by Brian Foo. Brian created this data-driven musical composition based on income data from neighborhoods the border the 2 train. Beautiful work.
Walgreens adds PatientsLikeMe data on medication side effects
How Open Data Can Reveal—And Correct—The Faults In Our Health System
Big Data is our Generation’s Civil Rights Issue, and We Don’t Know It.
We are happy to welcome this guest post by Madeleine Ball. Madeleine is the Senior Research Scientist at PersonalGenomes.org, co-founder of the upcoming Open Humans project, and the Director of Research at the Harvard Personal Genome Project. She can be found online @madprime.
Unpacking “data ownership”
It’s worth unpacking this phrase. What do we mean by “data ownership”? If we want to see changes, we need to start with a little more clarity.
Legally, data is not property. There is no copyright ownership of facts, as they are not “creative work”: the United States Supreme Court famously established this in the landmark case Feist vs. Rural. They are not patents, there is no invention. They are not trademarks. There is no “intellectual property” framework for data.
Yes, data is controlled: through security measures, access control, and data use agreements that legally restrict its usage. But it’s not owned. So let’s set aside the word “ownership” and talk about what we really want.
Control over what others do
One thing we might want is: “to control what others do with our data”. Whom they share it with, what they use it for. Practically this can be difficult to enforce, but the legal instruments exist.
Think about what you really want. Are you opposed to commercial use of your data? Look for words like “sell”, “lease”, and “commercial”. Are you concerned about privacy? Look for words like “share”, “third-parties”, and “aggregate” – and if individual data is shared, find out what that data is.
Companies won’t change if nobody is paying attention and nobody knows what they want. We can encourage change by getting specific, and by paying more attention to current policies. Raise awareness, criticize the bad actors, and praise the good ones.
Personal data access and freedom
The flip side of “data control” is our own rights: what can we do with our own data? We want access to our personal data, and the right to use it.
This idea is newer, and it has a lot of potential. This was what Tim Berners-Lee called for, when he called for data ownership last fall.
“That data that [firms] have about you isn’t valuable to them as it is to you.”
I think it’s worth listening, when the inventor of the world wide web thinks we should have a right to our data.
So let’s spell it out. Let’s turn this into a list of freedoms we demand. We should be inspired by the free software and free culture movements, which advocate for other acts of sharing with users and consumers. In particular, inspired by Richard Stallman’s “Four Freedoms” for free software, I have a suggested list.
Three Freedoms of Personal Data Rights
Raw data access – Access to digital files in standard, non-proprietary file formats.
Without raw data, we are captive to the “interface” to data that a data holder provides. Raw data is the “source code” underlying this experience. Access to raw data is fundamental to giving us the freedom to use our data in other ways.
Freedom to share - No restriction on how we share our data with others.
Typically, when data holders provide access to data, their data use agreements limit how this data may be shared. These agreements are vital to protecting user privacy rights when third parties have access, but we have the right make our own sharing decisions about our own data.
Unrestricted use – Freedom to modify and use our data for any purpose.
Data use agreements can also impose other limitations on what individuals can do with data. Any restriction imposed on our use of our data impinges on our personal data rights. Freedom for personal data means having the right to do anything we wish with data that came from us.
In the short term, access to raw data can seem obscure and irrelevant: most users cannot explore this data. But like the source code to software, access to this data has great potential: a few will be able to use it, and they can share their methods and software to create new tools.
Raw data access is also an opportunity for us to share for the greater good, on our own terms. We could share this data with research studies, to advance knowledge and technology. We could share data with developers, to develop software around it. We could share it with educators, with artists, with citizen scientists. We could even cut the red tape: dedicate our data public domain and make it a public good.
This work is licensed under a Creative Commons Attribution 4.0 International License.
As you may know, we’re very interested in how HealthKit is shaping and extending the reach of personal self-tracking data. Last week, during Apple’s quarterly earnings call, Tim Cook mentioned that “There’s also been incredible interest in HealthKit, with over 600 developers now integrating it into their apps.” (emphasis mine).
@fat32io just a heads up. All the data in HealthKit is NOT backed up into iCloud. Unless encrypted local backup all data lost on reset
— Daniel Yates (@astralpilgrim) February 3, 2015
@astralpilgrim yikes, didn’t know that.
— fat32io (@fat32io) February 3, 2015
@fat32io yup. Lost all my data history last week doing a reset. Spoke to apple Genius Bar who told me about it.
— Daniel Yates (@astralpilgrim) February 3, 2015
@fat32io it’s the encryption that is key. They said its due to future sharing of health data with docs etc, requires encryption
— Daniel Yates (@astralpilgrim) February 3, 2015
For those of you that are unfamiliar with backup options for your iOS device. Here’s a quick gif to walk you through the process of encrypting your iOS backup so that you can restore your HealthKit data if anything happens to your device:
Quantified Self Labs is dedicated to the idea that data access matters. Moving forward, we’re going to be exploring different aspects of how data access affects our personal and public lives. Stay tuned to our QS Access channel for more news, thoughts, and insights.
On January 13th Uber, a wildly popular and often scrutinized ride share company, announced they have entered into an agreement with the City of Boston to share anonymized data generated by users of the service. This is the first partnership between Uber and a local government body, but points to the ability to potentially partner with cities that want to take a peak at the vast amount of data about when and where people are traveling within their municipality. Our first reaction to this was to explore if Uber has provided any method for it’s own users to access and export their trip data. Surely if they can able to export and pass along data to a third party, they can pass that data to their own users?
In our exploration of the mobile and web user platforms we found that Uber currently does not offer users with an easy way to access their data. As an Uber customer, you are provided with email receipts of your trips that include travel information, a route of the ride, and cost. This information is also available through their online user account page. However, it is not exportable and accessible in a method that allows individuals to store information in a consistent and machine readable format (such as a csv file). In our search for methods to assist in exporting Uber ride data, I stumbled upon this data scraper on Github developed by Josh Hunt. It’s useful to know that Uber has a standard no scraping clause in in it’s Terms of Service, but individual users accessing their own data for their own reasons is probably not what these clauses are meant to protect.
Aside from data access issues there is of course open questions about how Uber will implement privacy protections governing sensitive user data. Of course, Uber is not without fault in this space. The now infamous blog post pointing to their ability to track one-night stands (archived here) was enough for some users to question ethical standards within Uber. In their announcement, Uber touched on this issue by stating that they will provide some privacy protections by only offering anonymized aggregated data to third party partners. Protecting user privacy through data aggregation and anonymization is a step in the right direction, but there remain these open issues around data access for users. Uber and the cities they partner with will learn a lot about how we travel, but the partnership between Uber and their users could be improved by helping users (myself included) understand their own data and behavior by allowing easier access to the data we contribute when we use the service.
We’re interested to hear from our readers about their experiences using the above mentioned tool, or similar tools to access and export their Uber trip data. Please let us know. We’ve also reached out to Uber for comment.
I reached out to Uber Support over Twitter and received the following response:
“Unfortunately this is not currently a feature, however we’re always looking to improve and I’ll pass your suggestion along! *NM” (link)
We’re posting a quick note today to let you know that we’ve updated our “How To Download Your Fitbit Data” post. It now included separate instructions for both the old and new versions of Google Spreadsheets. This is just the first in a series of planned updates. We hope to post additional updates to allow you to have deeper access to your Fitbit data including, heart rate, blood pressure, and daily goal data.
If you’re using this how-to we’d love to hear from you! Are you learning something new? Making interesting data visualizations? Discussing the data with your health care team? Let us know. You can email us or post here in the comments.
As part of the Quantified Self Public Health Symposium, we invited a variety of individuals from the research and academic community. These included visionaries and new investigators in public health, human-computer interaction, and medicine. One of these was Jason Bobe, the Executive Director of the Personal Genome Project. When we think of the intersection of self-tracking and health, it’s harder to find something more definitive and personal than one’s own genetic code. The Personal Genome Project has operated since 2005 as a large scale research project that “bring together genomic, environmental and human trait data.”
We asked Jason to talk about his experience leading a remarkably different research agenda than what is commonly observed in health and medical research. From the outset, the design of the Personal Genome Project was intended to fully involve and respect the autonomy, skills, and knowledge of their participants. This is manifested most clearly one of their defining characteristics, that each participant receives a full copy of their genomic data upon participation. It may be surprising to learn that this is an anomaly in most, if not all, health research. As Jason noted at the symposium, we live in an investigator-centered research environment where participants are called on to give up their data for the greater good. In Jason’s talk below, these truths are exposed, as well as a few example and insights related to how the research community can move towards a more participant-centered design as they begin to address large amounts of personal self-tracking data being gathered around the world.
I found myself returning to this talk recently when the NIH released a new Genomic Data Sharing Policy that will be applied to all NIH-funded research proposals that generate genomic data. I spent the day attempting to read through some of the policy documents and was struck by the lack of mention of participant access to research data. After digging a bit I found the only mention was in the “NIH Points to Consider for IRBs and Institutions“:
[...] the return of individual research results to participants from secondary GWAS is expected to be a rare occurrence. Nevertheless, as in all research, the return of individual research results to participants must be carefully considered because the information can have a psychological impact (e.g., stress and anxiety) and implications for the participant’s health and well-being.
It will not be surprise to learn that the Personal Genome Project submitted public comments during the the comment period. Among these comments was a recommendation to require “researchers to give these participants access to their personal data that is shared with other researchers.” Unfortunately, this recommendation appears not to have been implemented. As Jason mentioned, we still have a long way to go.
Today’s post comes to us from Laurie Frick. Laurie led a breakout session at the 2014 Quantified Self Europe Conference that opened up a discussion about what it would mean to be able to access all the data being gathered about yourself and then open that up for full transparency. In the summary below, Laurie describes that discussion and her ideas around the idea of living an open and transparent life. If you’re interested in these ideas and what it might mean to live an open and transparent life we invite you to join the conversation on our forum.
by Laurie Frick
Fear of surveillance is high, but what if societies with the most openness develop faster culturally, creatively and technically?
Open-privacy turns out to an incredibly loaded term, something closer to data transparency seems to create less consternation. We opened the discussion with the idea, “What if in the future we had access to all the data collected about us, and sharing that data openly was the norm?”
Would that level of transparency gain an advantage for that society or that country? What would it take to get to there? For me personally, I want access to ALL the data gathered about me, and would be willing to share lots of it; especially to enable new apps, new insights, new research, and new ideas.
In our breakout, with an international group of about 21 progressive self-trackers in the Quantified Selfc community, I was curious to hear how this conversation would go. In the US, data privacy always gets hung-up on the paranoia for denial of health-care coverage, and with a heavy EU group all covered with socialized-medicine, would the health issue fall away?
Turns out in our discussion, health coverage was barely mentioned, but paranoia over ‘big-brother’ remained. The shift seemed to focus the fear toward not-to-be-trusted corporations instead of government. The conversation was about 18 against and 3 for transparency. An attorney from Denmark suggested that the only way to manage that amount of personal data was to open everything, and simply enforce penalizing misuse. All the schemes for authorizing use of data one-at-a-time are non-starters.
“Wasn’t it time for fear of privacy to flip?” I asked everyone, and recalled the famous Warren Buffet line “…be fearful when others are greedy and greedy when others are fearful”. It’s just about to tip the other way, I suggested. Some very progressive scientists like John Wilbanks at the non-profit Sage Bionetworks are activists for open sharing of health data for research. Respected researchers like Dana Boyd, and the smart folks at the Berkman Center for Internet and Society at Harvard are pushing on this topic, and the Futures Company consultancy writes “it’s time to rebalance the one-sided handshake” and describes the risk of hardening of public attitudes as a result of the imbalance.
Once you start listing the types of personal data that are realistically gathered and known about each of us TODAY, the topic of open transparency gets very tricky.
- Time online
- Online clicks, search
- Physical location, where have you been
- Money spent on anything, anywhere
- Credit history
- Do you exercise
- What you eat
- Sex partners
- Bio markers, biometrics
- Health history
- School grades/IQ
- Driving patterns, citations
- Criminal behavior
For those at the forefront of open privacy and data transparency it’s better to frame it as a social construct rather than a ‘right’. It’s not something that can be legislated, but rather an exchange between people and organizations with agreed upon rules. It’s also not the raw data that’s valuable – but the analysis of patterns of human data.
I’m imagining one country or society will lead the way, and it will be evident that an ecosystem of researchers and apps can innovate given access to pools of cheap data. I don’t expect this research will lessen the value to the big-corporate data gatherers, and companies will continue to invest. A place to start is to have individuals the right to access, download, view, correct and delete data about them. In the meantime I’m sticking with my motto: “Don’t hide, get more”.
If you’re interested in the idea of open privacy, data access, and transparency please join the conversation on our forum or here in the comments.
Today’s post comes to us from Dawn Nafus and Robin Barooah. Together they led an amazing breakout session at the 2014 Quantified Self Europe Conference on the topic of understanding and mapping data access. We have a longstanding interest in observing and communicating how data moves in and out of the self-tracking systems we use every day. That interest, and support from partners like Intel and the Robert Wood Johnson Foundation, has helped us start to explore different methods of describing how data flows. We’re grateful to Dawn and Robin for taking this important topic on at the conference, and to all the breakout attendees who contributed their thoughts and ideas. If mapping data access is of interest to you we suggest you join the conversation on the forum or get in touch with us directly.
Mapping Data Access
By Dawn Nafus and Robin Barooah
One of the great pleasures of the QS community is that there is no shortage of smart, engaged self-trackers who have plenty to say. The Mapping Data Access session was no different, but before we can tell you about what actually happened, we need to explain a little about how the session came to being.
Within QS, there has been a longstanding conversation about open data. Self-trackers have not been shy to raise complaints about closed systems! Some conversations take the form of “how can I get a download of my own data?” while other conversations ask us to imagine what could be done with more data interoperability, and clear ownership over one’s own data, so that people (and not just companies) can make use of it. One of the things we noticed about these conversations is that when they start from a notion of openness as a Generally Good Thing, they sometimes become constrained by their own generality. It becomes impossible not to imagine a big pot of data in the sky. It becomes impossible not to wonder about where the one single unifying standard is going to come from that would glue all this data together in a sensible way. If only the world looked something like this…
We don’t have a big pot of data in the sky, and yet data does, more or less, move around one way or another. If you ask where data comes from, the answer is “it depends.” Some data come to us via just a few noise-reducing hops away from the sensors from which they came, while others are shipped around through multiple services, making their provenance more difficult to track. Some points of data access come with terms and conditions attached, and others less so. The system we have looks less like a lot and more like this…
… a heterogeneous system where some things connect, but others don’t. Before the breakout session, QS Labs had already begun a project  to map the current system of data access through APIs and data downloads. It was an experiment to see if having a more concrete sense of where data actually comes from could help improve data flows. These maps were drawn from what information was publicly available, and our own sense of the systems that self-trackers are likely to encounter.
Any map has to make choices about what to represent and what to leave out, and this was no different. The more we pursued them, there more it became clear that one map was not going to be able to answer every single question about the data ecosystem, and that the choices about what to keep in, and what to edit out, would have to reflect how people in the community would want to use the map. Hence, the breakout session: what we wanted to know was, what questions did self-trackers and toolmakers have that could be answered with a map of data access points? Given those questions, what kind of a map should it be?
Participants in the breakout session were very clear about the questions they needed answers to. Here are some of the main issues that participants thought a mapping exercise could tackle:
Tool development: If a tool developer is planning to build an app, and that app cannot generate all the data it needs on its own, it is a non-trivial task to find out where to get what kind of data, and whether the frequency of data collection suits the purposes, whether the API is stable enough, etc.. A map can ease this process.
Making good choices as consumers: Many people thought they could use a map to better understand whether the services they currently used cohered with their own sense of ‘fair dealings.’ This took a variety of forms. Some people wanted to know the difference between what a company might be capable of knowing about them versus the data they actually get back from the service. Others wanted a map that would explicitly highlight where companies were charging for data export, or the differences between what you can get as a developer working through an API and what you can get as an end user downloading his or her own data. Others still would have the map clustered around which services are easy/difficult to get data out of at all, for the reason that (to paraphrase one participant) “you don’t want to end up in a data roach motel. People often don’t know beforehand whether they can export their own data, or even that that’s something they should care about, and then they commit to a service. Then they find they need the export function, but can’t leave.” People also wanted the ability to see clearly the business relationships in the ecosystem so they could identify the opposite of the ‘roach motel’—“I want a list of all the third party apps that rely on a particular data source, because I want to see the range of possible places it could go.”
Locating where data is processed: Many participants care deeply about the quality of the data they rely on, and need a way of interpreting the kinds of signals they are actually getting. What does the data look like when it comes off the sensor, as opposed to what you see on the service’s dashboard, as opposed to what you see when you access it through an API or export feature? Some participants have had frustrating conversations with companies about what data could fairly be treated as ‘raw’ versus where the company had cleaned it, filtered it, or even created its own metric that they found difficult to interpret without knowing what, exactly, goes into it. While some participants did indeed want a universally-applicable ‘quality assessment,’ as conveners, we would point out that ‘quality’ is never absolute—noisy data at a high sample rate can be more useful for some purposes than, say, less noisy but infrequently collected data. We interpreted the discussion to be, at minimum, a call for greater transparency in how data is processed, so that self-trackers can have a basis on which to draw their own conclusions about what it means.
Supporting policymaking: Some participants had a sense that maps which highlighted the legal terms of data access, including the privacy policies of service use, could support the analysis of how the technology industry is handling digital rights in practice, and that such an analysis could have public policy implications. Sometimes this idea didn’t take the form of a map, but rather a chart that would make the various features of the terms of service comparable. The list mentioned earlier of which devices and services rely on which other services was important not just to be able to assess the extent of data portability, but also to assess what systems represent more risk of data leaking from one company to another without the person’s knowledge or consent. As part of the breakout, the group drew their own maps—maps that either they would like to exist in the world even if they didn’t have all the details, or maps of what they thought happened to their own data. One person, who drew a map of where she thought her own data goes, commented (again, a paraphrase) “All I found on this map was question marks, as I tried to imagine how data moves from one place to the next. And each of those question marks appeared to me to be an opportunity for surveillance.”
What next for mapping?
If you are a participant, and you drew a map, it would help continue the discussion if you talked a little more about what you drew on the breakout forum page. If you would like to get involved in the effort, please do chime in on the forum, too.
Clearly, these ecosystems are liable to change more rapidly than they can be mapped. But given the decentralized nature of the current system (which many of us see as a good thing) we left the breakout with the sense that some significant social and commercial challenges could in fact be solved with a better sense of the contours and tendencies of the data ecosystem as it works in practice.
 This work was supported by Intel Labs and the Robert Wood Johnson Foundation. One of us (Dawn) was involved in organizing support for this work, and the other (Robin) worked on the project. We are biased accordingly.