Data Portability

February 2, 2011

I started self-tracking a long time ago, but I recently came across an interesting thing I would like to share. I wanted to change one of the services I use, but I could not find a way to export or get my data. I searched Google for a way to access my data for the specific service but didn’t find a way out.

My data was stuck on the platform I didn’t want to use anymore… This brings me to my important point: data-portability. In the case of self-tracking, all data you generate essentially belongs to you, but services often make it hard for you to own your data. There are several reasons for this, including the strategy to keep you locked in as a customer.

If you are stuck with a service, there are several things you can do to get your own data back:

  • Manual copy-paste (mindless work and it takes a long, long time with bigger data sets, which most of the time isn’t an option.)
  • Build a scraper to get the data from the web service (not everybody has the knowledge to build things like scrapers). If you do build one, please share the scraper on GitHub or another service to provide the solution for other people.
  • Contact support to see if they can do anything (often development is slow, so you need to have some patience).
  • Try Google, to see if someone else made it out with their data and how they did it.

To prevent data lock-in, remember to think about data portability when choosing a service. I have learned this by experience and would like to warn you about the troubles it caused me.

Luckily most of my data has now been set free by building scrapers with Python. It took me a while to learn and build, but in the end it saved me lots of time and I got an extra skill that is useful for other projects!

Have you encountered troubles or useful solutions when dealing with data portability? Please feel free to share your opinions or knowledge in the comments!

Related Posts

Self-Registration: A person-centered approach to recording symptoms, observations, and outcomes.

Gary Wolf

August 11, 2020

If we want to know about typical and atypical symptoms of COVID-19, why wait until people show up at the doctors’ office or emergency room and then ask them to tell us: When did you first feel sick? It’s reasonable to want to build on top of our everyday tools, and track the development of the disease as it occurs. I want to underline what tends to be forgotten in our product-obsessed culture: these tools are not simply measurement instruments and wearables; they include the social and cognitive tools individuals are using to understand and manage their own health.

A Stage-Based Model of Personal Informatics Systems

Gary Wolf

July 3, 2020

In 2010 Ian Li, Anind Dey, and Jodi Forlizzi published a prescient paper called "A Stage-Based Model of Personal Informatics Systems" based on interviews in the Quantified Self community. It was a prescient description of an emerging practice.

A Framework for Personal Science

Gary Wolf

July 2, 2020

Self-tracking. Self-experiment. N-of-1 methods. Single subject research. The kinds of self-research seen in the Quantified Self community are described by a thicket of labels. In a perspective article recently published in Frontiers in Computer Science, Gary Wolf and Martijn de Groot attempt to provide a clear definition and framework for research.