Data Portability

I started self-tracking a long time ago, but I recently came across an interesting thing I would like to share. I wanted to change one of the services I use, but I could not find a way to export or get my data. I searched Google for a way to access my data for the specific service but didn’t find a way out.

My data was stuck on the platform I didn’t want to use anymore… This brings me to my important point: data-portability. In the case of self-tracking, all data you generate essentially belongs to you, but services often make it hard for you to own your data. There are several reasons for this, including the strategy to keep you locked in as a customer.

If you are stuck with a service, there are several things you can do to get your own data back:

  • Manual copy-paste (mindless work and it takes a long, long time with bigger data sets, which most of the time isn’t an option.)
  • Build a scraper to get the data from the web service (not everybody has the knowledge to build things like scrapers). If you do build one, please share the scraper on GitHub or another service to provide the solution for other people.
  • Contact support to see if they can do anything (often development is slow, so you need to have some patience).
  • Try Google, to see if someone else made it out with their data and how they did it.

To prevent data lock-in, remember to think about data portability when choosing a service. I have learned this by experience and would like to warn you about the troubles it caused me.

Luckily most of my data has now been set free by building scrapers with Python. It took me a while to learn and build, but in the end it saved me lots of time and I got an extra skill that is useful for other projects!

Have you encountered troubles or useful solutions when dealing with data portability? Please feel free to share your opinions or knowledge in the comments!

About Joost Plattel

Joost is the co-founder of the Quantified Self Meetup in Amsterdam, he's been tracking metrics about his life for more then 2 years and combines QS with lifehacking to improve his life. You can follow him on Twitter
This entry was posted in Discussions and tagged , , , . Bookmark the permalink.

13 Responses to Data Portability

  1. Matthew Cornell says:

    Excellent point, Joost. In the forthcoming simple data layer in Edison (end of this month, hopefully!) I’m implementing a simple text-based export. Your data is your data.

  2. Pingback: Tweets that mention Data Portability | Quantified Self -- Topsy.com

  3. Syler W. says:

    You mention building your own scrapers… Where would be a good place to start learning how to do so? Care to share any resources in particular that helped you?

    • Joost Plattel says:

      Scraperwiki: http://scraperwiki.org/ is an excellent place to start and try your first dabblings with scraping. If you are not yet familiar with programming or having a hard time learning the basics I recommend python or ruby because those languages have excellent documentation for learning in my opinion.

      • Syler W. says:

        Thanks! I took a short course in Python a year or two ago, so your post has provided the perfect nudge to refresh and deepen my understanding with practical applications.

  4. Rob Myers says:

    autonomo.us have some good standards for data portability, and for free network services in general:

    http://autonomo.us/

  5. Jscott says:

    Data control has been the core issues that has prevented me from using many services. With data entered manually this is not that big of an issue as you can set-up a simple platform for self-tracking. However, automated services (where the REAL play is) present the same issue that the Digital ID movement has been addressing for the past several years.

    Open and exportable platforms will make the process easier from the beginning and since this Self-tracking gig is user driven it might behoove all of us to be rather loud with our wants from vendors.

    As far as scraping…

    O’reilly put out a book called “php hacks” that had some great stuff dealing with scraping content along with pulling data.

    Scraping is win (as long as the website you are hitting is okay with it).

  6. Pingback: #quantifiedself data portability http://j.mp/qsdata via @brianoberkirch. Use #hAtom for time series web data! longurls: http://quantifiedself.com/2011/02/data-portability/ and http://microformats.org/wiki/hatom | 香港新媒體協會

  7. Have a look at what Mydex is doing. The concept is a personal data store, where the user can store all the data they generate. I think we will see a complete ecosystem of such PDS emerge, with data portability between them

  8. Pingback: Quantified Self Show and Tell 2011 #1 « Café Numérique

  9. Pingback: Questions for a Self-Tracking Service | Quantified Self

  10. Pingback: Quantified Self Show and Tell 2011 #1

  11. PRYV says:

    There is a way to keep all your data on one place, visualized and private! Never face again the problem with data portability. We made it possible at PRYV! You could find out more about Pryv on Indiegogo: http://www.indiegogo.com/projects/pryv-your-life-at-a-glance/x/5289451
    Evelina

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.