Data Portability
Joost Plattel
February 2, 2011
I started self-tracking a long time ago, but I recently came across an interesting thing I would like to share. I wanted to change one of the services I use, but I could not find a way to export or get my data. I searched Google for a way to access my data for the specific service but didn’t find a way out.
My data was stuck on the platform I didn’t want to use anymore… This brings me to my important point: data-portability. In the case of self-tracking, all data you generate essentially belongs to you, but services often make it hard for you to own your data. There are several reasons for this, including the strategy to keep you locked in as a customer.
If you are stuck with a service, there are several things you can do to get your own data back:
- Manual copy-paste (mindless work and it takes a long, long time with bigger data sets, which most of the time isn’t an option.)
- Build a scraper to get the data from the web service (not everybody has the knowledge to build things like scrapers). If you do build one, please share the scraper on GitHub or another service to provide the solution for other people.
- Contact support to see if they can do anything (often development is slow, so you need to have some patience).
- Try Google, to see if someone else made it out with their data and how they did it.
To prevent data lock-in, remember to think about data portability when choosing a service. I have learned this by experience and would like to warn you about the troubles it caused me.
Luckily most of my data has now been set free by building scrapers with Python. It took me a while to learn and build, but in the end it saved me lots of time and I got an extra skill that is useful for other projects!
Have you encountered troubles or useful solutions when dealing with data portability? Please feel free to share your opinions or knowledge in the comments!