Share Your Method of Analysis Without Sharing Your Data
Bastian Greshake Tzovaras
May 10, 2018
We are happy to have a guest post from Bastian Greshake Tzovaras, the director of research at the Open Humans project, on a new way to share personal data analysis methods. Read to the end to learn about a data analysis contest happening this month. Bastian can be found online at @gedankenstuecke. -Steven
The Quantified Self community builds its collective knowledge from individuals sharing insights gleaned from their own n-of-1 data. Not only do we learn from these projects, we also get inspired to do the same or similar projects of our own. But it’s easy to get tripped up when trying to do the same analysis on your own data. Is your input data in the same format? Are you running the code on the same operating system? Can you get all the dependencies installed? What if you have never really written code before or executed analysis scripts?
In the realm of academic science these issues are grouped under the label “reproducibility”. A solution to many of these issues are Jupyter Notebooks, which can be used to share code for analyzing data. JupyterHubs make it easy to host these notebooks online and overcome the difficulties that come with different operating systems, software packages, etc. Open Humans, a non-profit foundation that helps people donate their data to research, is using this technology to make the analysis of self-collected data reproducible for other members of the Quantified Self community.
We just released Open Humans’ Personal Data Notebooks. These are run in the browser and give people access to the data that they have stored in Open Humans. Data from Fitbit, Apple Health, Moves, Twitter, and a selection of genetic data providers is currently supported. People can write their personal data analysis in Python, R or Julia right in their web browser and see the results there – without having to worry about installing any local packages on their own computer. If you are proficient in any of these programming languages, it is easy to write your data analysis from scratch. If you are unfamiliar with coding in general – or with Python, R or Julia, in particular – the Personal Data Notebooks offer well-documented example notebooks which can be run without any prior knowledge as no modifications are needed and can serve as a great way to start coding.
For all notebooks the resulting analysis and visualizations can be shared easily with other users who then plug in their own data. We have made it easy to decouple the data analysis from the underlying data. You can share your data analysis code without having to share your personal data itself. Since data sources inside Open Humans are standardized, someone else’s Fitbit data will work just as well as your own.
There are step-by-step guides to get started with Personal Data Notebooks and example notebooks which can analyze your activity data from Fitbit and Apple Health or perform a sentiment analysis of your Twitter data.
To celebrate the launch of the Personal Data Notebooks, Open Humans and Quantified Self are running a notebook competition.
To take part, all you have to do is:
-
create a data analysis of a data source of your choice with the Personal Data Notebooks
Gary Wolf, Steven Jonas, and Azure Grant of Quantified Self will judge and rank the submitted notebooks. The most interesting notebooks will be highlighted and added to the set of existing samples that are preinstalled for each user. The winning notebooks will be featured here, on the Quantified Self blog. If you want to share and discuss your notebook ideas, The Open Humans community on Slack is eager to have you.