Bastian Greshake Tzovaras

Bastian Greshake Tzovaras
Berkeley, United States
Bastian Greshake Tzovaras is the co-founder of openSNP, an open data repository that invites people to openly share their genetic data with the internet. After submitting his PhD thesis in evolutionary bioinformatics he joined Open Humans as a Director of Research. At Open Humans he is working on enabling academic and citizen science alike by shifting research paradigms, putting research participants into the driver's seat.

Share Your Method of Analysis Without Sharing Your Data

How the author has used sad/loving/joyful emoji on Twitter over time.

This visualization from an example Personal Data Notebook shows how someone used various emoji on Twitter over time.

We are happy to have a guest post from Bastian Greshake Tzovaras, the director of research at the Open Humans project, on a new way to share personal data analysis methods. Read to the end to learn about a data analysis contest happening this month. Bastian can be found online at @gedankenstuecke. -Steven

The Quantified Self community builds its collective knowledge from individuals sharing insights gleaned from their own n-of-1 data. Not only do we learn from these projects, we also get inspired to do the same or similar projects of our own. But it’s easy to get tripped up when trying to do the same analysis on your own data. Is your input data in the same format? Are you running the code on the same operating system? Can you get all the dependencies installed? What if you have never really written code before or executed analysis scripts?

In the realm of academic science these issues are grouped under the label “reproducibility”. A solution to many of these issues are Jupyter Notebooks, which can be used to share code for analyzing data. JupyterHubs make it easy to host these notebooks online and overcome the difficulties that come with different operating systems, software packages, etc. Open Humans, a non-profit foundation that helps people donate their data to research, is using this technology to make the analysis of self-collected data reproducible for other members of the Quantified Self community.

We just released Open Humans’ Personal Data Notebooks. These are run in the browser and give people access to the data that they have stored in Open Humans. Data from Fitbit, Apple Health, Moves, Twitter, and a selection of genetic data providers is currently supported. People can write their personal data analysis in Python, R or Julia right in their web browser and see the results there – without having to worry about installing any local packages on their own computer. If you are proficient in any of these programming languages, it is easy to write your data analysis from scratch. If you are unfamiliar with coding in general – or with Python, R or Julia, in particular – the Personal Data Notebooks offer well-documented example notebooks which can be run without any prior knowledge as no modifications are needed and can serve as a great way to start coding.

Code from an example Personal Data Notebook.

For all notebooks the resulting analysis and visualizations can be shared easily with other users who then plug in their own data. We have made it easy to decouple the data analysis from the underlying data. You can share your data analysis code without having to share your personal data itself. Since data sources inside Open Humans are standardized, someone else’s Fitbit data will work just as well as your own.

There are step-by-step guides to get started with Personal Data Notebooks and example notebooks which can analyze your activity data from Fitbit and Apple Health or perform a  sentiment analysis of your Twitter data.

To celebrate the launch of the Personal Data Notebooks, Open Humans and Quantified Self are running a notebook competition.

To take part, all you have to do is:

Gary Wolf, Steven Jonas, and Azure Grant of Quantified Self will judge and rank the submitted notebooks. The most interesting notebooks will be highlighted and added to the set of existing samples that are preinstalled for each user. The winning notebooks will be featured here, on the Quantified Self blog. If you want to share and discuss your notebook ideas, The Open Humans community on Slack is eager to have you.

Posted in Lab Notes | Tagged , , , | Leave a comment

Use TwArχiv to analyze your Twitter archives

Screen Shot 2018-01-08 at 15.22.38

We are happy to welcome this guest post on a community tool by Bastian Greshake Tzovaras. Bastian is the director of research at the Open Humans project. He can be found online at @gedankenstuecke. -Steven

I’ve built a Twitter analysis web application that’s open to everyone to use and learn from. Often the best data for learning something about yourself are data you’ve already collected; sometimes without even being explicitly aware of collecting it. Social media activity, for example. We often send off Facebook posts or tweets with very little thought about the metadata that we generate in doing so. Where was I when I made that post? What time was it? What type of content did it contain? Did I retweet or reply to another person’s post? And, of course, what did my post contain?

This data can be extremely powerful – for others. The language you use in your Tweets can be used to predict your age as well as your income. Twitter uses the data to gather information about your likes, dislikes, and possessions – among other topics. But what if you want to learn about yourself with your own Twitter data?

The tool I created allows anybody to explore their own Twitter archive in detail. First, you’ll want to request your archive from Twitter. It will contain all the tweets you have ever sent, with not only the text but all the metadata as well. To look at these metadata, go to my small web application called TwArχiv (pronounced tw-archive), which allows you to upload your data and explore it using interactive graphs.

For instance, you can see how the nature of the tweets you send change over time. Are you replying more to people than you used to or is it all just retweets by now? For my own data it seems that finishing up my PhD work had quite an impact, starting in late 2016. With less procrastination I wrote fewer unprompted tweets. Instead, replying to people became more central to my Twitter experience.

Screen Shot 2018-01-08 at 16.00.59

There is also plenty of research on gender bias in social media usage and whose voices are being amplified, with men being overwhelmingly favored.  TwArχiv allows one to do some soul searching on this. It tries to predict the gender of the people you interact with based on their first names and shows you whether your reply and retweet behaviour is gender-balanced.

My own graphs show that I had (and have) a good way to go here. Especially 2010 is wildly off when it comes to the gender representation in my Twitter interactions. What happened during that time? I was politically active in the German Pirate Party, which was infamous for being a “boys club”.

Screen Shot 2018-01-08 at 15.23.28

If you have geolocation enabled on your tweets, you can get an idea of where you tweet. With a fully zoomable map, TwArχiv allows you to explore the globe on all scales to see the broader picture as well as street-level tweet distributions. As a first attempt of seeing movement patterns, you can also get a time-stamped version of the map that highlights locations one tweet at a time.

If you want to give a try with your own archive, you can head to TwArχ The data storage is handled by Open Humans and by default your archive and the resulting visualizations will be private. (You can choose to make them public, though, to share them with your friends and followers – mine are here!).

A note: The Twitter archive does not contain any direct messages but only your tweets, so if you have a public Twitter account the archive is basically all your “public Twitter interactions”.

If you have ideas on how to extend the functionality of TwArχiv or you want to code your own Twitter archive analysis, you could even get funding to do so: The Open Humans’ mini-grants of USD 5,000 for projects that will enrich the Open Humans ecosystem are a perfect fit for this kind of data visualization and analysis.

Posted in Personal Projects | Tagged , , , , | Leave a comment