Can't You See I Was Falling In Love
social life & social media
Shelly Jang used GMvault to look through 5 years of Google Chat logs to hunt for signals that she loves only her husband. She looked at whom she messages, the time of a day, and the words she uses. She was able to extract meanings from innocuous metrics like “delay in response” to show whether her or her future husband were “playing games” at the beginning of the relationship. In the talk, she shares what she learned from her project.
Like many of us here, I’m drawn to the Quantified Self movement not only because of the curiosity, but also because the process and tools itself is fun. Look at all these tools I’ve used.
Between the two separate aspects of QS, they’re logging and discovering, I find more often than not I find more enjoyment from the latter. That’s why I like to attend the meetups because you guys tell me all the fun new ways to collect self-logs, and I come home and I get super excited and I try to implement the new method I’ve just learned.
But alongside this excitement I noticed the strange emotional side effect, which is I find myself a feeling of loss towards the logs I never got to generate. This is different from the guilt I feel when I miss an entry to a consistently kept log. It’s more like there’s a whole new aspect of life that I could have kept a record of, but because I wasn’t even aware of it I lost the data forever.
This got me thinking about the two different modes of logging; active and passive. Most self-logging processes require discipline and continuous action to keep going. But as technology advances forward, we require less and less action to keep track of ourselves and be able to tap into that data.
A great source of passively collected data is Google because they’re tracking you. If you can retrieve your own record from them, you might be able to try fun exercises. So among Googles products, because I’ve been using Google Chat for many years, I decided to investigate this particular dataset.
Google Chat logs consist of rather large amounts of text data for me, so I decided to pose a specific question to guide my analysis. It just happen that I had big on me to use GChat since before I met my husband, so I thought it would be fun to see if there was love present in my chatting behavior.
Because this chat history is saved in the same format as email, I was able to use GMVault, which is an open source tool mainly for backing up Gmail. But it can be used to download the logs, and I parsed the XML files into a flat file, timestamps, and messages.
The log files consist of five years of data, containing roughly 1900 days of data with 68 different people, and I typed approximately 30 million words. The first thing I did was trying to see when did my husband appear in my life. So this is the number of times that I mentioned his name to my friends and family. And you can see even before when we started dating his name appears, and it rapidly ramps up the number ramps up because I’m talking to my friends about him all the time.
So this is a good indicator, and even with something as straightforward of number of messages sent, you can tell for the end of 2009 and onwards my husband becomes a significant presence in my life. Could this be a sign of love? Afraid not, because this behavior alone is not representative because that yellow trace there shows pretty much the same pattern. And that’s not a second boyfriend. It’s my colleague that joined our lab around the same time, and we communicated through GChat for our collaboration. So it shows that there are two different domains of my life; personal and professional.
Next, I wanted to see if we played any sort of juvenile hard to get games when we texted each other, because I know in high school, when I used AOL I’d definitely delay before texting back to boys.
So I plotted the mean response caused be me and plotted against the response caused by him, and it shows that pattern more or less lies along the diagonal, so that no one can claim to play mind games. But because the fitted line has a slope less than one, mathematically, I’m the one who played a little bit of game.
And this data, plotted against a time series is also interesting, because you can also see a slightly increasing trend as the year passes. This may indicate that over the years we’ve gotten a little bit comfortable and don’t feel the need to reply right away anymore. I don’t think that’ a bad sign.
And next I looked at the time that I chatwith my husband. Each square represents an hour along the y-axis and the month along the x-axis. And you can see quickly, this is more or less an hour as to when was Shelly awake because we conversed continually through the day.
You can see for the first year and a half, there’s a bit of sparsity in the evenings. This is probably due to the fact that we actually spent time together. This trend went away now we are married, but I’m hoping that you know it re-surfaces otherwise we’ll be texting each other while we’re in the house.
So comparing the pattern with other folks in my life, you can tell that with my friends my behavior is definitely different. It’s more sporadic and sparse, really distinguishing the two.
A notable exception again is the mirroring on my working pattern, with my collaboration on the right upper hand corner.
So these are some of the quantifiable patterns I extracted from the timestamps, but what about the context of the conversations.
I got my inspiration from Dataclysm, which is written by the author of Ok Cupid blog I highly recommend it. So what I did was I took the common words I used when I speak to my husband and the common words I used when I speak to my colleague, and counted the number of times each word was used and ranked them and plotted them as a scatter plot.
So you can see most words are used more or less similar frequency across the two conversations. But, when you look at the off diagonal, you can see on the lower right hand side the words that I used more commonly with my husband than with my colleague and vise-versa on the left upper hand corner.
So the words that divide my personal and professional persona are I talk about food a lot when I talk to my husband, and I talk about work when I talk to my colleague.
And then I tried to see if there were you know time evolving vocabulary when I speak to my husband. So I did the same thing except now I potted against the different years, and it turns out that my vocabulary remained more or less consistent because you can’t really see that many off-diagonal elements shown on the graph.
But what about the unique words I used each year. So I wanted to see what words I used, only in 2009 and never again. And so 2009 was the year of swine flu, so apparently I said something like h1n1, and 2010 my husband taught me how to drive a stick shift, so I say it’s manual. So 2011 I was still really jazzed about the 2010 World Cup so you can see I was really excited about something. And so this was one of my favorite analyses because it really took me down the memory lane about the moments that I share with my husband that I somehow completely forgotten.
And then finally, I tried something really simple which was counting the number of times the word love is mentioned to see if there was a definitive sign of love. As you may have guessed, this is not the safest way to try to measure ones affections because my friends and colleagues consistently went over this metric against my husband. Obviously this is because a single word alone is never representative of the conversation. The context matters a lot. It turns up the word appear in less affectionate settings, like I mention the song What is Love a lot apparently and saying how fond I am of three day weekends, and there was tones of sarcasm in there. So upon close inspection, I’ve used the word in context about 10 times in five years, and low and behold, eight of those ten were from my husband.
I kind of enjoyed how this whole exercise forced me to painstakingly hunt for signals that would definitely prove my love for my husband and not somebody else. This is partly due to the fact that the abstract concept of love is really difficult to metricize and analytical results are only as good as the questions posed.
And I’m sure there are ways that I could have twisted the data so that I could give you a resounding yes to my own question. Of course you could tell I was falling in love.
This project was inspired by the fact last year I was really busy moving around, graduating, getting married, that I wasn’t able to work on my hobby which is keeping a fitness journal and being really good about that. So I was forced to go look for alternative source of data about myself.
I’m happy to say that this brought up some funny and interesting anecdotes between my friends. So if you would like to try this on your Google Chat log, please let me know and I would be happy to send you my own strip so you can start your own digging.