The Big Bucket Personal Informatics Data Model

Seth’s post on Personal Science (especially about “data exhaust” [1]) got me thinking about big data and the implications for the self-tracking work we do. What evidence is there that big data will infiltrate self-experimenting? Under what conditions will self-tracking move from “small data”, or “data poor” (a few hundred or a few thousand data points) to “big data” or “data rich” (terminology from The Coming Data Deluge)? Let me share some thoughts and get yours.

First, what does “big data” mean [2]? From Wikipedia:

Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing.

This identifies an important problem. While it is natural to throw all our personal data into one big database, there are costs associated with doing so. I don’t mean those associated with capture (clearly we will solve the technical and cultural challenges), but the costs in sensemaking – turning data into actionable wisdom. Let’s put the problem into context and assume the future for personal science looks something like this (help me here):

  1. Many of our personal artifacts will be instrumented to know something about us (find many body-oriented ones in Walter’s Health Internet of Things), but the sky’s the limit. (For some great examples of work data we might capture, see Gary’s comment on my post The Quantified Worker.) The idea is that these things will be smart enough to answer questions needed for our experiments, like “How much water did I drink?”, “How active was I today?”, or “Did I raise my voice this week?”
  2. These artifacts will seamlessly transmit data to a central place that each individual owns and has complete control over. Also contributing data are medical professionals and any other person or organization that learns something about us. They will be contractually obligated to share it.
  3. This data is augmented by self-tracking tricorders that we may wear, which capture other personal data channels like cognitive states and life events.
  4. From all that data the citizen scientist will periodically reflect and analyze via triggers such as periodic reminders, natural events from experiments (e.g., when a question is answered or an experiment ends), or opportunistic situations such as encountering a problem or having a friend asking how we’re doing.
  5. Finally, the experimenter applies the results by integrating them into new mental models or behaviors, and continues this cycle of thinking up experiments, trying them out, and learning from them.

(Note that these steps are non-linear and are happening in parallel.)

Given this flow, I argue that the hard work is in the final two steps – sensemaking and behavior change. Leaving the latter for now (Ian Ayres on Carrots and Sticks addresses that well), how can we do these effectively when we are collecting a lifetime’s worth of data? I don’t know, but a few things come to mind including using advanced statistical tools, Visual analytics, and possibly the most important, collaboration. After all, successful researchers know that science works best when collaborating with others. In fact, given this possible future, our relationships with professions may move more in this direction.

What do you think?


  • My reply: Exhaust usually means waste that’s a byproduct of production. However, in our case data is the means of self-improvement. It’s like a catalyst for making a change in ourselves. Plus, unlike exhaust, it has value after its use. While factories may capture waste products for other uses, they don’t treat the waste as intrinsically useful. That’s a big difference.
  • Two additional resources you might find helpful are Wired’s The End of Theory: The Data Deluge Makes the Scientific Method Obsolete and Nature’s Special on Big Data.

[Image from Paul Kidd]

(Matt is a terminally-curious ex-NASA engineer and avid self-experimenter. His projects include developing the Think, Try, Learn philosophy, creating the Edison experimenter’s journal, and writing at his blog, The Experiment-Driven Life. Give him a holler at

About Matthew Cornell

Matt is a terminally-curious ex-NASA engineer and avid self-experimenter. His projects include developing the Think, Try, Learn philosophy, creating the Edison experimenter's journal, and writing at his blog, The Experiment-Driven Life. Give him a holler at
This entry was posted in Discussions and tagged , , , , , . Bookmark the permalink.

7 Responses to The Big Bucket Personal Informatics Data Model

  1. Pingback: Tweets that mention The Big Bucket Personal Informatics Data Model | Quantified Self --

  2. Cindy says:

    It isn’t clear to me that we will solve the technical challenges, much less the cultural challenges, of getting complete, accurate data entered into a “database” where it can be analyzed. The challenges include:

    * economics (leaving aside cost of maintaining a database, measurements are non-value-added… should we measure the poor or feed them?, if we don’t measure the poor the economics have selected out a significant section of the population, would we want to select out the Kitavans or Masai?),

    * measurement conventions (does my measurement of “body temperature” have the same accuracy and precision as yours?),

    * measurement definition (“Device Company A” defines BodyTemp different than “Device Company B”… this is an extant problem in many industries trying to standardize), etc.

    Within the possibility of self-measurement shifting from “small data” to “Big Data”, I think the problem of “Big Data” database creation and maintenance is far larger than sensemaking and behavior change. If the data are not there or not accurate, hypotheses cannot be tested nor conclusions made. As always, garbage in = garbage out.

    BTW, my personal bias is to want to contribute my personal data to the “Big Data” database.

  3. Matthew Cornell says:

    Excellent points re: challenges. I have revised my thinking because of it. Thank you, Cindy.

  4. Mark Spohr says:

    I think we could look to a model such as Google Flu Trends for sensemaking from large fuzzy data sets.

  5. Trent Fowler says:

    With respect to Cindy’s reply, I have a few things to add. As far as economics goes, the problem of having holes in your data has existed for a long time. Despite the fact that sophisticated statistics exist to try and assess the accuracy of data sets, you can never be sure that you haven’t collected information from an unusual sample. You do raise good points about measurement conventions and definitions, however. My degree is in psychology, and there is no end to debates concerning what is called the “construct validity” of our measurements. In other words, are we actually measuring intelligence, or are we measuring how well somebody takes intelligence tests? How can we be sure that our measures of happiness actually map onto meaningful differences in positive emotions? And so on.
    I took Matthew’s post to be geared more to personal data. That is, what are we going to do when we each have 10-15 terabytes of information sitting on hard drives which ranges from assessments of happiness to alpha wave readouts to how many miles I jog a week and which spans decades of our lives? How are we going to do anything useful when we’ve gone beyond a thousand data points and scatterplots aren’t helping us all that much? Personally, I wouldn’t be surprised if an industry developed around individuals who are highly trained in some combination of statistical analysis, data visualization, and life coaching. Think of it as a personal trainer whose job it is to sift through millions of bytes of personal information to tell you, for example, how often you actually eat chocolate as opposed to how often you remember eating chocolate. Their job would then be to make prescriptions and dispense advice while attempting to keep you on track of your goals.

  6. Matthew Cornell says:

    Tapping work like done at is a great idea, Mark. I wonder what kind of patterns a massive repository of personal experiments would enable finding.

  7. Matthew Cornell says:

    Thanks so much for the really stimulating points, Trent. First, you describe the problem aptly – I’m going to re-post it on my blog. Second, your idea of a Personal Analytics Coach shows excellent foresight – you’re out there!

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.