Designing good experiments: Some mistakes and lessons

January 21, 2011

Like you I’m an avid self-experimenter, and I’m always on the lookout for things to change that will either a) improve me, or b) help me understand myself better so I can do a). I was comparing notes recently with Seth Roberts (his QS posts are here) about what experiments we’ve done, what processes we’ve used to do them, and what lessons we’ve learned from them. I thought I’d share some of my take-aways with you and ask what you’ve learned from your own self-experimentation.

Keep experiments specific and simple

A mistake I’ve commonly made in the past made is trying to track too many things at once. For example, a year ago I was terribly fatigued and decided to improve sleep quality. I tried a bunch of things [1] but I wasn’t careful about keeping them separate, or stopping one before starting the next. The lesson is that the changes you make (“treatments”) and the things you measure (“variables”) should be simple and few. The general goal is to maximize the amount of information you get using the least amount of effort. This should tell you where to go next.

( I made changes like going to bed when I first felt tired, implementing a calming and regular nightly routine, eliminating caffeine, stopping using the computer from 9pm on, cutting out bright lights before bedtime, not eating or exercising right before, and taking drugs like Ambien and Xanax. Results: The first technique was, and continues to be, helpful, but time’s erasure of the stress from a family emergency a year ago made the biggest difference.)

Know the type of your design

Though I’ve been experimenting on myself for many years, it was only recently that I understood the basic approach of testing things on myself. I’ve learned that most kinds of self-experiments are a type of back and forth process called “Reversal or ABA designs.” From the Wikipedia article:

The reversal design is the most powerful of the single-subject research designs showing a strong reversal from baseline (“A”) to treatment (“B”) and back again. If the variable returns to baseline measure without a treatment then resumes its effects when reapplied, the researcher can have greater confidence in the efficacy of that treatment.

The idea as I understand it is straightforward, but it helped me to lay it out:

  1. Define the question you’re trying to answer (e.g., “Is grinding during the day causing my tooth pain?”),
  2. Decide one thing that you’re going to change (e.g., wear a night guard during the day),
  3. Decide at least one corresponding measurement you’ll make (e.g., pain on a scale of zero to two),
  4. Start taking measurements for a while (you’re in the first “A”),
  5. Implement the change and keep measuring (now you’re in “AB”),
  6. Then cut out the change and continuing measuring until you’re done (“ABA”).

What you’ll look for is whether your variable changes during the “AB” and “BA” transitions. If it does, you probably found something. If not, try something new.

For example, as hinted at above, one experiment I’m doing is working on reducing pain I have in a certain tooth. I’ve tested a number of things (including cutting out ice cream and acidic foods) and now I’m investigating the contribution to the problem my grinding might be making. In this case “A” is wearing a mouth guard at night (my baseline), and “B” is wearing it during the day too. I just finished the second “A,” and my results were surprising: not much difference! (I’m now investigating what appears to be a diurnal cycle.)

A second, odder type of experiment is one that takes advantage of the subject’s symmetry by testing two treatments in parallel. (I’m told that in statistics this is called “blocking,” but I haven’t found a good reference yet.) I recently used this to test different cold weather clothing for mountain biking by wearing different footwear on the left and right sides during the same ride. One result was that the order of sock/bootie/neoprene layers did not matter; left and right sides were not appreciably warmer. Another left/right example is what a friend did when she got poison ivy. She didn’t know which over-the-counter treatment to use, so she tested one on each side – brilliant! (In this case, my friend found out something that no one could have told her – how well the treatments work for her. This highlights a fundamental truth that underlies much of our work: What matters most to me is not whether it works for everyone, but whether it works for me.) A final experiment is one I started last year when we painted half of our house with latex paint and the other half with an oil-based one. It’s only been a year, but there is zero difference so far.

I am very curious to hear if you know of other types of designs besides these two.

Allow time for understanding to grow

A frustration I face with complex subjects (like human bodies, and our behavior and relationships) is that it can take time to figure out which variables are relevant to the problem, and how long they take to demonstrate their effect. For example, I’ve struggled with a mood disorder for years, and, like my insomnia experiments, I’ve tested out different remedies such as meditation and medications. However, it’s not so straightforward when you factor in life events, stressors, and biochemical “weather.” (I found the post QS Measuring Mood gave a great overview of the complexities of measuring mood, by the way.) Another example is changing diet. Maybe you’re different, but it took months before I noticed certain results of a vegan diet. Or take business networking – how do you determine results when effects are indirect, especially with social investments like meeting people?

What I took away is the need to be patient by giving the results time to emerge, and to be flexible, say when you notice something useful and decide to change to a new line of experimentation. Or, alternatively, the situation where you decide to drop an experiment when it doesn’t seem to be producing results.

Know that you’re doing one

This might be obvious to you, but there were times when I’ve caught myself doing “stealth” experiments that, had I thought of them that way, would have resulted in my doing something very different. As an extreme example, I started dabbling with Twitter in 2007 and eventually got sucked into spending up to an hour/day on it. Fast forward two years and it hit me during a late night Twitter-fugue that I didn’t actually know why I was using the bloody thing! What I should have done was to clarify what my goal was, what I’d be testing, and for how long I’d let it go. Again, because the results can be indirect, you might have to get clever in creating measures. For example, if you’re trying to create business by forming relationships on Twitter, one thing to try is to simply ask prospective clients how they found out about you. Again, ideally you’d do an ABA design.

Doing something, even if it’s imperfect, is better than nothing

This lesson was important to me because a major result of my analysis of over 100 experiments I’ve done in the last five years is that many were non-quantified. Not having numbers limited some forms of analysis and learning, but at the same time doing them helped me make lots of improvements in my life. For me what’s crucial is the experimental mindset itself – looking at life with curiosity, and bringing mainly questions (rather than answers) to how we go about improving ourselves. Though it’s a cliche, I think it’s true that the only failed experiment is one that you didn’t learn from.

[Image from Chemical Heritage Foundation]

(Matt is a terminally-curious ex-NASA engineer and avid self-experimenter. His projects include developing the Think, Try, Learn philosophy, creating the Edison experimenter’s journal, and writing at his blog, The Experiment-Driven Life. Give him a holler at

Related Posts

Self-Registration: A person-centered approach to recording symptoms, observations, and outcomes.

Gary Wolf

August 11, 2020

If we want to know about typical and atypical symptoms of COVID-19, why wait until people show up at the doctors’ office or emergency room and then ask them to tell us: When did you first feel sick? It’s reasonable to want to build on top of our everyday tools, and track the development of the disease as it occurs. I want to underline what tends to be forgotten in our product-obsessed culture: these tools are not simply measurement instruments and wearables; they include the social and cognitive tools individuals are using to understand and manage their own health.

A Stage-Based Model of Personal Informatics Systems

Gary Wolf

July 3, 2020

In 2010 Ian Li, Anind Dey, and Jodi Forlizzi published a prescient paper called "A Stage-Based Model of Personal Informatics Systems" based on interviews in the Quantified Self community. It was a prescient description of an emerging practice.

A Framework for Personal Science

Gary Wolf

July 2, 2020

Self-tracking. Self-experiment. N-of-1 methods. Single subject research. The kinds of self-research seen in the Quantified Self community are described by a thicket of labels. In a perspective article recently published in Frontiers in Computer Science, Gary Wolf and Martijn de Groot attempt to provide a clear definition and framework for research.