Decisions and Experimentation in the Quantified Self
chronic condition | stress
Ian Eslick spent the last four years thinking about variations on how do we disseminate what we’ve learned. When he got involved with QS, he was on pause with his Ph.D. and was working with rare disease communities and thinking about how patients can influence the research and development process. He realized he could bring these two worlds together. He shares his story about what he did, how he did it, and what he learned.
Decisions and Experimentation in the Quantified SelfI’ve spent the last four years thinking about variations on that fantastic quote of how do we disseminate what we’ve learned. And I find sometimes that it’s difficult for me personally even although I’ve got a decade or more training as a scientist to take an example and apply it in my own context.
And I got involved with Quantified Self because I sat next to Gary just by chance at a conference about four years ago, and I was at a sort of pause point at my Ph.D. I had been working with rare disease communities and thinking about how patients can influence the research and development process. And I realized I was never going to graduate going down that road.
And in talking with Gary and talking about my own personal story, which I’m going to share with you today, and it occurred to me that I could bring these two worlds together, what I was looking at in terms of the health care system, but also what I was experiencing personally.
And so today I’m going to share two stories about what I did and how I did it, and what I learned. And I’m going to introduce a couple of concepts that have come out of the work over the last four years. And this is a kind of a nice full circle because I graduated a few weeks ago, and I’m moving on to actually translate some of the learning into the health care system with some partners.
So when I was 15 I was diagnosed with an autoimmune disorder called psoriasis, it’s extraordinarily common. There’s 4 million people in the United States who have it and I have a very mild case at least in terms of the physical symptoms. But it’s characterized by your skin going crazy in responce on an autoimmune activity.
And that was okay. You can deal with it, you can kind of manage it, and it was never that big of a deal until I was about 24 years old. And when I turned 24 I started having all of these other symptoms. So psoriasis responds to stress; I had been going through a stressful time. And I started noticing more and more problems just cropping up in my life. And this is the benefit of hindsight. I didn’t know that these were all connected, but you know, horrible fatigue, never sleeping well, nasal congestion and it goes on and on and on and some really gross symptoms that I’m not going to share with you today.
And 10 years later I still sort of dealing with these symptoms but I started to notice correlations, and I had tried to write things down and I was never able to track for more than three days. I needed Emilia’s talk to figure out a discipline that would work for me. That was the discipline I was never able to acquire.
But what I was able to require, and this was the talk that Gary encouraged me to give which is on the Quantified Self site, which was I started keeping track in my head of all the different theories about what might be influencing my symptoms and I’m fortunate that my symptoms would vary. The idea of variation is extraordinary powerful, it’s how we learn about cause and effect, and I’m going to talk about cause and effects in a theoretical sense shortly. But what I ended up with was this whole different theory’s in what I might have and I was on the internet and hypothesizing.
But what I was able to do is perusing one hypothesis, I found an online guy and he gave me like a protein powder and some herbal supplements. And when I did that for three days, particularly getting rid of all carbohydrates I started feeling a lot better. So I thought ‘A-ha, I have now discovered a baseline state.’
So part of what I want to do in the talk today is think about some of the language that we might learn to share that would help us characterize what you do in your experiments and how that relates to what I might do in my experiments.
So the idea of looking at this variation and trying to find a reliable way to get into a baseline state, even if it’s unrealistic to sustain gives you an opportunity to learn because now problems are much more obvious.
So what would happen with me is I’d start feeling better and I would go along for a few days and I’d eat a salad and the next day I would feel horrible. And it took me a while to realize that infact that the relationship between the cause and the effect was about two to three days. So that’s really hard, so now I’ve got to remember what was I doing three, four, five days ago. So absolutely writing things down would have been extraordinary useful.
But I didn’t write things down, but because I had a baseline state and I knew a very simple protocol to sustain health, I was able to then keep track of the things that might possibly cause me to feel worse. So now I can have a mental record of the last week or so, what are the things I did that might have screwed me up. And I can start to go backwards and say, ‘Ah, it must have been this’, or ‘I can’t think of anything so it must have been something that was outside of my current hypothesis.’
And this eventually leads me to discover that I cannot eat fructose bearing foods. I don’t entirely know why, I have some theories. I can’t eat grains, I can’t drink alcohol. I have some wonderful charts now that I do record showing me my pain score spiking three days after having a few drinks. And I have to minimize refined sugar and dairy, but if I do that I’m a whole different human being.
And when I figured this out it was like being 20 years old again and I’ve just turned 40, so you know it’s actually a pretty remarkable things that in my 30s because I was so unhealthy I’d feel horrible. So the discovery process was using this baseline and mental arithmetic to discover cause and effect.
And as I mentioned I had a chance encounter with Gary and went to the QS meetup, and while I was there it really forced me to think through this process of how could I have done this faster, or how could I have done this in a year or two years and not in a decade; that’s a good Ph.D. topic.
So, tracking versus experimentation, so the discussion that Gary and I had had was about well how do you know that you know what you think you know, and it’s a very meta concept. But the reality is that I had a lot of hypotheses along the way that were wrong or mistaken, but they were part of the learning process. There’s nothing wrong with a bad hypothesis as long as you don’t hold on to it when you’ve got evidence to the contrary.
So tracking is about observing variation right, and finding patterns in that variation and using that to create a hypothesis. But if you are going to act on something, if you are going to regulate your life you have to have a model of your life. You have to know that the change that I make is going to yield a certain outcome. And that actually turns out to be difficult in many circumstances, that if the effect is extraordinarily large you don’t need any science or measurement it’s just obvious, you just stop drinking alcohol and I feel better.
And so causality is not just established by experiment, it’s established by temporal coincidence. And that’s why I think Quantified Self is so effective. You don’t need a lot of methodology. If you have repeated trials and things that come together close in time and the effects are pretty large.
It’s when things get a little bit more subtle or take longer time, but when you’ve got three or four days it’s harder to notice those correlations and that’s where the measurement really kicks in.
Things that are extremely small effects who cares because it’s not making my life better, and that’s what traditional science is concerned with small effects in large populations, Quantified Self is fundamentally different.
So experimentation is really thinking about cause because we want to make a decision to change something in our life, and that’s really where all of my research went, which is in order to identify cause we have to remove noise.
So what is noise? So obviously if I don’t sleep I get fatigue. So there’s a relationship between sleep and fatigue. But I was also having fatigue that was clearly correlated to my diet, kind of through that repeated observation over time that as I modulated my diet my fatigue would modulate.
So multi-causality is something that actually most people have trouble thinking about naturally. When you go into the educational literature people think in cause and change, they do x and then y and z, you know the knee bones connected to the.
So the trigger foods also cause fatigue. So if I happen to have at a time when I’m eating certain foods and I’m not sleeping and I don’t remember that I’m not sleeping, I can start feeling worse and it’s nothing to do with the food. Or I can stop eating the trigger foods but I just happen to be sleeping really well and that obscures the effect.
So the other thing that can happen is that maybe that when I’m sleeping better I just have better regulation, and so my diet happens to be better and my fatigue happens to be better because I’m sleeping and trigger foods have nothing to do with it.
So it’s these different forms of cause and influence that really drive experimental science, because what we want to do is to try and isolate the different influences so that we can identify the one cause and effect relationship.
So there is a formal methodology for doing experiments on a single person and it’s been around in psychology for 40, 50 years and it’s becoming more popular slowly in biomedicine. But it makes some key assumptions, it’s very similar to the ones that I described in my own life, which is that you have a consistent baseline. You do something, you change your state, you stop doing it, and you go back to where you were. And over moderate time scales that tends to be true for many of the things that we experiment with.
The other thing is you need short onset and offset times, and it takes six months to see an effect. It’s very hard to do a single subject experiment. It’s going to be a few years before we can get so statistical significance, and you need to have practical measurements. And so what you’re doing though is your using yourself in time as the control subject. So me, with my baseline diet is one state. Me with triggers is another state, so it’s a way of doing a negative trial and looking for a negative effect. I identify which trigger foods to remove from my standard survivable long-term baseline.
So I started exploring this idea of personal experiments, which are a little different than a one experiment. They are really more for what we do in our everyday context, not for what we do in the laboratory or with a physician. And I think actually this is also where medicine is going and we can talk about that at some other time.
But this is an example of a little experiment that provided, based on a patient forum post, where somebody said, hey, here’s a really simple treatment for a couple of bucks at Wallgreen, really helped with system control. It’s not a treatment for the disease, but it does manage symptoms. And I’m just using this as an experiment and this gives us a lot of understanding of some of the dynamics of change and how we might document them, so this is just basically how itchy I was I think and the amount of activity from my particular manifestation of psoriasis.
So you have noticed that my baseline state is right around a three, and as I go into my pink treatment period, well that baseline state persists. Some of that was a practice effect, and it took me a while to get used to doing it every day. Some of that was that it takes a while to have an effect. But now I can tell somebody else does this and what I found for me, that in 3 to 5 days, I started seeing an effect. And it was a huge effect. So if you just do it for 3 to 5 days, and you don’t see an effect then it’s probably not working for you.
And that’s actually a very interesting piece of sharing and just a little bit different than, ‘I just tried this and it worked’, it’s much more about starting to characterize how long it worked, how long it took to work, how big was the effect. What was it about me, and what will my problems in doing it? Documenting the pieces for the experiment can be extremely valuable for the next person, because the thing when I did, population research around this personal experimentation idea, what I found is that people don’t want to spend a lot of time. Anything more than a month is probably impractical for 90% of the population, maybe not this group, but you can’t ask people to spend forever doing these experiments, because you just need time out life intervening’s.
So, designing a tight experiment around understanding the dynamics is actually quite powerful because it means we are going to have more people participating and we can learn more as a community.
So these are some of the things that turn out to be very valuable in experimentation. So ultimately, we are trying to increase our confidence in something, and we are not trying to prove anything. We’re not trying to prove anything we can be a little bit more relaxed about the numbers.
The effect size; is this a large effect or a small effect? The onset and washout time so, how long do I have to wait after stopping the treatment to see a reversion to baseline? And all of these things are important for figuring out the schedule in which I modulate treatment to compare myself and baseline to myself as an under treatment.
So this was a little test site that I built for my Ph.D. and I brought about 20 or 30 people through the site at various times. And all it does it asks us to document these various features that I just mentioned. So we have measurements, which is how we measure a something that we’re concerned with. A treatment, which is what is the intervention, what are the dynamics of that intervention, and an experiment which brings these two together.
This is an example for a common treatment for psoriasis that many people online report works for them, its curcumin and it’s a herb that you can take, and there is clinical trials going on for this in the pharmaceutical context. But many people find that over-the-counter herbal treatments are quite effective.
So you have a hypothesis that says, I think there is an outcome, so what is it effect on scaling. So I’m bringing together a measure, and a thing that I’m hoping to influence that measure. And then there are other measures that might be relevant. Now why do we want these other measures? Well if we as a community start to identify that if you are looking at dietary triggers, and sleep is a major influence, and when you go back and look at your data you want to measure the things that are also influencing that outcome measure, and if you don’t do that then it’s much easier to get confused, or you need to have a much longer trials in order to kind of randomize over all of the different influences coming from these other parameters.
So, you can configure and run a trial on the platform, it uses SMS to do data collection, and I’m happy to talk to anybody more about it afterwards. It’s at personalexperiments.org
So this is an example of my data on the site. Me who never recorded anything for three days and I now have a year of data of all sorts of different things coming from Rescue Time and little wrist devices, and I used to get six or seven SMS queries a day. So that top score up there with my experiment chart is my pain score, my misery index, how bad do I feel on any given day. And it’s somewhere around the middle, which is quite survivable these days.
You notice there are two little red circles and that’s math. So what the math is doing is looking at my baseline data and saying, hey, are we seeing a change that is significant? And we will talk about significance in a second because it doesn’t mean the same thing here. That means in most scientific context. And at the bottom you can see my sleep, which is an important function that influence on my pain index.
So let’s take a brief aside and talk about other things that we can use data for. I had a colleague who is a mother of a three-year-old with cystic fibrosis who came onto the site, and she is simply just trying to document her son’s condition. But in the summer, her son had a spike in his coughing frequency you can see on the chart, and it got bad enough that she went into ER.
So short of experimentation data can still influence decisions if it helps you differentiate between two possible alternatives. There’s two explorations to this coughing; post nasal drip, treatable by easy inhalable steroids or a bacterial exacerbation that can lead to death. So as a physician he’s like okay, two weeks IV antibiotic drip, we don’t want to mess around with exacerbation. So she’s arguing with him, yeah but he was fine during the day, exacerbations are constant, ‘Yeah, you don’t want to take a chance.’ So the doctor is saying look, we don’t have enough data to be confident that it’s not a very serious condition, so we should do the serious intervention. So she pops up the site, shows him the data and says look, look at that variation over the last couple of days. He goes ‘Huh, okay, well let’s wait 24 hours. Let’s do a hold. Let’s agree that if he doesn’t get better in 24 hours we’ll do the IV antibiotics.’ Two hours later he is tested positive for rhinovirus, a few hours after that the nasal steroids kicked in, the coughing frequency subsided and he is out of ER the next day. So that saved a three-year-old two weeks in hospital on an IV drip.
So, other subjects that we have worked with also learned from doing repeated trials of data. So this is a patient who on the left had just had a surgery, and in that surgery it secured the current manifestation of his inflammatory bowel disease, which meant that he shouldn’t be experiencing any pain now from eating food. But he's terrified because for a long time certain foods would cause extraordinary pain. And so he and his doctor agreed on a protocol using a variation of the site that I did for my doctoral work, and essentially said, we’re going to do little trials of food, and if you don’t hurt in two days because you know that that sure response interval then you are probably fine for that food. Let’s just do one food at a time real simply. And after three or four of the ad hoc trials having no pain, he got a lot of confidence in his treatment and he said okay I’m done, and after a while he stopped tracking.
The other patient is a micro-biome transplant response and I won’t describe the gory details of that, but what you can see is in the second to bottom chart there is his weight going up, which is what you want to see for and IV patient responding, so now you have an opportunity to document the response to a therapy. All of which increases our understanding of what is the timeline we might expect to see improvement.
So each one of these gives us different ideas about the parameters of that causing graph of how things influence the evolution of our life.
What I came to the conclusion after all of this is that N-of-1 is serious overkill for QS. It’s an unnecessary level of rigor, 95% confidence intervals are about scientific causing proof. But what I want to know is, am I making a better decision, is data improving my decision. In some measurable way, not is it a perfect decision or do I have proof.
So what value is personal significance over statistical significance? A statistical significance says, that if I run this trial 20 more time is unlikely to get the same result. But what I want to know is should I still keep doing them? And in QS, we’re never really going to stop experimenting in a way because our life keeps going. So unlike a clinical trial, which takes place over a prescribed time, personal experiments are something which always happens. And so what also happens is we always stop adhering to any treatment regime at some point in time. So if we are tracking, we have an opportunity to do a retest.
So I have this nice little diet, I go away on vacation, I have a sort of runaway weekend, and then the next week I feel horrible. I’m like, okay that confirms my hypothesis that all these things are triggers, and then I retest that every few months and I have been doing so for years.
So when we think about this more formally you know, clinical trials are about theoretical causation and personal experiments are decision support. We’re biasing the odds in our favor; why not trying to prove anything.
You know, in a clinical trial we are looking at population effects, that they’re so high (heterogeneity? 11:43) In almost all clinical trials; many people don’t respond, people have horrible side-effects.
So N-of-1 trials assess your individual effects, and in personal experiments really it’s what I experiencing. So if something works for one of you and I want to try it out, I can try out in my own life. But the more I know about how you tried it and what you experience, the more likely I am to get some useful data to improve my own experience.
So personal experimentation is simply tracking, but we’re adding a little bit of discipline. And the question that kind of drives me, what is the minimum discipline we need to create a more generalizable knowledge that we can share amongst each other more accurately. Perhaps beyond I think the person-to-person sharing of examples from Gary’s quote is extremely powerful and that’s what makes QS so fantastic, but can we do a little bit more, so that we can share things more prescriptively, and that would actually enhance the spread of good ideas in health improving interventions.
So, if you think about what we did thinking about is this tracking, and my characterizing my behavior, or am I trying to influence the system of my life in some way, and whether that is energy, or productivity, or health, talking about it as an experiment, as opposed to tracking, says well, maybe there is some additional language that we want to bring into the what I did.
And how I did it, well there is a set of things that if we document such as well what was the exact measure I used? What other measures I might have used? What was that timing? What was that onset and offset time? How long did I sustain the benefit?
You know, I think a big open question in the entire world right now is, how long does the placebo affect work? I couldn’t find any literature on this for my big dissertation that says, other than we know that it exists, we don’t know if it’s reliably evoked over months and years. I would love to know the answers to that and maybe we could figure that out.
And what did you learn? So, part of what we learned is what worked for me, so what we might learn as a community is what other ways I might have tested. Maybe somebody else could test it and we could see whether they uncovered a nominal that might inform my own investigations.
So the vision that Gary ask me to present is Quantified Us, so there is a variety of things that we might do as a community that just enhance the prescriptive thread of what we’re learning.
So I as Gary has already done is just a picture of what you learned. Maybe with a paragraph that telling are a little bit more about it, and ideally a video at a QS talk, but you might also want to see CSV files that somebody else could sit down and look at the parameters of change. And in order to interpret that we need to know something that you were doing in your life. And so if we have a little experimental template then we know what are the important things to capture so somebody else could learn from my experience.
Indexing; it’s just having a place so I can search for everybody who’s measured something in a certain way or anybody who has tried it as a treatment. And a catalogue for personal experiments worked with, or Edison or many other sites that try to formalize the entire ecosystem, and I think we need more of those to explore the user interface dynamics that are going to work for different segments of the population.
So I would love to talk more about this. Thank you very much.