Quantified self with Fitbit - part 1

April 21, 2016
fitbit biometrics quantified self data science

An interesting phenomenon that has appeared in recent years is the notion of “Quantified Self” - simply the idea that quantifying much of our daily activity can lead to insights about our behaviors, and that a more thorough knowledge of our own behavior can help us to be more mindful of our health and lifestyle choices. In a previous post, I explored (in a very rudimentary fashion) information about my genome, in the form of single nucleotide polymorphisms sequenced by 23andMe™. More recently, I’ve had some success in extracting data on my daily activity, as recorded by my Fitbit Charge™.

Actually extracting informative data from Fitbit turned out to be the hardest part of the analysis I’ll talk about here…later on, I plan to take a more thorough look at the data and conduct a thorough statistical analysis.

Fitbit data can be downloaded from the standard web interface, but to get fine resolution data (on the level of minutes), you have to go through the Fitbit developer API. To do this, you have to register an app with Fitbit, selecting the “personal” option under uses (to get fine resolution data under other settings, you need to contact Fitbit and obtain access to Fitbit’s “Partner API”). After registering an app, you can then begin the process of accessing data through the API using a variety of different tools – I chose to go with a python implementation.

The python-fitbit tool contains a script (namely, gather_keys_oauth2.py) to obtain the “access token” and “refresh token” generated by the Fitbit API (in my eyes, this is the trickiest part). To do this, the personal key and personal secret assigned to your Fitbit app are necessary (these are displayed on the app page). After obtaining these keys, data can be downloaded in any format you would like; I found JSON to be most convenient for my purposes. To proceed with my elementary analysis, I used the python-fitbit tool to download my activity data from yesterday (April 20th).

After extracting my activity data for a single day, I merely loaded the resultant JSON file into python using the json module, isolated only the activity data via taking subsets (the JSON file created by Fitbit is actually a bit complicated, and organized slightly unintuitively), and passed the activity list structure to a pandas dataframe.

I then proceeded to plot the cumulative distribution function (CDF) of my steps (as measured minute-by-minute) over the past day. The plot is what the CDF looked like:

Distribution of my cumulative steps on 2016 April 20

I found the CDF of my steps over a single day to be a convenient and informative visualization – just by looking at horizontal regions, you can clearly see all the points in time where I was sitting rather still, probably working on some other data analytic or statistical project (like this one!).

This small amount of analysis – from figuring out the data extraction process to producing the plot above – took roughly 2-3 hours, most of which consisted of struggling with API access issues and re-orienting myself to the use of Python for data analysis (currently, I mostly use R for my biostatistical work).

All of the scripts that I produced are available on GitHub here. I plan to add several more thorough analyses to this repo over the next few weeks – of course, that’s assuming I don’t get distracted by other ongoing work/projects…

Renjin: put some Java in your R

Quick look at using R with some Java 'under the hood'
R data science statistics computing

Taking blogdown for a test drive

Trying out RStudio's new blogging framework
R data science tools productivity

R graphics - ggplot vs base

My $0.02 on a social media debate on plotting frameworks for R
R graphics data science not so standard deviations
comments powered by Disqus