#### Taking blogdown for a test drive

##### Trying out RStudio's new blogging framework

`R`

`data science`

`tools`

`productivity`

Recently, I came across the blogdown R package, a variant of RStudio’s popular bookdown R package, made by Yihui Xie and Amber Thomas. Blogdown allows the user to write blog posts with code chunks, in any of the large variety of languages supported by RMarkdown, allowing for computationally reproducible writing and programming. It also plays well with the new static site engine Hugo. Here, I’m mostly just going to take `blogdown`

for a spin.

First, let’s generate some data and try doing some very simple summary statistics:

```
# we're going to be simulating...set seed and constants
set.seed(6142)
n <- 1000
tx_mean <- 25
# generate variables randomly and by structural equation
W <- replicate(3, rnorm(n))
A <- rnorm(n, mean = tx_mean, sd = 2*sqrt(tx_mean))
Y <- as.numeric(A > tx_mean & W[, 1] > 0)
O <- as.data.frame(cbind(Y, A, W))
colnames(O) <- c("Y", "A", paste0("W", seq_len(3)))
head(O)
```

```
## Y A W1 W2 W3
## 1 1 28.40771 1.6361789 0.3361618 2.2859933
## 2 1 43.26174 2.0236876 0.3815559 0.1296723
## 3 1 30.45214 0.1319931 0.3985546 0.7973811
## 4 0 23.52230 -0.4502451 0.7176603 -1.0905016
## 5 1 29.88353 1.7895223 -0.3369940 0.7153515
## 6 0 21.33805 1.7653295 -0.8362112 -0.7107431
```

`skim(O)`

```
## Numeric Variables
## # A tibble: 5 x 13
## var type missing complete n mean sd min
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 A numeric 0 1000 1000 25.18186636 9.7685295 -4.678917
## 2 W1 numeric 0 1000 1000 0.04544250 0.9997996 -3.482909
## 3 W2 numeric 0 1000 1000 -0.02480579 0.9891446 -3.677962
## 4 W3 numeric 0 1000 1000 -0.02308091 1.0069851 -3.367710
## 5 Y numeric 0 1000 1000 0.27100000 0.4446985 0.000000
## # ... with 5 more variables: `25% quantile` <dbl>, median <dbl>, `75%
## # quantile` <dbl>, max <dbl>, hist <chr>
```

Look at that! In the above, we generate background covariates (\(W\)) based on the standard Normal distribution, a treatment (\(A\)) based on a Normal distribution with specified mean (\(\mu = 25\)) and standard deviation (\(\sigma = 2 \cdot \sqrt{\mu} = 10\)), and an outcome (\(Y\)) based on a specified structural equation of the form:

\[Y = I(A > 25) \cdot I(W_1 > 0),\] for \(n = 100\).

Having just recently returned from ACIC ’17, I have causal inference on my mind. You’ll notice that in specifying the data generating processes above, I provided a structural equation for \(Y\) – now, let’s play with marginal structural models (MSMs) just a little bit:

```
msm <- glm(Y ~ ., family = "binomial", data = O)
summary(msm)
```

```
##
## Call:
## glm(formula = Y ~ ., family = "binomial", data = O)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.67224 -0.38138 -0.10080 0.07985 2.31857
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.523376 0.687346 -13.855 <2e-16
## A 0.272499 0.021257 12.820 <2e-16
## W1 2.589087 0.200157 12.935 <2e-16
## W2 0.053711 0.119842 0.448 0.654
## W3 -0.006588 0.108947 -0.060 0.952
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1168.50 on 999 degrees of freedom
## Residual deviance: 516.94 on 995 degrees of freedom
## AIC: 526.94
##
## Number of Fisher Scoring iterations: 7
```

Above, we projected the observed data (\(O = (W, A, Y)\)) onto a working MSM for the outcome, using a logistic regression with terms for all of the baseline covariates and the treatment. From the `summary`

of the resultant `glm`

object, we notice that the estimated coefficients for the terms corresponding to the treatment (\(A\)) and first baseline covariate (\(W_1\)) are statistically significant – recall that these were the variables we used in specifying the structural equation for \(Y\) above.

I think I’ll wrap up this experiment by trying some simple plotting:

```
p <- ggplot(O, aes(A, Y)) + geom_point()
p <- p + stat_smooth(method = "glm", method.args = list(family = "binomial"),
formula = y ~ x)
p <- p + xlab("A (treatment)") + ylab("Y (outcome)") + theme_nima()
print(p)
```

We can plot the relationship between the treatment and outcome using a logistic regression fit. Pretty awesome that `blogdown`

supports such nice graphics so easily. In my old blog, I had to come up with a custom R script to use `knitr`

to process R Markdown to standard markdown, allowing them to be rendered on my old Jekyll blog. Glad I never have to do that again 👍

#### Related

#### sl3: Machine Learning Pipelines for R

##### Simplifying machine learning in R through pipelines

`R`

`data science`

`machine learning`

`computing`

#### Renjin: put some Java in your R

##### Quick look at using R with some Java 'under the hood'

`R`

`data science`

`statistics`

`computing`