Online experiments for computational social science

ICWSM 2014 Tutorial

Eytan Bakshy, eytan@fb.com | Sean J. Taylor, sjt@fb.com

Time and location:

Sunday, June 1, 9:00am - 12:00pm

Room 1255, North Quad Complex, University of Michigan, 105 South State Street, Ann Arbor

Description

This tutorial teaches attendees how to design, plan, implement, and analyze online experiments. First, we review basic concepts in causal inference and motivate the need for experiments. Then we will discuss basic statistical tools to help plan experiments: exploratory analysis, power calculations, and the use of simulation in R. We then discuss statistical methods to estimate causal quantities of interest and construct appropriate confidence intervals. Particular attention will be given to scalable methods suitable for “big data”, including working with weighted data and clustered bootstrapping. We then discuss how to design and implement online experiments using PlanOut, an open-source toolkit for advanced online experimentation used at Facebook. We will show how basic “A/B tests”, within-subjects designs, as well as more sophisticated experiments can be implemented. We demonstrate how experimental designs from social computing literature can be implemented, and also review in detail two very large field experiments conducted at Facebook using PlanOut. Finally, we will discuss issues with logging and common errors in the deployment and analysis of experiments. Attendees will be given code examples and participate in the planning, implementation, and analysis of a Web application using Python, PlanOut, and R.

Prerequisites

Basic knowledge of statistics and probability theory, and some familiarity with programming. We will be using R to do exercises involving power calculations and data analysis, so we recommend that attendees either have experience with R or come with a buddy who knows R.

Outcomes:

  • Understand the strengths and limitations of observational and experimental research.
  • Learn how to do power calculations via simulation to plan experiments in R.
  • Learn how to implement experiments using Python and PlanOut, an open-source toolkit for online experimentation.
  • Learn about common pitfalls and best practices for deploying, logging, and analyzing experiments.
  • Learn how to integrate experimentation into Web applications, and analyze the results using R.

Software requirements

  • R
  • Python (all Macs and Linux computers come with this)
  • PlanOut (type pip install planout at your terminal to install)
  • Flask (type pip install flask at your terminal to install)
  • Sample R and python code (to be available on github).
  • Optional: IPython and Pandas (comes with Anaconda).

Outline

Slides and code will be posted shortly before the tutorial.

Part 1: Experiments, causal inference, and planning.

1. Introduction.

  • Why run experiments
  • The process of running an experiment
  • Causal inference
  • Estimation

2. Planning Experiments

  • Gathering data
  • Power calculations for idealized data
  • Power calculations and simulation for real data
    • The bootstrap
    • Generative models and simulation
    • Using covariates in power analysis

Part 2: Implementing and analyzing experiments.

3. Designing and implementing experiments with PlanOut

  • Introduction to PlanOut
  • Designing experiments: A/B tests, factorial designs, within-subjects designs, pre-stratification, stepped wedge designs, and more complex designs
  • Logging: exposure logging and event logging
  • Examples of experiments using PlanOut
  • Building your own Web-based experiment

4. Analyzing experimental data

  • Extracting outcomes from log data
  • Working with large datasets
  • Dependence in data

References

Here are a few links to papers and books that we discuss or draw upon in the tutorial.

Methodology

Online experiments discussed