Making your own data

Overview

  • How PlanOut logs data
  • Flow for loading and analyzing data
  • Putting it all together: simulated web app and example analysis
In [1]:
%load_ext rpy2.ipython
from planout.ops.random import *
from planout.experiment import SimpleExperiment
import pandas as pd
import json
import random
In [2]:
%%R
library(dplyr)
Attaching package: ‘dplyr’

The following object is masked from ‘package:stats’:

    filter

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Logging

Log files

Create a new experiment and get a randomized assignment

In [3]:
class LoggedExperiment(SimpleExperiment):
    def assign(self, params, userid):
        params.x = UniformChoice(
            choices=["What's on your mind?", "Say something."],
            unit=userid
        )
        params.y = BernoulliTrial(p=0.5, unit=userid)

print LoggedExperiment(userid=8).get('x')
Say something.
  • Then open your terminal, navigate to the directory this notebook is in, and type:
> tail -f LoggedExperiment.log
  • By default, SimpleExperiment logs to a file whose name is the class name of your experiment.

Exposure logging

  • Parameter assignments are logged automatically the first time you retrieve a parameter
  • Logger can be configured to do caching, write to databases, etc.
In [4]:
e = LoggedExperiment(userid=7)
print e.get('x')
print e.get('y')
What's on your mind?
1

Manual exposure logging

Calling log_exposure() will force PlanOut to log an exposure event. You can optionally pass in additional data.

In [5]:
e.log_exposure()
e.log_exposure({'endpoint': 'home.py'})

Event logging

You can also log arbitrary events. The first argument to log_event() is a required parameter that specifies the event type.

In [6]:
e.log_event('post_status_update')
e.log_event('post_status_update', {'type': 'photo'})

Custom logging

  • Logging method is configurable, can be used to log to databases, etc.
In [7]:
class CustomLoggedExperiment(SimpleExperiment):
    def assign(self, params, userid):
        params.x = UniformChoice(
            choices=["What's on your mind?", "Say something."],
            unit=userid
        )
        params.y = BernoulliTrial(p=0.5, unit=userid)

    def log(self, data):
        print json.dumps(data)
        
e = CustomLoggedExperiment(userid=7)
print e.get('x')
{"inputs": {"userid": 7}, "name": "CustomLoggedExperiment", "params": {"y": 1, "x": "What's on your mind?"}, "time": 1433000235, "salt": "CustomLoggedExperiment", "event": "exposure"}
What's on your mind?

Putting it all together

  • Hypothetical experiment looks at the effect of sorting a music album's songs by popularity
  • Simulate components of a music store experiment
    • Experiment definition (PlanOut)
    • Code to render the web page
    • Code to handle item purchases (this logs the "conversion" event)
    • Code to simulate the process of users' purchase decision-making
    • A loop that simulates many users viewing many albums

Experiment definition

In [8]:
class MusicExperiment(SimpleExperiment):
    def assign(self, params, userid):
        params.sort_by_rating = BernoulliTrial(p=0.2, unit=userid)

Rendering web page

In [9]:
def get_price(albumid):
    "look up the price of an album"
    # this would realistically hook into a database
    return 11.99
In [10]:
def render_webpage(userid, albumid):
    'simulated web page rendering function'
    
    # get experiment for the given user / album pair.
    e = MusicExperiment(userid=userid)
    
    # use log_exposure() so that we can also record the price
    e.log_exposure({'price': get_price(albumid), 'albumid': albumid})
    
    # use a default value with get() in production settings, in case
    # your experimentation system goes down
    if e.get('sort_by_rating', False):
        songs = "some sorted songs" # this would sort the songs by rating
    else:
        songs = "some non-sorted songs"
    
    html = "some HTML code involving %s" % songs  # most valid html ever.
    # render html

Logging outcomes

In [11]:
def handle_purchase(userid, albumid):
    'handles purchase of an album'
    e = MusicExperiment(userid=userid)
    e.log_event('purchase', {'price': get_price(albumid), 'albumid': albumid})
    # start album download

Generative model of user decision making

In [12]:
def simulate_user_decision(userid, albumid):
    'simulate user experience'
    # This function should be thought of as simulating a users' decision-making
    # process for the given stimulus - and so we don't actually want to do any
    # logging here.
    e = MusicExperiment(userid=userid)
    e.set_auto_exposure_logging(False)  # turn off auto-logging
    
    # users with sorted songs have a higher purchase rate
    if e.get('sort_by_rating'):
        prob_purchase = 0.15
    else:
        prob_purchase = 0.10
    
    # make purchase with probability prob_purchase
    return random.random() < prob_purchase

Running the simulation

In [13]:
# We then simulate 500 users' visitation to 20 albums, and their decision to purchase
random.seed(0)
for u in xrange(500):
    for a in xrange(20):
        render_webpage(u, a)
        if simulate_user_decision(u, a):
            handle_purchase(u, a)

Analyzing your experiment

Standard analysis procedure

  • Data is logged to JSON.
  • Use a script to flatten file into tabular format
  • Join exposure data with outcome data
  • Analyze results
In [14]:
# stolen from http://stackoverflow.com/questions/23019119/converting-multilevel-nested-dictionaries-to-pandas-dataframe
from collections import OrderedDict
def flatten(d):
    "Flatten an OrderedDict object"
    result = OrderedDict()
    for k, v in d.items():
        if isinstance(v, dict):
            result.update(flatten(v))
        else:
            result[k] = v
    return result
In [15]:
def log2csv(filename):
    raw_log_data = [json.loads(i) for i in open(filename)]
    log_data = pd.DataFrame.from_dict([flatten(i) for i in raw_log_data])
    log_data.to_csv(filename[:-4] + '.csv', index=False)
In [16]:
log2csv('MusicExperiment.log')
In [17]:
%%R
log.data <- read.csv('MusicExperiment.csv')
print(log.data %>% sample_n(10))
      albumid    event            name price            salt sort_by_rating
2555       10 exposure MusicExperiment 11.99 MusicExperiment              0
10170       7 exposure MusicExperiment 11.99 MusicExperiment              0
9558       17 exposure MusicExperiment 11.99 MusicExperiment              0
3905        2 exposure MusicExperiment 11.99 MusicExperiment              0
9669       16 exposure MusicExperiment 11.99 MusicExperiment              0
7669       16 purchase MusicExperiment 11.99 MusicExperiment              1
6641        4 exposure MusicExperiment 11.99 MusicExperiment              0
120        13 exposure MusicExperiment 11.99 MusicExperiment              0
9347        2 exposure MusicExperiment 11.99 MusicExperiment              0
2937       18 exposure MusicExperiment 11.99 MusicExperiment              0
            time userid
2555  1433000236    114
10170 1433000237    457
9558  1433000237    429
3905  1433000236    175
9669  1433000237    434
7669  1433000237    344
6641  1433000236    298
120   1433000235      5
9347  1433000237    420
2937  1433000236    131
In [18]:
%%R
log.data %>%
  group_by(event) %>%
  summarise(n=n())
Source: local data frame [2 x 2]

     event     n
1 exposure 10000
2 purchase  1127

Exposure data

We first extract all user-album pairs that were exposed to an experiemntal treatment, and their parameter assignments.

In [19]:
%%R

exposures <- log.data %>%
  filter(event == 'exposure') %>%
  group_by(userid, albumid, sort_by_rating) %>%
  summarise(first_exposure_time=min(time))
head(exposures)
Source: local data frame [6 x 4]
Groups: userid, albumid

  userid albumid sort_by_rating first_exposure_time
1      0       0              0          1433000235
2      0       1              0          1433000235
3      0       2              0          1433000235
4      0       3              0          1433000235
5      0       4              0          1433000235
6      0       5              0          1433000235

Outcome data

In [20]:
conversions = log_data[log_data.event=='purchase'][['userid', 'albumid','price']]
df = pd.merge(unique_exposures, conversions, on=['userid', 'albumid'], how='left')
df['purchased'] = df.price.notnull()
df['revenue'] = df.purchased * df.price.fillna(0)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-20-771b8fa4cdfd> in <module>()
----> 1 conversions = log_data[log_data.event=='purchase'][['userid', 'albumid','price']]
      2 df = pd.merge(unique_exposures, conversions, on=['userid', 'albumid'], how='left')
      3 df['purchased'] = df.price.notnull()
      4 df['revenue'] = df.purchased * df.price.fillna(0)

NameError: name 'log_data' is not defined
In [21]:
%%R
conversions <- log.data %>%
  filter(event == 'purchase') %>%
  group_by(userid, albumid, price) %>%
  summarise(purchase_time=min(time))
head(conversions)
Source: local data frame [6 x 4]
Groups: userid, albumid

  userid albumid price purchase_time
1      1      15 11.99    1433000235
2      2       0 11.99    1433000235
3      2      12 11.99    1433000235
4      3      15 11.99    1433000235
5      4      17 11.99    1433000235
6      5      12 11.99    1433000235

Joining treatments with outcomes

In [22]:
%%R
all <- left_join(exposures, conversions, by=c('userid', 'albumid')) %>%
  mutate(
    purchased=!is.na(purchase_time),
    revenue=ifelse(purchased, price, 0)
  ) %>%
  select(userid, albumid, sort_by_rating, price, purchased, revenue)
head(all)
Source: local data frame [6 x 6]
Groups: userid, albumid

  userid albumid sort_by_rating price purchased revenue
1      0       0              0    NA     FALSE       0
2      0       1              0    NA     FALSE       0
3      0       2              0    NA     FALSE       0
4      0       3              0    NA     FALSE       0
5      0       4              0    NA     FALSE       0
6      0       5              0    NA     FALSE       0

Analyzing the experimental results

We successfully recover the purchase probability treatment effect

In [23]:
%%R
all %>%
  group_by(sort_by_rating) %>%
  summarise(
      prob.purchase=mean(purchased),
      avg.revenue=mean(revenue),
      n=n()
  )
Source: local data frame [2 x 4]

  sort_by_rating prob.purchase avg.revenue    n
1              0     0.1035802    1.241927 8100
2              1     0.1515789    1.817432 1900

Analyzing the experimental results

For the given $p$ and $N$, OLS gives us quick and reasonable approximation of the SE for our ATE.

In [24]:
%%R
print(summary(lm(purchased ~ sort_by_rating, data=all)))
Call:
lm(formula = purchased ~ sort_by_rating, data = all)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.1516 -0.1036 -0.1036 -0.1036  0.8964 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.103580   0.003508  29.529  < 2e-16 ***
sort_by_rating 0.047999   0.008047   5.965 2.54e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3157 on 9998 degrees of freedom
Multiple R-squared:  0.003546,	Adjusted R-squared:  0.003446 
F-statistic: 35.58 on 1 and 9998 DF,  p-value: 2.536e-09

In [25]:
%%R
print(summary(lm(revenue ~ sort_by_rating, data=all)))
Call:
lm(formula = revenue ~ sort_by_rating, data = all)

Residuals:
   Min     1Q Median     3Q    Max 
-1.817 -1.242 -1.242 -1.242 10.748 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     1.24193    0.04206  29.529  < 2e-16 ***
sort_by_rating  0.57550    0.09649   5.965 2.54e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.785 on 9998 degrees of freedom
Multiple R-squared:  0.003546,	Adjusted R-squared:  0.003446 
F-statistic: 35.58 on 1 and 9998 DF,  p-value: 2.536e-09

Exercise: How would you analyze the selective exposure experiment?

  • Think about:
    • What data would you need to log?
    • How would you verify that your data looks sane?
    • How would you measure your treatment effect?