Making your own data


  • How PlanOut logs data
  • Flow for loading and analyzing data
  • Putting it all together: simulated web app and example analysis
%load_ext rpy2.ipython
from planout.ops.random import *
from planout.experiment import SimpleExperiment
import pandas as pd
import json
import random
Log files

Create a new experiment and get a randomized assignment

class LoggedExperiment(SimpleExperiment):
    def assign(self, params, userid):
        params.x = UniformChoice(
            choices=["What's on your mind?", "Say something."],
        params.y = BernoulliTrial(p=0.5, unit=userid)

print LoggedExperiment(userid=8).get('x')
Say something.
  • Then open your terminal, navigate to the directory this notebook is in, and type:
> tail -f LoggedExperiment.log
  • By default, SimpleExperiment logs to a file whose name is the class name of your experiment.

Exposure logging

  • Parameter assignments are logged automatically the first time you retrieve a parameter
  • Logger can be configured to do caching, write to databases, etc.
e = LoggedExperiment(userid=7)
print e.get('x')
print e.get('y')
What's on your mind?

Manual exposure logging

Calling log_exposure() will force PlanOut to log an exposure event. You can optionally pass in additional data.

e.log_exposure({'endpoint': ''})

Event logging

You can also log arbitrary events. The first argument to log_event() is a required parameter that specifies the event type.

e.log_event('post_status_update', {'type': 'photo'})

Custom logging

  • Logging method is configurable, can be used to log to databases, etc.
class CustomLoggedExperiment(SimpleExperiment):
    def assign(self, params, userid):
        params.x = UniformChoice(
            choices=["What's on your mind?", "Say something."],
        params.y = BernoulliTrial(p=0.5, unit=userid)

    def log(self, data):
        print json.dumps(data)
e = CustomLoggedExperiment(userid=7)
print e.get('x')
{"inputs": {"userid": 7}, "name": "CustomLoggedExperiment", "params": {"y": 1, "x": "What's on your mind?"}, "time": 1433000235, "salt": "CustomLoggedExperiment", "event": "exposure"}
What's on your mind?

Putting it all together

  • Hypothetical experiment looks at the effect of sorting a music album's songs by popularity
  • Simulate components of a music store experiment
    • Experiment definition (PlanOut)
    • Code to render the web page
    • Code to handle item purchases (this logs the "conversion" event)
    • Code to simulate the process of users' purchase decision-making
    • A loop that simulates many users viewing many albums

Experiment definition

class MusicExperiment(SimpleExperiment):
    def assign(self, params, userid):
        params.sort_by_rating = BernoulliTrial(p=0.2, unit=userid)

Rendering web page

def get_price(albumid):
    "look up the price of an album"
    # this would realistically hook into a database
    return 11.99
def render_webpage(userid, albumid):
    'simulated web page rendering function'
    # get experiment for the given user / album pair.
    e = MusicExperiment(userid=userid)
    # use log_exposure() so that we can also record the price
    e.log_exposure({'price': get_price(albumid), 'albumid': albumid})
    # use a default value with get() in production settings, in case
    # your experimentation system goes down
    if e.get('sort_by_rating', False):
        songs = "some sorted songs" # this would sort the songs by rating
        songs = "some non-sorted songs"
    html = "some HTML code involving %s" % songs  # most valid html ever.
    # render html

Logging outcomes

def handle_purchase(userid, albumid):
    'handles purchase of an album'
    e = MusicExperiment(userid=userid)
    e.log_event('purchase', {'price': get_price(albumid), 'albumid': albumid})
    # start album download

Generative model of user decision making

def simulate_user_decision(userid, albumid):
    'simulate user experience'
    # This function should be thought of as simulating a users' decision-making
    # process for the given stimulus - and so we don't actually want to do any
    # logging here.
    e = MusicExperiment(userid=userid)
    e.set_auto_exposure_logging(False)  # turn off auto-logging
    # users with sorted songs have a higher purchase rate
    if e.get('sort_by_rating'):
        prob_purchase = 0.15
        prob_purchase = 0.10
    # make purchase with probability prob_purchase
    return random.random() < prob_purchase

Running the simulation

# We then simulate 500 users' visitation to 20 albums, and their decision to purchase
for u in xrange(500):
    for a in xrange(20):
        render_webpage(u, a)
        if simulate_user_decision(u, a):
            handle_purchase(u, a)

Analyzing your experiment

Standard analysis procedure

  • Data is logged to JSON.
  • Use a script to flatten file into tabular format
  • Join exposure data with outcome data
  • Analyze results
# stolen from
from collections import OrderedDict
def flatten(d):
    "Flatten an OrderedDict object"
    result = OrderedDict()
    for k, v in d.items():
        if isinstance(v, dict):
            result[k] = v
    return result
def log2csv(filename):
    raw_log_data = [json.loads(i) for i in open(filename)]
    log_data = pd.DataFrame.from_dict([flatten(i) for i in raw_log_data])
    log_data.to_csv(filename[:-4] + '.csv', index=False)
%%R <- read.csv('MusicExperiment.csv')
print( %>% sample_n(10))
      albumid    event            name price            salt sort_by_rating
2555       10 exposure MusicExperiment 11.99 MusicExperiment              0
10170       7 exposure MusicExperiment 11.99 MusicExperiment              0
9558       17 exposure MusicExperiment 11.99 MusicExperiment              0
3905        2 exposure MusicExperiment 11.99 MusicExperiment              0
9669       16 exposure MusicExperiment 11.99 MusicExperiment              0
7669       16 purchase MusicExperiment 11.99 MusicExperiment              1
6641        4 exposure MusicExperiment 11.99 MusicExperiment              0
120        13 exposure MusicExperiment 11.99 MusicExperiment              0
9347        2 exposure MusicExperiment 11.99 MusicExperiment              0
2937       18 exposure MusicExperiment 11.99 MusicExperiment              0
            time userid
2555  1433000236    114
10170 1433000237    457
9558  1433000237    429
3905  1433000236    175
9669  1433000237    434
7669  1433000237    344
6641  1433000236    298
120   1433000235      5
9347  1433000237    420
2937  1433000236    131
%%R %>%
  group_by(event) %>%
Source: local data frame [2 x 2]

     event     n
1 exposure 10000
2 purchase  1127

Exposure data

We first extract all user-album pairs that were exposed to an experiemntal treatment, and their parameter assignments.

exposures <- %>%
  filter(event == 'exposure') %>%
  group_by(userid, albumid, sort_by_rating) %>%
Source: local data frame [6 x 4]
Groups: userid, albumid

  userid albumid sort_by_rating first_exposure_time
1      0       0              0          1433000235
2      0       1              0          1433000235
3      0       2              0          1433000235
4      0       3              0          1433000235
5      0       4              0          1433000235
6      0       5              0          1433000235

Outcome data

conversions = log_data[log_data.event=='purchase'][['userid', 'albumid','price']]
df = pd.merge(unique_exposures, conversions, on=['userid', 'albumid'], how='left')
df['purchased'] = df.price.notnull()
df['revenue'] = df.purchased * df.price.fillna(0)
conversions <- %>%
  filter(event == 'purchase') %>%
  group_by(userid, albumid, price) %>%
Source: local data frame [6 x 4]
Groups: userid, albumid

  userid albumid price purchase_time
1      1      15 11.99    1433000235
2      2       0 11.99    1433000235
3      2      12 11.99    1433000235
4      3      15 11.99    1433000235
5      4      17 11.99    1433000235
6      5      12 11.99    1433000235

Joining treatments with outcomes

all <- left_join(exposures, conversions, by=c('userid', 'albumid')) %>%
    revenue=ifelse(purchased, price, 0)
  ) %>%
  select(userid, albumid, sort_by_rating, price, purchased, revenue)
Source: local data frame [6 x 6]
Groups: userid, albumid

  userid albumid sort_by_rating price purchased revenue
1      0       0              0    NA     FALSE       0
2      0       1              0    NA     FALSE       0
3      0       2              0    NA     FALSE       0
4      0       3              0    NA     FALSE       0
5      0       4              0    NA     FALSE       0
6      0       5              0    NA     FALSE       0

Analyzing the experimental results

We successfully recover the purchase probability treatment effect

all %>%
  group_by(sort_by_rating) %>%
Source: local data frame [2 x 4]

  sort_by_rating prob.purchase avg.revenue    n
1              0     0.1035802    1.241927 8100
2              1     0.1515789    1.817432 1900

Analyzing the experimental results

For the given $p$ and $N$, OLS gives us quick and reasonable approximation of the SE for our ATE.

print(summary(lm(purchased ~ sort_by_rating, data=all)))
lm(formula = purchased ~ sort_by_rating, data = all)

    Min      1Q  Median      3Q     Max 
-0.1516 -0.1036 -0.1036 -0.1036  0.8964 

               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.103580   0.003508  29.529  < 2e-16 ***
sort_by_rating 0.047999   0.008047   5.965 2.54e-09 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3157 on 9998 degrees of freedom
Multiple R-squared:  0.003546,	Adjusted R-squared:  0.003446 
F-statistic: 35.58 on 1 and 9998 DF,  p-value: 2.536e-09

print(summary(lm(revenue ~ sort_by_rating, data=all)))
lm(formula = revenue ~ sort_by_rating, data = all)

   Min     1Q Median     3Q    Max 
-1.817 -1.242 -1.242 -1.242 10.748 

               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     1.24193    0.04206  29.529  < 2e-16 ***
sort_by_rating  0.57550    0.09649   5.965 2.54e-09 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.785 on 9998 degrees of freedom
Multiple R-squared:  0.003546,	Adjusted R-squared:  0.003446 
F-statistic: 35.58 on 1 and 9998 DF,  p-value: 2.536e-09

Exercise: How would you analyze the selective exposure experiment?

  • Think about:
    • What data would you need to log?
    • How would you verify that your data looks sane?
    • How would you measure your treatment effect?