Experimental Design with PlanOut¶

Outline¶

A simple A/B test
How PlanOut works
Recipes:
- Factorial designs
- Cluster-randomized experiments
- Experiments with multiple units

Loading up PlanOut¶

This imports basic operators for doing random assignment and SimpleExperiment, the base class for logging

In [1]:

%pylab inline
from planout.ops.random import *
from planout.experiment import SimpleExperiment
import pandas as pd

Populating the interactive namespace from numpy and matplotlib

Anatomy of a simple A/B test¶

Define new experiments by subclassing SimpleExperiment
Implement logic in assign() method.
Use params object to perform random assignments

In [2]:

class VotingExperiment(SimpleExperiment):
  def assign(self, params, userid):
    params.button_text = UniformChoice(
      choices=["I'm a voter", "I'm voting"],
      unit=userid
    )

Using PlanOut¶

Get randomized assignments for input units by creating instances of the class.

In [3]:

e = VotingExperiment(userid=212)
print e.get('button_text')

I'm a voter

Using PlanOut¶

Here are the assignments for 10 userids.

In [4]:

for i in xrange(10):
    e = VotingExperiment(userid=i)
    print "user %s sees: %s" % (i, e.get('button_text'))

user 0 sees: I'm voting
user 1 sees: I'm voting
user 2 sees: I'm voting
user 3 sees: I'm a voter
user 4 sees: I'm voting
user 5 sees: I'm a voter
user 6 sees: I'm voting
user 7 sees: I'm a voter
user 8 sees: I'm voting
user 9 sees: I'm voting

To check to see that the experiment is doing what we expect it to, we can simulate assignments for many userids and construct a dataframe with all of the assignments:

In [5]:

e.get_params()

Out[5]:

{'button_text': "I'm voting"}

In [6]:

sim_users = [VotingExperiment(userid=i).get_params() for i in xrange(10000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['button_text']).size()

button_text
I'm a voter    5000
I'm voting     5000
dtype: int64

How random assignment works in PlanOut¶

PlanOut uses deterministic hashing to perform random assignments
Unless otherwise specified, all random assignments are independent.
Ensures that:
- The same units (e.g., user ids) get mapped to different values for different experiments or parameters.
- Assignments are as good as random.

Underneath the hood¶

PlanOut computes a hash that looks like:

  f(SHA1(experiment_name.parameter_name.unit_id))

So that:

In [7]:

class RandomExample1(SimpleExperiment):
  def assign(self, params, userid):
    params.x = UniformChoice(choices=[0, 1], unit=userid)
    params.y = UniformChoice(choices=['a','b'], unit=userid)

uses a hash like:

  SHA1(RandomExample1.x.4) % 2

to select the value for x when the given userid is 4.

Parameter-level salts¶

In [8]:

class RandomExample1(SimpleExperiment):
  def assign(self, params, userid):
    params.x = UniformChoice(choices=[0, 1], unit=userid)
    params.y = UniformChoice(choices=['a','b'], unit=userid)
    
sim_users = [RandomExample1(userid=i).get_params() for i in xrange(2000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['x', 'y']).size()

x  y
0  a    531
   b    475
1  a    478
   b    516
dtype: int64

Automatic salting¶

PlanOut implicitly uses parameter name as salt:

In [9]:

class RandomExample1(SimpleExperiment):
  def assign(self, params, userid):
    params.x = UniformChoice(choices=[0, 1], unit=userid, salt='x')
    params.y = UniformChoice(choices=['a','b'], unit=userid, salt='y')
    
sim_users = [RandomExample1(userid=i).get_params() for i in xrange(2000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['x', 'y']).size()

x  y
0  a    531
   b    475
1  a    478
   b    516
dtype: int64

Changing the salts change the assignments:¶

In [10]:

class RandomExample1(SimpleExperiment):
  def assign(self, params, userid):
    params.x = UniformChoice(choices=[0, 1], unit=userid, salt='x2')
    params.y = UniformChoice(choices=['a','b'], unit=userid, salt='y2')
    
sim_users = [RandomExample1(userid=i).get_params() for i in xrange(2000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['x', 'y']).size()

x  y
0  a    507
   b    493
1  a    506
   b    494
dtype: int64

Correlated assignments¶

Parameters with the same salt will have correlated assignments.

In [11]:

class RandomExample1(SimpleExperiment):
  def assign(self, params, userid):
    params.x = UniformChoice(choices=[0, 1], unit=userid, salt='x')
    params.y = UniformChoice(choices=['a','b'], unit=userid, salt='x')
    
sim_users = [RandomExample1(userid=i).get_params() for i in xrange(2000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['x', 'y']).size()

x  y
0  a    1006
1  b     994
dtype: int64

Experiment-level salts¶

Using experiment name in salt ensures indepence between experiments
By default, experiment class names are used as experiment-level salts

In [12]:

class RandomExample1(SimpleExperiment):
  def assign(self, params, userid):
    params.x = UniformChoice(choices=[0, 1], unit=userid)
    params.y = UniformChoice(choices=['a','b'], unit=userid)
    
class RandomExample2(SimpleExperiment):
  def assign(self, params, userid):
    params.x = UniformChoice(choices=[4, 8], unit=userid)
    params.y = UniformChoice(choices=['m','n'], unit=userid)

sim_users = [RandomExample1(userid=i).get_params() for i in xrange(4000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['x', 'y']).size()

sim_users = [RandomExample2(userid=i).get_params() for i in xrange(4000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['x', 'y']).size()

x  y
0  a    1009
   b     962
1  a    1004
   b    1025
dtype: int64
x  y
4  m    1017
   n     979
8  m    1026
   n     978
dtype: int64

Custom experiment-level salts¶

In [13]:

class RandomExample1(SimpleExperiment):
  def setup(self):
        self.salt = 'RandomExample2'

  def assign(self, params, userid):
    params.x = UniformChoice(choices=[0, 1], unit=userid)
    params.y = UniformChoice(choices=['a','b'], unit=userid)

sim_users = [RandomExample2(userid=i).get_params() for i in xrange(4000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['x', 'y']).agg(len)

x  y
4  m    1017
   n     979
8  m    1026
   n     978
dtype: int64

Random assignment with multiple units¶

When multiple units are used, units are concatinated,
e.g., if userid=4 and url='http://news.ycombinator.com', the hash would be:

    f(SHA1('RandomExperiment1.show_thumbnail.6.http://news.ycombinator.com'))

In [14]:

class RandomExample1(SimpleExperiment):
  def assign(self, params, userid, url):
    params.show_thumbnail = BernoulliTrial(p=0.15, unit=[userid, url])

RandomExample1(userid=6, url='http://news.ycombinator.com').get('show_thumbnail')

Out[14]:

Order matters¶

In [15]:

class RandomExample1(SimpleExperiment):
  def assign(self, params, userid, url):
    params.show_thumbnail = BernoulliTrial(p=0.15, unit=[url, userid])

RandomExample1(userid=6, url='http://news.ycombinator.com').get('show_thumbnail')

Out[15]:

Recipes for experimental designs¶

Factorial designs¶

Parameter assignments (by default) are performed independently, so a 2x2 design can be constructed simply by setting another parameter.

In [16]:

class VotingExperiment(SimpleExperiment):
  def assign(self, params, userid):
    params.button_text = UniformChoice(
      choices=["I'm a voter", "I'm voting"],
      unit=userid
    )
    params.has_social_cues = UniformChoice(
      choices=[1, 0],
      unit=userid
    )

A few examples of assignments produced by the data...¶

In [17]:

sim_users = [VotingExperiment(userid=i).get_params() for i in xrange(10000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments[:5]

   button_text  has_social_cues
0   I'm voting                1
1   I'm voting                0
2   I'm voting                1
3  I'm a voter                0
4   I'm voting                1

Crosstabs:¶

In [18]:

print assignments.groupby(['button_text', 'has_social_cues']).size()

button_text  has_social_cues
I'm a voter  0                  2485
             1                  2515
I'm voting   0                  2447
             1                  2553
dtype: int64

Unequal probability assignment with WeightedChoice¶

The WeightedChoice operator lets you select choices with different probabilities.

In [19]:

class VotingExperiment(SimpleExperiment):
  def assign(self, params, userid):
    params.button_text = UniformChoice(
      choices=["I'm a voter", "I'm voting"],
      unit=userid
    )
    params.has_social_cues = WeightedChoice(
      choices=[1, 0],
      weights=[8, 2],
      unit=userid
    )

Crosstabs¶

In [20]:

sim_users = [VotingExperiment(userid=i).get_params() for i in xrange(2000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['button_text', 'has_social_cues']).size()

button_text  has_social_cues
I'm a voter  0                  192
             1                  797
I'm voting   0                  196
             1                  815
dtype: int64

Arbitrary cell probabilites in a factorial design¶

Consider the case where there are organizational factors that limit the size of your experiment. For example, if in a hypothetical get out the vote experiment at Facebook only 5% of the users could be used as part of an experiment used to evaluate the effects of various encouragements to vote.

Because of this, the experimenters decided they wanted to assign users with a matrix of probabilities like:

$$ \left[ \begin{array}{ l c c } & \text{has megaphone} & \text{no megaphone} \\ \text{has feed} & 0.95 & 0.015 \\ \text{no feed} & 0.02 & 0.015 \end{array} \right]$$

There are multiple ways to do this.

Method 1: Conditional probabilities¶

Method 1 first assigns users to see or not see the megaphone with probability 0.97. Half of those 3% are then assigned to see feed stories about voting, while 98% of those who see megaphones are assigned to the see the stories. This produces the assignment probabilities given about.

In [21]:

class VotingExperiment(SimpleExperiment):
  def assign(self, params, userid):
    params.has_megaphone = BernoulliTrial(
      p=0.97,
      unit=userid
    )
    cond_probs = [0.5, 0.98]
    params.has_feed = BernoulliTrial(
      p=cond_probs[params.has_megaphone],
      unit=userid
    )

In [22]:

sim_users = [VotingExperiment(userid=i).get_params() for i in xrange(10000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['has_feed', 'has_megaphone']).size()

has_feed  has_megaphone
0         0                 147
          1                 198
1         0                 142
          1                9513
dtype: int64

Method 2: Binary encoding of cell positions¶

Method 2 selects the four conditions with their respective probabilities from a flat list. It then picks out elements of an array corresponding to the cell positions. For a very small number of subjects, this method will produce slightly more imbalance, but for larger N it will acheive similar levels of balance compared to method 1.

In [23]:

class VotingExperiment(SimpleExperiment):
  def assign(self, params, userid):
    params.idx = WeightedChoice(
      choices=[0, 1, 2, 3],
      weights=[0.95, 0.02, 0.015, 0.015],
      unit=userid
    )
    params.has_megaphone = [1,1,0,0][params.idx]
    params.has_feed = [1,0,1,0][params.idx]

Crosstabs¶

In [24]:

sim_users = [VotingExperiment(userid=i).get_params() for i in xrange(10000)]
assignments = pd.DataFrame.from_dict(sim_users)
print assignments.groupby(['has_feed', 'has_megaphone']).size()

has_feed  has_megaphone
0         0                 156
          1                 209
1         0                 147
          1                9488
dtype: int64

More than two factor levels¶

Small changes in appearance can have significant effects on individual behavior. Google is infamous for testing 41 different shades of blue for their link colors. Let's implement that experiment.

In [25]:

class ColorExperiment(SimpleExperiment):
  def assign(self, params, userid):
    params.blue_value = RandomInteger(
        min=215,
        max=255,
        unit=userid
    )
    params.link_color = '#0000%s' % format(params.blue_value, '02x')

In [26]:

ColorExperiment(userid=10).get_params()

Out[26]:

{'blue_value': 247, 'link_color': '#0000f7'}

Sample data¶

In [27]:

sim_users = [ColorExperiment(userid=i).get_params() for i in xrange(20000)]
assignments = pd.DataFrame.from_dict(sim_users)

In [28]:

assignments[:5]

Out[28]:

	blue_value	link_color
0	231	#0000e7
1	248	#0000f8
2	242	#0000f2
3	230	#0000e6
4	239	#0000ef

In [29]:

assignments['blue_value'].hist(bins=41);

Within-subjects designs (simple random assignment)¶

In some cases you might want to assign user-item pairs or user-session pairs to parameters. You can do this by simply passing more units into assign() and applying multiple units.

In [30]:

class SelectiveExposureExperiment(SimpleExperiment):
  def assign(self, params, subjectid, url):
    params.has_source_cues = BernoulliTrial(p=0.5, unit=subjectid)
    params.has_social_cues = BernoulliTrial(p=0.5, unit=subjectid)
    
    if params.has_source_cues:
      sources = ['Fox News','USA Today', 'CNN', 'MSNBC']
      params.source = UniformChoice(
        choices=sources,
        unit=[subjectid, url]
      )

Clustered randomized assignment¶

Random assignment can also occur at the cluster level. For example, a course on MOOCs might randomize classes into treatment conditions. Here, course ID, rather than the student ID, is used for random assignment. Note that while the student ID is not actually used for random assignment, it is still useful to pass it in to the experiment object for logging purposes.

In [33]:

class BadgeExperiment(SimpleExperiment):
  def assign(self, params, studentid, courseid):
    params.has_faces = BernoulliTrial(p=0.8, unit=courseid)

Stepped-wedge design¶

Stepped wedge designs expose subjects or clusters of subjects to treatments at a certain random point in time. This can improve precision when there are time-correlated exogenous events that may affect subjects' behaviors, or when there are constraints on how rapidly you can roll out treatments.

In [34]:

class BadgeExperiment(SimpleExperiment):
  def assign(self, params, studentid, courseid, week):
    params.rollout_week = RandomInteger(
      min=0,
      max=8,
      unit=courseid
    )
    if week >= params.rollout_week:
      params.has_faces = 1
    else:
      params.has_faces = 0

Demo of time-based rollout¶

In [35]:

example_course = 17
e = BadgeExperiment(studentid=2, courseid=example_course, week=0)
rollout_week = e.get('rollout_week')
print 'The badge rollout for course %s is %s...' % (example_course, rollout_week)

for week in xrange(9):
  exp = BadgeExperiment(studentid=2, courseid=example_course, week=week)
  print 'week %s: has_faces = %s ' % (week, exp.get('has_faces'))

The badge rollout for course 17 is 2...
week 0: has_faces = 0 
week 1: has_faces = 0 
week 2: has_faces = 1 
week 3: has_faces = 1 
week 4: has_faces = 1 
week 5: has_faces = 1 
week 6: has_faces = 1 
week 7: has_faces = 1 
week 8: has_faces = 1

Let's consider a social cues experiment where an individual has a certain number of friends that we might show in association with a page, and we want to deterministically randomly sample from some subset of those friends. This is the setup from Social Influence in Social Advertising: Evidence from Field Experiments.

In [36]:

class CueExperiment(SimpleExperiment):
  def assign(self, params, userid, page, friends):
    params.num_friends_shown = RandomInteger(
      min=0,
      max=min(3, len(friends)),
      unit=[userid, page]
    )
    params.friends_shown = Sample(
      choices=friends,
      draws=params.num_friends_shown,
      unit=[userid, page]
    )

In [37]:

some_friends = ["Sean", "Cat", "Solomon", "Dean", "Annie", "Moira", "Ronald", "Mike"]

for pid in xrange(12):
    print CueExperiment(userid=6, page=pid, friends=some_friends).get('friends_shown')

['Dean', 'Solomon']
[]
['Ronald', 'Annie', 'Moira']
['Solomon', 'Ronald', 'Mike']
['Mike']
['Sean', 'Cat']
['Mike']
[]
['Mike', 'Ronald', 'Solomon']
['Solomon', 'Cat', 'Ronald']
['Sean', 'Dean', 'Mike']
['Moira', 'Mike', 'Solomon']

More complex between subjects experiment¶

How does feedback affect content production?
Use experiments to modulate feedback a focal subject receives by varying encouragements for peers to provide feedback.

In [38]:

class ContentExperiment(SimpleExperiment):
  def assign(self, params, producerid, storyid, viewerid):
    params.p = RandomFloat(
        min=0,
        max=1,
        unit=producerid
    )
    params.collapse = BernoulliTrial(
        p=params.p,
        unit=[storyid, viewerid]
    )

With a few different producers...¶

In [40]:

sim_impressions(producerid=10, storyid=2)

Out[40]:

	p	producerid	storyid	viewerid
0	0.04197	10	2	0
1	0.04197	10	2	1
2	0.04197	10	2	2
3	0.04197	10	2	3
4	0.04197	10	2	4

With a few different producers...¶

In [41]:

sim_impressions(producerid=17, storyid=2)

Out[41]:

	collapse	p	producerid	storyid	viewerid
0	0	0.155989	17	2	0
1	0	0.155989	17	2	1
2	0	0.155989	17	2	2
3	1	0.155989	17	2	3
4	0	0.155989	17	2	4

With different stories...¶

In [42]:

sim_impressions(producerid=17, storyid=4)

Out[42]:

	p	producerid	storyid	viewerid
0	0.155989	17	4	0
1	0.155989	17	4	1
2	0.155989	17	4	2
3	0.155989	17	4	3
4	0.155989	17	4	4

With different stories¶

In [43]:

sim_impressions(producerid=17, storyid=15)

Out[43]:

	collapse	p	producerid	storyid	viewerid
0	1	0.155989	17	15	0
1	1	0.155989	17	15	1
2	0	0.155989	17	15	2
3	0	0.155989	17	15	3
4	1	0.155989	17	15	4

Different units identify different effects: Like hearding¶

Assigning p based on storyid lets you study like hearding

In [44]:

class ContentExperiment(SimpleExperiment):
  def assign(self, params, producerid, storyid, viewerid):
    params.p = RandomFloat(
        min=0,
        max=1,
        unit=storyid
    )
    params.expand = BernoulliTrial(
        p=params.p,
        unit=[storyid, viewerid]
    )

Different units identify different effects: Reciprocity¶

Assigning p based on [viewerid, producerid] pairs lets you study reciprocity

In [45]:

class ContentExperiment(SimpleExperiment):
  def assign(self, params, producerid, storyid, viewerid):
    params.p = RandomFloat(
        min=0,
        max=1,
        unit=[viewerid, producerid]
    )
    params.expand = BernoulliTrial(
        p=params.p,
        unit=[storyid, viewerid]
    )

In [46]:

print(ContentExperiment(producerid=15, viewerid=10, storyid=5).get_params())
print(ContentExperiment(producerid=10, viewerid=15, storyid=5).get_params())

{'p': 0.4080478355600889, 'expand': 0}
{'p': 0.2496919900519681, 'expand': 0}

Exercise: How can you encourage dyads in both directions direction?¶

Create an experiment called ContentExperiment2 where the same p is assigned to pairs of users no matter who the viewer is
One possible solution will produce assignments that looks like:

print(ContentExperiment2(producerid=15, viewerid=10, storyid=5).get_params()) print(ContentExperiment2(producerid=10, viewerid=15, storyid=5).get_params())

Exercise: Design the selective exposure experiment in PlanOut¶

Show 3 stories from a set of 10 stories (assume stories are in a list)
Randomly assign each story to a source cue, Fox News, MSNBC, or CNN
Stories should be able to be any any of the 3 positions, and source cues should appear in any of the 3 positions

	has_social_cues	has_source_cues	shuffled_urls
0	0	0	http://reuters.com/article0.html
1	0	0	http://reuters.com/article7.html
2	0	0	http://reuters.com/article2.html
3	0	0	http://reuters.com/article6.html
4	0	0	http://reuters.com/article4.html
5	0	0	http://reuters.com/article1.html
6	0	0	http://reuters.com/article5.html
7	0	0	http://reuters.com/article3.html

Experimental Design with PlanOut¶

Outline¶

Loading up PlanOut¶

Anatomy of a simple A/B test¶

Using PlanOut¶

Using PlanOut¶

How random assignment works in PlanOut¶

Underneath the hood¶

Parameter-level salts¶

Automatic salting¶

Changing the salts change the assignments:¶

Correlated assignments¶

Experiment-level salts¶

Custom experiment-level salts¶

Random assignment with multiple units¶

Order matters¶

Recipes for experimental designs¶

Factorial designs¶

A few examples of assignments produced by the data...¶

Crosstabs:¶

Unequal probability assignment with WeightedChoice¶

Crosstabs¶

Arbitrary cell probabilites in a factorial design¶

Method 1: Conditional probabilities¶

Method 2: Binary encoding of cell positions¶

Crosstabs¶

More than two factor levels¶

Sample data¶

Within-subjects designs (simple random assignment)¶

Within-subjects designs (balanced random assignment)¶

Clustered randomized assignment¶

Stepped-wedge design¶

Demo of time-based rollout¶

Social cues experiment¶

More complex between subjects experiment¶

With a few different producers...¶

With a few different producers...¶

With different stories...¶

With different stories¶

Different units identify different effects: Like hearding¶

Different units identify different effects: Reciprocity¶

Exercise: How can you encourage dyads in both directions direction?¶

Exercise: Design the selective exposure experiment in PlanOut¶