Bayesian AB Testing from the book

Example 1: web site conversions

Assume following stats:

visitors_to_A = 1300
visitors_to_B = 1275
conversions_from_A = 120
conversions_from_B = 125

We’ll model the probability of conversion given site A, or site B. Since we are modelling a probability, a good choice for a prior distribution is the Beta.

Our number of visitors and conversion data are binomial: 1300 trials and 120 successes. Beta and binomial are conjugate so we have an analytical solution.

if prior is beta(a, b) and I observe N trials and X successes, then the posterior is: Beta(a + X , B + N - X) for example: Beta ( 1 + 1300, 1 + 120 - 1300)

Then we can just sample from the Beta’s for both A and B to compare:

samples = 20000 # We want this to be large to get a better approximation.
samples_posterior_A = posterior_A.rvs(samples)
samples_posterior_B = posterior_B.rvs(samples)
print( np.sum(samples_posterior_A > samples_posterior_B)/len(samples_posterior_A   ))

Expected revenue: add a loss function

Assume:

$E[R] = 79p_{79} + 49p_{49} + 25p_{25} + 0p_{0}$

Where $p_{79}$ is the probability of selecting the $79 pricing plan.

the four p values have to add up to 1.

There is a generalization of the binomial distribution called the multinomial distribution. Here we set probably vector P to specify the probability that an individual would fall into specific bucket.

For our signup page, our observables forrow a multinomial distribution, where we do not know the values of the probability vector P.

There is a generalization of the Beta distribution as well. It is called the Dirihlet distribution. It returns a vector of positive values that sum to 1. The length of this vector is determined by the length of an input vector.

Luckily, we have a relationship between the Dirichlet and multinomial distributions similar to that between the Beta and the binomial.

Posterior is : $Dirichlet(1 +N_{1} , 1 + N_{2}, … , 1 + N_{m})$

N = 1000
N_79 = 10
N_49 = 46
N_25 = 80
N_0 = N - (N_79 + N_49 + N_49)
observations = np.array([N_79, N_49, N_25, N_0])
prior_parameters = np.array([1,1,1,1])
posterior_samples = dirichlet(prior_parameters + observations,
                              size=10000)
print("Two random samples from the posterior:")
print (posterior_samples[0])
print (posterior_samples[1])

Far beyond the t-test

Data is about length of time spent on webpage A vs. B and we wish to see on which page users spent more time.

We have five unknowns: the means and stds for the two distributions as well as a v parameter specific for t-test that models how likely we are to see outliers.

Empirical Bayes book (kindle)

Batting average is the H / AB is best modeled as a binomial as its number of successes out of total, and we can represent the prior expectations with a Beta distribution. **The Beta distribution is representing a probability distribution of probabilities **

Beta posterior

Nice feature of Beta is that the expected values is just alpha / (alpha + beta)

Empirical part

estimated Beta from historical data

library(dplyr)
library(tidyr)
library(Lahman)

# Filter out pitchers
career <- Batting %>%
  filter(AB > 0) %>%
  anti_join(Pitching, by = "playerID") %>%
  group_by(playerID) %>%
  summarize(H = sum(H), AB = sum(AB)) %>%
  mutate(average = H / AB)

# Include names along with the player IDs
career <- People %>%
  tbl_df() %>%
  dplyr::select(playerID, nameFirst, nameLast) %>%
  unite(name, nameFirst, nameLast, sep = " ") %>%
  inner_join(career, by = "playerID") %>%
  dplyr::select(-playerID)

career


library(stats4)

career_filtered <- career %>%
  filter(AB > 500)

# log-likelihood function
ll <- function(alpha, beta) {
  x <- career_filtered$H
  total <- career_filtered$AB
  -sum(VGAM::dbetabinom.ab(x, total, alpha, beta, log = TRUE))
}

# maximum likelihood estimation
m <- mle(ll, start = list(alpha = 1, beta = 10), method = "L-BFGS-B",
         lower = c(0.0001, .1))
ab <- coef(m)

career_eb <- career %>%
  mutate(eb_estimate = (H + alpha0) / (AB + alpha0 + beta0))

There are two steps in empiricial Bayes estimation:

estimate the overall distribution of your data
Use that distribution as your prior for estimating each average

“Empirical Bayes shrinkage towards a Beta prior.”