產品數據分析熱門面試問題(中) — A/B Testing Challenge with Python

A Summary of Product Interview Questions for Data Scientist: Part 2

14 min readJun 14, 2021

產品數據分析熱門面試問題(上) — Product Case Interview Framework

A Summary of Product Interview Questions for Data Scientist: Part 1

bert272.medium.com

上篇主要總結了幾個實用的Product解題框架和一些Specific Cases的解法，中篇和下篇我會專注在A/B Testing上。中篇我會先以一個A/B Testing的DS Take-Home Challenge實例讓大家對A/B Testing的流程比較有感覺，順道複習一些統計假設和Python Libraries。

上篇：Product Case Interview Framework中篇：A/B Testing Challenge with Python
一、Challenge描述
二、Load & Check Data
三、Check Randomization of Test
四、Q1: 公司軟體應該賣$39還是$59?
五、Q2: 從數據中，你得到什麼發現跟建議？
六、Q3: 實驗是否可以提早結束？下篇：A/B Testing Interview Questions

Take-Home Challenge是一些公司在面試初期階段，會發給Candidate一份作業，然後要求Candidate在一個時間內(4小時、2天、1週都有) 做EDA (exploratory data analysis)、建Model、寫報告回答問題(形式還是根據公司而定，像我面過做Quant的話可能就叫你做個股票預測Time-Series Model、社群軟體可能會請你設計一個動態牆)。這篇我會用Python做一個A/B Testing類型的Take-Home Challenge，我選的是書中的Price Test題組。

A/B Testing類型的Challenge基本的要點都是：

實驗設計正確性檢驗
實驗顯著性驗證(Overall)
實驗顯著性驗證(By Segment)
提出建議：¹實驗設計調整 ²Actionable Insights

本篇的例子也都會包含到以上幾點。

A Collection of Data Science Take-Home Challenges是前AirBnb資料科學家Giulio Palombo整理的一份資料科學家面試經典題庫，其中包含20個Take-Home Challenges，40個Product Questions以及6個SQL題組，有興趣的人可以直接到Data Masked官網了解：https://datamasked.com

此篇會節錄解題主要流程，完整的程式可以直接參考我的GitHub:

bertlee272/AB_Testing_Pricing_Test

A A/B Testing Challenge on Software Pricing. Contribute to bertlee272/AB_Testing_Pricing_Test development by creating…

github.com

一、Challenge描述

甲公司是一個賣軟體的公司，過去公司軟體的售價是$39，但公司的Revenue已經許久沒有成長，公司副總想嘗試提高售價至$59，他決定實施一個A/B Testing：66%使用者仍然會看到$39(控制組)，33%使用者會看到$59(實驗組)。我們已有實驗實施後的Conversion的資訊，試回答：

Q1: 公司軟體應該賣$39還是$59?

Q2: 從數據中，你得到什麼發現跟建議？

Q3: 實驗是否可以提早結束？

二、Load & Check Data

讀檔的時候，我自己有一些基本的檢查習慣：

(1) Looking at some samples

df_user_table.sample(10)

(2) Checking the data size

df_user_table.shape

(3) Checking for duplications

df_user_table[df_user_table.duplicated(subset=['user_id'], keep=False)]

No duplicated data here.

(4) Look at some basic statistical summary of data

df_user_table.describe(include='all')

(5) Check data type

df_user_table.applymap(type).apply(pd.value_counts).fillna(0)

三、Check Randomization of Test

在一個A/B Testing中，我們要確保Control & Testing Group的分佈是接近一致的，包括年齡、性別、實驗前使用量⋯等，避免這些因素影響到我們評估的Metric變化。

如果時間足夠，我們當然也可以逐一檢驗，對Numerical的資料我們可以做T-Test，對Categorical的資料我們可以做Chi-Square，但更快的方式，我們其實可以直接用Tree Model去分辨Control/Test Data，如果有變數能夠幫助我們分辨，那就代表這個變數在Control/Test Group中不均衡。

Chi-Square Test:

這裡幫大家快速複習一下Chi-Square，我們先來觀察一下Operative System在Control & Test Group的分佈：

理想上，Control & Test的Distribution應該要非常接近，也就是每一條藍色的Bar跟橘色的Bar應該接近等高，但看起來有一點差異。

怎麼算Expected Count呢？我們舉Linux使用者為例，Linux使用者佔比總佔比是1.305%，控制組總共有202,517人，實驗組總共有113,918人，那麼理論上在控制組要有2643.2個Linux使用者(1.31%*202,517)，實驗組要有1486.6人(1.31%*113,918)。

怎麼算Standardized Difference? 我們直接將上面控制組例子套到下面公式：Z_linux_control = (2204–2643.2)/(2643.2)^(1/2) = -8.54

怎麼算Chi-Square的Test statistic? 就是把所有Z值平方加總囉，上面的-8.54平方後就是下表出現的72.98，把下面兩排數字通通加起來就是最終的我們的Test statistic。

在查P-Value之前，我們還要知道Degree of Freedom(df)，其實就是(Row-1)*(Column-1) = (組數-1)*(類別數-1) = (2–1)(6–1) = 5
得到Test statistic和df後，我們就可以去查表或用套件計算P-Value，做結論。這裏我們的Chi-Square數值高達299.8，有非常大的顯著性，也就是說控制組跟實驗組的OS分布差異很大，實驗設計有問題。
當然，從頭到尾其實用套件就沒事了：

from scipy.stats import chi2_contingency

chi2_contingency([SER_A, SER_B])

Test statistic, P-Value, df, Expected Count

Decision Tree:

透過Decision Tree我們就可以很快地發現Control & Test Group 分佈不均的地方。以上例來說，Linux使用者更可能被分到Test Group (60.8%>39.2%)，Windows使用者也比較機會出現在Test Group (51.1%>48.9%)。

四、Q1: 公司軟體應該賣$39還是$59?

這題我會先對所有使用者的Revenue進行T-Test，看價錢對整體的影響；接著對不同客群的Revenue進行T-Test，細看價錢對不同客群的影響。

這裏我簡單描述如何對全體使用者進行T-Test：

Step 1. 建立Null Hypothesis & Alternative Hypothesis

H0: Rev_test-Rev_control = 0
HA: Rev_test-Rev_control > 0

Step 2. 選定⍺

Alpha = significance level = P(Type I Error)
通常會選5% (/10%) => 允許5%的機會犯Type I Error

Step 3. 計算T statistic

Test_Revenue = stats.ttest_ind(Revenue_B, 
                                Revenue_A, 
                                equal_var=False,
                                )# 1-sided test, pvalue need to divide by 2
Test_Revenue.statistic, Test_Revenue.pvalue/2

T statistic, P-Value

p-Value: under H0, prob x greater than the observed x from sample

Step 4. 做出結論

p-Value非常低，也就是H0成立的機會非常低，遠遠低於⍺，因此我們拒絕H0，亦即實驗組的Revenue顯著高於控制組。

實際上在Notebook中，我並不是直接做完T-Test就建議要將售價定為$59，而是看不同OS使用者的反應，並作出以下建議：

Back to the question, the best solution is certainly run the A/B Test again with a better randomization (remember that we have significant more windows & linux users in test group); If that is not an option, I choice to believe $59 is better. You could see that overall we have better revenue with price $59, and this is a result with more windows & linux users who are actually performing worse than average in the test group.

(另外，Notebook裡我有對lat(緯度)詳細進行一次手算T-Test，可以參考。)

五、Q2: 從數據中，你得到什麼發現跟建議？

這個部分，Tree Model又能用上場，我直接使用Decision Tree快速觀察那些因素會影響到Conversion，然後看每個變數的Importance，接著針對該變數視覺化，找到真相。

Step 1. Build Decision Tree

我們發現Source(Friend Referral, Ads Google)、OS(Mac, iOS)很重要，於是針對這兩個變數仔細觀察。

Step 2. Visualize Variables

Step 3. Make Suggestions

Either in Control/Test, friend_referral contribute enormously to conversion ➡️ Promote friend referral, give coupons to both users who refer and are referred.
Mac & iOS users have a very high conversion rate ➡️ Launch more campaigns targeting these Apple users.
Linux users’ conversion rate is extremely low and for testing group, not even one user converted ➡️ There could very possibly be some bugs or unfriendly design for linux users.

六、Q3: 實驗是否可以提早結束？

這種Sample Size的問題也是在設計A/B Testing的時候就應該計算好的，如果我們要評估的Metric是Conversion Rate，我們可以直接從以下的計算機得到所需Sample Size：

Sample Size Calculator

Need A/B sample sizes on your iPhone or iPad? Download A/B Buddy today. Question: How many subjects are needed for an…

www.evanmiller.org

不過我們在乎的是Revenue的變化，是Numerical的數值而非0, 1這種Binomial問題。

Step 1. 已知變數

Revenue控制組

Revenue實驗組

原始的Revenue平均：0.78
原始的Revenue標準差：5.45
實驗的Revenue標準差：7.30 (實驗前可以推估)

Step 2. 設定變數

Effect Size: 我們希望Revenue至少有多少的進步？這裡我希望至少進步10%，也就是增加0.78*0.1 = 0.078
Power: prob of detecting program effect = prob of rejecting H0 when HA true = 上圖綠色區塊 = 若賣$59 Revenue會成長10%假設成立，實驗相信如此的機會。通常會設為80%。
⍺: Sig. Level = P(Type I Error) = 實驗表示賣$59比$39 Revenue更高，但實際上不然的機會。通常設為5% or 10%，這次我設為10%。

Step 3. 計算Critical Value(c) & Standard Error(SE)

解以下連立方程式：