How To Find Sample Size In R

AB Sample Size Adding in R

Some useful tools in R for calculating sample size and experimentation elapsing

Introduction

Performing advert-hoc assay for stakeholders can exist time consuming. Furthermore, there are a few questions that I become asked on a reasonably frequent basis. And then I have been spending some time developing some tools for my "non-technical" colleagues to use in R.

One of the most commonly asked questions is "How large of a sample do I need to achieve significance?", which is ofttimes followed by "How long do I need to run my experiment for?". For this reason, I have developed some simple lawmaking for people to use when they need to reply these questions. All the user needs to do is pass some baseline numbers into some functions I have created and they can determine their sample size requirements and experiment duration on an ad-hoc basis.

Sample size, statistical power and experiment duration

Luckily, by knowing a few unproblematic pieces of data the pwr() parcel in R can answer these two questions with a off-white amount of ease. Pwr() helps y'all perform power analysis prior to conducting an experiment, which enables you to determine how big your sample size should be per experimental status.

The four quantities required to compute power analysis have an intimate relationship and we are able to compute whatever one of these values if nosotros have the remaining inputs:

1. sample size (due north)

2. effect size

iii. significance level (alpha)= P(Blazon I error) = probability of finding an effect that is not there

4. power = i — P(Type 2 error) = probability of finding an result that is at that place

As your significance level (3) and power (4) are typically fixed values, every bit long as you tin input the effects sizes (2) for your control and variant, you can make up one's mind your required sample size (i).

Thankfully, the ES.h() function in the pwr() package computes our effect size for usa to pass into power analyses. We volition typically know the electric current conversion charge per unit/performance of our command condition only the result of the variant is most by definition an unknown. All the same, we can calculate an expected effect size, given a desired uplift. In one case these effects are computed they are passed into the pwr.p.exam() function which will compute our sample size, providing n is left blank. To make this sort of assay user friendly, I have wrapped both aforementioned functions into a new part chosen sample_size_calculator().

Furthermore, as we will use this information to then calculate the number of days needed to run the experiment, I have created a days_calculator() function likewise, which will use the output from our sample size calculation:

          sample_size_calculator <- function(control, uplift){
variant <- (uplift + i) * command
baseline <- ES.h(control, variant)
sample_size_output <- pwr.p.test(h = baseline,
n = ,
sig.level = 0.05,
power = 0.8)
if(variant >= 0)
{return(sample_size_output)}
else
{paste("N/A")}
}          days_calculator <- role(sample_size_output, average_daily_traffic){
days_required <- c(sample_size_output * 2)/(average_daily_traffic)
if(days_required >= 0)
{paste("It will accept this many days to reach significance with your electric current traffic:", circular(days_required, digits = 0))}
else
{paste("Due north/A")}
}

If you are using this tool, you simply specify your control conversion rate and desired uplift:

          control <- 0.034567          uplift <- 0.01

And run the sample_size_calculator() part:

          sample_size_calculator(command, uplift)          sample_size_output <- sample_size_output$north          sample_size_output

Y'all will then become your required sample size output given these values (retrieve this sample size requirement is per variant):

          [n]230345

Now we have this information we can make up one's mind how long the experiment needs to run for. All that y'all volition demand to input is your boilerplate daily traffic:

          average_daily_traffic <- 42000

Run the days_calculator() function:

          days_calculator(sample_size_output, average_daily_traffic)

And yous will get the following output:

          [1] It will take this many days to reach significance with your current traffic: 36

Although this lawmaking is only relevant if you are conducting an experiment with an AB blueprint (i.e with just two experimental atmospheric condition), the functions presented can be amended to summate the required sample size given multiple experimental conditions, using the pwr.anova.test() function within sample_size_calculator(), replacing pwr.2p.exam().

Conclusion

Power assay is an imperative attribute of whatever experimental blueprint. It allows analysts to determine the required sample size needed to detect a statistically meaning effect of a given size, with a given degree of conviction. Conversely, information technology also facilitates the detection an effect of a given size with a given level of confidence, under sample-size constraints. If the probability is low, it could be advisable to alter the experimental design of your experiment or to minimise sure numerical values that are input into your ability analyses.

Used in conjunction with one another, calculating both your required sample and experimentation elapsing can be incredibly useful data to provide to stakeholders. Obtaining this information can assist them efficiently plan their experimentation road-maps. Furthermore, these predetermined numbers tin can assist in determining the feasibility of certain experiments or whether the uplifts desired are too idealistic.