Статзилла наука
How to calculate a sample size
Sooner or later, a researcher in the field of medicine faces the task of statistical data processing for their scientific article or thesis. And most often the first question that a researcher asks when faced with the need to calculate medical statistics is "how many patients should I recruit"?
Why is this so important?
With regard to statistics, there is a minimum sample size at which the findings of your study are statistically significant — if there is "something" in the data indeed, it will show up. What does it mean in practice? If you don't have a sufficient amount of people in the group, there is a risk of type II errors — to get a false negative result: surgery is ineffective, the new drug does not work etc., although in fact it does, and there were just not enough people involved in research.

In fact, I'll tell you a terrible secret: this question should not be the first one. If you want to avoid mistakes and tricky questions from dissertation board, or, what is more, if you want your work to be published of Scopus, Nature or, Gosh, Lancet, then you need to start preparing for statistical data processing much earlier, when you are just thinking about the topic of your research.

So what is the algorithm for writing a research paper?
It's all simple, you need to start with the topic of the study — the research question that you will answer in your work (we will analyze typical research questions below).

Depending on the research question, the design of the study (in other words, the method of collecting the sample) is chosen. The design of the study determines which methods of statistical data processing are appropriate. Finally, the sample size depends on the methods selected.

Research questionresearch designstatistical processing methodssample size

Let's take a closer look at the most common cases below, and we hope you will recall some of them from your experience.

Typical research questions in the field of medicine and related research designs are the following:

1. Evaluation of treatment tactics: which drug/therapy/intervention is better → randomized controlled test (RCT)
Patients (people who are already ill) are randomly allocated to groups: those who receive the therapy that is the subject of the study and the ones who receive a placebo/standard therapy. Let's explore what both treatments give us. It is important that the data is being accumulated after you have decided to conduct a study (instead of taking a file that had been collected in the Department of your University for decades). Yes, it's expensive. Yes, it takes a lot of time. It's not me who decided to be a doctor. Anyway, if RCT is your case, then we recommend studying the format of RCT publications in scientific journals — CONSORT.

METHODS: Comparison of mean values in groups. Survival analysis (if the result of therapy is some kind of end point, outcome).

2. Prognosis: does the use of alcohol lead to liver diseases → longitudinal cohort study
There are two or more selected groups of people receiving different treatment/exposed to different factors: smokers and non-smokers, etc., as well as in RCT. In contrast to RCT, they may not yet be sick (or they have an early stage of the disease). Moreover, in the process of your observation, they might not fall ill with the disease you are studying (not develop the disease). Again, we collect data after we decide to conduct a study — but we can draw conclusions about the causality of events.

METHODS: Comparison of frequencies in groups. Odds/risk ratios. Survival analysis. Regression analysis (predictive models). Correlation analysis (causality is the point). Individual/personalized risk models.

3. Development of preventive methods/etiology: which of the working conditions at the factory are risk factors for CVD? Cohort study or case-control study
Patients with the studied disease (cases) are compared with a control group(healthy, with another disease, with a mild degree of disease) to find out what led them to their current condition.

Hallelujah! This study design allows you to use the treasury of data already collected at your University or by your supervisor: the impact factors are evaluated retrospectively (from medical records)!

METHODS: Comparison of frequencies and mean values in groups. Odds/risk ratios. Regression analysis (not a prognosis but a description). Correlation analysis (not causality is the point but the "association" of risk factors with the disease). Individual/personalized risk models.

4. Validating a new diagnostic test: can only venous phase in CT scan be used for diagnosis instead of 4-phase CT? → Cross-sectional study
For each patient data are collected on the results of a new diagnostic test, the results of a gold standard and the true condition. Again, it can be done retrospectively.

METHODS: ROC analysis, sensitivity-specificity analysis.

Finally, we calculate the sample size. The number of people who need to be recruited for the study to get a statistically significant result.
We begin from the end of the study. Imagine the perfect moment when you get the applause of the grateful audience in honor of your grand discovery…. Now answer the question: what minimum difference found between the groups/results of different diagnostic tests can be considered clinically significant? That is, we take our target value (the difference in blood pressure after treatment with drug A and B; the incidence of liver disease in smokers and non-smokers; noise level at work in patients with CVD and without it; the frequency of correctly diagnosed by the new test and the standard test) and determine what result we "would like" to see in the work.

The study can show that drug B leads to a statistically significant decrease in cholesterol levels by 0.01 mmol/L. But is such a decrease significant clinically? Should we then not use a time-tested drug? Or, on the contrary, if a CT scan of the venous phase only detects lymphoma 0.5% less often than a 4-phase CT scan, should we continue to "torture" patients?

You ask me, well, what about the methods? I can understand what differences in result I want to get without complicated statistical terms about a comparison of the frequencies and correlations. I will tell you: so, the calculation of the sample size is closely related to another statistical indicator — the power of the statistical method that will be used for the analysis. Power is the probability of detecting differences, if they exist, or the ability of the method to do so. The lower it is, the smaller the sample size we need. In a medical research, the power is usually set in the range from 80% to 90%.

Each statistical criterion (that is, data processing method) has its own formula for calculating the power which can be transformed into a formula for calculating the sample size. To make your life a little easier — you don't have to learn these formulas by heart or calculate them manually — there are special calculators. For example, in this calculator statzilla.fvds.ru you don't have to worry about statistical methods, you just need to select your research question type, the service will do everything for you including the text generation.

So, all you need to do in order to calculate the number of patients for the study:
1. Depending on the research question — what we want to prove to humanity — we determine the appropriate design of the study.

2. We select methods of data processing (for example, comparison of mean values in groups) and within them — suitable statistical criteria (for example, the Mann-Whitney U test). (if you want to get a check list how to select statistical criteria for free please e-mail me at mail@statzilla.ru)

3. We define the minimum difference in groups/diagnostic test results which we "would like" to observe and which we consider clinically significant.

4. We choose the power level from 80% to 90% (here you can make experiments).

5. Then all you need is to take advantage of the power/sample size calculators for your statistical criterion, like this one (statzilla.fvds.ru).

In the end, do not forget to include a text describing the process of calculating the sample size (which immediately distinguishes your work from others ;) in the article or the thesis:

For a 80/90% probability of detecting the differences in the value of "INDICATOR NAME" between the studied groups using Mann-Whitney test, it is necessary to recruit X patients for each group.