Why Medical Students Need To Understand Statistics: An Interview With a Patho-Geneticist

Last Updated on June 26, 2022 by Laura Turner

Dr. Mary Jean M. has spent the last decade researching in various laboratories, from plant sciences, developmental biology, immunology, cancer biology, and parasitology. She is a patho-geneticist and loves every facet of infectious disease research and biostatistics, from understanding the population dynamic of life to the intricacies of the microscopic world. 
Where were you born and raised?
I grew up on a farm in rural Montana and love spending as much time as I can in the great outdoors.
Where and what did you study?
I received my BS from Montana State University in Biotechnology, a Master in Public Health (MPH) in Epidemiology from the University of Minnesota College of Public Health and a PhD from the University of Minnesota College of Veterinary Medicine.
What areas have you worked in thus far in your career?
I have worked in a variety of research areas, such as developmental biology, cereal quality, immunology, and cancer biology, before I found my true passion in infectious disease genetics.
Why did you decide to choose your particular career path?
I think I’m like most people that pursue graduate degrees, I enjoy finding answers and I want to make the world a better place. Science and math “came easy” to me, but when choosing a profession it had to be something I could enjoy every day. Looking back, many of my hopes were far fetched, but I’m not alone in the graduate community in the pursuit to make a difference. I worked in many different types of bioscience laboratories until I found that I could geek out on infectious disease research 24/7. I find it fascinating that a single-celled organism can optimize and often overtake a complex organism in order to replicate.
What is statistics?
Statistics is the use of equations to make a generalization or put objects in an order to try to answer a question. For example if you were an avid gambler and wanted to know what numbers to bet on based on chance alone, you could use an equation to help you decide. What car should you buy if you want the best gas mileage and longevity? Check the auto manufacturers’ stats.
What is the difference between statistics and biostatistics?
Bio. It changes the playing field. Instead of working with inanimate objects, a human, animal, or plant is involved. Survival analysis is looking at the rate that something lives or dies. That is a big ethical and emotionally charged difference. A mean and confidence interval for a biological event indicates people being helped or hurt. When dealing with living beings there are increased limitations in what can be extrapolated from experiments. The experiments often cannot be identically replicated or are working with a limited sample population.
What do you love about biostatistics?
I love knowing I can find out the most information utilizing the smallest amount of subjects. It is important to have the proper sample sizes when testing a hypothesis. Too small and you aren’t able to be confident in your results. Too big and not only have you wasted valuable resources, but you are putting people/animals into unnecessary risk.
How does having a strong statistical background affect your day to day life?
I enjoy getting to read and interpret information for myself. When news hits that some new treatment is the biggest, best thing to hit the streets, as a medical professional your opinion counts. I go back to the primary research article(s) and read. I’m not reanalyzing the data, but looking to see what the results show and what population dynamics are in play. I trust my knowledge base over that of a journalist making a headline.
How often are statistics and biostatistics utilized in the medical field?
Population standards (means and deviations) are used every day.
What are some examples of these being used?
How do you know if someone’s blood pressure is too high, who decided what a fever temperature was, or if a white blood cell count is out of whack? Statistics. Additionally, finding trends in populations are vital to our current understanding of disease/injury and treatments. Examples include determining carcinogens, such as cigarettes or asbestos, finding the source of an outbreak, or determining the best treatment for an ACL tear. Statistics has given us the background to stand upon when determining optimums of care.
In your experience, do medical students struggle with statistics?
I think most medical and veterinary students struggle with statistics. Heck, anyone that’s not a math major can feel a little uneasy when they find out a statistics class is required.
What is a common struggle that medical students have with statistics?
It can be difficult to find the relevance in a statistical example. Finding out how many times you will get heads versus tails on a coin toss might seem irrelevant when envisioning helping people that are bleeding, screaming, or dying. But the reason you don’t just learn anatomy out of a text book is because of the known variability between humans. We use averages and assumptions all the time, but until you look at a cadaver and realize humans can have an extra vertebrae the variability found within each individual may not sink in.
What do you first say to medical students struggling with statistics?
As a medical professional, you are or will become a human encyclopedia. What is the name of the little groove under your nose? I bet you just rambled it off. You are intelligent. You wouldn’t have made it this far if you couldn’t read and remember information.
What is important advice for those students who are struggling?
Statistics can be confusing because the answer is not cut and dry, like whooping cough is caused by Bordetella pertussis. In statistics you do not memorize and regurgitate equations. It is the concept of the equations that is important to remember and when you would use it. Be comfortable with the purpose of the equation before trying to enter any numerical values. Determine what question is trying to be answered and then choose the equation designed to give the desired result with the known information.
Why should pre-medical or medical students pay attention in their statistics classes?
Medical professionals take an oath to do no harm. Being able to read and interpret relevant advances in research within your field is vital to ensure that the best choices in treatment options for the individual are made. A mentor once told me that a professional degree (like an MD, MPH, RN, etc) is looking at things an inch deep but a mile wide, where as a graduate degree (like an MS or PhD) is looking an inch wide but a mile deep. Research is done in a reductionist manner. In order to test a hypothesis all known variables are removed or controlled for so the effect of the treatment of interest can be determined. That’s the mile deep part— but an MD needs to look at it a mile wide. A medical professional needs to understand the limitations and assumptions with treatments and not over extrapolate the outcomes past the conclusions of the experimental approach. Without being able to read and interpret the statistics of the procedures being tested, ethical choices could not be made on standards of care.
What are the two or three most important statistical concepts for a medical student or doctor to understand? 
It’s statistically significant! Woohoo! But what does that mean? Statistical significance is used to indicate how big a difference lies between two groups or samples. Often significance is shown using p values or differences in confidence intervals. Understanding what a p-value indicates and what to look for in confidence intervals give insight as to how important a result is. Having immediate recall to what parametric versus non-parametric data indicates and when it is a parametric data set, understanding standard deviation are concepts used over and over again in biomedical research. These concepts should be on auto-recall as quickly as the names of the different white blood cells.
Can you explain parametric versus non-parametric data for students? 
Whenever a dataset is constructed, the first step is to plot out the data to see what it looks like. When someone comes in with a sore throat or ear ache, the doctor looks in the ear or at the throat for the obvious, right? Datasets are simply plotted on an x/y graphs to see what shape they take. If a dataset has a shape (most often referred to as a curve) and the data has parameters then it is considered parametric or normally distributed. The height of an adult human has parameters; there isn’t a living adult 1 inch tall, nor is there one 20 feet tall. Blood pressure has parameters; too low or too high and you are not alive. The range of motion of a hinge joint has parameters; move it too far and now a surgeon is needed. All these measurements vary within evident biological parameters on a numerical scale. Thus parametric equations can be used. If the data is considered ordinal, then use non-parametric equations. Ordinal you say? Ordinal means items in a series. Think of ordinal data as ‘bucket data.’ It’s always in a bucket with no grey area in between. Examples would be data from questions such as: (1) is the patient better, the same, or worse, or (2) are you a former, current or never smoker? This data would be best as a bar graph of the number of people that answered the question with each answer, not a continuous shape. Analysis of parametric data may seem more intuitive, but there is nothing wrong with non-parametric data. Non-parametric or parametric data use different assumptions, which means different equations, to answer the question of whether the treatment, method, carcinogen, or whatever being tested is better or worse than a control.
When the data is parametric one of the most common things calculated is the standard deviation (SD). Imagine someone is lying prone on a table: if someone measured equally on both sides of the intergluteal cleft so that 68% of the mass of the buttocks would be represented, the first SD from the mean (i.e. center point or intergluteal cleft) has just been found. Measure over a little farther so that 95% of the mass of the buttocks is included and the second SD has been found… When you have 99.7%, it’s the third SD! Isn’t statistics fun. In reality the SD is a single number represented as sigma, that is calculated so that 1 times it is the first SD, 2 times it is the second, and 3 times it is the third. Imagine the different shapes of buttocks as a human lies prone, some are wider, some are taller, a child has less mass than an adult, etc. These are a visual example of different ‘curves’ with the mean always in the middle, always at the intergluteal cleft. The difference that is found in the distance from the intergluteal cleft to the first SD (68% of the buttocks mass) would vary based on the mass of the human buttocks. Just like the SD varies based on the shape of the curve formed by a dataset. For this unrealistic gluteal example, a test subject had a sigma (SD) of 3 and our control had an SD of 5. One could conclude that the buttocks is less spread out in the test subject or put in fancy shmancy terms, the data has a lower dispersion and thus is clustered closer to the mean. Keep in mind that these are fictional numbers to get a visual that will stick with you. So when thinking of SD, remember that much like buttocks, curves from datasets vary. To compare curves that look different, determine representative numbers, i.e. the mean and SD. By calculating the SD, the variability in the data can be shown with a single number and that number can also be used to compare treatment and control groups to help determine statistical significance.
Can you explain p-value and confidence intervals as they relate to statistical significance?
A quick and dirty version of statistical significance: Statistical significance is the goal of most medically relevant experiments or hypotheses. The purpose is to find a treatment that is better than the current treatment or no treatment. This can be shown in many ways, but the most common are p-values and confidence intervals.

  • The p in p-value stands for probability. If a normal female becomes pregnant the probability of giving birth to a male is 50% or p=0.5. Makes sense right, like the flip of a coin that baby could be male or female. The smaller the p-value the less likely the event will occur, like the probability of a having beta thalassemia with no family history is almost zero. When comparing two datasets (treatment versus control), if the p-value is less than a set threshold (0.05 for most cohort studies and 0.01 for most molecular studies) then conclude that the two data sets are different. Put in other words, the probability of a value found in the treatment group being from someone in the control group is less than 5% if the p-value was less than 0.05. If you were looking at a new cancer treatment versus the current, and the p-value was calculated to be 0.01, then the new cancer treatment is giving different results than the control group (whether that be remission rates, survival rates, or loss of hair… whatever was measured).
  • Confidence intervals are saying that there is a range around a mean, much like an SD. A confidence interval is fun because it shows blatantly that every human was NOT measured for this analysis. Confidence interval calculations indicate that somewhere between these two numbers the real mean should be found. When reading a study, if the confidence intervals of the data overlap, it indicates either a larger sample size is needed to see any difference or there is no difference at all, i.e. the treatment did not give significant results. Although not entirely correct, think of it as a +/- value to the mean. It indicates how large of a range around the calculated mean you are confident in, hence ‘confidence interval.’ When looking at SD (which are +/- waffle around the means), the same basic rule goes: if the mean +/- the SD overlap predict (unless the sample size is ginormous) that the two samples are not significantly different.

How does statistics compare to other medical school courses?
If you tried to have a conversation with the general populous and used the medical terminology now second nature to you, half the people listening would be confused, 25% would be asleep and the remaining might be able to follow along with some of it. The language of statistics can seem confusing, but it is not unlike the first time you opened up an anatomy book and had to learn every bone with all the tuberosities, origins, and insertions.

Do you have one last piece of statistics advice for medical students?

When you open that statistics book or grab that journal article, go back to that first day of biochemistry, anatomy, physiology, or histology. There are key terms to remember and the rest fits into place once you understand what the author is talking about. The fact is that as a medical professional your opinion counts to those you treat. Make that opinion a significant one (pun intended).

About the Ads