Why I think the Seralini GM feeding trial is bogus

UPDATE: If you’re looking for information on the ‘republished’ version of this manuscript, a full statistical analysis of the released data can be found here.

If you’re following the news about the French GM maize feeding trial, you’ve probably heard: (A) we need to pull GMO crops off the market immediately; or (B) that the study is flawed and is basically meaningless. I guess I find myself leaning toward the second group on this one. Here is why I think the recent GM corn feeding trial by Saralini (Seralini et al. 2012. Food and Chemical Toxicology) is bogus. Just for full discosure, I’m not in any way an expert on animal feeding studies. But I know something about statistics and probability (in most cases, just enough to be dangerous). I would invite anyone who sees errors in my logic from either an animal science, toxicology, or statistics standpoint to please let me know in the comments.

In the very first sentence of the introduction, the authors state that “There is an ongoing international debate as to the necessary length of mammalian toxicity studies in relation to the consumption of genetically modified (GM) plants including regular metabolic analyses (Séralini et al., 2011).” I find it interesting that Seralini cites himself as proof of this… I did not look up the reference or search to see if this is an actual international debate, or if it is simply Seralini vs the world on this point. But I digress. A reasonable person would certainly agree that long-term studies of food products sounds like a good idea, and so it is easy to side with the authors that this type of research is needed.

But if we compare the life span of rats with the life span of humans, the concept of “long term” is not at all similar. And this is where I think the Seralini study falls apart. It boils down to the fact that this study lasted for 2 years, and used Sprague-Dawley rats. To those of us who don’t do rat studies, 2 years probably seems like a reasonable “long term” duration for a study (it did to me at first glance). However, it seems that for the specific line of rats they chose (Sprague-Dawley), 2 years may be an exceptionally long time.

A 1979 paper by Suzuki et al. published in the Journal of Cancer Research and Clinical Oncology looked at the spontaneous appearance of endocrine tumors in this particular line of rats. Spontaneous appearance basically means the authors didn’t apply any treatments (like feeding them GMOs or herbicides). They just watched the rats for 2 years and observed what happened in otherwise healthy rats. When the study was terminated at 2 years (the same duration as the Seralini study), a whopping 86% of male and 72% of female rats had developed tumors.

Below I provide the results of a very basic simulation using R. I’ve also provided the R code incase anyone would like to repeat or modify this little exercise (R code is in red, output is in blue). Let’s assume that the Suzuki et al. (1979) paper is correct, and 72% of female Sprague-Dawley rats develop tumors after 2 years, even if no treatments are applied. If we randomly choose 10,000 rats with a 72% chance that they will have a tumor after 2 years, we can be pretty certain that approximately 72% of the rats we selected will develop a tumor by the end of 2 years.

 ## Create a sample of 10,000 Female rats. Each rat we choose
  ## has a 72% chance of developing a tumor after 2 years.
  SD.Female<-sample(c(0,1),10000,replace=T,c(0.28,0.72))
  ## The mean of this population (of 0s and 1s) will tell us the
  ## the proportion of rats that developed tumors, by chance.
  ## 0 = no tumor; 1= tumor
  mean(SD.Female)
 [1] 0.714

In our very large sample of 10,000 simulated rats, we found that 71.4% of them will develop tumors by the end of a 2 year study. That’s pretty close to 72%. But here is where sample size becomes so critically important. If we only select 10 female rats, the chances of finding exactly 72% of them with tumors is much less. In fact, there is a pretty good chance the percentage of 10 rats developing tumors could be MUCH different than the population mean of 72%. This is because there is a greater chance that our small sample of 10 will not be representative of the larger population.

UPDATE: 9/20/2012 – See the comment from Luis below for a more elegant way to set up the 9 groups. It also allows you to more easily change the probabilities (only one time, instead of 9) if you want to see the impact if the probability of tumors is 50 or 80% instead of 72%. Thanks Luis!

 ## Create 9 groups of rats. Each group has 10 individuals.
  ## Each individual has a 72% chance of developing a tumor
  ## after 2 years.
  SD.Fgrp1<-sample(c(0,1),10,replace=T,c(0.28,0.72))
  SD.Fgrp2<-sample(c(0,1),10,replace=T,c(0.28,0.72))
  SD.Fgrp3<-sample(c(0,1),10,replace=T,c(0.28,0.72))
  SD.Fgrp4<-sample(c(0,1),10,replace=T,c(0.28,0.72))
  SD.Fgrp5<-sample(c(0,1),10,replace=T,c(0.28,0.72))
  SD.Fgrp6<-sample(c(0,1),10,replace=T,c(0.28,0.72))
  SD.Fgrp7<-sample(c(0,1),10,replace=T,c(0.28,0.72))
  SD.Fgrp8<-sample(c(0,1),10,replace=T,c(0.28,0.72))
  SD.Fgrp9Female.9grp
  colnames(Female.9grp)<-c("Control","t1","t2","t3","t4","t5","t6","t7","t8")
  Female.9grp

	Control	t1	t2	t3	t4	t5	t6	t7	t8
1	1	1	0	1	1	1	0	0	0
2	1	1	1	1	0	0	0	0	1
3	1	1	0	1	1	1	1	1	0
4	1	1	1	0	1	1	1	0	0
5	0	1	0	1	1	1	0	1	1
6	1	1	0	1	0	0	1	1	0
7	1	1	0	1	1	1	1	1	0
8	0	1	1	0	1	1	1	1	1
9	1	1	1	1	1	0	1	1	0
10	0	1	1	0	1	1	1	1	1

sum(Female.9grp)

[1] 62

sum(Female.9grp)/90

[1] 0.6889

The 9 groups (in columns) of 10 rats each represent one possible randomization of the rats used in the Seralini study. Let’s assume that “Control” is the control group, “t1” is the first treatment group, and so on. If we look at all 90 simulated female rats chosen for the experiment, 62 rats (about 69%) would develop tumors after 2 years, even if no treatments were applied. Again, that’s not too far away from our known population mean of 72%.

But here’s the important part: Simply by chance, if we draw 10 rats from a population in which 72% get tumors after 2 years, we have anywhere from 5 (“t2”) to 10 (“t1”) rats in a treatment group that will develop tumors. Simply due to chance; not due to treatments. If I did not know about this predisposition for developing tumors in Sprague-Dawley rats, and I were comparing these treatment groups, I might be inclined to say that there is indeed a difference between treatment 1 and treatment 2. Only 5 animals developed tumors in treatment 1, and all 10 animals developed tumors treatment 2; that seems pretty convincing. But again, in this case, it was purely due to chance.

So my conclusion is that this study is flawed due to the choice of Sprague-Dawley rats, and the duration (2 years) for which the study was conducted. Sprague-Dawley rats appear to have a high probability of health problems after 2 years. And when there is a high probability of health problems, there is a high probability that just by chance you will find differences between treatments, especially if your sample size for each treatment is only 10 individuals.

UPDATE: September 23. For those of you who would like more information on this study by Seralini et al, please read Emily Willingham’s critique of the study. It is by far the most comprehensive summary I have read. Emily is on twitter at @ejwillingham. An excerpt:

The possible explanations are legion, but with several different kinds of estrogen receptors with different actions in different tissues, compounds that block a receptor at one concentration but activate it at another, compounds that interact with different kinds of hormone receptors in different ways, and differential effects in different species–it’s no wonder the results with mixtures are themselves so mixed. The one thing that doesn’t leap out here as being involved, among a sea of likely possibilities, is the GM corn itself.

UPDATE: September 28. For a graphical demonstration of this post, check out the Inspiring Science blog.

UPDATE: October 4.The European Food Safety Authority (EFSA) has released a statement on the Seralini study. Their conclusion (emphasis mine):

EFSA notes that the Séralini et al. (2012) study has unclear objectives and is inadequately reported in the publication, with many key details of the design, conduct and analysis being omitted. Without such details it is impossible to give weight to the results. Conclusions cannot be drawn on the difference in tumour incidence between the treatment groups on the basis of the design, the analysis and the results as reported in the Séralini et al. (2012) publication. In particular, Séralini et al. (2012) draw conclusions on the incidence of tumours based on 10 rats per treatment per sex which is an insufficient number of animals to distinguish between specific treatment effects and chance occurrences of tumours in rats. Considering that the study as reported in the Séralini et al. (2012) publication is of inadequate design, analysis and reporting, EFSA finds that it is of insufficient scientific quality for safety assessment.

and:

Séralini et al. (2012) draw conclusions on the incidence of tumours based on 10 rats per treatment per sex. This falls considerably short of the 50 rats per treatment per sex as recommended in the relevant international guidelines on carcinogenicity testing (i.e. OECD 451 and OECD 453). Given the spontaneous occurrence of tumours in Sprague-Dawley rats, the low number of rats reported in the Séralini et al. (2012) publication is insufficient to distinguish between specific treatment effects and chance occurrences of tumours in rats.

I guess that pretty much settles it.

Share this: