Proving once and for all that great minds think alike, Christie Bahlai and I share some concerns about the Carle book, a.k.a. The Very Hungry Caterpillar:
This is a terrible dataset about caterpillar diet. How did it got published? pic.twitter.com/XkAq51HxEP— Timothée Poisot (@tpoi) April 23, 2015
Hmm, I don't know about this caterpillar rearing manual. I thought P.rapae had an obligate association w/ Brassica. pic.twitter.com/M10dqbOYlN— Christie Bahlai (@cbahlai) May 10, 2015
So how hungry are caterpillars anyway? It’s surprisingly easy to find out, and it’s a good illustration of what we can do with open data. In this post, I’ll use data from the GLOBI database @poel14, to see which of the species from the genus Pieris is the very-hungriest. Because, clearly, the appropriate response to people cracking jokes at children’s books is to design a data-analysis plan.
The GLOBI database (it stands for Global Biotic Interactions) lists data from
the literature, and we can access it from
rglobi. So before you
Once this is done, we can look for all interactions that have Pieris eating something:
pieris_interactions <- get_interactions("Pieris", interaction.type="eats")
This gives (as of the writing of this post), 232 interactions. Looking good!
Now, we can build an incidence table of what Pieris species is eating what
food source. The output of
get_interactions has columns for the source and
target taxa, so this is fairly easy:
A <- table(pieris_interactions$source_taxon_name, pieris_interactions$target_taxon_name)
This table has 14 rows (for 14 Pieris species), and 137 food items. To know which is very-hungriest of all, we can simply sum the rows:
generality <- rowSums(A) sort(rowSums(A))
Here is the result:
|Pieris species||Number of known items in diet|
OK, so that settles it. Pieris rapae IS a very hungry caterpillar.
But more seriously, isn’t that amazing? That the integration of open data and open software means that we can now go and test hypotheses with very few effort? I can’t help but feel that we are extremely lucky to have all of these resources available. And I’m working on a paper that will showcase a more ambitious example. Open data are good. Use them.