In case you somehow missed this, we’re living in an information era. For most scientists, this means that an ever-growing component of our jobs is to access and synthesize information. The old naturalist skills that drove ecology in the past are being replaced, in many cases whole-sale, with information technology acumen.
Despite the importance of IT in modern ecological research, computational skills in the ecological community remain pretty gender-imbalanced. As a computationally savvy(ish) female scientist, this worries me. My personal experience is that computational skills take work to acquire, and in the end they are based more on effort than talent or intuition. Ecologists, and particularly ecological women, need to develop and communicate these skills if we want to hold on to our federal support. At the moment, we’re operating under digital silence. In this post, I’ll
The intent of this post is NOT to criticize the computational community, and particularly the R project, whose community is very actively trying to improve female participation. It is also not intended as a criticism of Penn State, whose biology program has given me a huge amount of flexibility to pursue my graduate work. Instead, I hope that this post will inspire smart women to recognize one another, and enter and participate in the computational domain in a more visible way.
A few numbers on women in ecological computing
1. Female participation in R / Open Source
The vast majority of academic ecologists now use the statistical computing environment R (www.r-project.org). R is an open source/open access project, driven by a board of directors. R packages are contributed from the broader scientific community, and adopted for hosting at the Comprehensive R Archive Network (“CRAN”). While the gender representation of PhD recipients in the scientific fields that interact most heavily with R is increasingly better balanced (~46% of PhDs awarded to women in statistics in the last decade; ~ 50% of science PhDs awarded to women), female contribution to R remains low.
A recent analysis at ROpenSci showed that ~14% of R packages are maintained by women.
I checked the packages listed on the CRAN environmetrics task view. Of the 107 packages listed, five were maintained by authors with clearly female names, whereas 94 had maintainers with clearly male monickers (maintainers of the remaining 8 packages had names that could not be readily classified using the gender package in R). In short,
5% of the ecology/environment packages hosted on CRAN are maintained by people with clearly female names.
There are currently 24 men and no women on the R Core Development Team (which is an elected body). Of the 58 other folks that the R project recognizes as major contributors to R, only one has a clearly female first name.
The dearth of women in computational ecology reflects the general absence of female participation in scientific computing. Although it’s notoriously difficult to measure, female Linux usership likely remains at less than 5%. StackOverflow, R-help and other web forums for computing assistance see very little female participation, even among individuals posing questions. Women in ecology are confronted with the same challenging computational community that women encounter across the computational board. While there are some signs of progress (for example, r-help doesn’t seem to be getting meaner), a lot of work remains.
2. Female participation and recognition on github
Additionally, female participation on the code management site github remains incredibly low.
Less than 3% of code repositories with 5 or more stars map to owners with clearly female names (see Alyssa Frazee’s analysis here).
The low rates hold across a whole bunch of programming languages. And don’t be fooled: R looks better than most because the gender classified Frazee used classes “Hadley” as female.
The trend toward female underparticipation holds in the ecological literature as well. 28% of the authors on 500 randomly chosen publications in Ecology Letters have clearly female first names (as opposed to 71% with clearly male names). This drops to 25-26% in the journals Ecological Modelling, and Journal of Theoretical Biology (however, name misclassification rates here are high: 18.8% of names not classified in Ecology Letters; 42.0% in Ecological Modelling, 39.5% in JTB).
What to make of this?
First, these numbers are in no way perfect. The participation rates presented here hinge on first name as a true signal of an individual’s gender. Gender associations with a given name are probabilistic, and certainly there may be some error in gender classification of each name. Generally, women in science are aware of gendered stereotypes, and may use gender-ambiguous monickers, or forego names for initials. Lastly, the distribution of number of packages per maintainer is somewhat overdispersed, and maintainers with more then four packages on the environmetrics list all have male names.
Even with these caveats, I think these data convey a real, emerging problem for the ecological community. Even though many women participate in graduate-level ecology, relatively few participate in the computational realm. Increasingly, that’s the area where the money is. We need to make some changes, or women are likely to be left out.
These numbers don’t tell the whole story, though. There are some women who are doing amazing things in the computational domain, whose work is often overlooked. Karline Soetaert, for example, is the author of the deSolve package in R, a workhorse package that gets used all over the place. The University of Washington is home to several prominent female R developers, including Daniela Witten and Hana Sevcikova. We may see a generational shift (someday) to a more gender-inclusive computing environment…
… but that day hasn’t arrived just yet. I think these steps would help get ecological women (and all entry-level users) assimilated into the broader computational community.
1. A clear statement of computational expectations for ecologists.
In my experience, grad students really like checking things off lists. If they’re given a list of required competencies, they’ll achieve them. The problem with computation is that people don’t know what skills they should be working to attain. Here’s one example of such a list for ecology grad students:
– basic knowledge of file directory structure on your own operating system
– ability to launch a terminal and navigate in it
– basic knowledge of SQL (set up a database; run a query)
– ability to fit basic statistical models (ANOVA, linear models)
– ability to simulate simple datasets
In my mind, these are NOT statistical competencies. There is a TON of material that has to get covered in statistical courses for ecologists. Computational skill belongs elsewhere in the ecological curriculum (if for no other reason than that, just like ecologists, many statisticians lack the training to effectively outfit students with these skills).
2. Local contact and community
User groups and forums, computing “corners” where people can work together, and formal code mentoring are all enormously helpful in getting smart new users up to speed computationally. These forums exist at many institutions, but my impression is that they tend to be chronically underutilized. Graduate programs should encourage participation in these forums. A competency list might foster participating.
3. Inclusion of computing in the general ecology curriculum.
This post builds on a talk for the Montana State University Department of Statistics, given in April of 2015.