Coding is better when done together. Photo by hackNY.
Over at the Molecular Ecologist, Kim Gilbert announces a new initiative, the Molecular Ecologist code snippet repository. It’ll be a place to put bits of useful code that wouldn’t warrant their own publication as a package or program, but would still be helpful to other biologists:
Do you have a script you regularly run to convert between data formats? A quick and easy way to run a certain analysis? Making a common figure for a given type of data? If you’re willing to share your code, we’ll put it online for public access with credit to your name.
Heatmaps are incredibly useful for the visual display of microarray data or data from high-trhoughput sequencing studies such as microbiome analysis. Basically, they are false colour images where cells in the matrix with high relative values are coloured differently from those with low relative values. Heatmaps can range from very simple blocks of colour with lists along 2 sides, or they can include information about hierarchical clustering, and/or values of other covariates of interest. Fortunately, R provides lots of options for constructing and annotating heatmaps.
I’ve personally used heatmap graphics for visualizing population structure in a sample, or linkage disequilibrium along a stretch of genetic sequence, but I haven’t done anything very complex. Arianne’s examples use a data set that’s freely available on Dryad, and she includes a lot of step-by-step detail to build up complex figures—if you’re going to be visualizing some microarrary results or metagenomics data any time soon, you should read the whole thing, and probably bookmark it.◼
But so now that it’s all over, how’d it go? Pretty well, on the overall. As much as Citizen Science is meant to be a crash course in scientific reasoning for Bard’s first-year students, it’s also a crash course in teaching for folks like me, who come to the job with experience as teaching assistants, but not in planning or executing a whole course. And judged solely on that level, Citizen Science is amazing.
Let me run through the numbers again: 12 four-and-a-half-hour days with the same 20 first-year students. I spent a fair bit of my Christmas holiday preparing lesson plans, and ended up reworking almost all of that planning in the last three days before class started. From there on, the average workday was something like:
0700-0800h: Wake, shower, breakfast at cafeteria.
0800-0900h: Last-minute lesson prep; classroom set-up, maybe some frantic final copy-making.
0900-1130h: Morning class period. Ideally, no more than one hour of this is PowerPoint presentations and/or videos of TED talks.
1130-1200h: Clean up, collect oneself, wait for the crush of students to move through the cafeteria.
1200-1300h: Lunch at the cafeteria.
1300-1500h: Afternoon class period. Only start this with a video if you want everyone to immediately fall asleep. Class debates are good in this time slot. Assign homework for the next day.
1500-1600h: Clean up, collect oneself, adjust tomorrow’s plans based on what you covered today.
1600-1730h: Exercise. (There’s a respectable campus gym, or nice trails if the weather’s not terrible.)
1730-1900h: Dinner at the cafeteria.
1900-whenever it’s done. Lesson planning and prep; printing and copying of handouts.
2300h: Bedtime, one hopes.
With variations for a four-day rotation in the wet lab and another in the computer lab, plus a “civic engagement” day in which the first-year students go to a local public school to guest-teach science classes for half a day, that’s pretty much the shape of the course. It was exhausting. Boot camp for college teaching. Learning to swim by jumping into the middle of the Hudson River in January.
But that schedule leaves out a multitude of support. First and foremost, Citizen Science faculty have no other personal responsibility than the teaching. Meals are in the campus cafeteria, which provides just fine. Housing is on campus—yes, my dorm room was tiny and ill-equipped, but it was also right around the corner from my classrooms, the communal faculty workspace, the cafeteria, and the gym. So: no cooking, no commute.
Also, it must be said, the Bard student body is pretty great. There were the inevitable exceptions, but most of my class section were smart, friendly, and willing to at least try to tackle any topic I threw at them. Sometimes they were alarmingly informal, and I had to bend a little to accomodate the local concept of punctuality, but if a classroom full of unknown students is a cliff from which a rookie prof dives, these students were also the trampoline at the bottom.
But most importantly, Citizen Science teaching is collaborative. Intensely collaborative. From the moment I arrived on campus, most of my conversations with other faculty members were about lesson plans: what had worked last year, what spurred an amazing class discussion earlier today, what part of the lab procedure left every student confused and irritated. We all started with a six-inch-thick binder of readings, case studies, and worksheets, and then added our own ideas—and swapped, reworked, cut, and rejiggered each other’s ideas.
For me, the flagship example of this was the computer lab. The resource binder had some material on SIR models of disease spread in a population; I wanted to try and teach my students some of the programming language R. So why not build SIR simulations in R?
One faculty member had already developed a nifty interactive model of disease spread in a simulated social network, which included many of the basic concepts necessary to understand more general models, so I started the computer section with that. Next up was an intro-to-R worksheet I’d banged out over the holidays, which covered exactly the programming concepts necessary to code the model, and nothing more. A couple of other faculty members test-drove that worksheet in their own class sections, which had the computer lab earlier in the schedule than mine.
One night’s reading assignment was Anderson and May (1979) [PDF], the original SIR paper; the next day we walked through the math in class. Then I gave my students a worksheet covering some of the graphing capabilities of R, which another of the R-using faculty had developed as followup to my introduction worksheet. And finally, I walked them through the coding necessary to create a simple SIR recursion simulation, complete with a plot of populaiton dynamics over time.
The result wasn’t unqualified success, by a long shot. Some students bogged down in the programming; many glazed over when I started writing equations on the whiteboard. Almost everyone seemed to like drawing graphs in R, though a lot of folks got frustrated by the technicalities of programming syntax even in that context. In the end, most students were able to at least follow me through coding the SIR model, but that was all we had time to do. Given another go-around, I’d provide more structure in the final stretch, with a worksheet that walks through the model coding and how to use the finished model to test specific hypotheses about epidemic dynamics. Also, I’d probably lead with the graph-making, which was more engaging than just pushing variables around on the command line.
But on the whole, I think it worked. My students coded SIR simulations in R, which actually responded to parameter changes the way they were supposed to, and generated pretty graphs in the process. Several students even told me, afterward, that they’ll use R for graphing in the future.
That outcome was really only possible because there were other faculty working on similar ideas, testing things out for me, sharing their own experience and materials. From what I hear, that’s a resource I can’t expect to have when I start teaching my own “real” courses as a full-fledged faculty member. And yet it’s the biggest reason why Citizen Science left me feeling like, actually, I might be able to pull off this whole professor-ing thing after all.◼
Over at The Molecular Ecologist, Mark Christie runs down the considerations to take into account when you’re thinking about making the effort to learn a programming language — he focuses specifically on bioinformatics, but his points really apply for just about anything you’d do with a script.
Perl and Python programs are (typically) compiled each time before they run and they are often not compiled to the same extent as C and C++ (but see PyPy for Python). This means that C and C++ typically run faster and require less memory after a program has been completed. Like most things in life, however, there is a tradeoff in that C and C++ programs usually require more lines of code because there are more details that have to be specified in each program. Thus there is a tradeoff between time spent developing, writing, and debugging code and the time that the program takes to run through completion.