I want to learn bioinformatics! A guide for complete beginners.

I asked a question on Twitter yesterday:

What is the correct way to handle a request to “help me learn bioinformatics” from a non-computer-literate person?

This is a frequent request I encounter, and although I have various stock answers, I was curious to find out what you guys would say. Further, I wanted a resource with jumping off links, which I hope this blog post can serve as.

And, as with many times before, I was blown away by the use of Twitter for the diversity and quality of opinion and information generated from such a seemingly innocuous question. I know I am lucky in having >3,500 eager followers hanging on my every question, but still ;)

Luis Pedro Coelho initially took issue with the question and said it is important to define what a user really wants, as “learning bioinformatics” “is not a goal per se”. Is the user driven by intellectual curiosity, do they want to learn to code, or simply develop enough skills to “analyse my RNA-Seq data?”. Russell Neches takes the extreme view which is that bioinformatics “is a fiction. Biologists just use computers for certain things”. Manoj Semanta likes to stratify bioinformatics into five layers of increasing complexity, with using web interfaces and running command-line programmes being the easiest and developing new algorithms being the hardest..

Some suggested learning R or Python and pointed to helpful online tutorials, such as http://learnpythonthehardway.org/ [Phil Ashton]. Another useful resource was the excellent Software Carpentry website which is aimed specifically at scientists wishing to learn best practices, for example the use of version control and Makefiles for reproducible research (software-carpentry.org) [suggested by Rob Davey, Deanna Church]. Software Carpentry plan to run more bootcamps for complete novices in the coming year, so keep an eye on their website.

Casey Bergman suggested using Galaxy, a web-based bioinformatics engine particularly heavily used for NGS/genomics analysis (www.galaxy-project.org). Although Rob Davey qualified this advice pointing out that “using a workflow UI and being a bioinformatician may not be the same thing. Chris Cole agrees, “Galaxy != bioinformatics. It’s a great and powerful system, tho.”

The always opinionated Mick Watson suggested more of a ‘sink or swim’ approach, specifically to go away and install Ubuntu on a PC or laptop, because “because to “learn bioinformatics” you need commitment and time and effort”. Mick also pointed to some online resources hosted at ARK Genomics which expanded on this idea of bioinformatics being tough, but worth investing time in learning (http://www.ark-genomics.org/events-online-training/eu-training-course).

The theme of “learning by doing” is probably the one that I suggest most to people, also suggested by Aylwyn Scally. I tell people that learning programming through reading a book and doing simple exercises is demotivating if you don’t have a problem in mind. So pick a problem that you think can be solved with scripting or bioinformatics tools, perhaps a biological question, and attempt to do it “by all means necessary”. Being driven by a goal will help you keep motivated. Mario Caccamo says “Learning by doing is not ideal but that’s the reality”.

Some suggested attempting some simple tasks. One suggestion was to take data from a laboratory evolution paper (e.g. http://www.ncbi.nlm.nih.gov/pubmed/21940899) and try to reproduce it, in this case by detecting a small set of mutations [@contaminatedscience]. Another was to use the intriguing ROSALIND platform  to attempt bioinformatics problem solving (http://rosalind.info/problems/locations/) [Robert Lanfae, Adam Kiezun].

Bastien Chevreux suggested the popular “Dummies Guide …” series including Bioinformatics for Dummies (http://www.dummies.com/store/product/Bioinformatics-For-Dummies-2nd-Edition.productCd-0470089857.html) although C. Titus Brown  takes issue with the name of these books suggesting that “the culture of “I’m too stupid” inhibits learning”. Good point. They look rubbish on your bookshelf too.

Pete @drosophilic suggested enrolling in local training courses, and a list is kept maintained by Stephen Turner at http://stephenturner.us/p/edu. I also note the newly launched iAnn Events platform (http://iann.pro/iannviewer).

Another useful resource is BioStars, see the thread Advice for newcomers to the bioioinformatics field [Pierre Lindenbaum]

C. Titus Brown has a workshop with online course materials for next-generation sequence analysis.

Alan McNally suggested it is possible for a newbie to learn bioinformatics successfully, citing himself as a case study: “I was the requester 4 years ago. Was told to switch to linux and start reading user guides. Haven’t looked back”.

And finally, Aylwyn Scally remarks that “the first thing I tell them is close MS Excel”.

Thanks to all those who took part in the discussion!

Do you have anything to add?

Update: 31st July 2013 – added some links to the Homolog.us blog.

7 Responses

  1. marcowanger
    July 18, 2013 at 4:34 pm |

    Tell him to try Rosalind, browse through the threads in Seqanswers. Also The NGS Wikibook.

  2. I want to learn bioinformatics! A guide for complete beginners. | Roberts Lab

    [...] I want to learn bioinformatics! A guide for complete beginners. [...]

  3. lzwright
    July 18, 2013 at 7:02 pm |

    Ah, my pal Bio Mick Watson. That’s where I started w/ Linux. maybe 4, 5, 6 mos ago? Once I got ubuntu installed (and I tried out a few other distros as well) you could not prevent me from putting it on every computer within striking distance. I virtually knew nothing about anything when I started. I got a couple of linux books (latest acquisition is Sobell) and I also recently acquired Practical Computing for Biologists. I am now deep in the land of de novo assembly of venom duct transcriptome of the marine snail Terebra anilis (terebrids are sister family to the better known cone snails). I was lucky enough to attend Titus Brown’s NGS 2013 but was also doing his tutorials way before I got there, and did not find them so difficult, even as a newbie, once I had some linux under my belt. Now I *really* need to learn some programming. I am following an MIT course on python programming http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-00sc-introduction-to-computer-science-and-programming-spring-2011/Syllabus/ and using something called code academy http://www.codecademy.com/tracks/python to pave the way. Have not yet explored Rosalind. It’s hard to find the time for this when I spending my life at the terminal, but I kinda sorta hafta.

  4. prm36
    July 19, 2013 at 10:46 am |

    A more time consuming route, but the way I learned was through an MRes in computational bio. Got a solid grounding in many fields of bioinformatics, hands on research experience, and a publication out of it. If you’ve got the time and plan to use a lot of bioinformatics in your future research, then it’s well worth the investment.

  5. Bioinformatics is not something you are taught, it’s a way of life | opiniomics

    [...] his subsequent blog post, which details some of the [...]

  6. So you want to be a computational biologist? | opiniomics

    [...] Nick Loman and I were approached by Nature Biotech to write a commentary based on our blog posts, here and here respectively, and if you’re reading this then the commentary is out [...]

Leave a Reply

You must be logged in to post a comment.