All the cool kids are on arXiv and Haldane’s Sieve .. why you should be too

Something very exciting has happened in recent weeks on arXiv, the preprint server which many biologists believe is the reserve of angry physicists, beardy mathematicians and unwashed computer scientists  (joke!!!).

Not any longer. I first felt a disturbance in the force in September when a few high-profile human genomicists started making pledges to send all their manuscripts to arXiv first, including the angriest man in biology, Michael Eisen. I’m pretty sure genomics wunder-kinds Daniel MacArthur and Joe Pickrell are also planning on sending their manuscripts there first. The venerable Ewan Birney is also thinking of getting in on the action, tweeting back in August:

 ”Ah. Bugger. Scooped by George Church on arbitrary DNA storage. Our paper is in review <sigh>. (wish we had posted on arXive now”

Things are happening!

Those working in human population genetics and paleogenomics are already posting fascinating, high-impact manuscripts, see for example “The Date of Interbreeding between Neandertals and Modern Humans”  from a team including  Svante Pääbo. Joe Pickrell and crew posted a detailed study on “The Genetic Prehistory of Southern Africa“, focusing on groups which speak using click-consonants.

But as interesting and inspiring as these papers are, I am more interested in bioinformatics, microbial genomics, evolution and ecology, so these papers don’t really impact my day job. But recently things have got even more interesting .. by which I mean microbial. Witness:

Posted 15th October 2012: Species Identification and Unbiased Profiling of Complex Microbial Communities Using Shotgun Illumina Sequencing of 16S rRNA Amplicon Sequences (Haldane’s Sieve, arXiv). This paper describes a novel method of using shotgun sequencing of long 16S amplicons  to permit species-level assignments.

Posted 28th September 2012: Horizontal gene transfer may explain variation in θs (Haldane’s Sieve) – a paper from Lenski no less, which gives a possible explanation of the intriguing findings of potential mutational “hotspots” in the E. coli genome published in Nature by Inigo Martincorena and Nicholas Luscombe. This paper suggested an “evolutionary risk management strategy” which challenges our fundamental understanding of genetic mutations being acquired randomly and subsequently selected for (demonstrated beautifully by Luria and Delbruck in 1943). Lenski, using data from his long-term E. coli evolution experiment suggests that in fact undetected recombination is instead the likely cause for these mutational hotspots.

Posted 13 Oct 2012: A 454 survey of the community composition and core microbiome of the common bed bug, Cimex lectularius, reveals significant microbial community structure across an urban landscape ( Notable for being one of the first microbial ecology studies published in arXiv and obviously bed-bugs are kind of cool/gross .

Posted  19 Sep 2012: Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data (Haldane’s Sieve)- a really interesting bit of software which may be useful for haplotype reconstruction in metagenomics and pooled sequencing experiments. I plan to give this a whirl and feedback to the authors my findings.

Posted 1 Oct 2012: Best Practices for Scientific Computing  (Haldane’s Sieve) – not specifically microbial or biological but a useful treatise on how to write better code which would be useful for those writing bioinformatics software and pipelines, or even just doing analysis.

So, a little taster there.

Make no mistake – these are all quality manuscripts, they aren’t being dumped there because they couldn’t get past the PLoS ONE reviewers, or something equally banal.

Why are they there? My interpretation is that these authors get it and are posting to arXiv to take advantage of the particular benefits of using a preprint server. Firstly, the immediacy of getting your work out there – simply submit the PDF and it’s available to everyone. Your manuscript then gets a permanent home with a citable DOI. Submitting to arXiv may help with establishing priority.

For me, even more useful than these things are the benefits of publishing to a self-selected audience who are genuinely interested in this subject, and actively wish to read and critique such papers out of professional curiosity, not just because they are lucky/unlucky enough to be selected as peer reviewers. On arXiv, the “vibe” seems much different to that of the now-closed Nature Precedings, which sometimes honestly did feel like a dumping ground for unloved or hurried manuscripts.

A potential worry for these authors is that although they have deposited in arXiv, the community as a whole may not be looking there– arXiv is not archived by PubMed– and so they may not be cited by others routinely because they weren’t seen. Hence this blog post, a small attempt to draw attention to this exciting development!

So I’ve talked a lot about arXiv – where does Haldane’s Sieve come in? This is simply a blog site run by Graham Coop, Bryan Howie and Joe Pickrell. It is important because arXiv provide no facilities for permitting comments on manuscripts, preferring that individual communities figure out the best way to discuss articles (and sensibly recognising this may not be a single place, something that even the open-access publishers can’t really understand).

In maths and physics this is usually done on listservs, but in genomics and biology I guess we are more comfortable with the blog format for discussion hence the choice of WordPress. Haldane’s Sieve finds new postings on arXiv, mainly in the field of population genetics, and then posts summary articles for you to comment on. It may be in the future we need a similar site for microbial genomics and ecology, but for now it’s not so busy that this nascent community needs splitting up. Another place to find links is Twitter, e.g. by following me (shameless link).

It seems to be working; the discussion of Lenski’s paper has already generated a vigorous response from Inigo Martincorena, the likes of which you are unlikely to see in a published journal, and all the better for it’s frankness and energy– in my opinion.

So, in summary, you should add Haldane’s Sieve and the arXiv qBio category ( to your feed reader if you want to spot exciting new articles and comment on them, and why not think about sending your next manuscript to arXiv first? (No, it doesn’t prevent you publishing in peer-reviewed journals)

15 Responses

  1. caseybergman
    October 16, 2012 at 10:33 am |

    Couldn’t agree more with this pitch! One of the really cool feature of posting to arXiv not mentioned here is that Google Scholar indexes arXiv preprints (e.g., which means that you can see who is citing your work before it is published & you can improve the accuracy of the papers Scholar Update reccomends for you. Also, Graham and Joe are doing a great job at getting people to write background commentaries on their own papers, which adds a lot of value to the Haldane’s Sieve site beyond discussion threads on the paper itself.

    P.S. our contribution to microbial genome evolution field on arXiv and Haldane’s Sieve can be found here:

  2. jamesho008
    October 17, 2012 at 1:41 pm |, A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing.

    Who’d have thought that Falcons and Hawks were more closely realted to Parrot and Vultures resepctively than each other! Amazing what you can do with NGS.

  3. jxchong
    October 17, 2012 at 6:49 pm |

    Any thoughts on what those of us who aren’t doing pop gen or ecology/evo can do? (disease genetics for example)

  4. RosieRedfield
    October 17, 2012 at 7:48 pm |

    Don’t forget that our #arseniclife manuscript led the way – it was posted on arXiv last January:

  5. Richvn
    October 18, 2012 at 10:05 am |

    Nature reporter Ewen Callaway noted this trend in early August, and wrote a neat news article about it!

    Richard Van Noorden (reporter at Nature).

  6. Iddo
    October 18, 2012 at 12:43 pm |

    “No, it doesn’t prevent you publishing in peer-reviewed journals”

    Actually, Nick, even using your own reference there is a considerable number of journals that would not let you publish with them if you have a public preprint. This seems to be a sweeping policy with CSHL press, ASM and Cell Press. Also, although Wiley-Blackwell is listed as “mixed/unclear”, the journals I have looked at prohibit pre-publications. The culture of these journals reflects the historically prevalent culture in experimental life science, which does not consider prepublications a worthy effort. I guess what I am trying to say that there is still a lot of work that needs to be done to get prepublications as acceptable in life science as they are in other fields, as there are sentiments best described by the following statements: “I won’t go on arxiv because that would prohibit me from publishing later”, “Going on arxiv is worthless, I could get scooped and arxiv won’t count as a prior publication or even evidence of precedence”, “If I go on arxiv that signals my colleagues that my work is not good enough to go somewhere ‘real’ “. These sentiments need to be addressed (and not sneeringly dismissed) if any progress is to be made with prepublication culture in life science.

  7. joepickrell
    October 18, 2012 at 2:24 pm |


    Some of these things are discussed on our FAQ:

  8. Preprint Servers – do you recommend ? – The Global Innovations

    [...] Pathogenomics [...]

  9. Links 10/19/12 | Mike the Mad Biologist
    October 19, 2012 at 8:44 pm |

    [...] Badger battle erupts in England Jo Boaler reveals attacks by Milgram and Bishop: When Academic Disagreement Becomes Harassment and Persecution. W. Mass. logger has close encounter with testy moose (I once had a large moose sneak up on me in Maine–I know. They are terrifying close up) Things that Frost My Shorts: Alternative Career Paths All the cool kids are on arXiv and Haldane’s Sieve .. why you should be too [...]

  10. Thoughts on Blogging and Haldane’s Sieve(s) – part I « Homologus

    [...] For a good review of Haldane’s Sieve, please do not forget to read the following commentary from pathogenomics blog – All the cool kids are on arXiv and Haldane’s Sieve .. why you should be too. [...]

  11. Accelerating Your Science with arXiv and Google Scholar « I wish you'd made me angry earlier

    [...] incredibly well with Google Scholar. I’ve tried to make some of these points on Twitter and elsewhere, but I thought I’d try to summarize here what I see as a very powerful approach to [...]

  12. Thoughts on arXiv and journals | Being A Better Scientist

    [...] In quantitative biology, the arXiv is cool and you will look like a modern 21st century scientist if you publish on the arXiv. But [...]

Leave a Reply

You must be logged in to post a comment.