<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pathogens: Genes and Genomes</title>
	<atom:link href="http://pathogenomics.bham.ac.uk/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://pathogenomics.bham.ac.uk/blog</link>
	<description>A heady mix of bacterial pathogenomics, next-generation sequencing, type-III secretion, bioinformatics and evolution!</description>
	<lastBuildDate>Mon, 20 May 2013 07:14:30 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>Sequencing instruments by number</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2013/04/sequencing-instruments-by-number/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2013/04/sequencing-instruments-by-number/#comments</comments>
		<pubDate>Tue, 30 Apr 2013 15:58:57 +0000</pubDate>
		<dc:creator>Nick Loman</dc:creator>
				<category><![CDATA[High-throughput sequencing]]></category>
		<category><![CDATA[omicsmaps]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1682</guid>
		<description><![CDATA[A quick one: I made this plot earlier today for a presentation. T￼he data is from our Omicsmaps site which I curate with James Hadfield. Didn&#8217;t actually use it in the end but thought I&#8217;d post it in case it was useful for someone. Notable is the rise in HiSeq reciprocated by the decline in [...]]]></description>
			<content:encoded><![CDATA[<p>A quick one: I made this plot earlier today for a presentation. T￼he data is from our <a href="http://omicsmaps.com">Omicsmaps</a> site which I curate with James Hadfield. Didn&#8217;t actually use it in the end but thought I&#8217;d post it in case it was useful for someone. Notable is the rise in HiSeq reciprocated by the decline in GA2 placements, the plateauing of SOLiD and 454, and the inexorable rise of the bench-top instruments. </p>
<p>There are &gt;2500 instruments in the database, which seems a lot, but I assume it is just a fraction of the total installed base these days. Still interesting to see the relative trends I think.<br />
￼<br />
<a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/instrument_stats.png"><img src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/instrument_stats-1024x768.png" alt="" title="instrument_stats" width="1024" height="768" class="alignright size-large wp-image-1700" /></a></p>
<p>R code is here:</p>
<pre class="brush: r; title: ; notranslate">
library(ggplot2)
omics&lt;-read.csv(&quot;/Users/nick/Downloads/full_dump.txt&quot;)
newomics&lt;-reshape(omics,
                  varying=c(&quot;number_454&quot;, &quot;number_ga2&quot;, &quot;number_hiseq&quot;,
                           &quot;number_miseq&quot;, &quot;number_ion_torrent&quot;,
                           &quot;number_solid&quot;, &quot;number_pacbio&quot;),
                  v.names=&quot;value&quot;,
                  timevar=&quot;Platform&quot;,
                  times=c(&quot;number_454&quot;, &quot;number_ga2&quot;, &quot;number_hiseq&quot;,
                          &quot;number_miseq&quot;, &quot;number_ion_torrent&quot;, 
                          &quot;number_solid&quot;, &quot;number_pacbio&quot;),
                  direction=&quot;long&quot;)
stats&lt;-ggplot(newomics, aes(x=as_of_date, y=value, colour=Platform)) +
        stat_summary(fun.y=&quot;sum&quot;, geom=&quot;point&quot;) +
        opts(axis.text.x=theme_text(angle=45, vjust=1, hjust=1)) +
        scale_shape_manual(values = c(1,2,3,4,5,6,7)) +
        scale_y_continuous(&quot;Number of instruments&quot;) +
        scale_x_discrete(&quot;Date&quot;)
ggsave(&quot;instrument_stats.png&quot;, stats, width=8, height=6)
ggsave(&quot;instrument_stats.pdf&quot;, stats, width=8, height=6)
</pre>
<p>The raw data can be downloaded from the public <a href="https://www.google.com/fusiontables/DataSource?docid=1tYRJ6qreHion4wWx4bd_TnL7WrmMGai63jKEHPw#rows:id=1">Google Fusion Table</a>.</p>
<p>Update: Bastien Chevreux mentioned that the colours were hard to distinguish, so I added shapes as well.</p>
<p><a rel="license" href="http://creativecommons.org/licenses/by/3.0/deed.en_US"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by/3.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Number of sequencing machines by platform</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="http://pathogenomics.bham.ac.uk/blog/2013/04/sequencing-instruments-by-number/" property="cc:attributionName" rel="cc:attributionURL">Nick Loman</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/deed.en_US">Creative Commons Attribution 3.0 Unported License</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2013/04/sequencing-instruments-by-number/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Adaptor trim or die: Experiences with Nextera libraries</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2013/04/adaptor-trim-or-die-experiences-with-nextera-libraries/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2013/04/adaptor-trim-or-die-experiences-with-nextera-libraries/#comments</comments>
		<pubDate>Wed, 17 Apr 2013 16:49:53 +0000</pubDate>
		<dc:creator>Nick Loman</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Genomics]]></category>
		<category><![CDATA[High-throughput sequencing]]></category>
		<category><![CDATA[nextera]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1644</guid>
		<description><![CDATA[One of the first posts I did on this blog, way back in September 2009 was about my experiences with filtering and trimming Illumina sequences, and it proved rather popular. To date, it has been viewed a whopping 8,560 times! But funnily enough, since that post was written my attitude towards filtering Illumina data slowly [...]]]></description>
			<content:encoded><![CDATA[<p>One of the first posts I did on this blog, <a href="http://pathogenomics.bham.ac.uk/blog/2009/09/tips-for-de-novo-bacterial-genome-assembly">way back in September 2009</a> was about my experiences with filtering and trimming Illumina sequences, and it proved rather popular. To date, it has been viewed a whopping 8,560 times!</p>
<p>But funnily enough, since that post was written my attitude towards filtering Illumina data slowly changed. I was increasingly finding that aggressive quality trimming was making little to no difference to my <em>de novo</em> assemblies, and in some cases actually making them worse. The explanation was likely due to the evolution of Illumina base-calling accuracy. At the time I was dealing with early GA2 data, which had serious 3&#8242; quality drop-off issues, even with reads as short as 36 cycles (that&#8217;s the true definition of short-read sequencing!). Nowadays with the MiSeq and HiSeq instruments, read qualities still decline at the 3&#8242; end, but mainly remain usable. Increasingly I found that quality was not a major determinant for getting good quality <em>de novo</em> assemblies, rather it came down to the usual old chestnuts of read length, coverage depth and insert size (for paired-end sequencing).</p>
<p>However, a recent analysis of a troublesome dataset has led me to revise my thoughts on the need to trim sequences routinely. This is due to the introduction of Nextera library preparations. Nextera is a really useful technique; it uses a transpososome (a transposase and transposon end complex) to fragment the genome (semi-)randomly with the addition of a specific sequence. With an additional round of PCR, sequencing adaptors and multiplexing barcodes are incorporated into the fragment ends. Simples, and no need for physical shearing methods. You clean up short fragments with AMpure beads, and there is an optional size-selection step which many people opt not to do (of which more later in this post).</p>
<p>However there is a fly in the ointment, which is becoming a major problem now the MiSeq is targeting ever longer read lengths. The libraries we have seen often have a short median fragment size, sometimes less than 200 bases. When combined with the MiSeq V2 Illumina 500-cycle kits run in paired 250 base mode this means that you will frequently be reading through the adaptor into some kind of crazy void-space (does this have a name?). Unless you routinely size select your fragments, the read length will be longer than the fragments.</p>
<p>Other than being a waste of sequencing reagents, this proves surprisingly fatal to <em>de novo</em> assembly, presumably because the adaptors form highly-connected nodes in the assembly graph which prevent contigs forming. It&#8217;s not such a big deal with mapping applications, as most short-read aligners will happily soft-clip the unmapped 3&#8242; bases off the read.</p>
<p>I was recently asked to look at a dataset which exhibited a case of this phenomenon so extreme that I thought might be helpful to share with others. These results are from a bacterial whole-genome shotgun sequencing project, where assembly of the reads was resulting in particularly terrible results.</p>
<p>How terrible? Well, here&#8217;s a Velvet assembly of the raw, untrimmed reads, using some arbitrary settings (k=43, exp_cov auto, cov_cutoff auto). The Velvet commands were:</p>
<pre>velvet_1.2.07/velveth out 43 -shortPaired -fastq -separate fastqfile1 fastqfile2
velvet_1.2.07/velvetg out -exp_cov auto -cov_cutoff auto
..
Final graph has 3778954 nodes and n50 of 30, max 540, total 51121254, using 170538/3413828 reads
</pre>
<p>Bear in mind this is a fairly typical, GC-rich bacterial genome. The stats tell us that from the &gt;3.4m 250-base reads in the dataset we end up with 3.7 million contigs, with an N50 of 30! We don&#8217;t need the Assemblathon in this case to know that this is … bad. A few things are notable here: despite having &gt;3.4m reads Velvet is reporting the median coverage depth is 1.0. Something ain&#8217;t right, clearly.</p>
<p>Of course we have been naughty and done an assembly without QCing our data first. Here&#8217;s the FastQC plot of the first pair from the untrimmed dataset:</p>
<p><a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/pretrimmed_qscores.png"><img class="alignright size-large wp-image-1647" title="pretrimmed_qscores" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/pretrimmed_qscores-1024x638.png" alt="" width="1024" height="638" /></a><a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/pretrimmed_nuccomp.png"><img class="alignright size-large wp-image-1648" title="pretrimmed_nuccomp" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/pretrimmed_nuccomp-1024x644.png" alt="" width="1024" height="644" /></a><br />
<a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/pretrimmed_gcdist.png"><img class="alignright size-large wp-image-1649" title="pretrimmed_gcdist" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/pretrimmed_gcdist-1024x635.png" alt="" width="1024" height="635" /></a></p>
<p>Corrrr &#8230; don&#8217;t like the look of that very much.</p>
<p>Now, after going and letting off steam at your sequencing centre, you might be tempted to trim this dataset down and try and salvage something from it, right?</p>
<p>Heng Li&#8217;s seqtk is as good as anything for a quick and dirty trim. This uses the Phred trimming algorithm which finds the maximum scoring subsequence when summing the quality minus a threshold value from each base. The default threshold is 0.05.</p>
<pre> seqtk trimfq fastqfile1 &gt; fastqfile1_trimmed
seqtk trimfq fastqfile2 &gt; fastqfile2_trimmed
</pre>
<p>Let&#8217;s check out the FastQC plots now. They look nicer, right?</p>
<p><a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/posttrimmed_qscores.png"><img class="alignright size-large wp-image-1656" title="posttrimmed_qscores" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/posttrimmed_qscores-1024x640.png" alt="" width="1024" height="640" /></a></p>
<p>So, we&#8217;d expect better results from Velvet, right? Let&#8217;s re-run the fun:</p>
<p>&nbsp;</p>
<pre> Final graph has 3778929 nodes and n50 of 30, max 540, total 51120953, using 170555/3413828 reads</pre>
<p>COMPUTER SAYS NO.</p>
<p>OK, what&#8217;s going on? There are a few things we could do to prove this is adaptor contamination. We could try and estimate the insert size by mapping to a reference sequence, that would give a reasonable hint that adaptor contamination is the problem. But let&#8217;s pretend we don&#8217;t have a reference, we can&#8217;t even assemble the sequences to map them back!</p>
<p>Is there a clue in the FastQC plots?</p>
<p>Looking at the regular FastQC k-mer plot there is some wackiness going on:</p>
<p><a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/Screen-Shot-2013-04-17-at-11.10.19.png"><img class="alignright size-large wp-image-1660" title="Screen Shot 2013-04-17 at 11.10.19" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/Screen-Shot-2013-04-17-at-11.10.19-1024x172.png" alt="" width="1024" height="172" /></a></p>
<p>&nbsp;</p>
<p>There&#8217;s a definite enrichment for some k-mers early in the read, but this doesn&#8217;t really give us enough information to answer the question.</p>
<p>Luckily, we can ask FastQC to check for k-mers of length 10 instead (I had to increase heap memory to get Java to play ball here):</p>
<pre>
fastqc -k 10 fastqfile1
</pre>
<p>OK</p>
<p><a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/fastqc_kmers.png"><img class="alignright size-large wp-image-1651" title="fastqc_kmers" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/fastqc_kmers-1024x725.png" alt="" width="1024" height="725" /></a></p>
<p>Well, it&#8217;s now fairly clear that many of the enriched k-mers form part of the Nextera adaptor sequence (I have not put this in the post as Illumina are a bit funny about them but you can find them easily enough from a Google search), and they occur as early as 35 bases into the read.</p>
<p>There is also enrichment for two other k-mers later on in the read. Guessing, I would think these are k-mers from the barcode for this sample.</p>
<p>OK, let&#8217;s trim those suckers off. I like to use <a href="http://www.usadellab.org/cms/index.php?page=trimmomatic">Trimmomatic</a>. Some people like <a href="https://github.com/vsbuffalo/scythe">Scythe</a> (I haven&#8217;t tried it, but it&#8217;s by the awesome Vince Buffalo so it&#8217;s bound to be good). There used to be a lovely Wiki page comparing all the trimmers, but I can&#8217;t find it right now (please post in the comments!)</p>
<p>Re-run Velvet, one last time, on the adaptor trimmed dataset:</p>
<pre>
Median coverage depth = 34.625060
Final graph has 851 nodes and n50 of 64229, max 227743, total 7908057, using 2978347/3133288 reads
</pre>
<p>That&#8217;s a bit more like it! Of course much better contig stats would have been achieved had the median insert size been greater than 250 bases, ideally greater than 450 bases.</p>
<p>*Many thanks to Ruth Miller of the British Columbia CDC for allowing me to share this instructive dataset on the blog!*</p>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2013/04/adaptor-trim-or-die-experiences-with-nextera-libraries/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Balti and Bioinformatics: Midlands Sequencing and Bioinformatics Meeting: Monday, 20th May 2013</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2013/04/balti-and-bioinformatics-midlands-sequencing-and-bioinformatics-meeting-monday-20th-may-2013/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2013/04/balti-and-bioinformatics-midlands-sequencing-and-bioinformatics-meeting-monday-20th-may-2013/#comments</comments>
		<pubDate>Tue, 16 Apr 2013 12:36:08 +0000</pubDate>
		<dc:creator>Nick Loman</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Genomics]]></category>
		<category><![CDATA[balti]]></category>
		<category><![CDATA[balti and bioinformatics]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[lamingtons]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1622</guid>
		<description><![CDATA[It&#8217;s that time again! Balti and bioinformatics will return .. on Monday, 20th May 2013. Not dissimilar to last time, we will feature complimentary samosas from Smethwick&#8217;s finest sweet shop, and hopefully without the bit where about 40 people had to crowd into my office. Spaces will be limited, please register to attend via the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/baltibioinformatics.jpg"><img class="alignright size-large wp-image-1628" title="baltibioinformatics" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/baltibioinformatics-1024x231.jpg" alt="" width="1024" height="231" /></a></p>
<p>It&#8217;s that time again! Balti and bioinformatics will return .. on Monday, 20th May 2013. Not dissimilar to last time, we will feature complimentary samosas from Smethwick&#8217;s finest sweet shop, and hopefully without the bit where about 40 people had to crowd into my office. Spaces will be limited, please <a href="https://docs.google.com/forms/d/19CJXKdMq_mOwnlVrcTR1LbUdcr-E9hFVRHn4fqx-QUg/viewform">register to attend via the Google web form</a> (it&#8217;s free!).</p>
<p>Programme:</p>
<p><strong>West Midlands Bioinformatics &amp; Sequencing Meeting: Balti and bioinformatics<br />
</strong></p>
<p>Monday, 20th May 2013</p>
<p>Venue: Room 311, Geography Building, University of Birmingham (not <del>Centre for Systems Biology, Haworth Building, University of Birminghamt</del></p>
<p>The Geography Building is adjacent to Biosciences (R26 on the <a href="http://www.birmingham.ac.uk/Documents/university/edgbaston-map.pdf">campus map here</a>)</p>
<p>If you are coming by car, please see the University travel website for details on car parking. The <a href="http://www.birmingham.ac.uk/contact/directions/edgbaston-directions.aspx">South Car Park</a> is usually most convenient. </p>
<p>If you are coming by train, the Geography Building is a very short walk from the University (Birmingham) train station, which in turn is 2 stops from Birmingham New Street on the Redditch line.</p>
<p>12:30 Registration</p>
<p>13:00 Introduction</p>
<p><strong>The Antipodean Invasion</strong> (Lamingtons not supplied)</p>
<p>13:10 &#8220;Torsten&#8217;s toolkit: VAGUE, Prokka, Nesoni, VelvetOptimiser and more&#8221; &#8211; Torsten Seemann, Monash University / Victorian Bioinformatics Consortium, Australia</p>
<p>13:40 &#8220;Tutorials in the cloud: Microbial genomics using Galaxy&#8221; &#8211; Simon Gladman, CSIRO / Victorian Bioinformatics Consortium, Australia</p>
<p>14:10 &#8220;The genomics of footrot disease&#8221; &#8211; Dieter Bulach, Victorian Bioinformatics Consortium, Australia</p>
<p>14:20 Samosa break</p>
<p>14:40 &#8220;Doctorin&#8217; the TraDIS: Analysis of data derived from Transposon-directed insertion-site sequencing&#8221;, Roy Chaudhuri, Centre for Genomic Research, University of Liverpool</p>
<p>15:10 &#8221;Infrastructure isn&#8217;t just hardware: TGAC&#8217;s tools supporting data-driven informatics&#8221; &#8211; Robert Davey, The Genome Analysis Center (TGAC), Norwich</p>
<p>15:40 &#8220;Norovirus genomics&#8221;, Liz Batty, Nuffield Department of Medicine, University of Oxford</p>
<p>16:00 &#8221;Experiments with Nextera&#8221; &#8211; Jelena Sostare, University of Birmingham</p>
<p>16:10 Bioinformatics clinic</p>
<p>16:30 Discussion, depart for balti in taxis</p>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2013/04/balti-and-bioinformatics-midlands-sequencing-and-bioinformatics-meeting-monday-20th-may-2013/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Goodbye and thank you to Birmingham!</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/#comments</comments>
		<pubDate>Thu, 28 Mar 2013 07:57:29 +0000</pubDate>
		<dc:creator>Mark Pallen</dc:creator>
				<category><![CDATA[Academic life]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1587</guid>
		<description><![CDATA[Well, today is my last working day at the University of Birmingham, before I set off after Easter for a new position at the University of Warwick as Professor of Microbial Genomics and Head of a new Division of Microbiology and Infection in Warwick Medical School. I have been at the University of Birmingham since [...]]]></description>
			<content:encoded><![CDATA[<p>Well, today is my last working day at the University of Birmingham, before I set off after Easter for a new position at the University of Warwick as Professor of Microbial Genomics and Head of a new Division of Microbiology and Infection in Warwick Medical School. I have been at the University of Birmingham since July 2001 and there is no question that my time here has been the pinnacle of my professional life (at least up till now: of course, the best is still to come!). I move on with with a sense of gratitude at having had the privilege of getting to know and work with so many great people and leave with a head stuffed full of wonderful memories. Sadly for me, the co-author of this blog, Nick Loman, is not moving to Warwick with me, but has instead taken up a permanent position in Birmingham. But I draw comfort from the fact that geographically we will still only be just 40 minutes drive apart and psychologically will remain united in our appreciation of all things cool and quirky at the conjunction of sequencers, sequences and software!</p>
<p>This will be my last blog post here, but in any case Nick has already largely made this blog his own. I have set up a new blog for my new life in the new Division of Microbiology and Infection, drawing on an old title that Nick will recognise from when we first met in the late 1990s: <a title="Microbial Underground" href="http://blogs.warwick.ac.uk/microbialunderground/" target="_blank">the Microbial Underground</a>: catch up on news from Warwick there after Easter and follow us (<a title="@WarwickMicrobio" href="http://twitter.com/WarwickMicrobio" target="_blank">@WarwickMicrobio</a>) on Twitter too!</p>
<p>I did start writing a long discursive ramble through my memories of my time in Birmingham, but in the end I have decided to sign off with some pictures that encapsulate all the good times! Goodbye Birmingham and thank you!</p>
<p>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/image006/' title='Image006'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/Image006-150x150.jpg" class="attachment-thumbnail" alt="Image006" title="Image006" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/imgp0605/' title='IMGP0605'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMGP0605-150x150.jpg" class="attachment-thumbnail" alt="IMGP0605" title="IMGP0605" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/n710560649_3663197_7329/' title='n710560649_3663197_7329'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/n710560649_3663197_7329-150x150.jpg" class="attachment-thumbnail" alt="n710560649_3663197_7329" title="n710560649_3663197_7329" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/olympus-digital-camera/' title='OLYMPUS DIGITAL CAMERA'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/Bringing_out_the_bubbly_2-150x150.jpg" class="attachment-thumbnail" alt="OLYMPUS DIGITAL CAMERA" title="OLYMPUS DIGITAL CAMERA" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/p1010033/' title='P1010033'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/P1010033-150x150.jpg" class="attachment-thumbnail" alt="P1010033" title="P1010033" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/imgp0614/' title='IMGP0614'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMGP0614-150x150.jpg" class="attachment-thumbnail" alt="IMGP0614" title="IMGP0614" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/n710560649_1635565_8401/' title='n710560649_1635565_8401'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/n710560649_1635565_8401-150x150.jpg" class="attachment-thumbnail" alt="n710560649_1635565_8401" title="n710560649_1635565_8401" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/4191725553_79a4314a2d_b/' title='4191725553_79a4314a2d_b'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/4191725553_79a4314a2d_b-150x150.jpg" class="attachment-thumbnail" alt="4191725553_79a4314a2d_b" title="4191725553_79a4314a2d_b" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/imgp0591/' title='IMGP0591'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMGP0591-150x150.jpg" class="attachment-thumbnail" alt="IMGP0591" title="IMGP0591" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_2213/' title='IMG_2213'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_2213-150x150.jpg" class="attachment-thumbnail" alt="IMG_2213" title="IMG_2213" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_1260/' title='IMG_1260'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_1260-150x150.jpg" class="attachment-thumbnail" alt="IMG_1260" title="IMG_1260" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/dscf4365/' title='DSCF4365'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/DSCF4365-150x150.jpg" class="attachment-thumbnail" alt="DSCF4365" title="DSCF4365" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/origin-album-cover/' title='Origin Album cover'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/Origin-Album-cover-150x150.jpg" class="attachment-thumbnail" alt="Origin Album cover" title="Origin Album cover" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_0007/' title='IMG_0007'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_0007-150x150.jpg" class="attachment-thumbnail" alt="IMG_0007" title="IMG_0007" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/p2130068/' title='P2130068'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/P2130068-150x150.jpg" class="attachment-thumbnail" alt="P2130068" title="P2130068" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_0623/' title='IMG_0623'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_0623-150x150.jpg" class="attachment-thumbnail" alt="IMG_0623" title="IMG_0623" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_0019/' title='IMG_0019'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_0019-150x150.jpg" class="attachment-thumbnail" alt="IMG_0019" title="IMG_0019" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_0819/' title='IMG_0819'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/6176111195_7a3de73bd5_o-150x150.jpg" class="attachment-thumbnail" alt="IMG_0819" title="IMG_0819" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_1314/' title='IMG_1314'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_1314-150x150.jpg" class="attachment-thumbnail" alt="IMG_1314" title="IMG_1314" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_1337/' title='IMG_1337'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_1337-150x150.jpg" class="attachment-thumbnail" alt="IMG_1337" title="IMG_1337" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_1338/' title='IMG_1338'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_1338-150x150.jpg" class="attachment-thumbnail" alt="IMG_1338" title="IMG_1338" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/img_1360/' title='IMG_1360'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_1360-150x150.jpg" class="attachment-thumbnail" alt="IMG_1360" title="IMG_1360" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/dscf8121/' title='DSCF8121'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/DSCF8121-150x150.jpg" class="attachment-thumbnail" alt="DSCF8121" title="DSCF8121" /></a>
<a href='http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/dscf8114/' title='DSCF8114'><img width="150" height="150" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/DSCF8114-150x150.jpg" class="attachment-thumbnail" alt="DSCF8114" title="DSCF8114" /></a>
</p>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2013/03/goodbye-and-thank-you-to-birmingham/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Crowd-sourcing killer outbreaks: Nice video from the BBSRC and Arran Frood</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2013/03/crowd-sourcing-killer-outbreaks-nice-video-from-the-bbsrc-and-arran-frood/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2013/03/crowd-sourcing-killer-outbreaks-nice-video-from-the-bbsrc-and-arran-frood/#comments</comments>
		<pubDate>Wed, 06 Mar 2013 15:41:02 +0000</pubDate>
		<dc:creator>Nick Loman</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1572</guid>
		<description><![CDATA[The BBSRC have done a nice job making a short video about the E. coli O104:H4 outbreak crowd-sourcing project, featuring little old me as well as the far more telegenic Lisa Crossman. Check it out, it&#8217;s got some spooky music too. Also please check out the OpenAshDieBack crowd-sourcing project currently ongoing, coordinated by the chaps [...]]]></description>
			<content:encoded><![CDATA[<p>The BBSRC have done a nice job making a short video about the <a href="https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki">E. coli O104:H4 outbreak crowd-sourcing project</a>, featuring little old me as well as the far more telegenic Lisa Crossman. Check it out, it&#8217;s got some spooky music too.</p>
<p><iframe width="928" height="522" src="http://www.youtube.com/embed/ttMnQIE-P-s?feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>Also please check out the <a href="http://oadb.tsl.ac.uk/">OpenAshDieBack crowd-sourcing project</a> currently ongoing, coordinated by the chaps at the John Innes Centre, The Sainsbury Lab and TGAC.</p>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2013/03/crowd-sourcing-killer-outbreaks-nice-video-from-the-bbsrc-and-arran-frood/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A chat with Oxford Nanopore&#8217;s Clive Brown at AGBT 2013</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2013/03/a-chat-with-oxford-nanopores-clive-brown-at-agbt-2013/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2013/03/a-chat-with-oxford-nanopores-clive-brown-at-agbt-2013/#comments</comments>
		<pubDate>Wed, 06 Mar 2013 10:20:11 +0000</pubDate>
		<dc:creator>Nick Loman</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1559</guid>
		<description><![CDATA[Don&#8217;t judge me, reader, because I&#8217;d skipped a session at AGBT to go and have a swim in the sea. A man can only spend so much time in dimly-lit, low-ceilinged hotel conference rooms, popping low-sugar sweets, before the will to live ebbs away. On returning to the conference, passing the bar I spotted a [...]]]></description>
			<content:encoded><![CDATA[<p>Don&#8217;t judge me, reader, because I&#8217;d skipped a session at AGBT to go and have a swim in the sea. A man can only spend so much time in dimly-lit, low-ceilinged hotel conference rooms, popping low-sugar sweets, before the will to live ebbs away.</p>
<p>On returning to the conference, passing the bar I spotted a distinctive bald head. Wait. I recognise that guy. Was it him?</p>
<p>I reversed and took another peek. Yes, it was Clive Brown, deep in a meeting. &#8220;Hello Clive!&#8221;. He looked up, slightly grumpy to be interrupted mid-flow. &#8220;I&#8217;m Nick Loman, wasn&#8217;t expecting to see you here!&#8221;. Oh hello Nick. I catch a glimpse of a prototype MinIon on the table. &#8220;Hah, yeah, I&#8217;ve buried three of those on the beach. Tell everyone!&#8221;</p>
<p>We meet in the bar the next day. Clive talks at machine-gun pace, whilst fiddling with a prototype MinIon which is on the desk, repeatedly taking it apart and reassembling it, like a soldier checking his gun before battle. It feels weighty, substantial, larger than the version announced previously. It&#8217;s got a mini-connector for USB3. &#8220;Feels a bit too expensive for a disposable sequencer, needs to be more plastic-y&#8221;, I venture. Clive agrees.</p>
<p><a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_16522.jpg"><img class="alignright size-large wp-image-1562" title="IMG_1652" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/IMG_16522-1024x768.jpg" alt="" width="1024" height="768" /></a></p>
<p>Clive is angry. He feels he&#8217;s been treated unreasonably by the community, and the press, since AGBT 2012&#8242;s electrifying announcements. &#8220;I&#8217;m bloody sequencing single molecules directly on this little device here!&#8221;. The implication is that no-one should be surprised it&#8217;s taken longer than expected to be released. He is unapologetic.</p>
<p>Clive is angry. He&#8217;s angry with the guy that marched to the nanopore suite at AGBT, banging and shouting through the door: &#8220;Where is your data? Show us the data!!&#8221;.</p>
<p>Clive is angry with reporters who keep asking him the same questions: why they haven&#8217;t released data, why they haven&#8217;t fulfilled their promise of commercialisation by the end of 2012.</p>
<p>Clive has a list. A list of people that says he&#8217;ll see to it won&#8217;t get a MinIon when it comes out. I can&#8217;t tell if he&#8217;s joking.</p>
<p>So why didn&#8217;t you release some data, Clive? He tells me that the raw signal data is commercially valuable, that someone in the business could take the traces and reverse engineer details of their customised nanopore this way. The idea that other parties could steal information to further their own nanopore projects is a recurring theme in our chat.</p>
<p>So why didn&#8217;t the MinIon come out in 2012? Technically, he lists several setbacks. The custom sensor microchip (ASIC) wasn&#8217;t performing as they wanted, necessitating a redesign from scratch. &#8220;That put us back about 5 months, but it was the right thing to do&#8221;. There have also been problems stabilising the lipid bilayer, and so over days and weeks it degrades. He set his team a new accuracy target of 1%, a major improvement from the 4% error rate announced at AGBT.</p>
<p>I venture the idea that even if the MinIon is a year, two years late, if it&#8217;s half as good as he says it is, all will be forgotten. Like waiting for the next version of Quake or Grand Theft Auto.</p>
<p>&#8220;It&#8217;s not going to be that long, we&#8217;re going to start announcing stuff this year, including data from our early-access programme.&#8221;</p>
<p>Why don&#8217;t you engage with the community better? I suggest that no tweets and no web updates isn&#8217;t a good look for a company with so many eyes on. He says that they have to be careful about putting any information out there right now, in case it is used against them. He suggests that now Zoe McDougall, their communications director is back from maternity leave, they will improve their communication with the community.</p>
<p>Technical breakthroughs. They&#8217;ve found that error rates can be improved by having multiple nanopores on the chip with different properties, and then merging the data. Some nanopores are better at recognising certain nucleotide signatures than others, and so they can be complementary. This is a hint that consensus accuracy might ultimately be important, a la Pacific Biosciences. ** see footnote</p>
<p>He&#8217;s keen on the idea of nanopore as a disruptive technology for proteomics, citing the <a href="http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2503.html">unfoldase</a> that should permit proteins to pass through the pore.</p>
<p>Clive is a man under pressure. I genuinely got the impression that the company were caught off-guard by all the attention and had no idea they would be under such scrutiny.</p>
<p>&#8220;We didn&#8217;t even know that long reads were so important to people until after that AGBT presentation.&#8221; He explains that he set his team a technical challenge to go from 20kb to 50kb to 100kb, simply because he likes pushing them further than they think they can go. His focus on getting error rate down to 1% results from similar pushing, sometimes to the chagrin of the commercial side of the operation.</p>
<p>Clive is guarded, and regularly checks himself, ensuring he doesn&#8217;t say anything that would &#8220;get him in trouble&#8221;.</p>
<p>&#8220;You know what, I hated doing that presentation at AGBT. I had to hide in my room for two days afterwards.&#8221;</p>
<p>&#8220;I&#8217;m not Jonathan Rothberg&#8221;.</p>
<p>What do I think? I find it hard to simply write-off nanopore as vapourware, as some seem happy to do. There is a great group of people in this company, and frankly it just wouldn&#8217;t be cricket to promise so much without delivering. I will wait and see. I feel sure the conversation will have moved on by AGBT 2014.</p>
<p>&#8220;I want to believe&#8221; as they might say on the X-Files.</p>
<p>Plus, I don&#8217;t want to end up on Clive&#8217;s list.</p>
<p>&nbsp;</p>
<p>** Clive has written to clarify this point: <em>I didnt mean that as an alternative to raw read, but it came up repeatedly during the conference that a number of early access groups are trying to do major projects to &#8220;improve the reference&#8221; of their given organism. They are currently mixing a number of short reads from different technologies and without the long reads, they have difficulty assembling (a major use of PB data). I have noticed that with two pores (or more) we effectively have two orthogonal error modes, which means this kind of improved reference, with assembly, can be done economically on one platform – which should be a lot easier.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2013/03/a-chat-with-oxford-nanopores-clive-brown-at-agbt-2013/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Applied Bioinformatics &amp; Public Health Microbiology: 15 – 17 May 2013</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2013/03/applied-bioinformatics-public-health-microbiology-15-17-may-2013/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2013/03/applied-bioinformatics-public-health-microbiology-15-17-may-2013/#comments</comments>
		<pubDate>Mon, 04 Mar 2013 12:14:00 +0000</pubDate>
		<dc:creator>Nick Loman</dc:creator>
				<category><![CDATA[Genomic Epidemiology]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1549</guid>
		<description><![CDATA[The awesome ABPHM meeting is back in 2013! This is a really nice conference that I am very happy to help organise. It&#8217;s a bit different from other public health microbiology conferences in that it specifically aims to bring together public health microbiologists and epidemiologists with bioinformaticians. Once we have everyone in the same room, [...]]]></description>
			<content:encoded><![CDATA[<p>The awesome ABPHM meeting is back in 2013! This is a really nice conference that I am very happy to help organise. It&#8217;s a bit different from other public health microbiology conferences in that it specifically aims to bring together public health microbiologists and epidemiologists with bioinformaticians. Once we have everyone in the same room, we try and understand what each other does a little better!</p>
<p>High-throughput sequencing has been high on the agenda for the past few meetings, I expect this to be the case again. However I expect this meeting will start focus on practical and logistical aspects of getting WGS of bacterial isolates into the microbiology lab for routine usage for hospital and community outbreak tracing and surveillance of important pathogens.</p>
<p>The last meeting in 2011 was notable for taking place right in the middle of the <em>E. coli</em> O104:H4 outbreak in Germany, and BGI released reads from an isolate that triggered the crowd-sourcing initiative during the meeting!</p>
<p>My job on the committee, alongside Jon Green from the HPA is to try and represent for the bioinformaticians, and so I am really pleased that we have a couple of high-profile international speakers who know their way around a bash shell: <strong>Aaron Darling</strong>, of Mauve/progressiveMauve fame (until recently of Jonathan Eisen&#8217;s lab) will be talking about his tools and work as will <strong>Torsten Seemann</strong>, author of incredibly useful assembly tools including VelvetOptimiser.</p>
<p>Julian Parkhill and Sharon Peacock will both be speaking about their current work, always something to look forward to.</p>
<p>Also of note is that <strong>Oxford Nanopore</strong> and <strong>Illumina</strong> are sponsoring the meeting!</p>
<p>If you do bioinformatics for infectious disease outbreaks or public health surveillance, I strongly recommend you register. Also put in an abstract (deadline <strong>20th March!</strong>) as we like to select many talks from submissions. It is a unique grouping of people and the talks are always good. This time it will be held at the Moller Centre which is near Cambridge city centre rather than in Hinxton. It&#8217;s a really nice venue, and it means that instead of heading to the <strong>Red Lion</strong>, this time we can take an out-trip to <strong>The Eagle </strong>and perhaps also <strong>The Panton Arms</strong>.</p>
<p>Head over to the <a href="https://registration.hinxton.wellcome.ac.uk/display_info.asp?id=313">event website</a> for the agenda and the registration form.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2013/03/applied-bioinformatics-public-health-microbiology-15-17-may-2013/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Loman&#8217;s law of bioinformatics</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2013/02/lomans-law-of-bioinformatics/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2013/02/lomans-law-of-bioinformatics/#comments</comments>
		<pubDate>Fri, 15 Feb 2013 15:07:05 +0000</pubDate>
		<dc:creator>Nick Loman</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1394</guid>
		<description><![CDATA[Loman&#8217;s law of bioinformatics states: If you haven&#8217;t found at least one serious bug in the bioinformatics pipeline you are re-using then you don&#8217;t yet understand it. &#160; Loman&#8217;s second law of pipelines: By the time you&#8217;ve got someone elses pipeline working to your satisfaction you could have written your own.]]></description>
			<content:encoded><![CDATA[<p>Loman&#8217;s law of bioinformatics states:</p>
<blockquote><p>If you haven&#8217;t found at least one serious bug in the bioinformatics pipeline you are re-using then you don&#8217;t yet understand it.</p></blockquote>
<p>&nbsp;</p>
<p>Loman&#8217;s second law of pipelines: By the time you&#8217;ve got someone elses pipeline working to your satisfaction you could have written your own.</p>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2013/02/lomans-law-of-bioinformatics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sequencing data: I want the truth! (You can&#8217;t handle the truth!)</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2013/01/sequencing-data-i-want-the-truth-you-cant-handle-the-truth/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2013/01/sequencing-data-i-want-the-truth-you-cant-handle-the-truth/#comments</comments>
		<pubDate>Thu, 10 Jan 2013 10:47:57 +0000</pubDate>
		<dc:creator>Nick Loman</dc:creator>
				<category><![CDATA[Genomics]]></category>
		<category><![CDATA[High-throughput sequencing]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1514</guid>
		<description><![CDATA[Two sequencing papers caught my eye this week. This letter from Piskol and Li  is perhaps the final nail in the coffin for the heavily criticised and debunked (also see: GenomesUnzipped) RNA editing paper from Li and Cheung published in Science in early 2011 (as Thomas Keane said on Twitter: &#8216;I can&#8217;t believe people are still debating this!). The letter Piskol and Li examined [...]]]></description>
			<content:encoded><![CDATA[<p>Two sequencing papers caught my eye this week.<a href="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/You-Cant-Handle-the-Truth.jpg"><img class="alignright size-full wp-image-1516" title="You Can't Handle the Truth" src="http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/You-Cant-Handle-the-Truth.jpg" alt="" width="350" height="238" /></a></p>
<p>This letter from Piskol and Li  is perhaps the final <a href="http://www.nature.com/nbt/journal/v31/n1/full/nbt.2472.html">nail in the coffin</a> for the heavily <a href="http://www.genomesunzipped.org/2011/05/notes-on-the-evidence-for-extensive-rna-editing-in-humans.php">criticised</a> and <a href="http://genomebiology.com/2012/13/4/r26/">debunked</a> (also see: <a href="http://www.genomesunzipped.org/2012/03/questioning-the-evidence-for-non-canonical-rna-editing-in-humans.php">GenomesUnzipped</a>) RNA editing paper from <a href="http://www.sciencemag.org/content/333/6038/53">Li and Cheung</a> published in Science in early 2011 (as Thomas Keane said on Twitter: &#8216;I can&#8217;t believe people are still debating this!).</p>
<p>The letter Piskol and Li examined the claim of &#8220;non-canonical&#8221; RNA editing, i.e. post-transcriptional editing differing from the two known types, adenosine-to-insosine (A-to-I; I read as G) and the rare cytosine-to-uracil (C-to-U). Although a vast swathe of the claimed editing events had been debunked by previous studies, they examined 11 putative events which had been apparently validated by sequencing PCR amplicons using capillary instruments. What they found should be disturbing to sequence bioinformaticians:</p>
<p>They noticed that if you search each of these amplicon sequences using BLAT against the reference human genome, each one had a very similar, &#8216;second-best&#8217; hit in the human genome. And lo, if you examine the sequence of those second best hits, the variant pointing to RNA editing wasn&#8217;t present. They then designed primers to specifically amplify the region of the genome around the second-best hit and demonstrated that was in fact the likely template for the original sequencing read, and not the region associated with the best hit that originally hinted at RNA editing. Put simply, the RNA editing event wasn&#8217;t an RNA editing event at all.</p>
<p>If you&#8217;ve done much sequence bioinformatics and variation detection you will know that alignment to paralogous regions of the genome (repeats) is a major reason for false positive SNP calls (perhaps the number one reason?). I see this frequently in the microbial genome projects I am involved in. As an aside, I bet this kind of analysis error happens <em>all the time</em> in published papers, but that they relate to findings not significant enough to attract extensive scrutiny&#8211; discovering novel types of RNA editing would be a pretty big prize, in this case it was deemed worthy of a Nature paper. What is notable is that Sanger &#8216;validation&#8217; also has the capacity to mislead if primers are not designed to unique regions of the genome.</p>
<p>That finding reminded me of an email I&#8217;d send to Titus Brown a few months ago, where he&#8217;d asked me to do some pre-publication peer review of a manuscript he&#8217;d written on possible sequencing artifacts causing <a href="http://arxiv.org/abs/1212.0159">problems with metagenomics assembly</a>. I sent him a list of potential reasons for artifacts that may or may not explain his results, which I have reproduced and augmented here:</p>
<table width="100%" border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top">Library preparation errors</td>
<td valign="top">Sequencing errors</td>
<td valign="top">Analysis errors</td>
</tr>
<tr>
<td valign="top">PCR amplification point mutations (e.g. TruSeq protocol, amplicons) [1]</p>
<p>emPCR amplification point mutations (454, Ion Torrent and SOLiD)</p>
<p>Bridge amplification errors (Illumina)</p>
<p>Chimera generation (particularly during amplicon protocols) [1]</p>
<p>Sample contamination</p>
<p>Amplification errors associated with high or low GC content</p>
<p>PCR duplicates</td>
<td valign="top">Base miscalls due to low signal<br />
Indel errors (particular PacBio)</p>
<p>Base under- and over-calls with flow-based chemistries, associated with long homopolymers (454, Ion Torrent) [2]</p>
<p>Short homopolymer associated indels (Ion Torrent PGM) [2]</p>
<p>Post-homopolymeric tract SNPs (Illumina) and/or read-through problems [3]</p>
<p>Associated with inverted repeats (Illumina) [4]</p>
<p>Specific motifs particularly with older Illumina chemistry [4]</td>
<td valign="top">Calling variants without sufficient reads mapping</p>
<p>Bad mapping (incorrectly placed read)</p>
<p>Correctly placed read but indels misaligned<br />
Multi-mapping to repeat/paralogous regions<br />
Sequence contamination e.g. adaptors</p>
<p>Error in reference sequence</p>
<p>Alignment to ends of contigs in draft assemblies</p>
<p>Incorrect trimming of reads, aligning adaptors</p>
<p>Inclusion of PCR duplicates</td>
</tr>
</tbody>
</table>
<p>Phew! Are you sure you want to do some genome sequencing?!</p>
<p>I&#8217;ve included a few references here to relevant papers. Casey Bergman has <a href="http://www.citeulike.org/user/cisevol/tag/sequencing_error">started a Citeulike collection</a> of papers relating to sequencer error profiles.</p>
<p>Now, thanks to a second paper published this week we have another item for the table (BTW please comment on my table and let me know what I&#8217;ve missed). This is a technical <a href="http://nar.oxfordjournals.org/content/early/2013/01/08/nar.gks1443.full.pdf?keytype=ref&amp;ijkey=suYBLqdsrc7kH7G">tour de force</a> from the Broad Institute (ht @dgmacarthur) published in Nucleic Acids Research. Allow me to summarise:</p>
<p>Whilst searching for variants in cancer samples they discovered artifacts involving triplets of the pattern &#8220;C&gt;A/G&gt;T&#8221;, occurring at low frequency in some cancer projects. Low frequency variants are of course of great interest in cancer genetics as the sample is genetically heterogenous, and any of these low frequency variants may be of interest as potential &#8220;drivers&#8221; of cancer progression which over time may become dominant. They may also represent clues to pathways which could be targeted with specific drugs.</p>
<p>However these artifacts seemed not to be real due to certain patterns spotted in the analysis; specifically, strand bias (significantly different patterns of forward/reverse orientations between the reads with variant calls and the non-variant calls) and their presence in both tumour and normal samples.</p>
<p>The impressive part of this study is that they then managed to track down the cause, and unlike the normal suspects in such cases, which include errors in PCR amplification, sequencing errors and alignment/analysis errors, they demonstrated that oxidation of DNA during the library preparation step &#8211; in this case acoustic shearing &#8211; generated 8-oxoguanine &#8216;lesions&#8217; in the genome, which were responsible for these errors.</p>
<p>In order to confirm these were not sequencing errors they showed that the error was present on HiSeq V2, V3 and MiSeq chemistries as well as on the Ion Torrent PGM.</p>
<p>They developed a metric called &#8220;ArtQ&#8221; which was a probability of the error being present, akin to the phred score:</p>
<blockquote><p>-10 x log10(consistent errors &#8211; inconsistent errors / all observations)</p></blockquote>
<p>They considered an ArtQ score of &gt;30 to mean the sample is unaffected by this problem. They then go onto  suggest an alternative library preparation with the inclusion of anti-oxidants in order to improve the ArtQ score, but they also suggest a bioinformatics based filter to exclude such mutations when this is not possible. Go read the rest of the paper, it&#8217;s impressive stuff (despite the presence of 3-D barcharts, yuck!).</p>
<p>The conclusions of the paper are the ones I want to focus on. They state (emphasis mine):</p>
<blockquote><p>The obvious deleterious effects  that the existence of such artifacts can have on the ﬁeld of cancer research <strong>could be dramatic</strong>. If multiple common  processes in the laboratory can <strong>signiﬁcantly alter the </strong><strong>physical base sequence of DNA</strong>, it begs the question of  whether we can <strong>truly be conﬁdent</strong> that the rare mutations we are searching for <strong>can actually be attributed</strong> to true biological variation</p></blockquote>
<p>They then warn that this may not be the only undiscovered artifact out there:</p>
<blockquote><p>this is one of the myriad of possible  low frequency errors that could be induced during NGS sample preparation</p></blockquote>
<p>They conclude that:</p>
<blockquote><p>A systematic review of a wide variety of data obtained using different protocols from different laboratories needs to be undertaken by the sequencing community to identify whether there are any types of other artifacts that may be induced during extraction and/or library preparation that could be wrongly attributed to the biology of a given disease.</p></blockquote>
<p>I couldn&#8217;t agree more. Lex Nederbragt and I are working on a project we are calling SeqBench which we hope will start to address this problem by producing a well curated metadatabase of sequencing reads. By collecting high quality metadata we hope to be able to provide a useful testing resource which could be used to compare the results of different library preparation techniques, as well as the results from different sequencing platforms, aligners, assemblers and more. I am presenting a poster on this project at AGBT and plan to post more on the blog during the run-up to this meeting. I&#8217;d be delighted if this was something you would like to get involved with.</p>
<p><em>This is a draft blog post. I reserve the right to make changes to it until I remove this disclaimer, probably later on today. If you make useful comments or suggestions via the comments form or Twitter I&#8217;ll happily change the post and give you a credit.</em></p>
<p>Thanks to Casey Bergman for proof reading and useful suggestions.</p>
<p>References</p>
<p>[1] <a href="http://pathogenomics.bham.ac.uk/blog/2010/08/come-on-feel-the-pyronoise/">http://pathogenomics.bham.ac.uk/blog/2010/08/come-on-feel-the-pyronoise/</a></p>
<p>[2] <a href="http://pathogenomics.bham.ac.uk/blog/2012/05/benchtop-sequencer-comparison-paper/">http://pathogenomics.bham.ac.uk/blog/2012/05/benchtop-sequencer-comparison-paper/</a></p>
<p>[3]  Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012 Jul 24;13:341. doi: 10.1186/1471-2164-13-341. PubMed PMID: 22827831; PubMed Central PMCID: PMC3431227.</p>
<p>[4]  Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H, Altaf-Ul-Amin M, Ogasawara N, Kanaya S. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011 Jul;39(13):e90. doi: 10.1093/nar/gkr344. Epub 2011 May 16. PubMed PMID: 21576222; PubMed Central PMCID: PMC3141275.</p>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2013/01/sequencing-data-i-want-the-truth-you-cant-handle-the-truth/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>COST training school: Bioinformatics for microbial community analysis</title>
		<link>http://pathogenomics.bham.ac.uk/blog/2012/11/cost-training-school-bioinformatics-for-microbial-community-analysis/</link>
		<comments>http://pathogenomics.bham.ac.uk/blog/2012/11/cost-training-school-bioinformatics-for-microbial-community-analysis/#comments</comments>
		<pubDate>Thu, 22 Nov 2012 10:09:08 +0000</pubDate>
		<dc:creator>Nick Loman</dc:creator>
				<category><![CDATA[16S]]></category>
		<category><![CDATA[Genomics]]></category>
		<category><![CDATA[High-throughput sequencing]]></category>
		<category><![CDATA[Ion Torrent]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://pathogenomics.bham.ac.uk/blog/?p=1496</guid>
		<description><![CDATA[I will be helping teach one day of this course which is fully-funded if you are a student from a COST-participating country&#8230; see below for details: Deadline is next week, so hurry if you are interested. COST training school ES1103: Bioinformatics for microbial community analysis Dates: December 11th- 14th (the school will begin at 1pm [...]]]></description>
			<content:encoded><![CDATA[<p>I will be helping teach one day of this course which is fully-funded if you are a student from a COST-participating country&#8230; see below for details:</p>
<p>Deadline is next week, so hurry if you are interested.</p>
<blockquote><p>COST training school ES1103: Bioinformatics for microbial community analysis</p>
<p>Dates: December 11th- 14th (the school will begin at 1pm on Tuesday and finish at 5pm on Friday)</p>
<p>Location: University of Liverpool, Centre for Genomic Research (CGR), Life Sciences Building, Liverpool L69 7ZB</p>
<p>Organisers: Dr Christopher Quince (Christopher.quince@glasgow.ac.uk), Dr Christiane Hertz-Fowler (chf@liv.ac.uk)</p>
<p>Lecturers: Dr Christopher Quince (Christopher.quince@glasgow.ac.uk), Dr Nick Loman (n.j.loman@bham.ac.uk) and Dr Martin Hartmann (martin.hartmann@microbiome.ch)</p>
<p>Description: The aim of this three and a half day workshop will be to give students an overview of the tools and bioinformatics techniques available for the analysis of next generation sequence data from microbial communities. The emphasis during the first three days will be on the analysis of amplicon sequences, for example 16S rRNA or fungal ITS, generated using next generation sequencing platforms: principally 454 but also Illumina and Ion Torrent. The entire process from initial sequence data through filtering, noise removal, OTU generation and taxonomic classification will be addressed during the first day and a half. Detailed instruction on performing noise-removal, using AmpliconNoise, and the use of the Mothur pipeline will be provided. The emphasis on the third day will be on multivariate statistics for the analysis and ecological interpretation of the resulting data sets using R. The final day of the workshop will consist of an introduction to microbial genomics, covering de novo sequence assembly, and gene annotation. Metagenome analysis will be discussed but not in great detail. The format will comprise a mixture of lectures and hands-on tutorials where students will process example data sets in real-time.  Students will also be encouraged to bring their own data for analysis.</p>
<p>Application procedure: Funded places exist through the COST action ES1103 for students from participating member states (see http://www.cost.eu/domains_actions/essem/Actions/ES1103?parties).  Applications should be sent by e-mail to Dr Christopher Quince (Christopher.quince@glasgow.ac.uk). Applications should consist of one paragraph describing the student’s motivation for attending the course and level of bioinformatics experience. The latter is purely to allow the correct pitching of content and all levels of prior knowledge will be catered for. However, basic Linux and sequence analysis skills would be helpful. Applications received before Wednesday November 28th will be considered and students selected who will benefit most from the training. Students from any career level from under-graduate to professorial can attend but this is a hands-on workshop and preference will be given to people who will analyse their own data at some point in the future.</p>
<p>Funding: COST will reimburse each student up to 1000 EUs to cover registration, travel, hotel and food. Hotel reservations have been made in a block booking at the nearby Liner Hotel (75 EU per night). If you do not want to stay at this hotel please notify us in your application. There will be an 80EU registration fee payable in advance by the student. This will cover lunches during the course.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://pathogenomics.bham.ac.uk/blog/2012/11/cost-training-school-bioinformatics-for-microbial-community-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
