13 Responses

  1. flxlex
    flxlex
    May 10, 2011 at 7:02 pm |

    While running PRINSEQ on the reads, and taking a first look at a newbler assembly, I came across your post. Hmmm, great minds think alike?

    However, I have a few different metrics for ‘my’ newbler 2.5.3 assembly of the data:
    - runtime on a server, allowing for 20 cpus (otherwise default settings): 10 minutes (!)
    - 918 contigs (from 100 bases)
    - N50 for those was 8899 bp
    - longest 58008 bp
    - total bases 4448513 bp

    So, except for the total bases, I obtained even better metrics. 20 cpus was overkill, but a first try with one cpu (usually more than enough for bacterial genomes) looked going really really slow, so I thought to give it (almost) all I got…

  2. flxlex
    flxlex
    May 11, 2011 at 9:35 am |

    I simply used
    runAssembly -o projectname -cpu 20 file.sff

  3. paultmorrison
    paultmorrison
    May 11, 2011 at 12:51 pm |

    Very interesting data. It really gives me a feel for what it looks like. Would be very interested to see what your machine in your lab can do. Get in there and get that puppy running.

    The Ion Torrent produced data set may not be an “average” run. Did they include how they filtered the raw or is it supposed to be unfiltered?

  4. flxlex
    flxlex
    May 12, 2011 at 11:57 am |

    CLCbio picked up on your story and used it in a press release: http://www.clcbio.com/files/CLCbio_pressrelease_12052011.pdf
    A bit cheap, I feel…

  5. werner@cornell.edu
    werner@cornell.edu
    May 12, 2011 at 2:02 pm |

    Thanks Nick; I’ve been very curious about what the Ion Torrent data look like, and I saw a link to this post through the CLC Bio newsletter. In our lab, we do a lot of high-throughput amplicon sequencing, so error rates are particularly of interest. You saw that the qual scores got low quite quickly — do you have any sense of the real error rate, though? I suppose a scaffolded assembly would be a better way to check. I’ll be really interested to hear what you get in an average run from your machine. Thanks! –Jeff

  6. SeqWiz
    SeqWiz
    May 12, 2011 at 6:38 pm |

    No mention of GENEious? It works with Ion data, would be good to know how it compares.

  7. There is more (length) to Ion Torrent reads than meets the eye (and is Ion Torrent hiding it?) « In between lines of code

    [...] (check out the excellent analysis by Nick Loman on his blog http://pathogenomics.bham.ac.uk/blog/2011/05/first-look-at-ion-torrent-data-de-novo-assembly/ So, I naturally had a look at the information in this sff file. Here are the header and the first [...]

  8. The IdeaConnection Blog · Crowdsourcing Helping to Stop a Killer Bacterium in its Tracks

    [...] Loman from the University of Birmingham. He started to analyse the data and posted sequences on his pathogen blog. Within a few days scientists from four continents were joining in with the [...]

  9. danielb
    danielb
    April 17, 2013 at 12:22 pm |

    Hi!

    You forget to set the -large option for the newbler. If your genome is large (found not defined, how large “large” means), and the option is not set, the algorithm stops after a certain base-amount, and gives not any sign, why. If you use the -large, it takes ca. 1-2 mins for a ~3mbp de novo assembly (i7 3770k 8 threads), and it gives pretty long contigs, however, as known, the contiguity isnt correlated to the validity of the assembly.

Leave a Reply

You must be logged in to post a comment.