19 Responses

  1. peterjc
    August 16, 2011 at 9:31 am |

    Hi Nick,

    What makes you say MIRA “Does not support paired-end 454 data or mate-pair Illumina data”? See for example Bastein’s email of 9 July 2011,

    One thing to keep in mind are the orientation of the pairs and what the scaffolder expects:

    Sanger pairs are oriented like this: ——> <——– and that is what Bambus wants per default
    454 pairs are originally oriented like this: ——–> ———> but as scaffolders (and MIRA) originally did not expect that, I use a trick by letting sff_extract reverse one sequence so that everything as back to “normal”. Nowadays MIRA could also use the forward / forward orientation, but I had not time to change sff_extract.
    Illumina paired-end reads look like this: ——–> <——— … which makes it easy.
    Illumina mate-pairs look like this: … which again lets scaffolders like Bambus despair as the expect something different. MIRA does not care there, it can work with that, too.


  2. peterjc
    August 16, 2011 at 9:50 am |

    It’s clearer now that you’re talking about scaffolding in MIRA.

  3. Anthony Underwood
    Anthony Underwood
    August 16, 2011 at 10:01 am |

    Hi Nick

    Now that Newbler 2.6 can handle fastq files, does this not enable adding both 454 sff files and illumina fastq files as inputs for the gsAssembler? I had hoped this might be the case.


  4. BioMickwatson
    August 16, 2011 at 12:03 pm |

    I’m never convinced adding 454 data to large eukaryotic assemblies makes much sense. And most hybrid assemblers assume your want to start with 454 and add Illumina; I often want to do the reverse i.e. assemble Ilkumina, scaffold with long insert 454 paired end. Mira appears to favour 454 in hybrid assemblies, which makes no sense if you have 0.5x 454 and 30x Illumina.

  5. BioMickwatson
    August 16, 2011 at 12:07 pm |

    Also had awful experience of NGen….

  6. yannickwurm
    August 17, 2011 at 10:24 am |

    FYI, we did something like 6) for the fire ant genome. This was indeed because Newbler is best at dealing with expensive 454 reads.
    1. assembled+scaffolded Illumina with SOAPdenovo
    2. “shredded” the scaffolds into overlapping 300bp sequences
    3. provided these “fake 454-like sequences” as FASTA to Newbler 454 along with our true 454 reads.

    See the paper at http://www.pnas.org/content/early/2011/01/24/1009690108.abstract or http://goo.gl/MY1Wq

    Cheers & thanks for the stimulating blog!

  7. yannickwurm
    August 19, 2011 at 3:36 am |

    @Nick: I completely agree that things *should* be better. When we did this (about 2 years ago), there didn’t seem to be any viable alternative (Mira existed, but we didn’t try it – among the 5 or 6 “standard” Illumina assembly softwares, SOAP was the only one that output useable data). It seems that a lot of progress has been made since then. I’m hoping that in another year or 2 there will be a one-button black box idiot-proof solution that figures everything out by itself & gives you a great assembly no matter what you throw at it! :)

  8. yannickwurm
    August 25, 2011 at 1:50 pm |

    If someone wants to compare the results of some of these newer approaches using real data, they’re more than welcome to. We have ~ 15x genome coverage in 454 shotgun, ~4x in paired 454 (8kb and 20kb), and 45x published Illumina “shotgun” (paired reads from 350bp insert library – mid 2009 so mediocre quality), as well as 100x unpublished from a late 2010 HiSeq run of the same library (get in touch about this one).

  9. praveenrajs
    October 18, 2011 at 8:13 am |

    What approach do you suggest to fill gaps in a known reference sequence using 454 data. Since the bacterial reference is ~3-4 MB and approx 10x of data is generated in 454. Do we have an optimized procedure to do it efficiently?

  10. santiago
    July 11, 2013 at 3:27 pm |

    Hello, Nick.

    Two years have past since you wrote this post.
    Do you have an update on this subject?
    How did it the hybrid assembly with Newbler v2.6 work?
    What strategies/softwares/pipelines do you recommend for nowadays?

    In my case, I’ve sequenced a ~700Mb chromosome: ~215X 2x100bp Illumina reads (HiSeq 1500) and ~500Mb 8Kb 454 paired-end reads.
    What would be the best scenario for assembling this data?


  11. sullis
    July 23, 2013 at 3:38 pm |


    Is it still true that CLC GW does not support paired end/mate pair scaffolding? (I’ve got a query in the CLC about this too)

    I’m currently running my first hybrid 454 (single end) + Illumina (100×100 ‘paired’) de novo assembly of a smallish (25MB) haploid eukaryote, with Newbler v2.8. It’s taking a long time – 4 days and counting – even with plenty of compute resources (using our own cluster, not 454’s).

  12. sullis
    July 23, 2013 at 3:40 pm |

    (The main time suck so far has been reading the flowgrams — 226 million of them)

  13. sullis
    November 19, 2014 at 6:00 pm |

    Once more into the breach…I am going to try hybrid assembly again. Since last time I’ve learned that when you feed Newbler an sff file (which was produced by the signal processing step of sequencing) it performs further key- and quality-trimming on the reads, before assembly them. It also splits paired end SFF reads. You can get these Newbler processed reads as assembly output if you set the right flag.

    I’d like to try using either Newbler (3.0 now) to assemble, or use another assembler (Masurca,MIRA, Ray…)

    Problem for Newbler: is it ‘paired-end aware’ for Illumina paired end reads (two fastq files, name d-1 and -2, sequence pairs in reverse-complement orientation, distance ~350 -600bp depending on library) ?

    Problem for others: are they ‘paired-end aware’ for 454 reads (whihc come in a single SFF file, and orientation of ‘pairs’ differs from Illumina’s, distance 3kb -20kb, depending on library) — and do they perform the key-and quality-trimming? If the answer to the last q is ‘no’, can they be made ‘paired-end aware’ given a set of Newbler-trimmed reads as input?

Leave a Reply

You must be logged in to post a comment.