What can you do with 1000 base pair reads?

Has anyone else noticed this yet? The page “Future of 454 Sequencing” on the Roche website poses the rather tantalising question: What can you do with 1000 base reads? The page is accompanied by a rather sexy graph showing a 454 Titanium run acheiving a modal read length of 766 base pairs. For those not acquainted with the current technology: Titanium typically produces 400-500 base pair reads, Solexa between 25-100 depending on run time (typical values 36 and 75) and Solid seems to be run around 50 base pairs at most places. With the short read technologies the quality scores tend to drop off towards the end of each sequence. 454 is no longer considered a short-read technology and in fact 1000 base pairs is getting close to pushing the limit of what skilled users could do with traditional ABI machines using the Sanger technology.

So, what can you do with 1000 base reads? Well, for bacterial genomes this raises a few interesting possibilities. The one that strikes me as most useful will be sequencing almost 2/3 of 16S genes when performing phylogenetic profiling. De novo sequencing will become even easier, and combined with paired-end, the concept of single scaffold assemblies should become reality. I guess it will make whole genome metagenomics easier. I’m not sure how much of a difference it will make to transcriptomics and resequencing, but it definitely won’t hurt!

My understanding is that this technology is in beta testing right now in several genome centres so may be available to regular customers very soon.

4 Responses

  1. Luke
    August 15, 2009 at 10:05 pm |

    I spend more time than is appropriate dreaming about what ultra-long read lengths will do for us in the future. One nice thing is that long read lengths allow you to starting putting together phased blocks; I’d be nice to see what you can do with that.

    How about using long read lengths to trace the population genetics of cancer – treating each read as a sample from one cell? If we find two SNPs on the same read (or phased block), and we find them with the distribution 5% 00, 10 5%, 11 90%, 0% 01, this suggests that the mutation at position 1 occurred first, followed by the one at pos 2, and that the mutation at pos 2 was associated with a sudden selective advantage. Find enough of those, and we can start putting together evidence of the sequence of mutations that lead to cancer, and help distinguish driver from passenger mutations.

  2. herm
    August 22, 2009 at 8:04 am |

    These lengths will certainly help the complete sequencing of eukaryotic genomes. I believe it’s still a pain to do that with current NGS technologies, although 750-800bp would be quite nice for fungal genomes.
    Once upon a time, I could get almost 1200 bases from the department’s ABI sequencer …

Leave a Reply

You must be logged in to post a comment.