There are now several available options if you want to call MLST profiles from whole-genome data.
DTU MLST Server
The web server at the Center for Genomic Epidemiology at the Danish Technical University is probably the easiest option, with the advantage that it will accept both raw read files and assemblies. It worked well when I tried it, however it was quite slow to return results and if you are uploading large read datasets it will take some time, particularly if you are analysing a large number of samples. It also does not have all of the MLST database listed (I wanted to use C. albicans).
BIGSdb is a powerful and flexible web server software that can be installed on your local PC or server. It offers the ability to call MLST profiles from assembled genome data, as well as setting up your own typing schemes based on other epidemiologically informative marker genes. But non-bioinformaticians may find it a little tricky to set up.
Update: There is also a hosted version of BIGSdb which lets you cut-and-paste your de novo assembly into the sequence query form and get profiles out, available for a certain subset of the MLST databases (more available on request to Keith Jolley).
SRST comes from Kat Holt’s group in Melbourne. It runs on your local machine and is notable because it calls profiles from short-read data without prior de novo assembly. It gives a confidence score to assignments. As it has some dependencies (BWA, samtools, BLAST) and runs as a Python script it is probably best run on a Linux machine or a Mac.
I found it works quite well on the Illumina data I tried, however there are a few tips for getting it running that are probably worth documenting for other users.
- The alleles files should be named gene.fas and geneshould be identical to the FASTA header lines in the file, as well as the column names in the STs file.
- The alleles in the alleles file should be named gene-N where N is the number of the allele. Note if you have a different separator than a hyphen you can specify this with the –name-sep argument, but having no separator is not allowed (as I think is the case with the Cork E. colidatabase).
- You need an older version of samtools to run this properly, I used samtools-0.1.12a. Newer versions don’t work.
Roll your own (suggested by Anthony Underwood, HPA)
Of course what many people do is first perform a de novo assembly, perhaps with Velvet, and then BLAST the contigs against the MLST allele database. You can then inspect the results manually, or write a little script to collect the results into a profile. If you have one you’d like to share, please post the link in the comments below. Here’s my Python script for what it’s worth …
- Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H, Marvig RL, Jelsbak L, Sicheritz-Pontén T, Ussery DW, Aarestrup FM, Lund O. Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol. 2012 Apr;50(4):1355-61. PMID: 22238442.
- Inouye M, Conway TC, Zobel J, Holt KE. Short read sequence typing (SRST): multi-locus sequence types from short reads. BMC Genomics. 2012 Jul 24;13:338. PMID: 22827703.
- Jolley KA, Maiden MC. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics. 2010 Dec 10;11:595. PMID: 21143983.