TAnDen
Tool for Analysis of Diversity in Viral Populations

 

Universidade Federal de São Paulo
Escola Paulista de Medicina

Overview

TAnDen is a computational platform for the analysis of viral genetic diversity on data generated by high-throughput sequencing. The software has two basic pieces: the first one is an intense alignment strategy that allows the recovery of the highest possible number of short-reads; the second one is the estimation of the populational genetic diversity through a Bayesian approach base on Dirichlet distributions inspired by word count modeling.


Motivation

The main motivation for the development of this method and its implementation as a platform is to help understand the genetic diversity of viruses with high rates of nucleotide substitution, as HIV-1 and Influenza.


Download

TAnDen is distributed under the GNU General Public Licence .

Intel binaries:

TAnDen 1.0 - Windows 32 bit

TAnDen 1.0 - Windows 64 bit

Reference

Zukurov, J. P. L., S. Nascimento-Brito,  A. Volpini, G. Oliveira, L. M. R. Janini, F. Antoneli. 2013. "A novel method for the estimation of diversity in viral populations from next generation sequencing data".


Use

A tutorial explaining the basic operation is also provided with the distribution as a text file.

FAQ - Frequently Asked Questions

1 - Is this program available Mac OS or Linux?

This program is available in 32 bits and 64 bits version. It is strongly recommend the use of the 64 bit version in order to handle large data files since the 32 bit versions has a limitation of 1.6 GB.

2 - Is it necessary to have any additional program for the software?

YES! It also requires the BLAST+ software and a user-defined database. The necessary binary files are provided with the distribution. It is not necessary to have the full BLAST+ package installed, only the programs "blastn.exe" and "makeblastdb.exe" are needed.

3 - I cannot start the alignment! What is happening?

Inform the path to the database. In the sub-folder "db" there are three HIV-1 database files (".nhr", ".nin" or ".nsq").

The status box informing that the location of the files is not correct.

The Windows with the correct path of the files.

4 - What is the reference sequence/database for alignment?

It is a sequence for mapping and positioning. It must be in fasta format.

Ex:

>REF

ATATATATATTGAATATAGCCATATCAGAGCACGCACACTAAGCTCTACAGATCATCATATTTTTTAGCGACGCTACCAGAC

5 - I have finished the alignment! What now?

The alignment output is the file ReconstructedSites.txt and is located at the same place from the inputs files. In the Windows Input Information select the file ReconstructedSites.txt. Select the option "Likelihood Analysis for Reconstructed/Alignment reads" in the combo from the Next-Generation analysis box windows.

6 - Which files should is used to run Likelihood analysis?

The input for Likelihood analysis is the output aligned file "ReconstructedSites.txt" that is in the same folder as the input file.

7 - Where is the Likelihood output?

The output file, called "Dirichlet.txt", will be saved in the same folder as the input file and the result will be displayed in the tab "Results".

8 - Which files should I use to run Bayesian analysis?

- File 01: the output from Likelihood analysis “Dirichlet.txt” located at the same folder from the inputs files.

- File 02: the output from the alignment “ReconstructedSites.txt” located at the same folder from the inputs files.


Flowchart overview


Contact: Fernando Antoneli Departmento de Informatica em Saude
Contact: Jean Paulo Lopes Zukurov  jpaulo_001@yahoo.com