UGENE es un software informático para bioinformática . [1] [2] Funciona en sistemas operativos de computadoras personales como Windows , macOS o Linux . Se publica como software gratuito y de código abierto , bajo una licencia pública general GNU (GPL) versión 2.
Autor (es) original (es) | Fursov M. |
---|---|
Desarrollador (es) | Unipro |
Versión inicial | 2008 |
Lanzamiento estable | 37/14 de diciembre de 2020 |
Escrito en | C ++ , Qt |
Sistema operativo | Windows , macOS , Linux |
Disponible en | Inglés , ruso |
Tipo | Kit de herramientas de bioinformática |
Licencia | GPLv 2 |
Sitio web | ugene |
UGENE ayuda a los biólogos a analizar diversos datos de genética biológica , como secuencias , anotaciones, alineaciones múltiples , árboles filogenéticos , conjuntos NGS y otros. Los datos se pueden almacenar tanto localmente (en una computadora personal) como en un almacenamiento compartido (por ejemplo, una base de datos de laboratorio).
UGENE integra docenas de conocidas herramientas biológicas, algoritmos y herramientas originales en el contexto de la genómica , la biología evolutiva , la virología y otras ramas de las ciencias de la vida. UGENE proporciona una interfaz gráfica de usuario (GUI) para las herramientas prediseñadas para que los biólogos sin conocimientos de programación informática puedan acceder a esas herramientas con mayor facilidad.
Con UGENE Workflow Designer, es posible optimizar un análisis de varios pasos. El flujo de trabajo consta de bloques como lectores de datos, bloques que ejecutan herramientas y algoritmos integrados y escritores de datos. Los bloques se pueden crear con herramientas de línea de comandos o un script. Un conjunto de flujos de trabajo de muestra está disponible en Workflow Designer, para anotar secuencias, convertir formatos de datos, analizar datos NGS, etc.
Además de la interfaz gráfica, UGENE también tiene una interfaz de línea de comandos . De este modo también se pueden ejecutar flujos de trabajo.
Para mejorar el rendimiento, UGENE utiliza procesadores multinúcleo (CPU) y unidades de procesamiento de gráficos (GPU) para optimizar algunos algoritmos. [3] [4]
Caracteristicas clave
El software admite las siguientes funciones:
- Crear, editar, y anotar de ácidos nucleicos y proteínas secuencias
- Búsqueda rápida en una secuencia
- Alineación de secuencia múltiple : Clustal W y O, MUSCLE , Kalign , MAFFT , T-Coffee
- Crear y utilizar almacenamiento compartido, p. Ej., Base de datos de laboratorio.
- Búsqueda a través de bases de datos en línea : Centro Nacional de Información Biotecnológica (NCBI), Protein Data Bank (PDB), UniProtKB / Swiss-Prot , UniProtKB / TrEMBL , servidores DAS
- Búsqueda BLAST local y NCBI Genbank
- Buscador de marco de lectura abierto
- Buscador de enzimas de restricción con lista integrada de enzimas de restricción REBASE [5]
- Paquete Primer3 integrado [6] para el diseño de cebadores de PCR
- Construcción y anotación de plásmidos
- Clonación in silico mediante el diseño de vectores de clonación
- Mapeo del genoma de lecturas cortas con Bowtie , BWA, [7] y UGENE Genome Aligner
- Visualice datos de secuenciación de próxima generación (archivos BAM) utilizando UGENE Assembly Browser
- Llamada variante con SAMtools [8]
- Análisis de datos de RNA-Seq con Tuxedo pipeline (TopHat, [9] Gemelos, [10] etc.)
- Análisis de datos ChIP-seq con tubería Cistrome (MACS, [11] CEAS, [12] etc.)
- Procesamiento de datos NGS sin procesar
- Integración de paquetes HMMER 2 y 3
- Visor de cromatogramas
- Search for transcription factor binding sites (TFBS) with weight matrix and SITECON algorithms
- Search for direct, inverted, and tandem repeats in DNA sequences
- Local sequence alignment with optimized Smith-Waterman algorithm
- Build (using integrated PHYLIP neighbor joining, MrBayes,[13] or PhyML[14] Maximum Likelihood) and edit phylogenetic trees
- Combine various algorithms into custom workflows with UGENE Workflow Designer
- Contigs assembly with CAP3[15]
- 3D structure viewer for files in Protein Data Bank (PDB) and Molecular Modeling Database (MMDB)[16] formats, anaglyph view support
- Predict protein secondary structure with GOR IV and PSIPRED algorithms
- Construct dot plots for nucleic acid sequences
- mRNA alignment with Spidey[17]
- Search for complex signals with ExpertDiscovery[18]
- Search for a pattern of various algorithms' results in a nucleic acid sequence with UGENE Query Designer
- PCR in silico for primer designing and mapping
- Spade de novo assembler
Vista de secuencia
The Sequence View is used to visualize, analyze and modify nucleic acid or protein sequences. Depending on the sequence type and the options selected, the following views can be present in the Sequence View window:
- 3D structure view
- Circular view
- Chromatogram view
- Graphs View: GC-content, AG-content, and other
- Dot plot view
Editor de alineación
The Alignment Editor allows working with multiple nucleic acid or protein sequences - aligning them, editing the alignment, analyzing it, storing the consensus sequence, building a phylogenetic tree, and so on.
Visor de árboles filogenéticos
The Phylogenetic Tree Viewer helps to visualize and edit phylogenetic trees. It is possible to synchronize a tree with the corresponding multiple alignment used to build the tree.
Navegador de ensamblajes
The Assembly Browser project was started in 2010 as an entry for Illumina iDEA Challenge 2011.[19] The browser allows users to visualize and browse large (up to hundreds of millions of short reads) next generation sequence assemblies. It supports SAM,[20] BAM (the binary version of SAM), and ACE formats. Before browsing assembly data in UGENE, an input file is converted to a UGENE database file automatically. This approach has its pros and cons. The pros are that this allows viewing the whole assembly, navigating in it, and going to well-covered regions rapidly. The cons are that a conversion may take time for a large file, and needs enough disk space to store the database.
Diseñador de flujo de trabajo
UGENE Workflow Designer allows creating and running complex computational workflow schemas.[21]
The distinguishing feature of Workflow Designer, relative to other bioinformatics workflow management systems is that workflows are executed on a local computer. It helps to avoid data transfer issues, whereas other tools’ reliance on remote file storage and internet connectivity does not.
The elements that a workflow consists of correspond to the bulk of algorithms integrated into UGENE. Using Workflow Designer also allows creating custom workflow elements. The elements can be based on a command-line tool or a script.
Workflows are stored in a special text format. This allows their reuse, and transfer between users.
A workflow can be run using the graphical interface or launched from the command line. The graphical interface also allows controlling the workflow execution, storing the parameters, and so on.
There is an embedded library of workflow samples to convert, filter, and annotate data, with several pipelines to analyze NGS data developed in collaboration with NIH NIAID.[22] A wizard is available for each workflow sample.
Formatos de datos biológicos compatibles
- Sequences and annotations: FASTA (.fa), GenBank (.gb), EMBL (.emb), GFF (.gff)
- Multiple sequence alignments: Clustal (.aln), MSF (.msf), Stockholm (.sto), Nexus (.nex)
- 3D structures: PDB (.pdb), MMDB (.prt)[16]
- Chromatograms: ABIF (.abi), SCF (.scf)
- Short reads: Sequence Alignment/Map(SAM) (.sam), binary version of SAM (.bam), ACE (.ace), FASTQ (.fastq)
- Phylogenetic trees: Newick (.nwk), PHYLIP (.phy)
- Other formats: Bairoch (enzymes info), HMM (HMMER profiles), PWM and PFM (position matrices), SNP and VCF4 (genome variations)
Ciclo de lanzamiento
UGENE is primarily developed by Unipro LLC[23] with headquarters in Akademgorodok of Novosibirsk, Russia. Each iteration lasts about 1–2 months, followed by a new release. Development snapshots may also be downloaded.
The features to include in each release are mostly initiated by users.
Ver también
- Sequence alignment software
- Bioinformatics
- Computational biology
- List of open source bioinformatics software
Referencias
- ^ Okonechnikov K, Golosova O, Fursov M, the UGENE team (2012). "Unipro UGENE: a unified bioinformatics toolkit". Bioinformatics. 28 (8): 1166–7. doi:10.1093/bioinformatics/bts091. PMID 22368248.
- ^ Fursov, M.; Novikova, O. (2008). "Multitasking software system for DNA analysis" (PDF). Proceedings of the Sixth International Conference on Bioinformatics of Genome Regulation and Structure. 1: 78. ISBN 978-5-91291-005-0.
- ^ Fursov, M. Y.; Oshchepkov, D. Y; Novikova, O. S. (2009). "UGENE: interactive computational schemes for genome analysis" (PDF). Proceedings of the Fifth Moscow International Congress on Biotechnology. 3: 14–15. ISBN 978-5-7237-0372-8.
- ^ Efremov, I. E.; Fursov, M. Y; Danilova, Yu. E. (2009). "UGENE: high performance genome analysis suite". Proceedings of the Fifth Moscow International Congress on Biotechnology. 2: 405–406. ISBN 978-5-7237-0372-8.
- ^ "NEW REBASE HOME". rebase.neb.com. Retrieved 18 October 2019.
- ^ "Primer3 Input (version 0.4.0)". bioinfo.ut.ee. Retrieved 18 October 2019.
- ^ "Burrows-Wheeler Aligner". bio-bwa.sourceforge.net. Retrieved 18 October 2019.
- ^ "SAMtools". samtools.sourceforge.net. Retrieved 18 October 2019.
- ^ "TopHat". ccb.jhu.edu. Retrieved 18 October 2019.
- ^ "IU Webmaster redirect". cufflinks.cbcb.umd.edu. Retrieved 18 October 2019.
- ^ "MACS - Model-based Analysis for ChIP-Seq". liulab.dfci.harvard.edu. Retrieved 18 October 2019.
- ^ "CEAS - Cis-regulatory Element Annotation System". liulab.dfci.harvard.edu. Retrieved 18 October 2019.
- ^ "MrBayes | index". nbisweden.github.io. Retrieved 18 October 2019.
- ^ "ATGC: PhyML". atgc.lirmm.fr. Retrieved 18 October 2019.
- ^ CAP3
- ^ a b "Macromolecular Structures Resource Group". www.ncbi.nlm.nih.gov. Retrieved 18 October 2019.
- ^ "Spidey is superceded [sic] by Splign". www.ncbi.nlm.nih.gov. Retrieved 18 October 2019.
- ^ Vaskin, Y.; Khomicheva, I.; Ignatieva, E.; Vityaev, E. (2012). "ExpertDiscovery and UGENE integrated system for intelligent analysis of regulatory regions of genes". In Silico Biology. 11 (3–4): 97–108. doi:10.3233/ISB-2012-0448. PMID 22935964.
- ^ "Illumina - iDEA Challenge". Archived from the original on 2013-01-26. Retrieved 18 October 2019.
- ^ "SAM" (PDF). Retrieved 18 October 2019.
- ^ Fursov, M. Y.; Varlamov, A. (2009). "UGENE - A practical approach for complex computational analysis in molecular biology" (PDF). Proceedings of the 10th Annual Bioinformatics Open Source Conference: 7.
- ^ "NIH: National Institute of Allergy and Infectious Diseases | Leading research to understand, treat, and prevent infectious, immunologic, and allergic diseases". www.niaid.nih.gov. Retrieved 18 October 2019.
- ^ "УНИПРО, Новосибирский центр информационных технологий. | СОФТ. Разработка, тестирование, реинжиниринг, поддержка ПО" ["UNIPRO, Novosibirsk center of information technologies. | SOFT. Development, testing, reengineering, software support"]. Retrieved 18 October 2019.
enlaces externos
- Official website
- Official website, UniPro
- UGENE podcast
- UGENE documentation
- UGENE forum
- Лучший свободный проект России | Журнал Linux Format - все о Linux по-русски
- [permanent dead link]