MAV-seq: Platform for the NGS Data Workflow Management and Automation
-
1
The Jackson Laboratory, Genomic Medicine, United States
The increasing amount of unparalleled heterogeneous genomic data generated today necessitates a robust platform for dealing with the practical issues of genomic data interoperability, structure, standardization, security, quality, pre-processing, governance development, long-term support, management of exponential growth of genomic applications and their datasets of enormous size and diversity. Addressing most of these living challenges and practical issues to the genomic big data club, here, we present a new scientific platform i.e. MAV-seq (Ahmed et al., 2016), towards automated management and processing of Next Generation Sequencing (NGS) data.
MAV-seq (Management, Analysis, Visualization of Sequence data) is an interactive, user friendly, cross platform, secure, encrypted, automated, customized, centralized, multi-roles based database application for the management of sample repertoires and automation of the data pre-processing of epigenomic and transcriptomic data. It supports:
• Study data management
• Experiments & projects data management
• Centralized sample metadata management
• Centralized NGS data management
• Automation of NGS data quality checking
• Automation of NGS data pre-processing
• GUI based access to the data clusters for NGS data transfer and management
• Customized data export and sharing
• Efficient data linking, tracking, querying and searching
• Extraction, classification and loading of data from different formats
• Users data and control management
• Data security and encryption
• Event management and logging
• Centralized and modular data administration
• Privatization and globalization of data
MAV-seq (Figure 1) is a secure database management system, which deals with the security threats including privilege abuse, weak authentication, weak system configuration, backup, front and back end system vulnerabilities. It applies different data encryption algorithms to encode data and provides controlled system’s access to the users based on their roles and privileges. It provides easy to use interfaces for raw data management, operational data management, user data management and analysis of genomic data, which includes: classification, tracking, processing, querying and visualization of data.
MAV-seq is a product line application, developed following different bioinformatics methods, software engineering principles, Butterfly paradigm (Ahmed et al., 2014), human computer interaction guidelines and big data analytics. MAV-seq integrates various genomic data quality check and pre-processing pipelines (e.g. ATAC-seq, ChIP-seq, mRNA-seq, tRNA-seq, WES, WGS etc.) with user-friendly graphical interface to enable biologist with no programming experience to process their NGS datasets. It requires Java Runtime Environment to be installed on in-use operating system (e.g. Windows, MacOSX etc.) with all integrated applications to be downloaded and installed in data cluster and referenced genome for mapping.
With this platform, we aim to simplify management and storage of NGS datasets including the standardization and automation of quality control and basic processing steps. MAV-seq is very simple and easy to learn platform, which does not require bioinformatics and programming abilities.
Acknowledgements
We acknowledge The Jackson Laboratory for Genomic Medicine for the financial support and ownership of this research and development.
References
1. Ahmed, Z., Bolisetty, M., Saman, Z., Anguiano, E., Ucar, D. (2016) MAV-seq: An interactive platform for the Management, Analysis, and Visualization of Sequence Data. In the proceedings of Human Genome Meeting, USA, 2016.
2. Ahmed, Z., Saman, Z., Dandekar, T. (2014) Developing sustainable software solutions for bioinformatics by the “Butterfly” paradigm. F1000Research., 3, 71.
Keywords:
data management,
data processing,
Genomics and genetics,
NGS applications,
MAV-seq
Conference:
Neuroinformatics 2016, Reading, United Kingdom, 3 Sep - 4 Sep, 2016.
Presentation Type:
Demo
Topic:
Genomics and genetics
Citation:
Ahmed
Z
(2016). MAV-seq: Platform for the NGS Data Workflow Management and Automation.
Front. Neuroinform.
Conference Abstract:
Neuroinformatics 2016.
doi: 10.3389/conf.fninf.2016.20.00096
Copyright:
The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers.
They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.
The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.
Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.
For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.
Received:
14 Jul 2016;
Published Online:
01 Sep 2016.
*
Correspondence:
Dr. Zeeshan Ahmed, The Jackson Laboratory, Genomic Medicine, Farmington, CT, 06032, United States, zahmed@ifh.rutgers.edu