Genome-Based Drug Target Identification in Human Pathogen Streptococcus gallolyticus

Qureshi, Nosheen Afzal; Bakhtiar, Syeda Marriam; Faheem, Muhammad; Shah, Mohibullah; Bari, Ahmed; Mahmood, Hafiz M.; Sohaib, Muhammad; Mothana, Ramzi A.; Ullah, Riaz; Jamal, Syed Babar

doi:10.3389/fgene.2021.564056

ORIGINAL RESEARCH article

Front. Genet. , 25 March 2021

Sec. Computational Genomics

Volume 12 - 2021 | https://doi.org/10.3389/fgene.2021.564056

This article is part of the Research Topic Computational Genomics Approaches Against Antibacterial Drug Resistance View all 8 articles

Genome-Based Drug Target Identification in Human Pathogen Streptococcus gallolyticus

$\r\nNosheen Afzal Qureshi$ Nosheen Afzal Qureshi¹

Syeda Marriam Bakhtiar¹

Muhammad Faheem²

Mohibullah Shah³

Ahmed Bari⁴

Hafiz M. Mahmood⁵

Muhammad Sohaib⁶

Ramzi A. Mothana⁷

Riaz Ullah^7*

Syed Babar Jamal^2*

¹Department of Bioinformatics and Biosciences, Capital University of Science and Technology, Islamabad, Pakistan
²Department of Biological Sciences, National University of Medical Sciences, Rawalpindi, Pakistan
³Department of Biochemistry, Bahauddin Zakariya University, Multan, Pakistan
⁴Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
⁵Department of Pharmacology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
⁶Department of Soil Science, College of Food and Agriculture Sciences, King Saud University, Riyadh, Saudi Arabia
⁷Department of Pharmacognosy (MAPPRC), College of Pharmacy, King Saud University, Riyadh, Saudi Arabia

Streptococcus gallolysticus (Sg) is an opportunistic Gram-positive, non-motile bacterium, which causes infective endocarditis, an inflammation of the inner lining of the heart. As Sg has acquired resistance with the available antibiotics, therefore, there is a dire need to find new therapeutic targets and potent drugs to prevent and treat this disease. In the current study, an in silico approach is utilized to link genomic data of Sg species with its proteome to identify putative therapeutic targets. A total of 1,138 core proteins have been identified using pan genomic approach. Further, using subtractive proteomic analysis, a set of 18 proteins, essential for bacteria and non-homologous to host (human), is identified. Out of these 18 proteins, 12 cytoplasmic proteins were selected as potential drug targets. These selected proteins were subjected to molecular docking against drug-like compounds retrieved from ZINC database. Furthermore, the top docked compounds with lower binding energy were identified. In this work, we have identified novel drug and vaccine targets against Sg, of which some have already been reported and validated in other species. Owing to the experimental validation, we believe our methodology and result are significant contribution for drug/vaccine target identification against Sg-caused infective endocarditis.

Introduction

Streptococcus gallolyticus (Sg) is Gram-positive, non-motile bacteria previously referred as Streptococcus bovis. It is phenotypically diverse bacteria belonging to the Lancefield Group D Streptococci (Pasquereau-Kotula et al., 2018; Arjun et al., 2020). This bacterium grows in chain or pairs and is non-γ-hemolytic or slightly γ-hemolytic but sometimes shows alpha-hemolytic activity on ovine blood agar plates (Rusniok et al., 2010; Hensler, 2011). Although commonly present in microflora, approximately 2.5–15% is present in the gastrointestinal tract of a healthy individual (Hinse et al., 2011) and become an opportunistic pathogen causing various diseases, including infective endocarditis, colon cancer, meningitis, and septicemia.

This opportunistic pathogenesis of Sg is dependent on genes involved in polysaccharide production, glucan mucopolysaccharide, a putative component of biofilm produced by this species, and three types of pili and collagen-binding protein (Takamura et al., 2014). These genes provide protection from host immune system and help in adherence to the epithelial lining of the heart (Rusniok et al., 2010), causing infection and resulting in endocarditis (Millar and Moore, 2004).

For the last two decades, a significant rise in incidence of infective endocarditis were observed worldwide (Tripodi et al., 2005; Marmolin et al., 2016; Shahid et al., 2018; Arregle et al., 2019; Chamat-Hedemand et al., 2020). Among 100,000 population, 2.6–7 cases of endocarditis have been reported per year, a significant proportion of which was contributed by streptococcal infections: with incidence of 17% in North America, 31% in other European countries, 39% in the South America, and 32% in rest of the world (Holland et al., 2016). This disease mostly occurs in elderly patients (Firstenberg, 2016), and the median age of patients is ≥58 (Vilcant and Hai, 2018). The risk of developing Sg endocarditis rises with the consumption of uncooked meat or fresh dairy products, weakened immune system, history of hepatic diseases, and comorbidities such as diabetes mellitus and rheumatic disorders (Cãruntu et al., 2014).

In the presence of primary infection, metabolic disorder, or immune-compromised state, Sg tries to cause endocardial injury. This injury then triggers the thrombus formation by the removal of fibrin and platelets. After thrombus formation, the bacteria enters into the bloodstream through the thrombus. As Sg has virulence properties, it can enter into the bloodstream in a paracellular manner without inducing major immune response and adheres to the damaged collagen-rich surface of the cardiac valve (endocardium). Once it is attached to the endocardium, this bacterium proliferates and forms a biofilm, which causes the inflammation in the lining of the heart and causes endocarditis (McDonald, 2009; Hensler, 2011).

Antibacterial drugs such as Penicillin G along with Gentamycin and estreptomicin are preferred medical treatments against infective endocarditis. Other options include Gentamicin-related Ceftriaxone and vancomycin in patients allergic to penicillin (Satué-Bartolomé and Alonso-Sanz, 2009). For patients with persistent fever and resistance to medical therapy, an expensive surgical intervention may be needed (Grubitzsch et al., 2016). Sg is resistant to penicillin, and one of the strains of Sg is also found to be resistant to tetracycline (Hinse et al., 2011). Therefore, development of an efficient treatment strategy against endocarditis, novel therapeutic targets, and potent drugs are urgently required.

For the rapid identification, many computational methods have been established such as core genome and subtractive genomic approaches that allow us to identify the core essential genomes and which do not possess any homology with the human genome (Caputo et al., 2019). These approaches has been used in a number of human pathogens such as Corynebacterium diphtheria (Jamal et al., 2017), Corynebacterium pseudotuberculosis (Tiwari et al., 2014), and Treponema pallidium (Jaiswal et al., 2017). This study is designed with a goal to exploit in silico approaches to link Sg species genomic data with its proteome and to identify the putative therapeutic targets. It can be used to classify potent inhibitors that may contribute to the discovery of compounds that can inhibit pathogenic developments (Jamal et al., 2017). The proteomes from the seven genomes of Sg were compared using a pan genome approach, from which only those genes were selected that were present in all the strains of Sg (Hinse et al., 2011). Then, the predicted core genome was further filtered out on the basis of essentiality for the bacteria, from which only 18 proteins were found to be essential, and all these proteins were non-homologous to the host (human). Out of these 18 proteins, 12 cytoplasmic proteins were identified as drug targets. These essential and non-host homologous protein targets were subjected to virtual screening using a library of 11,993 compounds. The identified putative targets might be used to design peptide vaccines and suggest novel lead druggable compounds that could bind to the proposed target proteins (Barh et al., 2011; Jamal et al., 2017; Uddin et al., 2019).

Materials and Methods

Genome Selection

In the current study, all available strains of Sg with available complete genome were considered for the pan genome analysis. A total of seven strains of Sg were selected; gene and protein sequences were retrieved from NCBI¹.

Identification of Core Genomes

The core genome of Sg was identified from pan genome analysis using EDGAR software (Blom et al., 2016). Only those genes that were common in all the strains of Sg were selected. The selection criteria in EDGAR software were as follows: one strain is selected as a reference strain, and rest of all the strains were compared with the reference strains and from which the core genomes were selected that were common in all the strains. The algorithm that it used was protein Basic Local Alignment Search Tool (BLASTp) with the standard scoring matrix BLOSUM62 and cutoff value of E = 1 × 10^–5 (Blom et al., 2016).

Identification of Non-host Homologous Proteins

The identified core genome of Sg was then subjected to BLASTp against the human proteome to find out the proteins non-homologous to human host using default parameters e-value = 0.0001, bit score ≥ 100, scoring matrix BLOSUM62 and identity ≥ 25%. Only those proteins that showed no hit against human proteome database were selected (Jamal et al., 2017).

Identification of Essential Genes

The non-host homologous proteins were subjected to BLASTp against Database of Essential Genes (DEG) with the standard scoring matrix BLOSUM62, e-value = 0.001 and identity ≥25% to find out essential proteins that are indispensable for the survival of pathogen. The database of essential genes consist of experimentally validated data from eukaryotes, archaea, and prokaryotes, and it covers a large number of essential genes for 31 bacteria containing more than 12,000 bacterial essential genes (Luo et al., 2014).

Drug Target Prioritization

For the determination of potential therapeutics, several factors are used like molecular weight, molecular function, cellular localization, pathway analysis, and virulence (Agüero et al., 2008). Molecular weight (MW) was determined by ProtParam tool². Targets whose MW is <100 kDa are considered as best therapeutic target (Mondal et al., 2015). Molecular functions and biological process for target proteins were determined by Uniprot³. Subcellular localization of pathogen was performed by CELLO⁴. The cellular localization of bacteria determines the environment in which proteins operate. It affects the function of protein by controlling accessibility and availability of all types of molecular interaction partners. The knowledge of protein localization often plays an important role in characterizing the cellular function of hypothetical and newly discovered proteins (Scott et al., 2005). For pathway analysis, the Kyoto Encyclopedia of Genes and Genomes (KEGG) web tool⁵ was used to determine the role of protein targets in different cellular and metabolic pathways (Kanehisa and Sato, 2020). To identify virulence of protein targets, Virulence Factor Database (VFDB)⁶ was used, which determines the pathogenic virulence of the target proteins.

Catalytic Pocket Detection

The shortlisted potential druggable proteins were further screened to detect the possible binding pockets by calculating the druggable score using DoGSiteScorer (Volkamer et al., 2012). It is an automated pocket detection tool that is used for the calculation of druggability of protein cavities. This tool needs sequence of interest in 3D structure format; therefore, SwissModel was used for the prediction of the 3D structure. SwissModel web tool predicts the 3D structures of protein targets (Nielsen et al., 2010). After obtaining 3D structures, the druggability evaluation was performed by DoGSiteScorer. This tool returns the pocket residue and druggability score, which ranges from 0 to 1. The score closer to 1 is considered as a highly druggable protein cavity (Jamal et al., 2017).

Retrieval of Ligands

Eleven thousand nine hundred ninety-three druggable molecules with Tonimoto cutoff level of 60% were retrieved from the ZINC database (Sterling and Irwin, 2015). Then, partial charges were calculated, and energies of these compounds were minimized using energy minimization algorithm with default parameters. All minimized structures were saved in.mdb file. Then, these prepared ligands were used as an input file for molecular docking (Wadood et al., 2014).

Validation of 3D Structures

All the 3D structures quality was further validated using RAMPAGE and ERRAT tool. RAMPAGE stands for RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression. This tool does Ramachandran plot analysis and provides validity score for the 3D structure of target proteins. The score ≥80 were considered good (Batut and Gingeras, 2013). For further validation, ERRAT, an online tool, was used, which provides information about the protein structure with bad regions. The quality factor of the 3D structure ≥37% were considered good (Saddala and Adi, 2018).

Preparation of Protein for Docking

The predicted 3D structures were further prepared for docking using the Molecular Operating Environment (MOE) tool. This tool is quite robust along with the meticulous algorithm. It not only predicts the top ranking poses but also prognosticate the root mean-square deviation (RMSD) along with the calculated energies of docked molecule (Pagadala et al., 2017). The 3D protonation and energy minimization of these 3D structures was done (Vilar et al., 2008); then, these minimized structures were further used as template for molecular docking.

Molecular Docking of Drug Targets

The prepared minimized structures of targeted proteins and ligands were further subjected to molecular docking carried out in MOE using the MOE Dock (Figure 1). It predicted the favorable binding possess of selected ligands active sites of drug targets. Default parameters were selected for molecular docking. After the docking, we analyzed the best poses for hydrogen bonding/π–π interactions, and then, RMSD was calculated in MOE (Wadood et al., 2014). The orientation of the best dock molecules was further analyzed in chimera.

FIGURE 1

Figure 1. Complete workflow of drug target identification in Sg using in silico approaches.