This notebook takes the machine learning predictions from the previous notebook and compares them with the ground truth provided by OMG.
Tested using biodesign_3.7 kernel on jprime.lbl.gov. It requires the cplex library for running the MOMA optimization.
A modified E. coli model with the isoprenol pathway added to it (iJO1366_MVA.json
file in the ../data/models
directory)
A set of recommended designs provided by ART (../data/art_output/ARTrecommendations.csv
)
../data/ART_recommendations_with_production.csv
).../data/ART_prediction_vs_actual_recommendation.png
).EDD_experiment_description_file_BT.csv
EDD_OD_BT.csv
EDD_external_metabolites_BT.csv
EDD_transcriptomics_BT.csv
EDD_proteomics_BTSM.csv
EDD_metabolomics_BTSM.csv
EDD_fluxomics_BT.csv
Clone the git repository with the OMG
library:
git clone https://github.com/JBEI/OMG.git
or pull the latest version.
import sys
import os
sys.path.insert(1, '../../OMG')
sys.path.append('../')
import cobra
import pandas as pd
import omg
from plot_multiomics import *
from tqdm import tqdm
user_params = {
'host': 'ecoli', # ecoli or ropacus
'modelfile': '../data/models/iJO1366_MVA.json',
'cerevisiae_modelfile': '../data/models/iMM904.json',
'timestart': 0.0,
'timestop': 8.0,
'numtimepoints': 9,
'designsfile': 'ARTrecommendations.csv',
'designsfilepath': '../data/art_output',
'mapping_file': '../mapping/inchikey_to_cid.txt',
'output_file_path': 'data/omg_output',
'edd_omics_file_path': '../data/omg_output/edd/',
'numreactions': 8,
'numinstances': 10,
'ext_metabolites': {
'glc__D_e': 22.203,
'nh4_e': 18.695,
'pi_e': 69.454,
'so4_e': 2.0,
'mg2_e': 2.0,
'k_e': 21.883,
'na1_e': 103.7,
'cl_e': 27.25,
'isoprenol_e': 0.0,
'ac_e': 0.0,
'for_e': 0.0,
'lac__D_e': 0.0,
'etoh_e': 0.0
},
'initial_OD': 0.01,
'BIOMASS_REACTION_ID': 'BIOMASS_Ec_iJO1366_core_53p95M'
}
We proceed in the same way as in Notebook A
First we obtain the metabolic model:
file_name = user_params['modelfile']
model = cobra.io.load_json_model(file_name)
We now add minimum flux constraints for production of isoprenol and formate, and we limit oxygen intake:
iso = 'EX_isoprenol_e'
iso_cons = model.problem.Constraint(model.reactions.EX_isoprenol_e.flux_expression,lb = 0.20)
model.add_cons_vars(iso_cons)
for_cons = model.problem.Constraint(model.reactions.EX_for_e.flux_expression,lb = 0.10)
model.add_cons_vars(for_cons)
o2_cons = model.problem.Constraint(model.reactions.EX_o2_e.flux_expression,lb = -8.0)
model.add_cons_vars(o2_cons)
And then we constrain several central carbon metabolism fluxes to more realistic upper and lower bounds:
CC_rxn_names = ['ACCOAC','MDH','PTAr','CS','ACACT1r','PPC','PPCK','PFL']
for reaction in CC_rxn_names:
reaction_constraint = model.problem.Constraint(model.reactions.get_by_id(reaction).flux_expression,lb = -1.0,ub = 1.0)
model.add_cons_vars(reaction_constraint)
We also create a similar model with a higher production of isoprenol, which we will use with MOMA to simulate bioengineered strains:
modelHI = model.copy()
iso_cons = modelHI.problem.Constraint(modelHI.reactions.EX_isoprenol_e.flux_expression,lb = 0.25)
modelHI.add_cons_vars(iso_cons)
We will need this to create time series for bioenginered strains. First, we create first time grid for simulation:
t0 = user_params['timestart']
tf = user_params['timestop']
points = user_params['numtimepoints']
tspan, delt = np.linspace(t0, tf, points, dtype='float64', retstep=True)
grid = (tspan, delt)
We then use this model to obtain the times series for fluxes, OD and external metabolites:
solution_TS, model_TS, cell, Emets, Erxn2Emet = \
omg.get_flux_time_series(model, user_params['ext_metabolites'], grid, user_params)
We perform the same calculation for the model with higher isoprenol production that we created above:
solutionHI_TS, modelHI_TS, cellHI, EmetsHI, Erxn2EmetHI = \
omg.get_flux_time_series(modelHI, user_params['ext_metabolites'], grid, user_params)
Read the file from ART with suggested designs (i.e. reactions kos and overexpressions):
rec_df = pd.read_csv(f'{user_params["designsfilepath"]}/{user_params["designsfile"]}')
rec_df.head()
We then use MOMA to calculate flux profiles at each time point for the recommended strains. Instead of using the solution time series corresponding to the initial model, we use the solution time series corresponding to the higher production. The reason is that, otherwise, we would never see an increase in isoprenol production, since MOMA minimizes the changes in flux by design. Remember that our goal here is just to create realistic flux profiles that can be used to showcase ART. This approach is good enough for that purpose:
%%time
solutionsMOMA_TS = {}
cols = rec_df.columns[:-2]
if user_params['numinstances'] not in [None, 0]:
num_strains = user_params['numinstances']
else:
num_strains = rec_df.shape[0]
for i in tqdm(range(num_strains)):
design = rec_df[cols].loc[i]
if design['Line Name']=='WT':
solutionsMOMA_TS[i] = omg.getBEFluxes(model_TS, design, solution_TS, grid)
else:
solutionsMOMA_TS[i] = omg.getBEFluxes(model_TS, design, solutionHI_TS, grid)
Here we use the integrate_fluxes
function to produce the external metabolite concentrations compatible with the calculated fluxes:
cellsEmetsBE = {}
for i in range(num_strains):
cell, Emets = omg.integrate_fluxes(solutionsMOMA_TS[i], model_TS, user_params['ext_metabolites'], grid, user_params)
cellsEmetsBE[i] = (cell, Emets)
We can visualize the obtained concentrations for a recommendation:
i = 2
cellBE, EmetsBE = cellsEmetsBE[i]
plot_DO_extmets(cellBE, EmetsBE[['glc__D_e','isoprenol_e','ac_e','for_e','lac__D_e','etoh_e']])
EmetsBE
Collect all isoprenol production values in a single list first:
production = []
for i in range(user_params['numinstances']):
cell, Emets = cellsEmetsBE[i]
production.append(Emets.loc[9,'isoprenol_e'])
production_df = rec_df.copy()
production_df['Actual Isoprenol [mM]'] = production.copy()
production_df.loc[0:2,:]
pred_vs_actual(production_df)
df_filename = '../data/ART_recommendations_with_production.csv'
production_df.to_csv(df_filename)
We will generate omics data for the best recommendation, and store it in EDD. Firstly, we obtain the multiomics data for each time point:
proteomics_timeseries = {}
transcriptomics_timeseries = {}
metabolomics_timeseries = {}
metabolomics_oldids_timeseries = {}
fluxomics_timeseries = {}
for t in tspan:
fluxomics_timeseries[t] = solutionsMOMA_TS[0][t].fluxes.to_dict()
(proteomics_timeseries[t], transcriptomics_timeseries[t],
metabolomics_timeseries[t], metabolomics_oldids_timeseries[t]) = omg.get_multiomics(model,
solutionsMOMA_TS[0][t],
user_params['mapping_file'],
old_ids=True)
Write experiment description:
omg.write_experiment_description_file(user_params['edd_omics_file_path'], line_name='Strain 97', label='_BT')
Write OD data:
omg.write_OD_data(cell, user_params['edd_omics_file_path'], line_name='Strain 97', label='_BT')
Write external metablites:
omg.write_external_metabolite(Emets, user_params['edd_omics_file_path'], line_name='Strain 97', label='_BT')
Write omics data:
omg.write_omics_files(fluxomics_timeseries, 'fluxomics', user_params, line_name='Strain 97', label='_BT')
omg.write_omics_files(proteomics_timeseries, 'proteomics', user_params, line_name='Strain 97', label='_BT')
omg.write_omics_files(transcriptomics_timeseries, 'transcriptomics', user_params, line_name='Strain 97', label='_BT')
omg.write_omics_files(metabolomics_timeseries, 'metabolomics', user_params, line_name='Strain 97', label='_BT')
We will also write a small version of the multiomics data with a subset of proteins, transcripts and metabolites:
genesSM = ['b0180','b2708','b3197','b1094','b2224','b3256','b2316','b3255','b0185','b1101']
proteinsSM = ['P17115','P45395','P0A6A8','P76461','P77580','P24182','P0A9Q5','P0ABD5','P77580','P00893']
metabolitesSM = ['CID:1549101','CID:175','CID:164533','CID:15938965','CID:21604863','CID:15939608','CID:27284','CID:1038','CID:16741146','CID:1778309']
transcriptomics_timeseriesSM ={}
proteomics_timeseriesSM ={}
metabolomics_timeseriesSM ={}
for t in tspan:
transcriptomics_timeseriesSM[t] = {gene: transcriptomics_timeseries[t][gene] for gene in genesSM}
proteomics_timeseriesSM[t] = {protein: proteomics_timeseries[t][protein] for protein in proteinsSM}
metabolomics_timeseriesSM[t] = {metab: metabolomics_timeseries[t][metab] for metab in metabolitesSM}
omg.write_omics_files(proteomics_timeseriesSM, 'proteomics' , user_params, line_name='Strain 97', label='_BTSM')
omg.write_omics_files(transcriptomics_timeseriesSM,'transcriptomics', user_params, line_name='Strain 97', label='_BTSM')
omg.write_omics_files(metabolomics_timeseriesSM, 'metabolomics' , user_params, line_name='Strain 97', label='_BTSM')