This notebook takes the initial designs previously suggested by ART and uses OMG to create the corresponding final isoprenol concentrations. These concentrations, along with the designs will be later used by ART to make predictions and recommend new designs.
Tested using biodesign_3.7 kernel on jprime.lbl.gov server. It requires the cplex library for running the MOMA optimization.
iJO1366_MVA.json
file in the ../data/models
directory)../data/ice_mo_strains.csv
exported from ICE) containing the details of which reactions are either:EDD_experiment_description_file_BE_designs.csv
EDD_isoprenol_production.csv
The files are stored in the user defined directory.
Clone the git repository with the OMG
library:
git clone https://github.com/JBEI/OMG.git
or pull the latest version.
Importing needed libraries:
import sys
import os
sys.path.insert(1, '../../OMG')
sys.path.append('../')
import cobra
import pandas as pd
import omg
from plot_multiomics import *
from tqdm import tqdm
user_params = {
'host': 'ecoli', # ecoli or ropacus
'modelfile': '../data/models/iJO1366_MVA.json',
'cerevisiae_modelfile': '../data/models/iMM904.json',
'timestart': 0.0,
'timestop': 8.0,
'numtimepoints': 9,
'designsfile': 'ice_mo_strains.csv',
'designsfilepath': '../data/',
'mapping_file': '../mapping/inchikey_to_cid.txt',
'output_file_path': '../data/omg_output',
'edd_omics_file_path': '../data/omg_output/edd/',
'numreactions': 8,
'numinstances': 96,
'ext_metabolites': {
'glc__D_e': 22.203,
'nh4_e': 18.695,
'pi_e': 69.454,
'so4_e': 2.0,
'mg2_e': 2.0,
'k_e': 21.883,
'na1_e': 103.7,
'cl_e': 27.25,
'isoprenol_e': 0.0,
'ac_e': 0.0,
'for_e': 0.0,
'lac__D_e': 0.0,
'etoh_e': 0.0
},
'initial_OD': 0.01,
'BIOMASS_REACTION_ID': 'BIOMASS_Ec_iJO1366_core_53p95M'
}
First we obtain the metabolic model:
file_name = user_params['modelfile']
model = cobra.io.load_json_model(file_name)
We now add minimum flux constraints for production of isoprenol and formate, and we limit oxygen intake:
iso = 'EX_isoprenol_e'
iso_cons = model.problem.Constraint(model.reactions.EX_isoprenol_e.flux_expression,lb = 0.20)
model.add_cons_vars(iso_cons)
for_cons = model.problem.Constraint(model.reactions.EX_for_e.flux_expression,lb = 0.10)
model.add_cons_vars(for_cons)
o2_cons = model.problem.Constraint(model.reactions.EX_o2_e.flux_expression,lb = -8.0)
model.add_cons_vars(o2_cons)
And then we constrain several central carbon metabolism fluxes to more realistic upper and lower bounds:
CC_rxn_names = ['ACCOAC','MDH','PTAr','CS','ACACT1r','PPC','PPCK','PFL']
for reaction in CC_rxn_names:
reaction_constraint = model.problem.Constraint(model.reactions.get_by_id(reaction).flux_expression,lb = -1.0,ub = 1.0)
model.add_cons_vars(reaction_constraint)
We also create a similar model with a higher production of isoprenol, which we will use with MOMA to simulate bioengineered strains:
modelHI = model.copy()
iso_cons = modelHI.problem.Constraint(modelHI.reactions.EX_isoprenol_e.flux_expression,lb = 0.25)
modelHI.add_cons_vars(iso_cons)
First create the time grid for simulation:
t0 = user_params['timestart']
tf = user_params['timestop']
points = user_params['numtimepoints']
tspan, delt = np.linspace(t0, tf, points, dtype='float64', retstep=True)
grid = (tspan, delt)
We then use this model to obtain the times series for fluxes, OD and external metabolites:
solution_TS, model_TS, cell, Emets, Erxn2Emet = \
omg.get_flux_time_series(model, user_params['ext_metabolites'], grid, user_params)
We perform the same calculation for the model with higher isoprenol production that we created above:
solutionHI_TS, modelHI_TS, cellHI, EmetsHI, Erxn2EmetHI = \
omg.get_flux_time_series(modelHI, user_params['ext_metabolites'], grid, user_params)
First obtain the file from ICE with suggested designs (i.e. reactions kos and overexpressions):
designs_df = pd.read_csv(f'{user_params["designsfilepath"]}/{user_params["designsfile"]}',
usecols=['Part ID', 'Name', 'Summary'])
designs_df.columns = ['Part ID','Line Name','Line Description']
designs_df2 = designs_df.copy() # make copy for creating EDD experiment description file later
designs_df[:2]
In order to work with the ICE line descriptions we need to change it from its string format into numerical format into a dataframe (see below).
First, let's add columns for each reaction:
reactions = designs_df['Line Description'][0].split('_')[::2]
for rxn in reactions:
designs_df[rxn] = None
And then assign values for each reaction and line
for i in range(len(designs_df)):
if designs_df['Line Name'][i]=='WT':
designs_df.loc[i][reactions] = [1 for r in range(len(reactions))]
else:
values = designs_df.loc[i]['Line Description'].split('_')[1::2]
designs_df.loc[i][reactions] = [float(value) for value in values]
designs_df = designs_df.drop(columns=['Line Description','Part ID'])
The final dataframe involves the line name and numerical multiplier that we will use to simulate the bioengineered strains. Each design (line) involves the modification of up to 8 fluxes (1 -> keep the same; 2-> double flux, 0-> knock reaction out):
designs_df.tail()
We then use MOMA to calculate flux profiles at each time point for the bioengineered strains as indicated by the designs in this data frame (takes around 1 min per design). Instead of using the solution time series corresponding to the initial model, we use the solution time series corresponding to the higher production. The reason is that, otherwise, we would never see an increase in isoprenol production, since MOMA minimizes the changes in flux by design. Our goal here is just to create varied flux profiles that ART can learn from.
This is a long calculation (~1.5 hrs):
%%time
solutionsMOMA_TS = {}
cols = ['Line Name']
cols.extend(reactions)
if user_params['numinstances'] not in [None, 0]:
num_strains = user_params['numinstances']
else:
num_strains = designs_df.shape[0]
for i in tqdm(range(num_strains)): # Added counter bar here
design = designs_df[cols].loc[i]
if design['Line Name']=='WT':
solutionsMOMA_TS[i] = omg.getBEFluxes(model_TS, design, solution_TS, grid)
else:
solutionsMOMA_TS[i] = omg.getBEFluxes(model_TS, design, solutionHI_TS, grid)
As a sanity check, we can verify that the knocked out fluxes are zero:
i = 0
print(designs_df.loc[i,:], '\n')
for rxn in ['CS','PPC','PPCK','PFL']:
print(f'{rxn}: {solutionsMOMA_TS[i][5].fluxes[rxn]}')
Here we use the integrate_fluxes
function in OMG to produce the external metabolite concentrations which are the consequence of the calculated fluxes:
cellsEmetsBE = {}
for i in range(num_strains):
cell, Emets = omg.integrate_fluxes(solutionsMOMA_TS[i], model_TS, user_params['ext_metabolites'], grid, user_params)
cellsEmetsBE[i] = (cell, Emets)
We can check we obtain the same result with this function for the wild type as we did before in notebook A:
cellWT, EmetsWT = omg.integrate_fluxes(solution_TS, model_TS, user_params['ext_metabolites'], grid, user_params)
EmetsWT
plot_DO_extmets(cellWT, EmetsWT[['glc__D_e','isoprenol_e','ac_e','for_e','lac__D_e','etoh_e']])
And compare these growth and production profiles with any other bioengineered strain:
i = 2
cellBE, EmetsBE = cellsEmetsBE[i]
plot_DO_extmets(cellBE, EmetsBE[['glc__D_e','isoprenol_e','ac_e','for_e','lac__D_e','etoh_e']])
EmetsBE
Firstly, let's collect all isoprenol production values in a single list:
production = []
for i in range(user_params['numinstances']):
cell, Emets = cellsEmetsBE[i]
production.append(Emets.loc[user_params['numtimepoints'],'isoprenol_e'])
Then, let's create a new data frame and append the production values for each strain/line:
production_df = designs_df.copy()
production_df['Isoprenol'] = pd.Series(production)
production_df.loc[0:2,:]
The maximum production is higher than for the original (WT) strain (0.462):
np.max(production_df['Isoprenol'])
Remove not needed columns:
production_edd_df = production_df.drop(columns=reactions).copy()
Rename isoprenol column:
isoprenol_cid = 'CID:12988'
production_edd_df = production_edd_df.rename(columns={'Isoprenol': isoprenol_cid})
Pivot the dataframe for EDD format:
production_edd_df = production_edd_df.set_index('Line Name').stack().reset_index()
production_edd_df.columns = ['Line Name', 'Measurement Type', 'Value']
Add Time and Units columns:
production_edd_df['Time'] = 9.0
production_edd_df['Units'] = 'mM'
production_edd_df.head()
Save the dataframe as csv:
production_file_name = f'{user_params["edd_omics_file_path"]}/EDD_isoprenol_production.csv'
production_edd_df.to_csv(production_file_name, index=False)
We then create the EDD_experiment_description_file_BE_designs.csv
file for the import of data into EDD:
experiment_description_file_name = f'{user_params["edd_omics_file_path"]}/EDD_experiment_description_file_BE_designs.csv'
with open(experiment_description_file_name, 'w') as fh:
fh.write('Part ID, Line Name, Line Description, Media, Shaking Speed, Starting OD, Culture Volume, Flask Volume, Growth Temperature, Replicate Count\n')
for i in range(len(designs_df2)):
fh.write(f"{designs_df2.loc[i]['Part ID']}, \
{designs_df2.loc[i]['Line Name']}, \
{designs_df2.loc[i]['Line Description']}, \
M9, 1, 0.1, 50, 200, 30, 1\n")