This notebook uses the designs created by ART and the corresponding isoprenol production levels generated through OMG, to build a predictive model through ART. ART's model will be able to predict isoprenol production given the design as input. We will then use this model to recommend designs that optimize isoprenol production.
Tested using ART_v3.6 kernel on jprime.lbl.gov
ART_training_EDDstyle.csv
, a file for ART front end import ARTrecommendations.csv
, a file containing the ART recommendationsfrom edd_utils import login, export_study
First let's decide which study we want to get, from which EDD server, and using which login name:
study_slug = 'multiomics-be-strains-data-089b'
edd_server = 'public-edd.agilebiofoundry.org'
user = 'tradivojevic'
Export the EDD study that contains isoprenol production data using edd-utils
package (use your own username for EDD):
session = login(edd_server=edd_server, user=user)
df = export_study(session, study_slug, edd_server=edd_server)
df.head()
Keep only the necesarry columns:
df = df[['Line Name','Line Description','Measurement Type', 'Value']]
df.head()
Add columns for each reaction:
reactions = df['Line Description'][0].split('_')[::2]
for rxn in reactions:
df[rxn] = None
df.tail()
And assign values for each reaction and line:
for i in range(len(df)):
if df['Line Name'][i]=='WT':
for r in range(len(reactions)):
df.iloc[i, (4+r)] = float(1)
else:
values = df.loc[i]['Line Description'].split('_')[1::2]
for r,value in zip(range(len(reactions)),values):
df.iloc[i, (4+r)] = float(value)
df = df.drop(columns='Line Description')
Each design (line) involves the modification of up to 8 fluxes (1 -> keep the same; 2-> double flux, 0-> knock reaction out):
df.tail()
How many designs improve production over the wild type?
num_improved_production = len(df[df['Value'] > df.loc[95]['Value']])
print(f'{num_improved_production} designs out of {len(df)} improve production of isoprenol ({num_improved_production/len(df)*100:.2f}%).')
Rename Value
column to the formal metabolite name:
production_name = df['Measurement Type'][0]
df = df.rename(columns={'Value': production_name})
df = df.drop(columns='Measurement Type')
df.tail()
df[df[production_name] > df.loc[95][production_name]]
Pivot the dataframe back to EDD format, now including all the reaction names and modifications:
df = df.set_index('Line Name').stack().reset_index()
df.columns = ['Line Name', 'Measurement Type', 'Value']
df.head()
Save this dataframe to a file for ART front end:
data_file = '../data/ART_training_EDDstyle.csv'
df.to_csv(data_file, header=True, index=False)
Store the names of all variables:
variables = df['Measurement Type'][df['Line Name']=='Strain 1'].tolist()
The first step is to make sure the ART library is available in your kernel (ART_v3.6 has all the necessary dependencies). Clone the corresponding git repository:
git clone https://github.com/JBEI/AutomatedRecommendationTool.git
(Information about licensing ART is available at https://github.com/JBEI/ART.)
We can then add library to the path and do the necessary imports:
import sys
sys.path.append('../../AutomatedRecommendationTool')
from art.core import *
import pickle
And then define some ART input parameters:
user_params = {}
user_params['num_recommendations'] = 10 # Number of final recommendations
user_params['output_directory'] = '../data/art_output/' # Directory to store output files
The first step is to create a dictionary that contains the settings for ART:
art_params = {
'response_var': [variables[0]],
'input_var': variables[1:],
'input_var_type': 'Categorical',
'seed': 10,
'num_recommendations': user_params['num_recommendations'],
'cross_val': True,
'output_directory': user_params['output_directory']
}
With this setting, you can now run ART. However, this takes around 25min, so you can set run_art to False and load the previously run model, which will be much faster:
run_art = True
The folliwing cell will generate plots of the (cross-validated) predictions vs observation, to gauge the quality of the predictions; a plot of the predicted distribution for all recommendations; a plot of success probability (for improving the current best production) vs the number of recommended strains engineered.
%%time
if run_art:
art = RecommendationEngine(df, **art_params)
else:
with open(os.path.join(art_params['output_directory'], 'art.pkl'), 'rb') as output:
art = pickle.load(output)
utils.save_pkl_object(art)
art.recommendations
It turns out that all recommendations indicate that the CS and ACACT1r reaction fluxes should double and PPCK should be knocked out.
Finally we save the recommendations, along with the predicted production levels. We will compare them with the ground truth provided by OMG in the next notebook.
First, we change the last column name to indicate it is predicted:
pred_col_name = 'Mean predicted Isoprenol [mM]'
art.recommendations = art.recommendations.rename(columns={art_params['response_var'][0]: pred_col_name})
Then, we add standard deviation predictions for the recommendations:
pp_rec_mean, pp_rec_std = art.post_pred_stats(art.recommendations.values[:,:-1])
art.recommendations['SD Isoprenol [mM]'] = pp_rec_std.copy()
We assign Line Name to each of the recommendations:
n_instances = len(set(df['Line Name']))
art.recommendations.insert(loc=0, column='Line Name', value=['Strain ' + str(n_instances+i) for i in range(1,art_params['num_recommendations']+1)])
art.recommendations.head()
And finally save it in a file:
rec_filename = f'{art.outDir}/ARTrecommendations.csv'
art.recommendations.to_csv(rec_filename, header=True, index=False)