This notebook designs the experiment needed to feed ART's predictive capabilities. We use ART to provide suggested designs (flux modifications) for which to get phenotypic data. These designs and phenotypic data will be used later to predict new designs.
Tested using ART_v3.6 kernel on jprime.lbl.gov
ICE_MOstrains.csv
(to be used for ICE import)Clone the git repository with the ART
library
git clone https://github.com/JBEI/AutomatedRecommendationTool.git
or pull the latest version.
Information about licensing ART is available at https://github.com/JBEI/ART.
Importing needed libraries:
import sys
sys.path.append('../../AutomatedRecommendationTool') # Make sure this is the location for the ART library
sys.path.append('../')
from art.core import *
from plot_multiomics import plot_distribution_of_designs
These are the reactions we consider for genetic engineering in terms of isoprenol production:
user_params = {
'reactions': ['ACCOAC',
'MDH',
'PTAr',
'CS',
'ACACT1r',
'PPC',
'PPCK',
'PFL']
}
n_reactions = len(user_params['reactions'])
We consider the following possible genetic modifications:
and we describe them through numerical categories:
user_params['modif_dict'] = {
'KO': int(0),
'NoMod': int(1),
'UP': int(2)
}
Here we specify how many instances (designs) we want to create (change as desired):
user_params['n_instances'] = 96
And we define the path and name for the output file:
user_params['designs_ice_file'] = '../data/art_output/ICE_MOstrains.csv'
Here we calculate which percentage of possible designs we have covered with this initial set:
n_modifications = len(user_params['modif_dict'])
tot_number_mod = n_modifications**n_reactions
print(f"Total number of possible modifications: {tot_number_mod}")
trainingset = user_params['n_instances']/tot_number_mod*100
print(f"Training set size: {user_params['n_instances'] } ({trainingset:.2f}% of the total)")
Define a dictionary that contains the settings that ART will use to find the recommended designs:
art_params = {
'input_var': user_params['reactions'], # input variables, i.e. features
'num_recommendations': user_params['n_instances'] - 1, # one of them will be wild type
'initial_cycle': True, # Set this to True for initial designs recommendations
'seed': 10, # seed for number random generator
'output_directory': '../data/art_output/initial_designs' # directory to store this output
}
As current version of ART works only with continuous variables, we will first find recommended designs in the interval [0, 1] and then transform each of those to one of numerical categories defined above {0, 1, 2} by using the floor function
$$f(x) = 3\lfloor x \rfloor$$With the configuration stored in art_params, we now run ART:
art = RecommendationEngine(**art_params)
df = art.recommendations.copy()
df.tail()
And transform the initial designs to categories (0, 1 or 2) by using the floor function:
df = np.floor(3 * df)
df.tail()
We include as last design the wild type strain, since we will use this list to find isoprenol production:
df.loc[user_params['n_instances']-1] = [user_params['modif_dict']['NoMod'] for i in range(n_reactions)]
df = df.astype(int)
df.tail()
The distribution of initial desings is approximately the same for each category (i.e. modification):
plot_distribution_of_designs(df)
First we need ot create the line names for all designs:
df.insert(loc=0, column='Line Name', value=['Strain ' + str(i) for i in range(1,user_params['n_instances'])] + ['WT'])
df.tail()
Add here the appropriate PI name and email:
PI = "Hector Garcia Martin" # Change to your PI!!!
PI_email = "hgmartin@lbl.gov" # Cahnge to your PI's email!!!!
And then we proceed to generate the ICE file:
header = 'Principal Investigator*, Principal Investigator Email, \
Funding Source, Intellectual Property, BioSafety Level*, Name*, Alias, \
Keywords, Summary*, Notes, References, Links, Status*, Creator*, Creator Email*, Host, \
Genotype or Phenotype, Selection Markers*, Sequence Trace File(s), Sequence File, \
Attachment File'
append_pi = PI + ', ' + PI_email + ', , , 1'
append_desc = ', , , Complete, ' + PI + ', ' + PI_email + ', , , , , , '
with open(user_params['designs_ice_file'], 'w') as fh:
fh.write(f'{header}\n')
for i in range(len(df)) :
if df.loc[i, "Line Name"] == 'WT':
strain_name = 'WT'
strain_description = 'Wild type E. coli'
else:
strain_name = df.loc[i, "Line Name"]
strain_description = '_'.join([f'{reaction}_{df.loc[i, reaction]}' for reaction in user_params['reactions']])
fh.write(f'{append_pi},{strain_name}, , ,{strain_description},{append_desc}\n')