Mana Modeller modules
batchs module
- manamodeller.batchs.write_div_enum_script(script_path, batch_directory, rxn_enum_set_dir, output_directory, modelfile, weightfile, reactionFile, prev_sol_dir='prev_sol_dir/', log_dir='log_dir', env='MANA', dist_anneal=0.9, obj_tol=0.01, iters=100, para_batchs=False)[source]
write_div_enum_script.
- Parameters:
script_path (str) – path to the diversity_enum.py dexom python script
batch_directory (str) – path to the directory were batch files should be written
rxn_enum_set_dir (str) – path to the directory of processed reaction-enum results
output_directory (str) – path to the directory were diversity-enum modelling results should be written
modelfile (str) – path to the model’s json file
weightfile (str) – path to the csvs file that contains binarized reactions activity (according to transcriptomic data)
reactionFile (str) – path to the file that contains the list of reactions in the model
prev_sol_dir (str) – path to the directory were reaction-enum solutions used as starting point for the diversity enumeration process should be saved
log_dir (str) – path to the directory were log files should be stored
env (str) – name of the anaconda environment to be activated
dist_anneal (float) – dexom-python parameter, 0<=a<=1 controls the distance between each successive solution
obj_tol (float) – dexom-python parameter, objective value tolerance, as a fraction of the original value
iters (int) – dexom-python parameter, maximal number of iterations
para_batchs (boolean) – if True, launch each batch file independantly (instead of parallel on conditions, parallel on batch)
- Returns:
- Return type:
write batch files ready to launch on a adequatly prepared slurm computing platform
- manamodeller.batchs.write_rxn_enum_script(script_path, batch_directory, output_directory, modelfile, weightfile, reactionFile='', log_dir='log_dir', env='MANA', obj_tol=0.001, iters=100, para_batchs=False)[source]
write_rxn_enum_script.
- Parameters:
script_path (str) – path to the diversity_enum.py dexom python script
batch_directory (str) – path to the directory were batch files should be written
output_directory (str) – path to the directory were diversity-enum modelling results should be written
modelfile (str) – path to the model’s json file
weightfile (str) – path to the csvs file that contains binarized reactions activity (according to transcriptomic data)
reactionFile (str) – path to the file that contains the list of reactions in the model
log_dir (str) – path to the directory were log files should be stored
env (str) – name of the anaconda environment to be activated
obj_tol (float) – dexom-python parameter, objective value tolerance, as a fraction of the original value
iters (int) – dexom-python parameter, maximal number of iterations
para_batchs (boolean) – if True, launch each batch file independantly (instead of parallel on conditions, parallel on batch)
- Returns:
- Return type:
write batch files ready to launch on a adequatly prepared slurm computing platform
dars module
- manamodeller.dars.calculate_freq_ctrls(rListFile, all_cpds, time, pheno, working_path)[source]
calculate_freq_ctrls.
- Parameters:
rListFile (str) – the path to the model’s reactions list file
all_cpds (str) – the list of compounds to process, from the props.properties file
time (str) – the exposure time to consider
pheno (pandas dataframe) – the pandas data frame containing Open TG-Gates metadata.
working_path (str) – the root path of your working directory.
- Returns:
return a pandas dataframe with the calculated frequencies
for all controls partial enumeration results corresponding to input parameters
- manamodeller.dars.calculate_frequencies(data, name)[source]
calculate_frequencies.
- Parameters:
data (pandas dataframe) – a dataframe containing the partial enumeration results
name (str) – the name of the pandas Series
- Returns:
- Return type:
a pandas series with the sum of partial enumeration results for each reaction.
- manamodeller.dars.calculate_frequencies_for_dir(full_enum_path, rList, output_file='')[source]
calculate_frequencies_for_dir.
- Parameters:
full_enum_path (str) – the path to the full_enum directory
rList (list) – the list of reactions in the model
output_file (str) – if not empty, the frequencies table is written in csv to the path provided in this param
- Returns:
- Return type:
a pandas dataframe containing the frequencies table, of each csv file found in the full enum path.
- manamodeller.dars.compute_scores(comp_freq, crossing_point=1, crossing_point_1_2=1.2, b=1)[source]
compute_scores.
- Parameters:
comp_freq (pandas dataframe) – a pandas dataframe containing several metrics computed from activation frequencies
crossing_point (int) – a crossing point factor for the circle center calculation
crossing_point_1_2 (float) – another crossing point factor the circle center calculation
b (int) – a parameter for the calculation of ellipses
- Returns:
a pandas dataframe with several metrics computed
- Return type:
R2, center of circle, dist_to_OO, center of circle 1.2
- manamodeller.dars.findCircleCenter(A, B, C)[source]
findCircleCenter.
- Parameters:
A (list) – a list containing x and y coordinates of point A, a point of the circle O
B (list) – a list containing x and y coordinates of point B, a point of the circle O
C (list) – a list containing x and y coordinates of point C, a point of the circle O
- Returns:
a pandas dataframe containing
- Return type:
calculated properties of the circle O
- manamodeller.dars.rescale_and_rotate(comp_freq)[source]
rescale_and_rotate.
- Parameters:
comp_freq (pandas dataframe) – a pandas dataframe containing several metrics computed from activation frequencies
- Returns:
- Return type:
a pandas dataframe with f_ctrl and f_trt that have been rescaled, rescaled and rotated
- manamodeller.dars.rotate(vector, theta, rotation_around=None)[source]
rotate.
reference: https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions
- Parameters:
vector (pandas dataframe) – the activation frequencies dataframe to rotate
theta (float) – rotation angle in radians
rotation_around (np.array) – A point around which vector will be rotated around. Can be None
- Returns:
- Return type:
The rotated dataframe
modelling module
- manamodeller.modelling.eval_gpr_activity(expr, gh, gl)[source]
eval_gpr_activity.
This is an adaptation of the eval_gpr function available in cobrapy. Instead of evaluating if a GPR is active according to a list of knockout genes, it will evaluate if the GPR is regulated by a list of genes
Exemple of usage : provide the list of Highly expressed genes and Lowly expressed genes Return all expressions that are True, thus Highly expressed
evaluate compiled ast of gene_reaction_rule with list of active genes
- Parameters:
expr (Expression) – The ast of the gene reaction rule
gh (list) – list of highly expressed genes
gl (list) – list of lowly expressed genes
- Returns:
True if the reaction is active with the given gh and gl lists otherwise false
- Return type:
bool
- manamodeller.modelling.find_high_low_exprs(uarray_data, threshold_dw_perc=25, threshold_up_perc=75)[source]
find_high_low_exprs.
- Parameters:
uarray_data (pandas dataframe) – Description of parameter uarray_data.
threshold_dw_perc (int) – The percentile below which we consider that genes are not expresed.
threshold_up_perc (int) – The percentile above which we consider that genes are highly expressed.
- Returns:
Return list of highly/lowly expressed gene .
- Return type:
list
- manamodeller.modelling.find_reactions_expression_levels(gprs, gh, gl)[source]
find_reactions_expression_levels.
- Parameters:
gprs (cobra gpr) – A cobra gpr object, obtained from a cobra model
gh (list) – list of highly expressed genes
gl (list) – list of lowly expressed genes
- Returns:
a tuple of three list. rh, is the list of reactions identified as active according to gene expression data rl, is the list of reactions identified as inactive according to gene expression data rn, is the list of reactions not constrained by gene expression data
- Return type:
tuple
- manamodeller.modelling.fullname_equation(reaction)[source]
fullname_equation.
- Parameters:
reaction (cobra reaction) – A cobra reaction object, obtained from a cobra model
- Returns:
the reaction with complete metabolites names
- Return type:
str
- manamodeller.modelling.get_GPR_reactions(model)[source]
get_GPR_reactions.
- Parameters:
model (cobra model) – A cobra object, loaded with the cobra library
- Returns:
a dataframe with all the reactions in the model having a GPR
- Return type:
pandas dataframe
- manamodeller.modelling.get_gene_list(model)[source]
get_gene_list.
- Parameters:
model (cobra model) – A cobra object, loaded with the cobra library
- Returns:
a list containing all the genes in the model
- Return type:
list
- manamodeller.modelling.get_reactions_ids(model)[source]
get_reactions_ids.
- Parameters:
model (cobra model) – A cobra object, loaded with the cobra library
- Returns:
a dataframe with all the reactions and eventually their GPR
- Return type:
pandas dataframe
- manamodeller.modelling.identify_model_gene_ids(model)[source]
identify_model_gene_ids.
- Parameters:
model (cobra model) – A cobra object, loaded with the cobra library
- Returns:
The type of gene identifiers used in the model
- Return type:
str
- manamodeller.modelling.map_single_column(data, hgnc_data, col_to_add)[source]
map_single_column.
- Parameters:
data (pandas dataframe) – a gene expression dataset
hgnc_data (pandas dataframe) – a hgnc database annotation dataframe
col_to_add (str) – the name of the identifier column to map
- Returns:
the gene expressed dataframe with the mapped identifier added
- Return type:
pandas dataframe
- manamodeller.modelling.preprocess_data(data, gene_id_col, model, pickle_path='', csvs_path='', threshold_dw_perc=25, threshold_up_perc=75)[source]
preprocess_data.
- Parameters:
data (pandas dataframe) – a gene expression dataset
gene_id_col (str) – the name of the gene identifier column
model (cobra model) – A cobra object, loaded with the cobra library
pickle (str) – if not empty, write pkl files containing categorized reactions activity at given location
csvs (str) – if not empty, write csv files containing categorized reactions activity at given location
threshold_dw_perc (int) – The percentile below which we consider that genes are not expresed.
threshold_up_perc (int) – The percentile above which we consider that genes are highly expressed.
- Returns:
- Return type:
Write a csv or a pickle with for categorized reactions activity
results_analysis module
- manamodeller.results_analysis.dendro_reactions(matrix, title='No title set')[source]
dendro_reactions.
- Parameters:
matrix (pandas dataframe) – a distance matrix
title (str) – the title of the dendrogram
- Returns:
- Return type:
a scipy linkage object and show plot the dendrogram
- manamodeller.results_analysis.extract_reactions_from_clusters(matrix, title, write_files=False, file_prefix='cluster', header=True)[source]
extract_reactions_from_clusters.
- Parameters:
matrix (pandas dataframe) – a distance matrix
title (str) – the title of dendrograms plots
write_files (boolean) – if true, write cluster’s reaction files
file_prefix (str) – prefix to add when saving cluster’s reaction file
header (boolean) – if true, add a header to the cluster’s reaction file
- Returns:
- Return type:
Write a csv or a pickle with for categorized reactions activity
- manamodeller.results_analysis.generate_annotation_table(cluster_file, model, hgnc_data, DARs_direction, outputFile)[source]
generate_annotation_table.
- Parameters:
cluster_file (str) – path to the tsv file containg current cluster’s DARs
model (cobra model) – A cobra object, loaded with the cobra library
hgnc_data (pandas dataframe) – a hgnc database annotation dataframe
DARs_direction (str) – the path to the tsv file for the molecule associated with current cluster to get DARs’ direction
outputFile (str) – path and name of the outputFile (xlsx file)
- Returns:
- Return type:
Write an excel file with one reaction per line and corresponding annotations in columns
- manamodeller.results_analysis.get_node_list(gml_file)[source]
get_node_list.
- Parameters:
gml_file (str) – the path to the gml file
- Returns:
- Return type:
a list containing nodes ids from the gml file
- manamodeller.results_analysis.visualize_gml(gml_file, window_size=['1000px', '1000px'], notebook=True)[source]
visualize_gml.
- Parameters:
gml_file (str) – the path to the gml file
window_size (list) – the list of x and y pixel sizes for the output window
notebook (boolean) – if true enable the pyvis optimized visualisation for notebooks
- Returns:
- Return type:
an interactive visualisation window of the graph
results_processing module
- manamodeller.results_processing.concatenate_csv(filenames, out_dir, col_index, single_csv, index_suffix='')[source]
concatenate_csv.
- Parameters:
filenames (str) – list of csv files to concatenate into one csv file
out_dir (str) – path to the concatenated csv output directory
col_index (str) – column name of a column to be used as index (optional)
single_csv (boolean) – option for the concatenate_csv function, if True all solutions will be stored in a single csv file
index_suffix (str) – suffix to add to csv’s row index
- Returns:
- Return type:
write the concatenated csv in the
- manamodeller.results_processing.concatenate_reaction_div_enum(path_concat_rxn_enum, path_concat_div_enum, out_dir, col_index='', single_csv=False, ncpus=1)[source]
concatenate_reaction_div_enum.
- Parameters:
path_concat_rxn_enum (str) – path to the concatenated reaction enum directory
path_concat_div_enum (str) – path to the concatenated diversity enum directory
out_dir (str) – path to the csvs output directory
col_index (str) – column name of a column to be used as index (optional)
single_csv (boolean) – option for the concatenate_csv function, if True all solutions will be stored in a single csv file
ncpus (int) – the number of cpus allocated, will enable parallel processing
- Returns:
- Return type:
a JoinableQueue Object
- manamodeller.results_processing.concatenate_solutions(csv_dir, out_dir, col_index='', single_csv=False, ncpus=1, restart=False)[source]
concatenate_solutions.
- Parameters:
csv_dir (str) – path to the csvs to concatenate directory
out_dir (str) – path to the csvs output directory
col_index (str) – column name of a column to be used as index (optional)
single_csv (boolean) – option for the concatenate_csv function, if True all solutions will be stored in a single csv file
ncpus (int) – the number of cpus allocated, will enable parallel processing
- Returns:
- Return type:
a JoinableQueue Object
- manamodeller.results_processing.remove_done_batchs(batch_dir, result_dir, launch_undone=True, relax_param=False, enum_type='reaction_enum', para_batch=False, env='MANA')[source]
remove_done_batchs.
- Parameters:
batch_dir (str) – path to the batchs directory
result_dir (str) – path to the modelling result directory
launch_undone (boolean) – If True, write the master bash file to launch all failed batchs
relax_param (boolean) – If True, relax the mipgap tolerance parameter
enum_type (str) – string indicating which type of enumeration is being processed (optional)
para_batch (boolean) – if True, launch each batch file independantly (instead of parallel on conditions, parallel on batch)
env (str) –
activated (name of the anaconda environment to be) –
- Returns:
- Return type:
a list with failed batch names
- manamodeller.results_processing.remove_zerobiomass_solutions(enum_dir, reaction_list, separator=',')[source]
remove_zerobiomass_solutions.
- Parameters:
enum_dir (str) – the path to the enumeration directory
reaction_list (str) – the path to the reaction_list directory
separator (str) – the character used to separate columns in the file
- Returns:
- Return type:
overwrite the csv file without solutions with 0 flux in biomass reaction
utils module
- manamodeller.utils.launch_multi_proc(num_workers, q)[source]
launch_multi_proc.
- Parameters:
num_workers (int) – the number of workers which will define the number of allowed parallel threads
q (JoinableQueue) – a JoinableQueue object filled with tasks to perform
- Returns:
- Return type:
None
- manamodeller.utils.make_csvs(data, out_folder, celfilename)[source]
make_csvs.
- Parameters:
data (list) – list of lists with data to be transformed into a pandas dataframe
out_folder (str) – the path where csvs files will be saved
celfilename (str) – the initial CEL filename (used as identifier)
- Returns:
- Return type:
an interactive visualisation window of the graph
- manamodeller.utils.make_pickle(object, filename)[source]
make_pickle.
- Parameters:
object (pkl) – the pkl object to save
filename (str) – the path and filename for the pickle file
notebook (boolean) – if true enable the pyvis optimized visualisation for notebooks
- Returns:
- Return type:
write a .pkl file at the designated location