Mana Modeller modules

batchs module

manamodeller.batchs.write_div_enum_script(script_path, batch_directory, rxn_enum_set_dir, output_directory, modelfile, weightfile, reactionFile, prev_sol_dir='prev_sol_dir/', log_dir='log_dir', env='MANA', dist_anneal=0.9, obj_tol=0.01, iters=100, para_batchs=False)[source]

write_div_enum_script.

Parameters:

script_path (str) – path to the diversity_enum.py dexom python script
batch_directory (str) – path to the directory were batch files should be written
rxn_enum_set_dir (str) – path to the directory of processed reaction-enum results
output_directory (str) – path to the directory were diversity-enum modelling results should be written
modelfile (str) – path to the model’s json file
weightfile (str) – path to the csvs file that contains binarized reactions activity (according to transcriptomic data)
reactionFile (str) – path to the file that contains the list of reactions in the model
prev_sol_dir (str) – path to the directory were reaction-enum solutions used as starting point for the diversity enumeration process should be saved
log_dir (str) – path to the directory were log files should be stored
env (str) – name of the anaconda environment to be activated
dist_anneal (float) – dexom-python parameter, 0<=a<=1 controls the distance between each successive solution
obj_tol (float) – dexom-python parameter, objective value tolerance, as a fraction of the original value
iters (int) – dexom-python parameter, maximal number of iterations
para_batchs (boolean) – if True, launch each batch file independantly (instead of parallel on conditions, parallel on batch)

Returns:

Return type:

write batch files ready to launch on a adequatly prepared slurm computing platform

manamodeller.batchs.write_rxn_enum_script(script_path, batch_directory, output_directory, modelfile, weightfile, reactionFile='', log_dir='log_dir', env='MANA', obj_tol=0.001, iters=100, para_batchs=False)[source]

write_rxn_enum_script.

Parameters:

script_path (str) – path to the diversity_enum.py dexom python script
batch_directory (str) – path to the directory were batch files should be written
output_directory (str) – path to the directory were diversity-enum modelling results should be written
modelfile (str) – path to the model’s json file
weightfile (str) – path to the csvs file that contains binarized reactions activity (according to transcriptomic data)
reactionFile (str) – path to the file that contains the list of reactions in the model
log_dir (str) – path to the directory were log files should be stored
env (str) – name of the anaconda environment to be activated
obj_tol (float) – dexom-python parameter, objective value tolerance, as a fraction of the original value
iters (int) – dexom-python parameter, maximal number of iterations
para_batchs (boolean) – if True, launch each batch file independantly (instead of parallel on conditions, parallel on batch)

Returns:

Return type:

write batch files ready to launch on a adequatly prepared slurm computing platform

dars module

manamodeller.dars.calculate_freq_ctrls(rListFile, all_cpds, time, pheno, working_path)[source]

calculate_freq_ctrls.

Parameters:

rListFile (str) – the path to the model’s reactions list file
all_cpds (str) – the list of compounds to process, from the props.properties file
time (str) – the exposure time to consider
pheno (pandas dataframe) – the pandas data frame containing Open TG-Gates metadata.
working_path (str) – the root path of your working directory.

Returns:

return a pandas dataframe with the calculated frequencies
for all controls partial enumeration results corresponding to input parameters

manamodeller.dars.calculate_frequencies(data, name)[source]

calculate_frequencies.

Parameters:

data (pandas dataframe) – a dataframe containing the partial enumeration results
name (str) – the name of the pandas Series

Returns:

Return type:

a pandas series with the sum of partial enumeration results for each reaction.

manamodeller.dars.calculate_frequencies_for_dir(full_enum_path, rList, output_file='')[source]

calculate_frequencies_for_dir.

Parameters:

full_enum_path (str) – the path to the full_enum directory
rList (list) – the list of reactions in the model
output_file (str) – if not empty, the frequencies table is written in csv to the path provided in this param

Returns:

Return type:

a pandas dataframe containing the frequencies table, of each csv file found in the full enum path.

manamodeller.dars.compute_scores(comp_freq, crossing_point=1, crossing_point_1_2=1.2, b=1)[source]

compute_scores.

Parameters:

comp_freq (pandas dataframe) – a pandas dataframe containing several metrics computed from activation frequencies
crossing_point (int) – a crossing point factor for the circle center calculation
crossing_point_1_2 (float) – another crossing point factor the circle center calculation
b (int) – a parameter for the calculation of ellipses

Returns:

a pandas dataframe with several metrics computed

Return type:

R2, center of circle, dist_to_OO, center of circle 1.2

manamodeller.dars.findCircleCenter(A, B, C)[source]

findCircleCenter.

Parameters:

A (list) – a list containing x and y coordinates of point A, a point of the circle O
B (list) – a list containing x and y coordinates of point B, a point of the circle O
C (list) – a list containing x and y coordinates of point C, a point of the circle O

Returns:

a pandas dataframe containing

Return type:

calculated properties of the circle O

manamodeller.dars.rescale_and_rotate(comp_freq)[source]

rescale_and_rotate.

Parameters:: comp_freq (pandas dataframe) – a pandas dataframe containing several metrics computed from activation frequencies
Returns:
Return type:: a pandas dataframe with f_ctrl and f_trt that have been rescaled, rescaled and rotated

manamodeller.dars.rotate(vector, theta, rotation_around=None)[source]

rotate.

reference: https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions

Parameters:

vector (pandas dataframe) – the activation frequencies dataframe to rotate
theta (float) – rotation angle in radians
rotation_around (np.array) – A point around which vector will be rotated around. Can be None

Returns:

Return type:

The rotated dataframe

modelling module

manamodeller.modelling.eval_gpr_activity(expr, gh, gl)[source]

eval_gpr_activity.

This is an adaptation of the eval_gpr function available in cobrapy. Instead of evaluating if a GPR is active according to a list of knockout genes, it will evaluate if the GPR is regulated by a list of genes

Exemple of usage : provide the list of Highly expressed genes and Lowly expressed genes Return all expressions that are True, thus Highly expressed

evaluate compiled ast of gene_reaction_rule with list of active genes

Parameters:

expr (Expression) – The ast of the gene reaction rule
gh (list) – list of highly expressed genes
gl (list) – list of lowly expressed genes

Returns:

True if the reaction is active with the given gh and gl lists otherwise false

Return type:

bool

manamodeller.modelling.find_high_low_exprs(uarray_data, threshold_dw_perc=25, threshold_up_perc=75)[source]

find_high_low_exprs.

Parameters:

uarray_data (pandas dataframe) – Description of parameter uarray_data.
threshold_dw_perc (int) – The percentile below which we consider that genes are not expresed.
threshold_up_perc (int) – The percentile above which we consider that genes are highly expressed.

Returns:

Return list of highly/lowly expressed gene .

Return type:

list

manamodeller.modelling.find_reactions_expression_levels(gprs, gh, gl)[source]

find_reactions_expression_levels.

Parameters:

gprs (cobra gpr) – A cobra gpr object, obtained from a cobra model
gh (list) – list of highly expressed genes
gl (list) – list of lowly expressed genes

Returns:

a tuple of three list. rh, is the list of reactions identified as active according to gene expression data rl, is the list of reactions identified as inactive according to gene expression data rn, is the list of reactions not constrained by gene expression data

Return type:

tuple

manamodeller.modelling.fullname_equation(reaction)[source]

fullname_equation.

Parameters:: reaction (cobra reaction) – A cobra reaction object, obtained from a cobra model
Returns:: the reaction with complete metabolites names
Return type:: str

manamodeller.modelling.get_GPR_reactions(model)[source]

get_GPR_reactions.

Parameters:: model (cobra model) – A cobra object, loaded with the cobra library
Returns:: a dataframe with all the reactions in the model having a GPR
Return type:: pandas dataframe

manamodeller.modelling.get_gene_list(model)[source]

get_gene_list.

Parameters:: model (cobra model) – A cobra object, loaded with the cobra library
Returns:: a list containing all the genes in the model
Return type:: list

manamodeller.modelling.get_reactions_ids(model)[source]

get_reactions_ids.

Parameters:: model (cobra model) – A cobra object, loaded with the cobra library
Returns:: a dataframe with all the reactions and eventually their GPR
Return type:: pandas dataframe

manamodeller.modelling.identify_model_gene_ids(model)[source]

identify_model_gene_ids.

Parameters:: model (cobra model) – A cobra object, loaded with the cobra library
Returns:: The type of gene identifiers used in the model
Return type:: str

manamodeller.modelling.map_single_column(data, hgnc_data, col_to_add)[source]

map_single_column.

Parameters:

data (pandas dataframe) – a gene expression dataset
hgnc_data (pandas dataframe) – a hgnc database annotation dataframe
col_to_add (str) – the name of the identifier column to map

Returns:

the gene expressed dataframe with the mapped identifier added

Return type:

pandas dataframe

manamodeller.modelling.preprocess_data(data, gene_id_col, model, pickle_path='', csvs_path='', threshold_dw_perc=25, threshold_up_perc=75)[source]

preprocess_data.

Parameters:

data (pandas dataframe) – a gene expression dataset
gene_id_col (str) – the name of the gene identifier column
model (cobra model) – A cobra object, loaded with the cobra library
pickle (str) – if not empty, write pkl files containing categorized reactions activity at given location
csvs (str) – if not empty, write csv files containing categorized reactions activity at given location
threshold_dw_perc (int) – The percentile below which we consider that genes are not expresed.
threshold_up_perc (int) – The percentile above which we consider that genes are highly expressed.

Returns:

Return type:

Write a csv or a pickle with for categorized reactions activity

results_analysis module

manamodeller.results_analysis.dendro_reactions(matrix, title='No title set')[source]

dendro_reactions.

Parameters:

matrix (pandas dataframe) – a distance matrix
title (str) – the title of the dendrogram

Returns:

Return type:

a scipy linkage object and show plot the dendrogram

manamodeller.results_analysis.extract_reactions_from_clusters(matrix, title, write_files=False, file_prefix='cluster', header=True)[source]

extract_reactions_from_clusters.

Parameters:

matrix (pandas dataframe) – a distance matrix
title (str) – the title of dendrograms plots
write_files (boolean) – if true, write cluster’s reaction files
file_prefix (str) – prefix to add when saving cluster’s reaction file
header (boolean) – if true, add a header to the cluster’s reaction file

Returns:

Return type:

Write a csv or a pickle with for categorized reactions activity

manamodeller.results_analysis.generate_annotation_table(cluster_file, model, hgnc_data, DARs_direction, outputFile)[source]

generate_annotation_table.

Parameters:

cluster_file (str) – path to the tsv file containg current cluster’s DARs
model (cobra model) – A cobra object, loaded with the cobra library
hgnc_data (pandas dataframe) – a hgnc database annotation dataframe
DARs_direction (str) – the path to the tsv file for the molecule associated with current cluster to get DARs’ direction
outputFile (str) – path and name of the outputFile (xlsx file)

Returns:

Return type:

Write an excel file with one reaction per line and corresponding annotations in columns

manamodeller.results_analysis.get_node_list(gml_file)[source]

get_node_list.

Parameters:: gml_file (str) – the path to the gml file
Returns:
Return type:: a list containing nodes ids from the gml file

manamodeller.results_analysis.visualize_gml(gml_file, window_size=['1000px', '1000px'], notebook=True)[source]

visualize_gml.

Parameters:

gml_file (str) – the path to the gml file
window_size (list) – the list of x and y pixel sizes for the output window
notebook (boolean) – if true enable the pyvis optimized visualisation for notebooks

Returns:

Return type:

an interactive visualisation window of the graph

results_processing module

manamodeller.results_processing.concatenate_csv(filenames, out_dir, col_index, single_csv, index_suffix='')[source]

concatenate_csv.

Parameters:

filenames (str) – list of csv files to concatenate into one csv file
out_dir (str) – path to the concatenated csv output directory
col_index (str) – column name of a column to be used as index (optional)
single_csv (boolean) – option for the concatenate_csv function, if True all solutions will be stored in a single csv file
index_suffix (str) – suffix to add to csv’s row index

Returns:

Return type:

write the concatenated csv in the

manamodeller.results_processing.concatenate_reaction_div_enum(path_concat_rxn_enum, path_concat_div_enum, out_dir, col_index='', single_csv=False, ncpus=1)[source]

concatenate_reaction_div_enum.

Parameters:

path_concat_rxn_enum (str) – path to the concatenated reaction enum directory
path_concat_div_enum (str) – path to the concatenated diversity enum directory
out_dir (str) – path to the csvs output directory
col_index (str) – column name of a column to be used as index (optional)
single_csv (boolean) – option for the concatenate_csv function, if True all solutions will be stored in a single csv file
ncpus (int) – the number of cpus allocated, will enable parallel processing

Returns:

Return type:

a JoinableQueue Object

manamodeller.results_processing.concatenate_solutions(csv_dir, out_dir, col_index='', single_csv=False, ncpus=1, restart=False)[source]

concatenate_solutions.

Parameters:

csv_dir (str) – path to the csvs to concatenate directory
out_dir (str) – path to the csvs output directory
col_index (str) – column name of a column to be used as index (optional)
single_csv (boolean) – option for the concatenate_csv function, if True all solutions will be stored in a single csv file
ncpus (int) – the number of cpus allocated, will enable parallel processing

Returns:

Return type:

a JoinableQueue Object

manamodeller.results_processing.remove_done_batchs(batch_dir, result_dir, launch_undone=True, relax_param=False, enum_type='reaction_enum', para_batch=False, env='MANA')[source]

remove_done_batchs.

Parameters:

batch_dir (str) – path to the batchs directory
result_dir (str) – path to the modelling result directory
launch_undone (boolean) – If True, write the master bash file to launch all failed batchs
relax_param (boolean) – If True, relax the mipgap tolerance parameter
enum_type (str) – string indicating which type of enumeration is being processed (optional)
para_batch (boolean) – if True, launch each batch file independantly (instead of parallel on conditions, parallel on batch)
env (str) –
activated (name of the anaconda environment to be) –

Returns:

Return type:

a list with failed batch names

manamodeller.results_processing.remove_zerobiomass_solutions(enum_dir, reaction_list, separator=',')[source]

remove_zerobiomass_solutions.

Parameters:

enum_dir (str) – the path to the enumeration directory
reaction_list (str) – the path to the reaction_list directory
separator (str) – the character used to separate columns in the file

Returns:

Return type:

overwrite the csv file without solutions with 0 flux in biomass reaction

utils module

manamodeller.utils.launch_multi_proc(num_workers, q)[source]

launch_multi_proc.

Parameters:

num_workers (int) – the number of workers which will define the number of allowed parallel threads
q (JoinableQueue) – a JoinableQueue object filled with tasks to perform

Returns:

Return type:

None

manamodeller.utils.make_csvs(data, out_folder, celfilename)[source]

make_csvs.

Parameters:

data (list) – list of lists with data to be transformed into a pandas dataframe
out_folder (str) – the path where csvs files will be saved
celfilename (str) – the initial CEL filename (used as identifier)

Returns:

Return type:

an interactive visualisation window of the graph

manamodeller.utils.make_pickle(object, filename)[source]

make_pickle.

Parameters:

object (pkl) – the pkl object to save
filename (str) – the path and filename for the pickle file
notebook (boolean) – if true enable the pyvis optimized visualisation for notebooks

Returns:

Return type:

write a .pkl file at the designated location

manamodeller.utils.worker(q, _finish)[source]

worker.

Parameters:

q (JoinableQueue) – a JoinableQueue object filled with tasks to perform
_finish (boolean) – a boolean value indicating if the worker should process elements from the JoinableQueue

Returns:

Return type:

None

Mana Modeller modules

batchs module

dars module

modelling module

results_analysis module

results_processing module

utils module

Module contents