Mana Modeller modules

batchs module

manamodeller.batchs.write_div_enum_script(script_path, batch_directory, rxn_enum_set_dir, output_directory, modelfile, weightfile, reactionFile, prev_sol_dir='prev_sol_dir/', log_dir='log_dir', env='MANA', dist_anneal=0.9, obj_tol=0.01, iters=100, para_batchs=False)[source]

write_div_enum_script.

Parameters:
  • script_path (str) – path to the diversity_enum.py dexom python script

  • batch_directory (str) – path to the directory were batch files should be written

  • rxn_enum_set_dir (str) – path to the directory of processed reaction-enum results

  • output_directory (str) – path to the directory were diversity-enum modelling results should be written

  • modelfile (str) – path to the model’s json file

  • weightfile (str) – path to the csvs file that contains binarized reactions activity (according to transcriptomic data)

  • reactionFile (str) – path to the file that contains the list of reactions in the model

  • prev_sol_dir (str) – path to the directory were reaction-enum solutions used as starting point for the diversity enumeration process should be saved

  • log_dir (str) – path to the directory were log files should be stored

  • env (str) – name of the anaconda environment to be activated

  • dist_anneal (float) – dexom-python parameter, 0<=a<=1 controls the distance between each successive solution

  • obj_tol (float) – dexom-python parameter, objective value tolerance, as a fraction of the original value

  • iters (int) – dexom-python parameter, maximal number of iterations

  • para_batchs (boolean) – if True, launch each batch file independantly (instead of parallel on conditions, parallel on batch)

Returns:

Return type:

write batch files ready to launch on a adequatly prepared slurm computing platform

manamodeller.batchs.write_rxn_enum_script(script_path, batch_directory, output_directory, modelfile, weightfile, reactionFile='', log_dir='log_dir', env='MANA', obj_tol=0.001, iters=100, para_batchs=False)[source]

write_rxn_enum_script.

Parameters:
  • script_path (str) – path to the diversity_enum.py dexom python script

  • batch_directory (str) – path to the directory were batch files should be written

  • output_directory (str) – path to the directory were diversity-enum modelling results should be written

  • modelfile (str) – path to the model’s json file

  • weightfile (str) – path to the csvs file that contains binarized reactions activity (according to transcriptomic data)

  • reactionFile (str) – path to the file that contains the list of reactions in the model

  • log_dir (str) – path to the directory were log files should be stored

  • env (str) – name of the anaconda environment to be activated

  • obj_tol (float) – dexom-python parameter, objective value tolerance, as a fraction of the original value

  • iters (int) – dexom-python parameter, maximal number of iterations

  • para_batchs (boolean) – if True, launch each batch file independantly (instead of parallel on conditions, parallel on batch)

Returns:

Return type:

write batch files ready to launch on a adequatly prepared slurm computing platform

dars module

manamodeller.dars.calculate_freq_ctrls(rListFile, all_cpds, time, pheno, working_path)[source]

calculate_freq_ctrls.

Parameters:
  • rListFile (str) – the path to the model’s reactions list file

  • all_cpds (str) – the list of compounds to process, from the props.properties file

  • time (str) – the exposure time to consider

  • pheno (pandas dataframe) – the pandas data frame containing Open TG-Gates metadata.

  • working_path (str) – the root path of your working directory.

Returns:

  • return a pandas dataframe with the calculated frequencies

  • for all controls partial enumeration results corresponding to input parameters

manamodeller.dars.calculate_frequencies(data, name)[source]

calculate_frequencies.

Parameters:
  • data (pandas dataframe) – a dataframe containing the partial enumeration results

  • name (str) – the name of the pandas Series

Returns:

Return type:

a pandas series with the sum of partial enumeration results for each reaction.

manamodeller.dars.calculate_frequencies_for_dir(full_enum_path, rList, output_file='')[source]

calculate_frequencies_for_dir.

Parameters:
  • full_enum_path (str) – the path to the full_enum directory

  • rList (list) – the list of reactions in the model

  • output_file (str) – if not empty, the frequencies table is written in csv to the path provided in this param

Returns:

Return type:

a pandas dataframe containing the frequencies table, of each csv file found in the full enum path.

manamodeller.dars.compute_scores(comp_freq, crossing_point=1, crossing_point_1_2=1.2, b=1)[source]

compute_scores.

Parameters:
  • comp_freq (pandas dataframe) – a pandas dataframe containing several metrics computed from activation frequencies

  • crossing_point (int) – a crossing point factor for the circle center calculation

  • crossing_point_1_2 (float) – another crossing point factor the circle center calculation

  • b (int) – a parameter for the calculation of ellipses

Returns:

a pandas dataframe with several metrics computed

Return type:

R2, center of circle, dist_to_OO, center of circle 1.2

manamodeller.dars.findCircleCenter(A, B, C)[source]

findCircleCenter.

Parameters:
  • A (list) – a list containing x and y coordinates of point A, a point of the circle O

  • B (list) – a list containing x and y coordinates of point B, a point of the circle O

  • C (list) – a list containing x and y coordinates of point C, a point of the circle O

Returns:

a pandas dataframe containing

Return type:

calculated properties of the circle O

manamodeller.dars.rescale_and_rotate(comp_freq)[source]

rescale_and_rotate.

Parameters:

comp_freq (pandas dataframe) – a pandas dataframe containing several metrics computed from activation frequencies

Returns:

Return type:

a pandas dataframe with f_ctrl and f_trt that have been rescaled, rescaled and rotated

manamodeller.dars.rotate(vector, theta, rotation_around=None)[source]

rotate.

reference: https://en.wikipedia.org/wiki/Rotation_matrix#In_two_dimensions

Parameters:
  • vector (pandas dataframe) – the activation frequencies dataframe to rotate

  • theta (float) – rotation angle in radians

  • rotation_around (np.array) – A point around which vector will be rotated around. Can be None

Returns:

Return type:

The rotated dataframe

modelling module

manamodeller.modelling.eval_gpr_activity(expr, gh, gl)[source]

eval_gpr_activity.

This is an adaptation of the eval_gpr function available in cobrapy. Instead of evaluating if a GPR is active according to a list of knockout genes, it will evaluate if the GPR is regulated by a list of genes

Exemple of usage : provide the list of Highly expressed genes and Lowly expressed genes Return all expressions that are True, thus Highly expressed

evaluate compiled ast of gene_reaction_rule with list of active genes

Parameters:
  • expr (Expression) – The ast of the gene reaction rule

  • gh (list) – list of highly expressed genes

  • gl (list) – list of lowly expressed genes

Returns:

True if the reaction is active with the given gh and gl lists otherwise false

Return type:

bool

manamodeller.modelling.find_high_low_exprs(uarray_data, threshold_dw_perc=25, threshold_up_perc=75)[source]

find_high_low_exprs.

Parameters:
  • uarray_data (pandas dataframe) – Description of parameter uarray_data.

  • threshold_dw_perc (int) – The percentile below which we consider that genes are not expresed.

  • threshold_up_perc (int) – The percentile above which we consider that genes are highly expressed.

Returns:

Return list of highly/lowly expressed gene .

Return type:

list

manamodeller.modelling.find_reactions_expression_levels(gprs, gh, gl)[source]

find_reactions_expression_levels.

Parameters:
  • gprs (cobra gpr) – A cobra gpr object, obtained from a cobra model

  • gh (list) – list of highly expressed genes

  • gl (list) – list of lowly expressed genes

Returns:

a tuple of three list. rh, is the list of reactions identified as active according to gene expression data rl, is the list of reactions identified as inactive according to gene expression data rn, is the list of reactions not constrained by gene expression data

Return type:

tuple

manamodeller.modelling.fullname_equation(reaction)[source]

fullname_equation.

Parameters:

reaction (cobra reaction) – A cobra reaction object, obtained from a cobra model

Returns:

the reaction with complete metabolites names

Return type:

str

manamodeller.modelling.get_GPR_reactions(model)[source]

get_GPR_reactions.

Parameters:

model (cobra model) – A cobra object, loaded with the cobra library

Returns:

a dataframe with all the reactions in the model having a GPR

Return type:

pandas dataframe

manamodeller.modelling.get_gene_list(model)[source]

get_gene_list.

Parameters:

model (cobra model) – A cobra object, loaded with the cobra library

Returns:

a list containing all the genes in the model

Return type:

list

manamodeller.modelling.get_reactions_ids(model)[source]

get_reactions_ids.

Parameters:

model (cobra model) – A cobra object, loaded with the cobra library

Returns:

a dataframe with all the reactions and eventually their GPR

Return type:

pandas dataframe

manamodeller.modelling.identify_model_gene_ids(model)[source]

identify_model_gene_ids.

Parameters:

model (cobra model) – A cobra object, loaded with the cobra library

Returns:

The type of gene identifiers used in the model

Return type:

str

manamodeller.modelling.map_single_column(data, hgnc_data, col_to_add)[source]

map_single_column.

Parameters:
  • data (pandas dataframe) – a gene expression dataset

  • hgnc_data (pandas dataframe) – a hgnc database annotation dataframe

  • col_to_add (str) – the name of the identifier column to map

Returns:

the gene expressed dataframe with the mapped identifier added

Return type:

pandas dataframe

manamodeller.modelling.preprocess_data(data, gene_id_col, model, pickle_path='', csvs_path='', threshold_dw_perc=25, threshold_up_perc=75)[source]

preprocess_data.

Parameters:
  • data (pandas dataframe) – a gene expression dataset

  • gene_id_col (str) – the name of the gene identifier column

  • model (cobra model) – A cobra object, loaded with the cobra library

  • pickle (str) – if not empty, write pkl files containing categorized reactions activity at given location

  • csvs (str) – if not empty, write csv files containing categorized reactions activity at given location

  • threshold_dw_perc (int) – The percentile below which we consider that genes are not expresed.

  • threshold_up_perc (int) – The percentile above which we consider that genes are highly expressed.

Returns:

Return type:

Write a csv or a pickle with for categorized reactions activity

results_analysis module

manamodeller.results_analysis.dendro_reactions(matrix, title='No title set')[source]

dendro_reactions.

Parameters:
  • matrix (pandas dataframe) – a distance matrix

  • title (str) – the title of the dendrogram

Returns:

Return type:

a scipy linkage object and show plot the dendrogram

manamodeller.results_analysis.extract_reactions_from_clusters(matrix, title, write_files=False, file_prefix='cluster', header=True)[source]

extract_reactions_from_clusters.

Parameters:
  • matrix (pandas dataframe) – a distance matrix

  • title (str) – the title of dendrograms plots

  • write_files (boolean) – if true, write cluster’s reaction files

  • file_prefix (str) – prefix to add when saving cluster’s reaction file

  • header (boolean) – if true, add a header to the cluster’s reaction file

Returns:

Return type:

Write a csv or a pickle with for categorized reactions activity

manamodeller.results_analysis.generate_annotation_table(cluster_file, model, hgnc_data, DARs_direction, outputFile)[source]

generate_annotation_table.

Parameters:
  • cluster_file (str) – path to the tsv file containg current cluster’s DARs

  • model (cobra model) – A cobra object, loaded with the cobra library

  • hgnc_data (pandas dataframe) – a hgnc database annotation dataframe

  • DARs_direction (str) – the path to the tsv file for the molecule associated with current cluster to get DARs’ direction

  • outputFile (str) – path and name of the outputFile (xlsx file)

Returns:

Return type:

Write an excel file with one reaction per line and corresponding annotations in columns

manamodeller.results_analysis.get_node_list(gml_file)[source]

get_node_list.

Parameters:

gml_file (str) – the path to the gml file

Returns:

Return type:

a list containing nodes ids from the gml file

manamodeller.results_analysis.visualize_gml(gml_file, window_size=['1000px', '1000px'], notebook=True)[source]

visualize_gml.

Parameters:
  • gml_file (str) – the path to the gml file

  • window_size (list) – the list of x and y pixel sizes for the output window

  • notebook (boolean) – if true enable the pyvis optimized visualisation for notebooks

Returns:

Return type:

an interactive visualisation window of the graph

results_processing module

manamodeller.results_processing.concatenate_csv(filenames, out_dir, col_index, single_csv, index_suffix='')[source]

concatenate_csv.

Parameters:
  • filenames (str) – list of csv files to concatenate into one csv file

  • out_dir (str) – path to the concatenated csv output directory

  • col_index (str) – column name of a column to be used as index (optional)

  • single_csv (boolean) – option for the concatenate_csv function, if True all solutions will be stored in a single csv file

  • index_suffix (str) – suffix to add to csv’s row index

Returns:

Return type:

write the concatenated csv in the

manamodeller.results_processing.concatenate_reaction_div_enum(path_concat_rxn_enum, path_concat_div_enum, out_dir, col_index='', single_csv=False, ncpus=1)[source]

concatenate_reaction_div_enum.

Parameters:
  • path_concat_rxn_enum (str) – path to the concatenated reaction enum directory

  • path_concat_div_enum (str) – path to the concatenated diversity enum directory

  • out_dir (str) – path to the csvs output directory

  • col_index (str) – column name of a column to be used as index (optional)

  • single_csv (boolean) – option for the concatenate_csv function, if True all solutions will be stored in a single csv file

  • ncpus (int) – the number of cpus allocated, will enable parallel processing

Returns:

Return type:

a JoinableQueue Object

manamodeller.results_processing.concatenate_solutions(csv_dir, out_dir, col_index='', single_csv=False, ncpus=1, restart=False)[source]

concatenate_solutions.

Parameters:
  • csv_dir (str) – path to the csvs to concatenate directory

  • out_dir (str) – path to the csvs output directory

  • col_index (str) – column name of a column to be used as index (optional)

  • single_csv (boolean) – option for the concatenate_csv function, if True all solutions will be stored in a single csv file

  • ncpus (int) – the number of cpus allocated, will enable parallel processing

Returns:

Return type:

a JoinableQueue Object

manamodeller.results_processing.remove_done_batchs(batch_dir, result_dir, launch_undone=True, relax_param=False, enum_type='reaction_enum', para_batch=False, env='MANA')[source]

remove_done_batchs.

Parameters:
  • batch_dir (str) – path to the batchs directory

  • result_dir (str) – path to the modelling result directory

  • launch_undone (boolean) – If True, write the master bash file to launch all failed batchs

  • relax_param (boolean) – If True, relax the mipgap tolerance parameter

  • enum_type (str) – string indicating which type of enumeration is being processed (optional)

  • para_batch (boolean) – if True, launch each batch file independantly (instead of parallel on conditions, parallel on batch)

  • env (str) –

  • activated (name of the anaconda environment to be) –

Returns:

Return type:

a list with failed batch names

manamodeller.results_processing.remove_zerobiomass_solutions(enum_dir, reaction_list, separator=',')[source]

remove_zerobiomass_solutions.

Parameters:
  • enum_dir (str) – the path to the enumeration directory

  • reaction_list (str) – the path to the reaction_list directory

  • separator (str) – the character used to separate columns in the file

Returns:

Return type:

overwrite the csv file without solutions with 0 flux in biomass reaction

utils module

manamodeller.utils.launch_multi_proc(num_workers, q)[source]

launch_multi_proc.

Parameters:
  • num_workers (int) – the number of workers which will define the number of allowed parallel threads

  • q (JoinableQueue) – a JoinableQueue object filled with tasks to perform

Returns:

Return type:

None

manamodeller.utils.make_csvs(data, out_folder, celfilename)[source]

make_csvs.

Parameters:
  • data (list) – list of lists with data to be transformed into a pandas dataframe

  • out_folder (str) – the path where csvs files will be saved

  • celfilename (str) – the initial CEL filename (used as identifier)

Returns:

Return type:

an interactive visualisation window of the graph

manamodeller.utils.make_pickle(object, filename)[source]

make_pickle.

Parameters:
  • object (pkl) – the pkl object to save

  • filename (str) – the path and filename for the pickle file

  • notebook (boolean) – if true enable the pyvis optimized visualisation for notebooks

Returns:

Return type:

write a .pkl file at the designated location

manamodeller.utils.worker(q, _finish)[source]

worker.

Parameters:
  • q (JoinableQueue) – a JoinableQueue object filled with tasks to perform

  • _finish (boolean) – a boolean value indicating if the worker should process elements from the JoinableQueue

Returns:

Return type:

None

Module contents