NSForest

nsforesting.NSForest(adata, cluster_header, *, medians_header='medians_', binary_scores_header='binary_scores_', cluster_list=[], gene_selection='BinaryFirst_high', n_trees=1000, n_jobs=-1, beta=0.5, n_top_genes=15, n_binary_genes=10, n_genes_eval=6, save=False, save_supplementary=False, output_folder='', outputfilename_prefix='')

Performs the main NS-Forest algorithm to find a list of NS-Forest markers for each cluster_header.

Parameters

adata: AnnData
Annotated data matrix.

cluster_header: str
Column in adata.obs storing cell annotation.

medians_header: str (default: “medians_{cluster_header}”)
Key in adata.varm storing median expression matrix.

binary_scores_header: str (default: “binary_scores_{cluster_header}”)
Key in adata.varm storing binary score matrix.

cluster_list: list (default: all clusters)
For subsetting by specified cell annotations. Used for parallelizing NSForest.

gene_selection: str (default: “BinaryFirst_high”)
Level of filtering genes by binary score. Options: [None, “BinaryFirst_high”, “BinaryFirst_moderate”, “BinaryFirst_low”]. None includes all genes. BinaryFirst_high includes genes with binary scores > 2 std. BinaryFirst_moderate includes genes with binary scores > 1 std. BinaryFirst_low includes genes with binary scores > median.

n_trees: int (default: 1000)
n_estimators parameter in sklearn.ensemble’s RandomForestClassifier.

n_jobs: int (default: -1)
n_jobs parameter in sklearn.ensemble’s RandomForestClassifier.

beta: float (default: 0.5)
beta parameter in sklearn.metrics’s fbeta_score.

n_top_genes: int (default: 15)
Taking the top n_top_genes genes ranked by sklearn.ensemble’s RandomForestClassifier as input for sklearn.tree’s DecisionTreeClassifier.

n_binary_genes: int (default: 10)
Taking the top n_binary_genes genes ranked by binary score for supplementary table output.

n_genes_eval: int (default: 6)
Taking the top n_genes_eval genes ranked by binary score as input for sklearn.tree’s DecisionTreeClassifier.

save: bool (default: False)
Whether to save csv and pkl of df_results in output_folder.

save_supplementary: bool (default: False)
Whether to save additional supplementary csvs.

output_folder: str (default: “”)
Output folder. Created if doesn’t exist.

outputfilename_prefix: str (default: “”)
Prefix for all output files.

Returns

df_results: pd.DataFrame: NS-Forest results. Includes classification metrics (f_score, precision, recall, onTarget).