NSForest
- nsforesting.NSForest(adata, cluster_header, *, medians_header='medians_', binary_scores_header='binary_scores_', cluster_list=[], gene_selection='BinaryFirst_high', n_trees=1000, n_jobs=-1, beta=0.5, n_top_genes=15, n_binary_genes=10, n_genes_eval=6, save=False, save_supplementary=False, output_folder='', outputfilename_prefix='')
Performs the main NS-Forest algorithm to find a list of NS-Forest markers for each cluster_header.
Parameters
- adata: AnnData
Annotated data matrix.
- cluster_header: str
Column in adata.obs storing cell annotation.
- medians_header: str (default: “medians_{cluster_header}”)
Key in adata.varm storing median expression matrix.
- binary_scores_header: str (default: “binary_scores_{cluster_header}”)
Key in adata.varm storing binary score matrix.
- cluster_list: list (default: all clusters)
For subsetting by specified cell annotations. Used for parallelizing NSForest.
- gene_selection: str (default: “BinaryFirst_high”)
Level of filtering genes by binary score. Options: [None, “BinaryFirst_high”, “BinaryFirst_moderate”, “BinaryFirst_low”]. None includes all genes. BinaryFirst_high includes genes with binary scores > 2 std. BinaryFirst_moderate includes genes with binary scores > 1 std. BinaryFirst_low includes genes with binary scores > median.
- n_trees: int (default: 1000)
n_estimators parameter in sklearn.ensemble’s RandomForestClassifier.
- n_jobs: int (default: -1)
n_jobs parameter in sklearn.ensemble’s RandomForestClassifier.
- beta: float (default: 0.5)
beta parameter in sklearn.metrics’s fbeta_score.
- n_top_genes: int (default: 15)
Taking the top n_top_genes genes ranked by sklearn.ensemble’s RandomForestClassifier as input for sklearn.tree’s DecisionTreeClassifier.
- n_binary_genes: int (default: 10)
Taking the top n_binary_genes genes ranked by binary score for supplementary table output.
- n_genes_eval: int (default: 6)
Taking the top n_genes_eval genes ranked by binary score as input for sklearn.tree’s DecisionTreeClassifier.
- save: bool (default: False)
Whether to save csv and pkl of df_results in output_folder.
- save_supplementary: bool (default: False)
Whether to save additional supplementary csvs.
- output_folder: str (default: “”)
Output folder. Created if doesn’t exist.
- outputfilename_prefix: str (default: “”)
Prefix for all output files.
Returns
- df_results: pd.DataFrame
NS-Forest results. Includes classification metrics (f_score, precision, recall, onTarget).