Tutorial nsforesting
This tutorial is for running NS-Forest for generating marker combinations that best classify clusters.
Setting up environment
[1]:
import sys
import os
code_folder = "C:/Users/bpeng/OneDrive - J. Craig Venter Institute/Documents/Github/NSForest"
sys.path.insert(0, os.path.abspath(code_folder))
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad
import matplotlib.pyplot as plt
import plotly.io as pio
pio.renderers.default = "notebook"
import nsforest as ns
from nsforest import utils
Data Exploration
Loading h5ad AnnData file
[2]:
data_folder = "../demo_data/"
file = data_folder + "adata_layer1.h5ad"
adata = sc.read_h5ad(file)
adata
[2]:
AnnData object with n_obs × n_vars = 871 × 16497
obs: 'cluster'
Defining cluster_header as cell type annotation.
Note: Some datasets have multiple annotations per sample (ex. “broad_cell_type” and “granular_cell_type”). NS-Forest can be run on multiple cluster_header’s. Combining the parent and child markers may improve classification results.
[3]:
cluster_header = "cluster"
Defining output_folder for saving results
[4]:
output_folder = "../outputs_layer1/"
Looking at sample labels
[5]:
adata.obs_names
[5]:
Index(['A01_1_Nuclei_NeuNP_H200_1025_MTG_layer1_BCH9',
'A01_BCH3_1NeuNP_H200.1030_MTG_Layer_1',
'A02_BCH1_1NeuNP_H200.1025_MTG_layer_1',
'A03_1_Nuclei_NeuNP_H200_1025_MTG_layer1_BCH9',
'A04_1_Nuclei_NeuNP_H200_1025_MTG_layer1_BCH9',
'A04_BCH1_1NeuNP_H200.1025_MTG_layer_1',
'A04_BCH3_1NeuNP_H200.1030_MTG_Layer_1',
'A05_1_Nuclei_NeuNP_H200_1025_MTG_layer1_BCH9',
'A05_BCH1_1NeuNP_H200.1025_MTG_layer_1',
'A05_BCH3_1NeuNP_H200.1030_MTG_Layer_1',
...
'P09_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH7',
'P09_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH9',
'P09_1_Nuclei_NeuNN_H200_1030_MTG_layer1_BCH8',
'P09_BCH1_1NeuNN_H200.1025_MTG_layer_1',
'P10_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH6',
'P10_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH9',
'P10_BCH1_1NeuNN_H200.1025_MTG_layer_1',
'P11_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH7',
'P11_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH9',
'P11_1_Nuclei_NeuNN_H200_1030_MTG_layer1_BCH8'],
dtype='object', length=871)
Looking at genes
Note: adata.var_names must be unique. If there is a problem, usually it can be solved by assigning adata.var.index = adata.var["ensembl_id"].
[6]:
adata.var_names
[6]:
Index(['A1CF', 'A2M', 'A2M_AS1', 'A2ML1', 'A2ML1_AS1', 'A2MP1', 'A3GALT2',
'A4GALT', 'AAAS', 'AACS',
...
'ZUFSP', 'ZW10', 'ZWILCH', 'ZWINT', 'ZXDC', 'ZYG11A', 'ZYG11B', 'ZYX',
'ZZEF1', 'ZZZ3'],
dtype='object', length=16497)
Checking cell annotation sizes
Note: Some datasets are too large and need to be downsampled to be run through the pipeline. When downsampling, be sure to have all the granular cluster annotations represented.
[7]:
pd.DataFrame(adata.obs[cluster_header].value_counts()).reset_index()
[7]:
| cluster | count | |
|---|---|---|
| 0 | e1_e299_SLC17A7_L5b_Cdh13 | 299 |
| 1 | i1_i90_COL5A2_Ndnf_Car4 | 90 |
| 2 | i2_i77_LHX6_Sst_Cbln4 | 77 |
| 3 | i3_i56_BAGE2_Ndnf_Cxcl14 | 56 |
| 4 | i4_i54_MC4R_Ndnf_Cxcl14 | 54 |
| 5 | g1_g48_GLI3_Astro_Gja1 | 48 |
| 6 | i5_i47_TRPC3_Ndnf_Car4 | 47 |
| 7 | i6_i44_GPR149_Vip_Mybpc1 | 44 |
| 8 | i7_i31_CLMP_Ndnf_Cxcl14 | 31 |
| 9 | g2_g27_APBB1IP_Micro_Ctss | 27 |
| 10 | i8_i27_SNCG_Vip_Mybpc1 | 27 |
| 11 | i9_i22_TAC3_Vip_Mybpc1 | 22 |
| 12 | g3_g18_GPNMB_OPC_Pdgfra | 18 |
| 13 | i10_i16_TSPAN12_Vip_Mybpc1 | 16 |
| 14 | g4_g9_MOG_Oligo_Opalin | 9 |
| 15 | i11_i6_EGF_Vip_Mybpc1 | 6 |
Preprocessing
Generating scanpy dendrogram
Note: Only run if there is no pre-defined dendrogram order. This step can still be run with no effects, but the runtime may increase.
Dendrogram order is stored in adata.uns["dendrogram_cluster"]["categories_ordered"].
[8]:
ns.pp.dendrogram(adata, cluster_header, save = True, output_folder = output_folder, outputfilename_suffix = cluster_header)
Saving dendrogram as...
../outputs_layer1/_cluster.png
Calculating cluster medians per gene
Run ns.pp.prep_medians before running NS-Forest.
Note: Do not run if evaluating marker lists. Do not run when generating scanpy plots (e.g. dot plot, violin plot, matrix plot).
[9]:
adata = ns.pp.prep_medians(adata, cluster_header)
adata.varm["medians_cluster"]
Calculating medians per cluster: 100%|██████████| 16/16 [00:01<00:00, 10.19it/s]
Saving medians as adata.varm.medians_cluster
median: 0.0
mean: 1.626
std: 2.49
Only positive genes selected. 11688 positive genes out of 16497 total genes
--- 2.008080244064331 seconds ---
[9]:
| e1_e299_SLC17A7_L5b_Cdh13 | g1_g48_GLI3_Astro_Gja1 | g2_g27_APBB1IP_Micro_Ctss | g3_g18_GPNMB_OPC_Pdgfra | g4_g9_MOG_Oligo_Opalin | i10_i16_TSPAN12_Vip_Mybpc1 | i11_i6_EGF_Vip_Mybpc1 | i1_i90_COL5A2_Ndnf_Car4 | i2_i77_LHX6_Sst_Cbln4 | i3_i56_BAGE2_Ndnf_Cxcl14 | i4_i54_MC4R_Ndnf_Cxcl14 | i5_i47_TRPC3_Ndnf_Car4 | i6_i44_GPR149_Vip_Mybpc1 | i7_i31_CLMP_Ndnf_Cxcl14 | i8_i27_SNCG_Vip_Mybpc1 | i9_i22_TAC3_Vip_Mybpc1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A2M | 0.000000 | 1.584962 | 8.985842 | 1.000000 | 0.000000 | 0.000000 | 1.792481 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 1.292481 | 0.000000 | 1.000000 | 0.500000 |
| A2M_AS1 | 0.000000 | 0.000000 | 3.169925 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| A2ML1_AS1 | 4.392317 | 7.832668 | 1.584962 | 4.253898 | 0.000000 | 6.400854 | 3.683161 | 4.522197 | 4.754888 | 2.403677 | 4.321928 | 5.392317 | 4.459432 | 5.209454 | 5.727921 | 5.096147 |
| A2MP1 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.584962 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 0.000000 |
| AAAS | 0.000000 | 1.292481 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ZYG11A | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| ZYG11B | 4.857981 | 1.000000 | 1.000000 | 0.000000 | 2.584963 | 3.836213 | 6.346526 | 5.491853 | 3.906891 | 3.064641 | 3.903677 | 5.781360 | 5.584649 | 6.475733 | 5.614710 | 5.931125 |
| ZYX | 3.169925 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| ZZEF1 | 4.392317 | 1.584962 | 2.321928 | 0.000000 | 1.000000 | 4.215226 | 5.734567 | 3.390680 | 1.584962 | 1.584962 | 2.321928 | 3.169925 | 3.229716 | 4.392317 | 3.584963 | 2.660964 |
| ZZZ3 | 6.303781 | 3.000000 | 1.584962 | 3.435182 | 8.189824 | 5.972722 | 5.985592 | 5.712608 | 4.906890 | 6.374096 | 5.997354 | 6.321928 | 6.169925 | 6.507795 | 5.426265 | 6.492921 |
11688 rows × 16 columns
Calculating binary scores per gene per cluster
Run ns.pp.prep_binary_scores before running NS-Forest. Do not need to run if evaluating marker lists. Do not need to run when generating scanpy plots.
[10]:
adata = ns.pp.prep_binary_scores(adata, cluster_header)
adata.varm["binary_scores_cluster"]
Calculating binary scores per cluster: 100%|██████████| 16/16 [01:13<00:00, 4.62s/it]
Saving binary scores as adata.varm.binary_scores_cluster
median: 0.1
mean: 0.202
std: 0.252
--- 74.26272201538086 seconds ---
[10]:
| e1_e299_SLC17A7_L5b_Cdh13 | g1_g48_GLI3_Astro_Gja1 | g2_g27_APBB1IP_Micro_Ctss | g3_g18_GPNMB_OPC_Pdgfra | g4_g9_MOG_Oligo_Opalin | i10_i16_TSPAN12_Vip_Mybpc1 | i11_i6_EGF_Vip_Mybpc1 | i1_i90_COL5A2_Ndnf_Car4 | i2_i77_LHX6_Sst_Cbln4 | i3_i56_BAGE2_Ndnf_Cxcl14 | i4_i54_MC4R_Ndnf_Cxcl14 | i5_i47_TRPC3_Ndnf_Car4 | i6_i44_GPR149_Vip_Mybpc1 | i7_i31_CLMP_Ndnf_Cxcl14 | i8_i27_SNCG_Vip_Mybpc1 | i9_i22_TAC3_Vip_Mybpc1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A2M | 0.000000 | 0.623023 | 0.931968 | 0.500000 | 0.000000 | 0.000000 | 0.658949 | 0.000000 | 0.000000 | 0.500000 | 0.000000 | 0.500000 | 0.567888 | 0.000000 | 0.500000 | 0.466667 |
| A2M_AS1 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| A2ML1_AS1 | 0.153393 | 0.470567 | 0.066667 | 0.146435 | 0.000000 | 0.352137 | 0.127804 | 0.163316 | 0.184686 | 0.089374 | 0.149377 | 0.247584 | 0.158108 | 0.228193 | 0.283856 | 0.216961 |
| A2MP1 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.915876 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.866667 | 0.866667 | 0.000000 |
| AAAS | 0.000000 | 0.948420 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.933333 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ZYG11A | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| ZYG11B | 0.268527 | 0.066667 | 0.066667 | 0.000000 | 0.148420 | 0.200397 | 0.381241 | 0.306785 | 0.204062 | 0.166928 | 0.203846 | 0.328997 | 0.312765 | 0.393586 | 0.315017 | 0.342573 |
| ZYX | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| ZZEF1 | 0.401457 | 0.091271 | 0.168100 | 0.000000 | 0.066667 | 0.381913 | 0.541554 | 0.284062 | 0.091271 | 0.091271 | 0.168100 | 0.258675 | 0.264994 | 0.401457 | 0.308410 | 0.206141 |
| ZZZ3 | 0.157010 | 0.031445 | 0.000000 | 0.044353 | 0.347222 | 0.131380 | 0.132101 | 0.119149 | 0.091036 | 0.163913 | 0.132888 | 0.158664 | 0.145953 | 0.178503 | 0.107846 | 0.176774 |
11688 rows × 16 columns
Plotting median and binary score distributions
[11]:
ns.pp.plot_varm(adata, f"medians_{cluster_header}", nonzero = True, save = True, output_folder = output_folder)
Saving adata.varm[medians_cluster] as histogram as...
../outputs_layer1/histogram_medians_cluster.png
[12]:
ns.pp.plot_varm(adata, f"medians_{cluster_header}", scale = "log", save = True, output_folder = output_folder)
Saving adata.varm[medians_cluster] as histogram as...
../outputs_layer1/histogram_medians_cluster.png
[13]:
ns.pp.plot_varm(adata, f"binary_scores_{cluster_header}", nonzero = True, save = True, output_folder = output_folder)
Saving adata.varm[binary_scores_cluster] as histogram as...
../outputs_layer1/histogram_binary_scores_cluster.png
[14]:
ns.pp.plot_varm(adata, f"binary_scores_{cluster_header}", scale = "log", save = True, output_folder = output_folder)
Saving adata.varm[binary_scores_cluster] as histogram as...
../outputs_layer1/histogram_binary_scores_cluster.png
Saving preprocessed AnnData as new h5ad
[15]:
filename = file.replace(".h5ad", "_preprocessed.h5ad")
print(f"Saving new anndata object as...\n{filename}")
adata.write_h5ad(filename)
adata
Saving new anndata object as...
../demo_data/adata_layer1_preprocessed.h5ad
[15]:
AnnData object with n_obs × n_vars = 871 × 11688
obs: 'cluster'
uns: 'pca', 'dendrogram_cluster'
obsm: 'X_pca'
varm: 'PCs', 'medians_cluster', 'binary_scores_cluster'
Running NS-Forest
Note: Do not run NS-Forest if only evaluating input marker lists.
[16]:
outputfilename_prefix = cluster_header
results = ns.nsforesting.NSForest(adata, cluster_header, save_supplementary = True, save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Running NS-Forest version 4.1
Preparing adata...
Pre-selecting genes based on binary scores...
BinaryFirst_high Threshold (mean + 2 * std): 0.706
Average number of genes after gene_selection in each cluster: 735.5
Saving number of genes selected per cluster as...
../outputs_layer1/cluster_gene_selection.csv
--- 0.05267214775085449 seconds ---
Number of clusters to evaluate: 16
1 out of 16:
e1_e299_SLC17A7_L5b_Cdh13
Pre-selected 1356 genes to feed into Random Forest.
NSForest-selected markers: ['LINC00507']
fbeta: 0.96
precision: 0.978
recall: 0.893
2 out of 16:
g1_g48_GLI3_Astro_Gja1
Pre-selected 583 genes to feed into Random Forest.
NSForest-selected markers: ['LINC00498']
fbeta: 0.95
precision: 1.0
recall: 0.792
3 out of 16:
g2_g27_APBB1IP_Micro_Ctss
Pre-selected 420 genes to feed into Random Forest.
NSForest-selected markers: ['ADAM28', 'PTPRC']
fbeta: 0.976
precision: 1.0
recall: 0.889
4 out of 16:
g3_g18_GPNMB_OPC_Pdgfra
Pre-selected 353 genes to feed into Random Forest.
NSForest-selected markers: ['GPNMB', 'OLIG2']
fbeta: 0.862
precision: 1.0
recall: 0.556
5 out of 16:
g4_g9_MOG_Oligo_Opalin
Pre-selected 571 genes to feed into Random Forest.
NSForest-selected markers: ['ST18']
fbeta: 1.0
precision: 1.0
recall: 1.0
6 out of 16:
i10_i16_TSPAN12_Vip_Mybpc1
Pre-selected 1007 genes to feed into Random Forest.
NSForest-selected markers: ['TSPAN12', 'CHRNB3']
fbeta: 0.804
precision: 0.9
recall: 0.562
7 out of 16:
i11_i6_EGF_Vip_Mybpc1
Pre-selected 1912 genes to feed into Random Forest.
NSForest-selected markers: ['EGF', 'FBRSL1']
fbeta: 0.714
precision: 1.0
recall: 0.333
8 out of 16:
i1_i90_COL5A2_Ndnf_Car4
Pre-selected 238 genes to feed into Random Forest.
NSForest-selected markers: ['COL5A2', 'BMP6']
fbeta: 0.908
precision: 0.97
recall: 0.722
9 out of 16:
i2_i77_LHX6_Sst_Cbln4
Pre-selected 292 genes to feed into Random Forest.
NSForest-selected markers: ['LHX6']
fbeta: 0.817
precision: 0.838
recall: 0.74
10 out of 16:
i3_i56_BAGE2_Ndnf_Cxcl14
Pre-selected 151 genes to feed into Random Forest.
NSForest-selected markers: ['BAGE2', 'SYT10']
fbeta: 0.781
precision: 0.962
recall: 0.446
11 out of 16:
i4_i54_MC4R_Ndnf_Cxcl14
Pre-selected 223 genes to feed into Random Forest.
NSForest-selected markers: ['ARHGAP36', 'ADAM33']
fbeta: 0.857
precision: 0.923
recall: 0.667
12 out of 16:
i5_i47_TRPC3_Ndnf_Car4
Pre-selected 942 genes to feed into Random Forest.
NSForest-selected markers: ['NTNG1', 'EYA4']
fbeta: 0.906
precision: 1.0
recall: 0.66
13 out of 16:
i6_i44_GPR149_Vip_Mybpc1
Pre-selected 377 genes to feed into Random Forest.
NSForest-selected markers: ['FLT1', 'GPR149']
fbeta: 0.792
precision: 1.0
recall: 0.432
14 out of 16:
i7_i31_CLMP_Ndnf_Cxcl14
Pre-selected 1012 genes to feed into Random Forest.
NSForest-selected markers: ['PAX6', 'TGFBR2']
fbeta: 0.901
precision: 1.0
recall: 0.645
15 out of 16:
i8_i27_SNCG_Vip_Mybpc1
Pre-selected 1326 genes to feed into Random Forest.
NSForest-selected markers: ['SNCG', 'EDNRA']
fbeta: 0.759
precision: 0.923
recall: 0.444
16 out of 16:
i9_i22_TAC3_Vip_Mybpc1
Pre-selected 1005 genes to feed into Random Forest.
NSForest-selected markers: ['BSPRY', 'MCTP2']
fbeta: 0.69
precision: 0.889
recall: 0.364
--- 87.0947813987732 seconds ---
Saving supplementary table as...
../outputs_layer1/cluster_supplementary.csv
Saving markers table as...
../outputs_layer1/cluster_markers.csv
using median
Calculating medians per cluster: 100%|██████████| 16/16 [00:00<00:00, 366.96it/s]
Saving supplementary table as...
../outputs_layer1/cluster_markers_onTarget_supp.csv
Saving supplementary table as...
../outputs_layer1/cluster_markers_onTarget.csv
Saving final results table as...
../outputs_layer1/cluster_results.csv
Saving final results table as...
../outputs_layer1/cluster_results.pkl
[17]:
results
[17]:
| software_version | cluster_header | clusterName | clusterSize | f_score | precision | recall | TN | FP | FN | TP | marker_count | NSForest_markers | binary_genes | onTarget | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4.1 | cluster | e1_e299_SLC17A7_L5b_Cdh13 | 299 | 0.959741 | 0.978022 | 0.892977 | 566 | 6 | 32 | 267 | 1 | [LINC00507] | [SLC17A7, LINC00508, TBR1, ANKRD33B, NPTX1, LI... | 0.792614 |
| 1 | 4.1 | cluster | g1_g48_GLI3_Astro_Gja1 | 48 | 0.950000 | 1.000000 | 0.791667 | 823 | 0 | 10 | 38 | 1 | [LINC00498] | [LINC00498, SLC25A18, EMX2OS, FAM189A2, SLC7A1... | 1.000000 |
| 2 | 4.1 | cluster | g2_g27_APBB1IP_Micro_Ctss | 27 | 0.975610 | 1.000000 | 0.888889 | 844 | 0 | 3 | 24 | 2 | [ADAM28, PTPRC] | [ADAM28, PLCG2, INPP5D, PTPRC, CSF2RA, P2RY13,... | 1.000000 |
| 3 | 4.1 | cluster | g3_g18_GPNMB_OPC_Pdgfra | 18 | 0.862069 | 1.000000 | 0.555556 | 853 | 0 | 8 | 10 | 2 | [GPNMB, OLIG2] | [GPNMB, COL20A1, OLIG2, STK32A, KLRC3, KLRC2, ... | 1.000000 |
| 4 | 4.1 | cluster | g4_g9_MOG_Oligo_Opalin | 9 | 1.000000 | 1.000000 | 1.000000 | 862 | 0 | 0 | 9 | 1 | [ST18] | [ST18, MOBP, CNDP1, MOG, CD22, FOLH1, TF, CARN... | 1.000000 |
| 5 | 4.1 | cluster | i10_i16_TSPAN12_Vip_Mybpc1 | 16 | 0.803571 | 0.900000 | 0.562500 | 854 | 1 | 7 | 9 | 2 | [TSPAN12, CHRNB3] | [TSPAN12, TMC5, LINC01539, CHRNB3, FAM46A, ANG... | 0.783762 |
| 6 | 4.1 | cluster | i11_i6_EGF_Vip_Mybpc1 | 6 | 0.714286 | 1.000000 | 0.333333 | 865 | 0 | 4 | 2 | 2 | [EGF, FBRSL1] | [EGF, FZD8, KCNJ2_AS1, FBRSL1, TEKT1, NRG3_AS1... | 1.000000 |
| 7 | 4.1 | cluster | i1_i90_COL5A2_Ndnf_Car4 | 90 | 0.907821 | 0.970149 | 0.722222 | 779 | 2 | 25 | 65 | 2 | [COL5A2, BMP6] | [NMBR, COL5A2, C8ORF4, PAPSS2, TRPC3, BMP6, SS... | 0.642585 |
| 8 | 4.1 | cluster | i2_i77_LHX6_Sst_Cbln4 | 77 | 0.816619 | 0.838235 | 0.740260 | 783 | 11 | 20 | 57 | 1 | [LHX6] | [LHX6, FLT3, TAC1, CALB1, RSPO3, TRBC2, GRIK3,... | 1.000000 |
| 9 | 4.1 | cluster | i3_i56_BAGE2_Ndnf_Cxcl14 | 56 | 0.781250 | 0.961538 | 0.446429 | 814 | 1 | 31 | 25 | 2 | [BAGE2, SYT10] | [BAGE2, SCN5A, GREM2, FAM19A4, SYT10, ARHGAP18... | 0.602383 |
| 10 | 4.1 | cluster | i4_i54_MC4R_Ndnf_Cxcl14 | 54 | 0.857143 | 0.923077 | 0.666667 | 814 | 3 | 18 | 36 | 2 | [ARHGAP36, ADAM33] | [ARHGAP36, MC4R, COBLL1, HLA_B, LINC01435, ADA... | 0.710233 |
| 11 | 4.1 | cluster | i5_i47_TRPC3_Ndnf_Car4 | 47 | 0.906433 | 1.000000 | 0.659574 | 824 | 0 | 16 | 31 | 2 | [NTNG1, EYA4] | [SSTR2, KIRREL, TRPC3, NTNG1, TARID, EYA4, CA2... | 0.380471 |
| 12 | 4.1 | cluster | i6_i44_GPR149_Vip_Mybpc1 | 44 | 0.791667 | 1.000000 | 0.431818 | 827 | 0 | 25 | 19 | 2 | [FLT1, GPR149] | [FLT1, PLCE1_AS1, CXCL12, SLC22A3, PLCE1_AS2, ... | 0.811541 |
| 13 | 4.1 | cluster | i7_i31_CLMP_Ndnf_Cxcl14 | 31 | 0.900901 | 1.000000 | 0.645161 | 840 | 0 | 11 | 20 | 2 | [PAX6, TGFBR2] | [KIAA1644, FGF10, CLMP, PAX6, SP8, TGFBR2, WIF... | 0.547845 |
| 14 | 4.1 | cluster | i8_i27_SNCG_Vip_Mybpc1 | 27 | 0.759494 | 0.923077 | 0.444444 | 843 | 1 | 15 | 12 | 2 | [SNCG, EDNRA] | [SNCG, MMRN2, EDNRA, FBN3, KCNK2, RGS2, SCML4,... | 1.000000 |
| 15 | 4.1 | cluster | i9_i22_TAC3_Vip_Mybpc1 | 22 | 0.689655 | 0.888889 | 0.363636 | 848 | 1 | 14 | 8 | 2 | [BSPRY, MCTP2] | [BSPRY, OFD1P10Y, MCTP2, OFD1P8Y, OFD1P15Y, OF... | 1.000000 |
Plotting scanpy dot plot, violin plot, matrix plot for NS-Forest markers
Note: Assign pre-defined dendrogram order here or use adata.uns["dendrogram_" + cluster_header]["categories_ordered"].
[18]:
to_plot = results.copy()
[19]:
dendrogram = [] # custom dendrogram order
dendrogram = list(adata.uns["dendrogram_" + cluster_header]["categories_ordered"])
to_plot["clusterName"] = to_plot["clusterName"].astype("category")
to_plot["clusterName"] = to_plot["clusterName"].cat.set_categories(dendrogram)
to_plot = to_plot.sort_values("clusterName")
to_plot = to_plot.rename(columns = {"NSForest_markers": "markers"})
[20]:
markers_dict = dict(zip(to_plot["clusterName"], to_plot["markers"]))
markers_dict
[20]:
{'e1_e299_SLC17A7_L5b_Cdh13': ['LINC00507'],
'i2_i77_LHX6_Sst_Cbln4': ['LHX6'],
'g1_g48_GLI3_Astro_Gja1': ['LINC00498'],
'g3_g18_GPNMB_OPC_Pdgfra': ['GPNMB', 'OLIG2'],
'g2_g27_APBB1IP_Micro_Ctss': ['ADAM28', 'PTPRC'],
'g4_g9_MOG_Oligo_Opalin': ['ST18'],
'i7_i31_CLMP_Ndnf_Cxcl14': ['PAX6', 'TGFBR2'],
'i1_i90_COL5A2_Ndnf_Car4': ['COL5A2', 'BMP6'],
'i5_i47_TRPC3_Ndnf_Car4': ['NTNG1', 'EYA4'],
'i11_i6_EGF_Vip_Mybpc1': ['EGF', 'FBRSL1'],
'i3_i56_BAGE2_Ndnf_Cxcl14': ['BAGE2', 'SYT10'],
'i10_i16_TSPAN12_Vip_Mybpc1': ['TSPAN12', 'CHRNB3'],
'i4_i54_MC4R_Ndnf_Cxcl14': ['ARHGAP36', 'ADAM33'],
'i9_i22_TAC3_Vip_Mybpc1': ['BSPRY', 'MCTP2'],
'i6_i44_GPR149_Vip_Mybpc1': ['FLT1', 'GPR149'],
'i8_i27_SNCG_Vip_Mybpc1': ['SNCG', 'EDNRA']}
[21]:
ns.pl.dotplot(adata, markers_dict, cluster_header, dendrogram = dendrogram, save = True, output_folder = output_folder, outputfilename_suffix = outputfilename_prefix)
[22]:
ns.pl.stackedviolin(adata, markers_dict, cluster_header, dendrogram = dendrogram, save = True, output_folder = output_folder, outputfilename_suffix = outputfilename_prefix)
[23]:
ns.pl.matrixplot(adata, markers_dict, cluster_header, dendrogram = dendrogram, save = True, output_folder = output_folder, outputfilename_suffix = outputfilename_prefix)
Plotting classification metrics from NS-Forest results
[24]:
ns.pl.boxplot(results, ["f_score", "precision", "recall", "onTarget"], save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
../outputs_layer1/cluster_boxplot_f_score_precision_recall_onTarget.html
Plotting individual classification metrics
[25]:
ns.pl.boxplot(results, "f_score", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
../outputs_layer1/cluster_boxplot_f_score.html
Plotting metrics vs clusterSize
[26]:
ns.pl.scatter_w_clusterSize(results, "f_score", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
../outputs_layer1/cluster_scatter_f_score.html
[27]:
ns.pl.scatter_w_clusterSize(results, "precision", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
../outputs_layer1/cluster_scatter_precision.html
[28]:
ns.pl.scatter_w_clusterSize(results, "recall", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
../outputs_layer1/cluster_scatter_recall.html
[29]:
ns.pl.scatter_w_clusterSize(results, "onTarget", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
../outputs_layer1/cluster_scatter_onTarget.html