Tutorial nsforesting

This tutorial is for running NS-Forest for generating marker combinations that best classify clusters.

Setting up environment

[1]:
import sys
import os
code_folder = "C:/Users/bpeng/OneDrive - J. Craig Venter Institute/Documents/Github/NSForest"
sys.path.insert(0, os.path.abspath(code_folder))
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad
import matplotlib.pyplot as plt
import plotly.io as pio
pio.renderers.default = "notebook"
import nsforest as ns
from nsforest import utils

Data Exploration

Loading h5ad AnnData file

[2]:
data_folder = "../demo_data/"
file = data_folder + "adata_layer1.h5ad"
adata = sc.read_h5ad(file)
adata
[2]:
AnnData object with n_obs × n_vars = 871 × 16497
    obs: 'cluster'

Defining cluster_header as cell type annotation.

Note: Some datasets have multiple annotations per sample (ex. “broad_cell_type” and “granular_cell_type”). NS-Forest can be run on multiple cluster_header’s. Combining the parent and child markers may improve classification results.

[3]:
cluster_header = "cluster"

Defining output_folder for saving results

[4]:
output_folder = "../outputs_layer1/"

Looking at sample labels

[5]:
adata.obs_names
[5]:
Index(['A01_1_Nuclei_NeuNP_H200_1025_MTG_layer1_BCH9',
       'A01_BCH3_1NeuNP_H200.1030_MTG_Layer_1',
       'A02_BCH1_1NeuNP_H200.1025_MTG_layer_1',
       'A03_1_Nuclei_NeuNP_H200_1025_MTG_layer1_BCH9',
       'A04_1_Nuclei_NeuNP_H200_1025_MTG_layer1_BCH9',
       'A04_BCH1_1NeuNP_H200.1025_MTG_layer_1',
       'A04_BCH3_1NeuNP_H200.1030_MTG_Layer_1',
       'A05_1_Nuclei_NeuNP_H200_1025_MTG_layer1_BCH9',
       'A05_BCH1_1NeuNP_H200.1025_MTG_layer_1',
       'A05_BCH3_1NeuNP_H200.1030_MTG_Layer_1',
       ...
       'P09_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH7',
       'P09_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH9',
       'P09_1_Nuclei_NeuNN_H200_1030_MTG_layer1_BCH8',
       'P09_BCH1_1NeuNN_H200.1025_MTG_layer_1',
       'P10_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH6',
       'P10_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH9',
       'P10_BCH1_1NeuNN_H200.1025_MTG_layer_1',
       'P11_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH7',
       'P11_1_Nuclei_NeuNN_H200_1025_MTG_layer1_BCH9',
       'P11_1_Nuclei_NeuNN_H200_1030_MTG_layer1_BCH8'],
      dtype='object', length=871)

Looking at genes

Note: adata.var_names must be unique. If there is a problem, usually it can be solved by assigning adata.var.index = adata.var["ensembl_id"].

[6]:
adata.var_names
[6]:
Index(['A1CF', 'A2M', 'A2M_AS1', 'A2ML1', 'A2ML1_AS1', 'A2MP1', 'A3GALT2',
       'A4GALT', 'AAAS', 'AACS',
       ...
       'ZUFSP', 'ZW10', 'ZWILCH', 'ZWINT', 'ZXDC', 'ZYG11A', 'ZYG11B', 'ZYX',
       'ZZEF1', 'ZZZ3'],
      dtype='object', length=16497)

Checking cell annotation sizes

Note: Some datasets are too large and need to be downsampled to be run through the pipeline. When downsampling, be sure to have all the granular cluster annotations represented.

[7]:
pd.DataFrame(adata.obs[cluster_header].value_counts()).reset_index()
[7]:
cluster count
0 e1_e299_SLC17A7_L5b_Cdh13 299
1 i1_i90_COL5A2_Ndnf_Car4 90
2 i2_i77_LHX6_Sst_Cbln4 77
3 i3_i56_BAGE2_Ndnf_Cxcl14 56
4 i4_i54_MC4R_Ndnf_Cxcl14 54
5 g1_g48_GLI3_Astro_Gja1 48
6 i5_i47_TRPC3_Ndnf_Car4 47
7 i6_i44_GPR149_Vip_Mybpc1 44
8 i7_i31_CLMP_Ndnf_Cxcl14 31
9 g2_g27_APBB1IP_Micro_Ctss 27
10 i8_i27_SNCG_Vip_Mybpc1 27
11 i9_i22_TAC3_Vip_Mybpc1 22
12 g3_g18_GPNMB_OPC_Pdgfra 18
13 i10_i16_TSPAN12_Vip_Mybpc1 16
14 g4_g9_MOG_Oligo_Opalin 9
15 i11_i6_EGF_Vip_Mybpc1 6

Preprocessing

Generating scanpy dendrogram

Note: Only run if there is no pre-defined dendrogram order. This step can still be run with no effects, but the runtime may increase.

Dendrogram order is stored in adata.uns["dendrogram_cluster"]["categories_ordered"].

[8]:
ns.pp.dendrogram(adata, cluster_header, save = True, output_folder = output_folder, outputfilename_suffix = cluster_header)
Saving dendrogram as...
../outputs_layer1/_cluster.png
_images/tutorial_nsforesting_19_1.png

Calculating cluster medians per gene

Run ns.pp.prep_medians before running NS-Forest.

Note: Do not run if evaluating marker lists. Do not run when generating scanpy plots (e.g. dot plot, violin plot, matrix plot).

[9]:
adata = ns.pp.prep_medians(adata, cluster_header)
adata.varm["medians_cluster"]
Calculating medians per cluster: 100%|██████████| 16/16 [00:01<00:00, 10.19it/s]
Saving medians as adata.varm.medians_cluster
median: 0.0
mean: 1.626
std: 2.49
Only positive genes selected. 11688 positive genes out of 16497 total genes
--- 2.008080244064331 seconds ---
[9]:
e1_e299_SLC17A7_L5b_Cdh13 g1_g48_GLI3_Astro_Gja1 g2_g27_APBB1IP_Micro_Ctss g3_g18_GPNMB_OPC_Pdgfra g4_g9_MOG_Oligo_Opalin i10_i16_TSPAN12_Vip_Mybpc1 i11_i6_EGF_Vip_Mybpc1 i1_i90_COL5A2_Ndnf_Car4 i2_i77_LHX6_Sst_Cbln4 i3_i56_BAGE2_Ndnf_Cxcl14 i4_i54_MC4R_Ndnf_Cxcl14 i5_i47_TRPC3_Ndnf_Car4 i6_i44_GPR149_Vip_Mybpc1 i7_i31_CLMP_Ndnf_Cxcl14 i8_i27_SNCG_Vip_Mybpc1 i9_i22_TAC3_Vip_Mybpc1
A2M 0.000000 1.584962 8.985842 1.000000 0.000000 0.000000 1.792481 0.000000 0.000000 1.000000 0.000000 1.000000 1.292481 0.000000 1.000000 0.500000
A2M_AS1 0.000000 0.000000 3.169925 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
A2ML1_AS1 4.392317 7.832668 1.584962 4.253898 0.000000 6.400854 3.683161 4.522197 4.754888 2.403677 4.321928 5.392317 4.459432 5.209454 5.727921 5.096147
A2MP1 0.000000 0.000000 0.000000 0.000000 1.584962 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 0.000000
AAAS 0.000000 1.292481 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
ZYG11A 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ZYG11B 4.857981 1.000000 1.000000 0.000000 2.584963 3.836213 6.346526 5.491853 3.906891 3.064641 3.903677 5.781360 5.584649 6.475733 5.614710 5.931125
ZYX 3.169925 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ZZEF1 4.392317 1.584962 2.321928 0.000000 1.000000 4.215226 5.734567 3.390680 1.584962 1.584962 2.321928 3.169925 3.229716 4.392317 3.584963 2.660964
ZZZ3 6.303781 3.000000 1.584962 3.435182 8.189824 5.972722 5.985592 5.712608 4.906890 6.374096 5.997354 6.321928 6.169925 6.507795 5.426265 6.492921

11688 rows × 16 columns

Calculating binary scores per gene per cluster

Run ns.pp.prep_binary_scores before running NS-Forest. Do not need to run if evaluating marker lists. Do not need to run when generating scanpy plots.

[10]:
adata = ns.pp.prep_binary_scores(adata, cluster_header)
adata.varm["binary_scores_cluster"]
Calculating binary scores per cluster: 100%|██████████| 16/16 [01:13<00:00,  4.62s/it]
Saving binary scores as adata.varm.binary_scores_cluster
median: 0.1
mean: 0.202
std: 0.252
--- 74.26272201538086 seconds ---
[10]:
e1_e299_SLC17A7_L5b_Cdh13 g1_g48_GLI3_Astro_Gja1 g2_g27_APBB1IP_Micro_Ctss g3_g18_GPNMB_OPC_Pdgfra g4_g9_MOG_Oligo_Opalin i10_i16_TSPAN12_Vip_Mybpc1 i11_i6_EGF_Vip_Mybpc1 i1_i90_COL5A2_Ndnf_Car4 i2_i77_LHX6_Sst_Cbln4 i3_i56_BAGE2_Ndnf_Cxcl14 i4_i54_MC4R_Ndnf_Cxcl14 i5_i47_TRPC3_Ndnf_Car4 i6_i44_GPR149_Vip_Mybpc1 i7_i31_CLMP_Ndnf_Cxcl14 i8_i27_SNCG_Vip_Mybpc1 i9_i22_TAC3_Vip_Mybpc1
A2M 0.000000 0.623023 0.931968 0.500000 0.000000 0.000000 0.658949 0.000000 0.000000 0.500000 0.000000 0.500000 0.567888 0.000000 0.500000 0.466667
A2M_AS1 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
A2ML1_AS1 0.153393 0.470567 0.066667 0.146435 0.000000 0.352137 0.127804 0.163316 0.184686 0.089374 0.149377 0.247584 0.158108 0.228193 0.283856 0.216961
A2MP1 0.000000 0.000000 0.000000 0.000000 0.915876 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.866667 0.866667 0.000000
AAAS 0.000000 0.948420 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.933333 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
ZYG11A 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ZYG11B 0.268527 0.066667 0.066667 0.000000 0.148420 0.200397 0.381241 0.306785 0.204062 0.166928 0.203846 0.328997 0.312765 0.393586 0.315017 0.342573
ZYX 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ZZEF1 0.401457 0.091271 0.168100 0.000000 0.066667 0.381913 0.541554 0.284062 0.091271 0.091271 0.168100 0.258675 0.264994 0.401457 0.308410 0.206141
ZZZ3 0.157010 0.031445 0.000000 0.044353 0.347222 0.131380 0.132101 0.119149 0.091036 0.163913 0.132888 0.158664 0.145953 0.178503 0.107846 0.176774

11688 rows × 16 columns

Plotting median and binary score distributions

[11]:
ns.pp.plot_varm(adata, f"medians_{cluster_header}", nonzero = True, save = True, output_folder = output_folder)
Saving adata.varm[medians_cluster] as histogram as...
../outputs_layer1/histogram_medians_cluster.png
_images/tutorial_nsforesting_25_1.png
[12]:
ns.pp.plot_varm(adata, f"medians_{cluster_header}", scale = "log", save = True, output_folder = output_folder)
Saving adata.varm[medians_cluster] as histogram as...
../outputs_layer1/histogram_medians_cluster.png
_images/tutorial_nsforesting_26_1.png
[13]:
ns.pp.plot_varm(adata, f"binary_scores_{cluster_header}", nonzero = True, save = True, output_folder = output_folder)
Saving adata.varm[binary_scores_cluster] as histogram as...
../outputs_layer1/histogram_binary_scores_cluster.png
_images/tutorial_nsforesting_27_1.png
[14]:
ns.pp.plot_varm(adata, f"binary_scores_{cluster_header}", scale = "log", save = True, output_folder = output_folder)
Saving adata.varm[binary_scores_cluster] as histogram as...
../outputs_layer1/histogram_binary_scores_cluster.png
_images/tutorial_nsforesting_28_1.png

Saving preprocessed AnnData as new h5ad

[15]:
filename = file.replace(".h5ad", "_preprocessed.h5ad")
print(f"Saving new anndata object as...\n{filename}")
adata.write_h5ad(filename)
adata
Saving new anndata object as...
../demo_data/adata_layer1_preprocessed.h5ad
[15]:
AnnData object with n_obs × n_vars = 871 × 11688
    obs: 'cluster'
    uns: 'pca', 'dendrogram_cluster'
    obsm: 'X_pca'
    varm: 'PCs', 'medians_cluster', 'binary_scores_cluster'

Running NS-Forest

Note: Do not run NS-Forest if only evaluating input marker lists.

[16]:
outputfilename_prefix = cluster_header
results = ns.nsforesting.NSForest(adata, cluster_header, save_supplementary = True, save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Running NS-Forest version 4.1

Preparing adata...
Pre-selecting genes based on binary scores...
        BinaryFirst_high Threshold (mean + 2 * std): 0.706
        Average number of genes after gene_selection in each cluster: 735.5
Saving number of genes selected per cluster as...
../outputs_layer1/cluster_gene_selection.csv
--- 0.05267214775085449 seconds ---

Number of clusters to evaluate: 16
1 out of 16:
        e1_e299_SLC17A7_L5b_Cdh13
        Pre-selected 1356 genes to feed into Random Forest.
        NSForest-selected markers: ['LINC00507']
        fbeta: 0.96
        precision: 0.978
        recall: 0.893
2 out of 16:
        g1_g48_GLI3_Astro_Gja1
        Pre-selected 583 genes to feed into Random Forest.
        NSForest-selected markers: ['LINC00498']
        fbeta: 0.95
        precision: 1.0
        recall: 0.792
3 out of 16:
        g2_g27_APBB1IP_Micro_Ctss
        Pre-selected 420 genes to feed into Random Forest.
        NSForest-selected markers: ['ADAM28', 'PTPRC']
        fbeta: 0.976
        precision: 1.0
        recall: 0.889
4 out of 16:
        g3_g18_GPNMB_OPC_Pdgfra
        Pre-selected 353 genes to feed into Random Forest.
        NSForest-selected markers: ['GPNMB', 'OLIG2']
        fbeta: 0.862
        precision: 1.0
        recall: 0.556
5 out of 16:
        g4_g9_MOG_Oligo_Opalin
        Pre-selected 571 genes to feed into Random Forest.
        NSForest-selected markers: ['ST18']
        fbeta: 1.0
        precision: 1.0
        recall: 1.0
6 out of 16:
        i10_i16_TSPAN12_Vip_Mybpc1
        Pre-selected 1007 genes to feed into Random Forest.
        NSForest-selected markers: ['TSPAN12', 'CHRNB3']
        fbeta: 0.804
        precision: 0.9
        recall: 0.562
7 out of 16:
        i11_i6_EGF_Vip_Mybpc1
        Pre-selected 1912 genes to feed into Random Forest.
        NSForest-selected markers: ['EGF', 'FBRSL1']
        fbeta: 0.714
        precision: 1.0
        recall: 0.333
8 out of 16:
        i1_i90_COL5A2_Ndnf_Car4
        Pre-selected 238 genes to feed into Random Forest.
        NSForest-selected markers: ['COL5A2', 'BMP6']
        fbeta: 0.908
        precision: 0.97
        recall: 0.722
9 out of 16:
        i2_i77_LHX6_Sst_Cbln4
        Pre-selected 292 genes to feed into Random Forest.
        NSForest-selected markers: ['LHX6']
        fbeta: 0.817
        precision: 0.838
        recall: 0.74
10 out of 16:
        i3_i56_BAGE2_Ndnf_Cxcl14
        Pre-selected 151 genes to feed into Random Forest.
        NSForest-selected markers: ['BAGE2', 'SYT10']
        fbeta: 0.781
        precision: 0.962
        recall: 0.446
11 out of 16:
        i4_i54_MC4R_Ndnf_Cxcl14
        Pre-selected 223 genes to feed into Random Forest.
        NSForest-selected markers: ['ARHGAP36', 'ADAM33']
        fbeta: 0.857
        precision: 0.923
        recall: 0.667
12 out of 16:
        i5_i47_TRPC3_Ndnf_Car4
        Pre-selected 942 genes to feed into Random Forest.
        NSForest-selected markers: ['NTNG1', 'EYA4']
        fbeta: 0.906
        precision: 1.0
        recall: 0.66
13 out of 16:
        i6_i44_GPR149_Vip_Mybpc1
        Pre-selected 377 genes to feed into Random Forest.
        NSForest-selected markers: ['FLT1', 'GPR149']
        fbeta: 0.792
        precision: 1.0
        recall: 0.432
14 out of 16:
        i7_i31_CLMP_Ndnf_Cxcl14
        Pre-selected 1012 genes to feed into Random Forest.
        NSForest-selected markers: ['PAX6', 'TGFBR2']
        fbeta: 0.901
        precision: 1.0
        recall: 0.645
15 out of 16:
        i8_i27_SNCG_Vip_Mybpc1
        Pre-selected 1326 genes to feed into Random Forest.
        NSForest-selected markers: ['SNCG', 'EDNRA']
        fbeta: 0.759
        precision: 0.923
        recall: 0.444
16 out of 16:
        i9_i22_TAC3_Vip_Mybpc1
        Pre-selected 1005 genes to feed into Random Forest.
        NSForest-selected markers: ['BSPRY', 'MCTP2']
        fbeta: 0.69
        precision: 0.889
        recall: 0.364
--- 87.0947813987732 seconds ---

Saving supplementary table as...
../outputs_layer1/cluster_supplementary.csv
Saving markers table as...
../outputs_layer1/cluster_markers.csv
using median
Calculating medians per cluster: 100%|██████████| 16/16 [00:00<00:00, 366.96it/s]
Saving supplementary table as...
../outputs_layer1/cluster_markers_onTarget_supp.csv
Saving supplementary table as...
../outputs_layer1/cluster_markers_onTarget.csv

Saving final results table as...
../outputs_layer1/cluster_results.csv
Saving final results table as...
../outputs_layer1/cluster_results.pkl

[17]:
results
[17]:
software_version cluster_header clusterName clusterSize f_score precision recall TN FP FN TP marker_count NSForest_markers binary_genes onTarget
0 4.1 cluster e1_e299_SLC17A7_L5b_Cdh13 299 0.959741 0.978022 0.892977 566 6 32 267 1 [LINC00507] [SLC17A7, LINC00508, TBR1, ANKRD33B, NPTX1, LI... 0.792614
1 4.1 cluster g1_g48_GLI3_Astro_Gja1 48 0.950000 1.000000 0.791667 823 0 10 38 1 [LINC00498] [LINC00498, SLC25A18, EMX2OS, FAM189A2, SLC7A1... 1.000000
2 4.1 cluster g2_g27_APBB1IP_Micro_Ctss 27 0.975610 1.000000 0.888889 844 0 3 24 2 [ADAM28, PTPRC] [ADAM28, PLCG2, INPP5D, PTPRC, CSF2RA, P2RY13,... 1.000000
3 4.1 cluster g3_g18_GPNMB_OPC_Pdgfra 18 0.862069 1.000000 0.555556 853 0 8 10 2 [GPNMB, OLIG2] [GPNMB, COL20A1, OLIG2, STK32A, KLRC3, KLRC2, ... 1.000000
4 4.1 cluster g4_g9_MOG_Oligo_Opalin 9 1.000000 1.000000 1.000000 862 0 0 9 1 [ST18] [ST18, MOBP, CNDP1, MOG, CD22, FOLH1, TF, CARN... 1.000000
5 4.1 cluster i10_i16_TSPAN12_Vip_Mybpc1 16 0.803571 0.900000 0.562500 854 1 7 9 2 [TSPAN12, CHRNB3] [TSPAN12, TMC5, LINC01539, CHRNB3, FAM46A, ANG... 0.783762
6 4.1 cluster i11_i6_EGF_Vip_Mybpc1 6 0.714286 1.000000 0.333333 865 0 4 2 2 [EGF, FBRSL1] [EGF, FZD8, KCNJ2_AS1, FBRSL1, TEKT1, NRG3_AS1... 1.000000
7 4.1 cluster i1_i90_COL5A2_Ndnf_Car4 90 0.907821 0.970149 0.722222 779 2 25 65 2 [COL5A2, BMP6] [NMBR, COL5A2, C8ORF4, PAPSS2, TRPC3, BMP6, SS... 0.642585
8 4.1 cluster i2_i77_LHX6_Sst_Cbln4 77 0.816619 0.838235 0.740260 783 11 20 57 1 [LHX6] [LHX6, FLT3, TAC1, CALB1, RSPO3, TRBC2, GRIK3,... 1.000000
9 4.1 cluster i3_i56_BAGE2_Ndnf_Cxcl14 56 0.781250 0.961538 0.446429 814 1 31 25 2 [BAGE2, SYT10] [BAGE2, SCN5A, GREM2, FAM19A4, SYT10, ARHGAP18... 0.602383
10 4.1 cluster i4_i54_MC4R_Ndnf_Cxcl14 54 0.857143 0.923077 0.666667 814 3 18 36 2 [ARHGAP36, ADAM33] [ARHGAP36, MC4R, COBLL1, HLA_B, LINC01435, ADA... 0.710233
11 4.1 cluster i5_i47_TRPC3_Ndnf_Car4 47 0.906433 1.000000 0.659574 824 0 16 31 2 [NTNG1, EYA4] [SSTR2, KIRREL, TRPC3, NTNG1, TARID, EYA4, CA2... 0.380471
12 4.1 cluster i6_i44_GPR149_Vip_Mybpc1 44 0.791667 1.000000 0.431818 827 0 25 19 2 [FLT1, GPR149] [FLT1, PLCE1_AS1, CXCL12, SLC22A3, PLCE1_AS2, ... 0.811541
13 4.1 cluster i7_i31_CLMP_Ndnf_Cxcl14 31 0.900901 1.000000 0.645161 840 0 11 20 2 [PAX6, TGFBR2] [KIAA1644, FGF10, CLMP, PAX6, SP8, TGFBR2, WIF... 0.547845
14 4.1 cluster i8_i27_SNCG_Vip_Mybpc1 27 0.759494 0.923077 0.444444 843 1 15 12 2 [SNCG, EDNRA] [SNCG, MMRN2, EDNRA, FBN3, KCNK2, RGS2, SCML4,... 1.000000
15 4.1 cluster i9_i22_TAC3_Vip_Mybpc1 22 0.689655 0.888889 0.363636 848 1 14 8 2 [BSPRY, MCTP2] [BSPRY, OFD1P10Y, MCTP2, OFD1P8Y, OFD1P15Y, OF... 1.000000

Plotting scanpy dot plot, violin plot, matrix plot for NS-Forest markers

Note: Assign pre-defined dendrogram order here or use adata.uns["dendrogram_" + cluster_header]["categories_ordered"].

[18]:
to_plot = results.copy()
[19]:
dendrogram = [] # custom dendrogram order
dendrogram = list(adata.uns["dendrogram_" + cluster_header]["categories_ordered"])
to_plot["clusterName"] = to_plot["clusterName"].astype("category")
to_plot["clusterName"] = to_plot["clusterName"].cat.set_categories(dendrogram)
to_plot = to_plot.sort_values("clusterName")
to_plot = to_plot.rename(columns = {"NSForest_markers": "markers"})
[20]:
markers_dict = dict(zip(to_plot["clusterName"], to_plot["markers"]))
markers_dict
[20]:
{'e1_e299_SLC17A7_L5b_Cdh13': ['LINC00507'],
 'i2_i77_LHX6_Sst_Cbln4': ['LHX6'],
 'g1_g48_GLI3_Astro_Gja1': ['LINC00498'],
 'g3_g18_GPNMB_OPC_Pdgfra': ['GPNMB', 'OLIG2'],
 'g2_g27_APBB1IP_Micro_Ctss': ['ADAM28', 'PTPRC'],
 'g4_g9_MOG_Oligo_Opalin': ['ST18'],
 'i7_i31_CLMP_Ndnf_Cxcl14': ['PAX6', 'TGFBR2'],
 'i1_i90_COL5A2_Ndnf_Car4': ['COL5A2', 'BMP6'],
 'i5_i47_TRPC3_Ndnf_Car4': ['NTNG1', 'EYA4'],
 'i11_i6_EGF_Vip_Mybpc1': ['EGF', 'FBRSL1'],
 'i3_i56_BAGE2_Ndnf_Cxcl14': ['BAGE2', 'SYT10'],
 'i10_i16_TSPAN12_Vip_Mybpc1': ['TSPAN12', 'CHRNB3'],
 'i4_i54_MC4R_Ndnf_Cxcl14': ['ARHGAP36', 'ADAM33'],
 'i9_i22_TAC3_Vip_Mybpc1': ['BSPRY', 'MCTP2'],
 'i6_i44_GPR149_Vip_Mybpc1': ['FLT1', 'GPR149'],
 'i8_i27_SNCG_Vip_Mybpc1': ['SNCG', 'EDNRA']}
[21]:
ns.pl.dotplot(adata, markers_dict, cluster_header, dendrogram = dendrogram, save = True, output_folder = output_folder, outputfilename_suffix = outputfilename_prefix)
_images/tutorial_nsforesting_38_0.png
[22]:
ns.pl.stackedviolin(adata, markers_dict, cluster_header, dendrogram = dendrogram, save = True, output_folder = output_folder, outputfilename_suffix = outputfilename_prefix)
_images/tutorial_nsforesting_39_0.png
[23]:
ns.pl.matrixplot(adata, markers_dict, cluster_header, dendrogram = dendrogram, save = True, output_folder = output_folder, outputfilename_suffix = outputfilename_prefix)
_images/tutorial_nsforesting_40_0.png

Plotting classification metrics from NS-Forest results

[24]:
ns.pl.boxplot(results, ["f_score", "precision", "recall", "onTarget"], save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
 ../outputs_layer1/cluster_boxplot_f_score_precision_recall_onTarget.html

Plotting individual classification metrics

[25]:
ns.pl.boxplot(results, "f_score", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
 ../outputs_layer1/cluster_boxplot_f_score.html

Plotting metrics vs clusterSize

[26]:
ns.pl.scatter_w_clusterSize(results, "f_score", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
 ../outputs_layer1/cluster_scatter_f_score.html
[27]:
ns.pl.scatter_w_clusterSize(results, "precision", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
 ../outputs_layer1/cluster_scatter_precision.html
[28]:
ns.pl.scatter_w_clusterSize(results, "recall", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
 ../outputs_layer1/cluster_scatter_recall.html
[29]:
ns.pl.scatter_w_clusterSize(results, "onTarget", save = True, output_folder = output_folder, outputfilename_prefix = outputfilename_prefix)
Saving...
 ../outputs_layer1/cluster_scatter_onTarget.html