library("dplyr")
library("Seurat")
library("knitr")
library("ggplot2")
library("BiocManager")
library("here")
#BiocManager::install("EnhancedVolcano")
library("EnhancedVolcano") #volcano plot
#install.packages('DESeq2') #for DEG
library("DESeq2")
library("tidyverse") #tidy up data
library("styler") #tidy up data
library("scCustomize") #for color scales)

if (!require("kableExtra")) {install.packages("kableExtra"); require("kableExtra")} # for color brewer
if (!require("RColorBrewer")) {install.packages("RColorBrewer"); require("RColorBrewer")} # for color brewer
if (!require("sctransform")) {install.packages("sctransform"); require("sctransform")} # for data normalization
if (!require("glmGamPoi")) {BiocManager::install('glmGamPoi'); require("glmGamPoi")} # for data normalization, sctransform
if (!require("cowplot")) {install.packages("cowplot"); require("cowplot")} # for figure layout
if (!require("patchwork")) {install.packages("patchwork"); require("patchwork")} # for figure patching
if (!require("openxlsx")) {install.packages("openxlsx"); require("openxlsx")} # to save .xlsx files

# install.packages("styler")

set.seed(12345)
# here()

1 Welcome

Welcome to the Single-Cell Omics Research and Education Club!

If this is your time to the club, I want to extend and extra-special welcome to you!

I’m Jonathan Nelson, an Assistant Professor at the University of Southern California. I’m a wet scientist turned wet+dry scientist. I’ve been working with single-cell RNAseq data for the past 5 years and I’m excited to share what I’ve learned with you.

1.1 SCORE Values

1.1.1 Learning

We believe that bioinformatics is a constantly evolving field, and that ongoing learning and professional development is essential to staying up-to-date. We encourage members to share their knowledge and experiences with each other, and to seek out opportunities for continued learning.

1.1.2 Accessibility

We believe that access to bioinformatics support should be available to everyone. We strive to create a welcoming and inclusive environment where all members can feel comfortable asking for help and contributing to the group.

1.1.3 Collaboration

We believe that working together is key to achieving success in bioinformatics. We value the diversity of perspectives and backgrounds that each member brings, and we encourage open communication and the sharing of ideas.

1.1.4 Integrity

We believe in conducting ourselves with honesty and professionalism in all our interactions. We hold ourselves to high ethical standards and respect the privacy and confidentiality of all members.

1.1.5 Empathy

We believe in approaching each other with empathy and kindness. We understand that bioinformatics can be a challenging and sometimes frustrating field, and we strive to support each other through these difficulties.

1.2 Context and Expectations

I know a lot of this has been going on in the background for everyone and I wanted to bring it to the forefront. My expectation is that we have 4-6 more meetings about spatial transcriptomics…and then we’ll re-evaluate.

Email me you would like me to add anyone: j.nelson@med.usc.edu

Today’s code (this html file) will be posted to the SCORE website (https://usckrc.github.io/website/score.html)

2 The Agenda!

2.1 Music and Memes

2.2 Coding Crumbs: qs package

2.3 Recreating a Figure: ComplexHeatmap

2.4 Main Theme: Xenium

3 Music and Memes

3.1 This Months Coding Music

3.1.1 Manchester Orchestra: A Black Mile to the Surface

https://open.spotify.com/album/4hruYceqit29o6m4arpAql?si=8-2MnzK2SDyZLmsnF0Q1ig

3.1.2 The Silence Hits

3.2 The Memes

3.2.1 Meme 1: THE Academic Path

3.2.2 Meme 2: MEETless Mondays

4 Coding Crumbs: qs package

4.1 Saving .RDS files is a problem

4.2 But also increadibly important!

4.2.1 It preserves the meta.data/embeddings/annotations that you’ve analyzed

4.3 But it takes up A LOT of space!

4.3.1 Especially for single-cell RNAsequencing objects

4.3.2 I often have to pick and choose which RDS objects I keep on my computer for what I’m working on.

4.3.3 Case and Point: KPMP Object (Week 5 topic)

4.3.3.1 KPMP object as a .h5surat: 4GB

4.4 Enter the qs package

4.4.1 Special Thank You to Zhongwei Li and Jincan He for introducing this package to me

https://github.com/qsbase/qs

4.4.2 Features

4.4.3 Serialization Benchmarks

4.4.4 De-serialization Benchmarks

4.5 Essential Information

4.5.1 Install qs on your system

# CRAN version
install.packages("qs")

# CRAN version compile from source (recommended)
remotes::install_cran("qs", type = "source", configure.args = "--with-simd=AVX2")

4.6 Save Data

qsave(df1, "myfile.qs")

4.7 Load Data

df2 <- qread("myfile.qs")

4.8 It’s that EASY

4.8.1 .qs files are 2-3x less in size than .RDS files

5 Recreating a Figure: ComplexHeatmap

5.1 Credit to Sienna Blanche

5.2 I got this email from a collaborator asking for analysis of a few genesets

5.2.1 We started with a series of VlnPlots

5.2.1.1 Honestly, not that intuative to understand for someone who doesn’t look at scRNAseq data

5.2.1.2 We even tried helping by adding the mean

5.2.1.3 Code for adding mean to VlnPlot

  p <- VlnPlot(
    seurat_subset,
    features = gene,
    group.by = "subclass.l2",
    split.by = "Enrollment.Category",
    pt.size = 0.5 
  ) +
    stat_summary(
      fun = mean,
      geom = "point",
      color = "red",
      size = 5,
      position = position_dodge(width = 0.9)  # aligns dots with split violins
    ) +
    ggtitle(paste("Violin plot of", gene, "\n(Red dots = group means)")) +
    theme(axis.text.x = element_text(angle = 0, hjust = 0.5))
  
  print(p)

5.3 Enter ComplexHeatmap

https://github.com/jokergoo/ComplexHeatmap

5.3.1 Install ComplexHeatmap on your system

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("ComplexHeatmap")

5.3.2 Citation for ComplexHeatmap

5.4 Example 1: scRNAseq data

5.5 Example 2: 3D Heatmap

5.6 THE CODE

fxn_genes <- c("SLC9A3", "ATP1A1", "ATP1B1",  "SLC5A1", "SLC5A2", "CLDN2", "SLC4A4", "SLC34A1", "SLC34A3")
injury_genes <- c("HAVCR1", "HIF1A", "HMOX1",  "PCK1")
albumin_genes <- c("LRP2", "CUBN", "CCL5", "DAB2", "CLTA", "CLTB", "CLTC", "VIL1")
differentiated_fxn_genes <- c("CTSL", "LAMP1","ABCD3", "SLC15A2")

# define columns and rows
pt_order <- c(
"HR_PT-S1", "AKI_PT-S1", "CKD_PT-S1", 
"HR_PT-S2", "AKI_PT-S2", "CKD_PT-S2", 
"HR_PT-S3", "AKI_PT-S3", "CKD_PT-S3")

all_genes <- c(fxn_genes, injury_genes, albumin_genes, differentiated_fxn_genes) 

# Use Seurat to get category+subclass averages
expr_avg <- AverageExpression(
  KPMP,
  features = all_genes,
  group.by = c("Enrollment.Category", "subclass.l2")
)$RNA

# keep only PT-S1–S3 subclasses
expr_avg <- expr_avg[, grepl("PT-S[123]", colnames(expr_avg))]

# rename Enrollment.Category with line breaks
colnames(expr_avg) <- gsub("Healthy Reference", "HR", colnames(expr_avg))

# scale by row
expr_scaled <- t(scale(t(expr_avg)))

# functional block mapping  
block_name <- c(
  setNames(rep("Markers of Function", length(fxn_genes)), fxn_genes),
  setNames(rep("Markers of Injury", length(injury_genes)), injury_genes),
  setNames(rep("Genes Related to Albumin Uptake", length(albumin_genes)), albumin_genes),
  setNames(rep("Genes Related to Differentiated Function", length(differentiated_fxn_genes)), differentiated_fxn_genes)
)

 row_anno <- rowAnnotation(
  "Functional Block" = block_name[rownames(expr_scaled)],
  col = list("Functional Block" = c(
    "Markers of Function" = "#8B0000",
    "Markers of Injury" = "#4169E1",
    "Genes Related to Albumin Uptake" = "#006400",
    "Genes Related to Differentiated Function" = "#f8d568"
  )),
  show_annotation_name = FALSE,  # hides the "Functional Block" text at bottom
 annotation_legend_param = list(
    title_gp = gpar(fontsize = 18, fontface = "bold"),   
    labels_gp = gpar(fontsize = 14)                      
  )
)

 # Make labels for plotting
col_labels <- gsub("_PT-S[123]", "", colnames(expr_scaled))

#enrollment <- sapply(strsplit(colnames(expr_scaled), "_"), `[`, 1)

col_anno <- HeatmapAnnotation(
  Enrollment = anno_text(
    col_labels, 
    gp = gpar(fontsize = 14), 
    just = "center",
    rot = 0),
  annotation_height = unit(3, "mm")
)

# subclass info for column splitting
col_groups <- sapply(strsplit(colnames(expr_scaled), "_"), `[`, 2)

# drop subclass suffix from labels (after groups are made)
#colnames(expr_scaled) <- gsub("_PT-S[123]", "", colnames(expr_scaled))

# orignal heatmap
Heatmap(
  expr_scaled,
  name = "Expression",
  col = col_fun,
  cluster_rows = FALSE,
  cluster_columns = FALSE,
  show_column_names = FALSE,
  row_split = block_name[rownames(expr_scaled)],
  row_title = NULL,
  row_names_gp = gpar(fontsize = 14, fontface = "bold"),
  left_annotation = row_anno,
  top_annotation = col_anno,
  column_split = col_groups,   # split by PT-S1, PT-S2, PT-S3
  column_gap = unit(5, "mm"), 
  column_title_gp = gpar(fontsize = 18, fontface = "bold"), 
  heatmap_legend_param = list(
    title = "Scaled Expr",
    at = c(-2, 0, 2),
    labels = c("Low", "Mid", "High"),
    title_gp = gpar(fontsize = 18, fontface = "bold"),
    labels_gp = gpar(fontsize = 14)
  )
)

5.7 Final Product!

5.8 Lessons for Me

5.8.1 5.8.2 Control of row and column blocking is powerful for making figures

6 Main Theme: Xenium

6.1 What is Xenium?

6.2 Great Review by Lu Zhong at Week 6 SCORE on Spatial Transcriptomic Technologies

https://usckrc.github.io/website/SCORE_6__LZ_ST_ppt.pdf

6.3 Cost of a Xenium Experiment

6.3.1 Does NOT Informatics Time

6.3.2 Some “Back of the Envelope” Math

6.3.2.0.1 Vanilla Xenium Experiment

6.3.2.1 2x Xenium Slides: $15,000

6.3.2.2 8 samples/slide x2 slides + $1,000 Sample Prep/Labor/QC

6.3.2.3 Roughly $1,000 per sample

6.3.2.4 Could be less with Discounts from 10X Rep

6.4 My Xenium Experiment

6.4.1 I want to compare the effect of environmental pollutants on kidney transcriptomics

6.4.2 BIG THANK YOU to Stephanie Tring and Amanda Ye at USC Molecular Genomics Core

https://uscnorriscancer.usc.edu/molecular-genomics-core/

6.4.3 BIG THANK YOU to Vicky Yamamoto for embedding and cutting the sections

6.4.4 Rough Timeline of Experiment

6.4.5 Practice and Preparation

https://kb.10xgenomics.com/hc/en-us/articles/12811210821133-How-can-I-more-easily-visualize-the-fiducial-frame-on-Xenium-slides

6.4.5.1 Too close to Fiducials

https://kb.10xgenomics.com/hc/en-us/articles/18142712612493-What-should-I-do-if-I-cover-fiducials-on-the-Xenium-slide

6.4.6 Keys to success in Sample Preparation

6.4.6.1 1) RNAase free reagents when collecting tissue (PBS and PFA)

6.4.6.2 2) Use RNAase Away when cutting blocks on microtome

6.4.7 QC of Sample

6.4.7.1 H&E Section

6.4.7.2 RNA Scroll of Block

6.4.7.2.1 DV200 is 88% (minimum is 30%)

6.4.7.3 DAPI Stain

6.4.8 You get slide after being on Xenium Instrument

6.4.9 Xenium Outputs

6.4.10 `analysis_summary.html`

6.4.10.1 `Key Metrics`

6.4.10.2 Note the number of cells!

6.4.10.3 Context

6.4.10.4 `Decoding`

6.4.10.5 `Cell Segmentation`

6.4.10.6 `Analysis`

6.4.11 Xenium Explorer

6.4.11.1 Download

https://www.10xgenomics.com/support/software/xenium-explorer/downloads

6.4.11.2 Tutorial

https://www.10xgenomics.com/support/software/xenium-explorer/latest/tutorials

6.4.11.3 LIVE DEMO with Xenium Explorer

6.5 Working with Xenium Data Seurat

https://satijalab.org/seurat/articles/seurat5_spatial_vignette_2

6.5.1 Loading the Xenium Data into Seurat

6.5.1.1 NEED to install the `arrow` package

if (!require("arrow")) {install.packages("arrow"); require("arrow")}

6.5.1.2 Can take 5 minutes

xenium.obj <- LoadXenium(here("output-XETG00402__0054800__Region_1__20250822__221946"), fov = "fov")

6.5.2 Initial QC Check

xenium.obj <- subset(xenium.obj, subset = nCount_Xenium > 0)

VlnPlot(xenium.obj, features = c("nFeature_Xenium", "nCount_Xenium"), ncol = 2, pt.size = 0)

6.5.3 Basic Processing

xenium.obj <- SCTransform(xenium.obj, assay = "Xenium")
xenium.obj <- RunPCA(xenium.obj, npcs = 30, features = rownames(xenium.obj))
xenium.obj <- RunUMAP(xenium.obj, dims = 1:30)
xenium.obj <- FindNeighbors(xenium.obj, reduction = "pca", dims = 1:30)
xenium.obj <- FindClusters(xenium.obj, resolution = 0.3)

DimPlot(xenium.obj)

6.5.4 Seurat -> Xenium Explorer

6.5.5 The Question: Are High nCount Cells Doublets?

6.5.6 How do I visualize these cells of Seurat on Xenium Explorer?

6.5.7 The Code

high_count_cells <- xenium.obj@meta.data %>%
  filter(nCount_Xenium > 2500) %>%
  rownames()

df_export <- data.frame(
  cell_id = high_count_cells,
  group   = "Higher_than_2500_counts"
) %>%
  tibble::rownames_to_column("row") %>%
  select(-row)

# Save as CSV
write.csv(df_export, "cellid_higher_than_2500_counts.csv", row.names = FALSE)

6.5.8 Xenium Explorer -> Seurat

6.5.9 The Question: How do the annotations in Xenium Explorer Look in Seurat?

6.5.10 How do I visualize these cells from Xenium Explorer in Seurat?

cells <- read_csv("Selection_1_cells_stats.csv", skip = 2) %>%
  as.data.frame() %>%                   # dplyr/tibble prefers explicit cols
  column_to_rownames(var = colnames(.)[1])   # use first column as rownames

xenium.obj <- AddMetaData(xenium.obj, cells)

head(xenium.obj@meta.data)

6.5.11 Added as `meta.data`

6.5.12 Annotating cells from a Xenium experiment

6.6 TOPIC FOR FUTURE SCORE!

6.6.1 Things I’m Struggling With

6.6.1.1 QC for multiplets

6.6.1.2 Size of Seurat Objects from Xenium Data

6.7 Questions?

6.8 Community Questions

Do you have a coding problem that you’d like some support on?
Do you have a topic you’d like covered at a future meeting?

Email me: j.nelson@med.usc.edu

6.9 Upcoming Schedule

6.10 Brainstorming Upcoming Topics

6.10.1 Potential Future Topics on Xenium Analysis

6.10.1.1 QC and Annotating Xenium Datasets

6.10.1.2 Integration of Xenium and snRNAseq Datasets

6.10.1.3 Spatially Informed Clustering

6.10.1.4 Comparing Xenium and snRNAseq DEG Analysis

7 Session Info

sessionInfo()

## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] openxlsx_4.2.8              patchwork_1.3.2            
##  [3] cowplot_1.2.0               glmGamPoi_1.20.0           
##  [5] sctransform_0.4.2           RColorBrewer_1.1-3         
##  [7] kableExtra_1.4.0            scCustomize_3.2.0          
##  [9] styler_1.10.3               lubridate_1.9.4            
## [11] forcats_1.0.0               stringr_1.5.1              
## [13] purrr_1.1.0                 readr_2.1.5                
## [15] tidyr_1.3.1                 tibble_3.3.0               
## [17] tidyverse_2.0.0             DESeq2_1.48.2              
## [19] SummarizedExperiment_1.38.1 Biobase_2.68.0             
## [21] MatrixGenerics_1.20.0       matrixStats_1.5.0          
## [23] GenomicRanges_1.60.0        GenomeInfoDb_1.44.2        
## [25] IRanges_2.42.0              S4Vectors_0.46.0           
## [27] BiocGenerics_0.54.0         generics_0.1.4             
## [29] EnhancedVolcano_1.26.0      ggrepel_0.9.6              
## [31] here_1.0.1                  BiocManager_1.30.26        
## [33] ggplot2_3.5.2               knitr_1.50                 
## [35] Seurat_5.3.0                SeuratObject_5.2.0         
## [37] sp_2.2-0                    dplyr_1.1.4                
## 
## loaded via a namespace (and not attached):
##   [1] RcppAnnoy_0.0.22        splines_4.5.1           later_1.4.4            
##   [4] R.oo_1.27.1             polyclip_1.10-7         janitor_2.2.1          
##   [7] fastDummies_1.7.5       lifecycle_1.0.4         rprojroot_2.1.1        
##  [10] globals_0.18.0          lattice_0.22-7          MASS_7.3-65            
##  [13] magrittr_2.0.3          plotly_4.11.0           sass_0.4.10            
##  [16] rmarkdown_2.29          jquerylib_0.1.4         yaml_2.3.10            
##  [19] httpuv_1.6.16           zip_2.3.3               spam_2.11-1            
##  [22] spatstat.sparse_3.1-0   reticulate_1.43.0       pbapply_1.7-4          
##  [25] abind_1.4-8             Rtsne_0.17              R.cache_0.17.0         
##  [28] R.utils_2.13.0          circlize_0.4.16         GenomeInfoDbData_1.2.14
##  [31] irlba_2.3.5.1           listenv_0.9.1           spatstat.utils_3.1-5   
##  [34] goftest_1.2-3           RSpectra_0.16-2         spatstat.random_3.4-1  
##  [37] fitdistrplus_1.2-4      parallelly_1.45.1       svglite_2.2.1          
##  [40] codetools_0.2-20        DelayedArray_0.34.1     xml2_1.4.0             
##  [43] shape_1.4.6.1           tidyselect_1.2.1        UCSC.utils_1.4.0       
##  [46] farver_2.1.2            spatstat.explore_3.5-2  jsonlite_2.0.0         
##  [49] progressr_0.15.1        ggridges_0.5.7          survival_3.8-3         
##  [52] systemfonts_1.2.3       tools_4.5.1             ica_1.0-3              
##  [55] Rcpp_1.1.0              glue_1.8.0              gridExtra_2.3          
##  [58] SparseArray_1.8.1       xfun_0.53               withr_3.0.2            
##  [61] fastmap_1.2.0           digest_0.6.37           timechange_0.3.0       
##  [64] R6_2.6.1                mime_0.13               textshaping_1.0.3      
##  [67] ggprism_1.0.7           colorspace_2.1-1        scattermore_1.2        
##  [70] tensor_1.5.1            spatstat.data_3.1-8     R.methodsS3_1.8.2      
##  [73] data.table_1.17.8       httr_1.4.7              htmlwidgets_1.6.4      
##  [76] S4Arrays_1.8.1          uwot_0.2.3              pkgconfig_2.0.3        
##  [79] gtable_0.3.6            lmtest_0.9-40           XVector_0.48.0         
##  [82] htmltools_0.5.8.1       dotCall64_1.2           scales_1.4.0           
##  [85] png_0.1-8               snakecase_0.11.1        spatstat.univar_3.1-4  
##  [88] rstudioapi_0.17.1       tzdb_0.5.0              reshape2_1.4.4         
##  [91] nlme_3.1-168            GlobalOptions_0.1.2     cachem_1.1.0           
##  [94] zoo_1.8-14              KernSmooth_2.23-26      parallel_4.5.1         
##  [97] miniUI_0.1.2            vipor_0.4.7             ggrastr_1.0.2          
## [100] pillar_1.11.0           grid_4.5.1              vctrs_0.6.5            
## [103] RANN_2.6.2              promises_1.3.3          beachmat_2.24.0        
## [106] xtable_1.8-4            cluster_2.1.8.1         paletteer_1.6.0        
## [109] beeswarm_0.4.0          evaluate_1.0.5          cli_3.6.5              
## [112] locfit_1.5-9.12         compiler_4.5.1          rlang_1.1.6            
## [115] crayon_1.5.3            future.apply_1.20.0     rematch2_2.1.2         
## [118] plyr_1.8.9              ggbeeswarm_0.7.2        stringi_1.8.7          
## [121] viridisLite_0.4.2       deldir_2.0-4            BiocParallel_1.42.2    
## [124] lazyeval_0.2.2          spatstat.geom_3.5-0     Matrix_1.7-3           
## [127] RcppHNSW_0.6.0          hms_1.1.3               future_1.67.0          
## [130] shiny_1.11.1            ROCR_1.0-11             igraph_2.1.4           
## [133] bslib_0.9.0

Single-Cell Omics Research and Education

SCORE #7: Starting Out With Xenium Data

Jonathan Nelson

29 September 2025