library("dplyr")
library("Seurat")
library("knitr")
library("ggplot2")
library("BiocManager")
library("here")
#BiocManager::install("EnhancedVolcano")
library("EnhancedVolcano") #volcano plot
#install.packages('DESeq2') #for DEG
library("DESeq2")
library("tidyverse") #tidy up data
library("styler") #tidy up data
library("scCustomize") #for color scales)


if (!require("kableExtra")) {install.packages("kableExtra"); require("kableExtra")} # for color brewer
if (!require("RColorBrewer")) {install.packages("RColorBrewer"); require("RColorBrewer")} # for color brewer
if (!require("sctransform")) {install.packages("sctransform"); require("sctransform")} # for data normalization
if (!require("glmGamPoi")) {BiocManager::install('glmGamPoi'); require("glmGamPoi")} # for data normalization, sctransform
if (!require("cowplot")) {install.packages("cowplot"); require("cowplot")} # for figure layout
if (!require("patchwork")) {install.packages("patchwork"); require("patchwork")} # for figure patching
if (!require("openxlsx")) {install.packages("openxlsx"); require("openxlsx")} # to save .xlsx files

# install.packages("styler")



set.seed(12345)
# here()

1 Welcome

Welcome to the Single-Cell Omics Research and Education Club!

If this is your time to the club, I want to extend and extra-special welcome to you!

I’m Jonathan Nelson, an Assistant Professor at the University of Southern California. I’m a wet scientist turned wet+dry scientist. I’ve been working with single-cell RNAseq data for the past 5 years and I’m excited to share what I’ve learned with you.

1.1 SCORE Values

1.1.1 Learning

We believe that bioinformatics is a constantly evolving field, and that ongoing learning and professional development is essential to staying up-to-date. We encourage members to share their knowledge and experiences with each other, and to seek out opportunities for continued learning.

1.1.2 Accessibility

We believe that access to bioinformatics support should be available to everyone. We strive to create a welcoming and inclusive environment where all members can feel comfortable asking for help and contributing to the group.

1.1.3 Collaboration

We believe that working together is key to achieving success in bioinformatics. We value the diversity of perspectives and backgrounds that each member brings, and we encourage open communication and the sharing of ideas.

1.1.4 Integrity

We believe in conducting ourselves with honesty and professionalism in all our interactions. We hold ourselves to high ethical standards and respect the privacy and confidentiality of all members.

1.1.5 Empathy

We believe in approaching each other with empathy and kindness. We understand that bioinformatics can be a challenging and sometimes frustrating field, and we strive to support each other through these difficulties.

1.2 Context and Expectations

I know a lot of this has been going on in the background for everyeone and I wanted to bring it to the forefront. My expectation is that we have about 6 of these meetings together and then we can re-evaluate if we want to continue as a group or not.

Email me you would like me to add anyone:

Today’s code (this html file) will be posted to the SCORE website (https://usckrc.github.io/website/score.html)

2 The Agenda!

2.1 Music and Memes

2.2 Coding Crumbs: renv package

2.3 Recreating a Figure: Stacked Violin Plot

2.4 Main Theme: Accessing KPMP Data



3 Music and Memes

3.1 This Months Coding Music

3.1.2 Heartbreak Anniversay Hits

3.2 The Memes

3.2.1 Not a strong month for bioinfromatic memes

3.2.2 But I hope you enjoy these!

3.2.3 Meme 1: Periodic table of “Can I lick it?”

3.2.4 Meme 2: What goes into Arxiv



4 Coding Crumbs: renv package

4.1 It finally happened!

4.1.1 I got too tired of switching between my Seurat v4 and Seurat v5 computers

4.1.1.1 This was my solution for the Seurat v5

4.1.1.2 Screenshot from Score #1

4.2 What is renv?

https://rstudio.github.io/renv/articles/renv.html

4.2.1 A package for managing R project environments

4.2.2 Let’s be honest…we often get stuck in between projects as packages change

Project 1
Project 1
Project 2
Project 2
Project 3
Project 3
Library 2
Library 2
Library 3
Library 3

Global Package Cache

Global Package C...
Library 1
Library 1

4.2.3 It allows you to create a snapshot of your R packages and versions

4.2.4 It allows you to restore your R packages and versions

4.2.5 Great video from Albert Rapp on using renv

https://www.youtube.com/watch?v=Oen9xhEh8PY

4.2.6 It took me 5 minutes to set this up on my system

4.2.7 I started by using the renv package in a new github repo that I was working on

4.2.8 This easy!

4.2.8.1 Open up project from the folder where you want to use renv

4.2.9 Now when I open the project from this location: Seurat v5

4.2.10 When I open project from any other folder: Seurat v4

4.3 Take Home

4.3.1 Implimenting renv is a great way to manage your R packages and versions

4.3.2 It’s been seamless and helped me to move between Seurat v4 and Seurat v5

4.3.3 Important for making your code reproducible!

4.3.3.1 Add a renv.lock file to your git repo so that others can use the same packages and versions



5 Recreating a Figure: Stacked Violin Plot

5.1 Final Product!

5.2 Publication Examples of Stacked Violin Plot

5.2.1 Figure 1 from Park et al. Science 2018

https://pubmed.ncbi.nlm.nih.gov/29622724/

5.2.2 Figure 1 from Karaiskos et al. JASN 2018

https://pubmed.ncbi.nlm.nih.gov/29794128/

5.3 Start with a Seurat Object

5.3.1 Data from Figure 1 from Burfeind et al. Phys Genomics 2025

https://pubmed.ncbi.nlm.nih.gov/39982410/

5.3.2 Load Dataset

SO <- readRDS(here("Week 4 CellChat", "data", "CACPR_integrated.rds"))

DimPlot(SO, group.by = "subclass.CACPR") + ggtitle("")

DefaultAssay(SO) <- "RNA"

SO@meta.data

5.3.3 Step 1: Create a list of genes

markers.to.plot1 <- c("Lrp2",         # PT
                      "Slc5a12",      # PT-S1
                      "Slc13a3",      # PT-S2
                      "Slc16a9",      # PT-S3
                      "Havcr1",       # Injured PT
                      "Epha7",        # dTL
                      "Slc12a1",      # TAL
                      "Cldn10",       # TAL
                      "Cldn16",       # TAL
                      "Nos1",         # MD
                      "Slc12a3",      # DCT
                      "Pvalb",        # DCT1
                      "Slc8a1",       # DCT2, CNT
                      "Aqp2",         # PC
                      "Slc4a1",       # IC-A
                      "Slc26a4",      # IC-B
                      "Upk1b",        # Uro
                      "Ncam1",        # PEC
                      "Pdgfrb",       # Perivascular
                      "Piezo2",       # Mesangial
                      "Pdgfra",       # Fib
                      "Acta2",        # Mural
                      "Nphs1",        # Podo
                      "Kdr",          # Capillary Endo
                      "Lyve1",        # Lymph
                      "Ptprc",        # Immune
                      "Cd74",         # Macrophage
                      "Skap1"         # B/T Cells 
                      )

5.3.4 Step 1b: Multidimentional Dotplot code

DotPlot(SO,
features = markers.to.plot1,
dot.scale = 8,
dot.min = 0,
scale.max = 100,
scale.min = 0,
col.min = -2.5,
col.max = 2.5,
group.by = "class.CACPR") +
  coord_flip() +
  theme_classic() +
  theme(axis.line = element_line(size = 1, colour = "black"),
        axis.ticks.x = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        text = element_text(size=20),
        axis.text.x = element_text(angle = 45, hjust = 1)) +
  xlab(NULL) +
  ylab(NULL)

5.3.5 Step 2: Violin Plot of List in Features

VlnPlot(SO,
features = markers.to.plot1,
group.by = "subclass.CACPR")

5.3.5.1 Looks really bad!

5.3.6 Step 3: Change the stack = argument

5.3.6.1 ?VlnPlot

5.3.6.2 Gives you the modifiable arguements in the Vln Function

VlnPlot(SO,
features = markers.to.plot1,
group.by = "subclass.CACPR",
stack = TRUE)  # This is the key argument to change

5.4 Lessons for Me

5.4.1 Sometimes creating graphs is as sample as knowing the arguments in a function

5.4.2 Still need to work on coloring by subclass.CACPR rather than feature

5.4.2.0.1 Challenge for the next SCORE?

6 Main Theme: Accessing KPMP Data

6.1 The Kidney Precision Medicine Project

https://www.kpmp.org/

6.2 Manuscript Describing the Goals of KPMP

https://pubmed.ncbi.nlm.nih.gov/33637194/

6.2.1 Overarching Goals

6.2.2 Patient Population

6.2.3 Tissue Handling

6.2.4 Outcomes

6.3 Atlas 1.0: Lake et al. Nature 2023

https://pubmed.ncbi.nlm.nih.gov/37468583/

6.3.1 Lessons I learned from Atlas 1.0

6.3.1.1 Multi-layered meta.data

6.3.1.2 Cell states in meta.data

6.4 There are a lot of great outcomes of the KPMP

6.5 A well-curated atlas of human kidney tissues is one of them!

6.6 Accessing KPMP Datasets

https://www.kpmp.org/available-data

6.6.1 Office hours have been very helpful

6.6.2 KPMP very responsive to data access problems

6.6.2.1 Overlaping datasets by Participant

6.6.2.2 Regret I didn’t set this up as an Upset Plot


6.6.3 Single-nucleus RNA-seq data

6.6.3.1 This has a lot of the individual sample files

6.6.4 Shortcut to get to annotated object

6.7 snRNAseq object: WashU-UCSD_HuBMAP_KPMP-Biopsy_10X-R_12032021.h5Seurat

6.7.1 Code to load

if (!require("SeuratDisk")) {install.packages("SeuratDisk"); require("SeuratDisk")}

SO <- LoadH5Seurat(here("WashU-UCSD_HuBMAP_KPMP-Biopsy_10X-R_12032021.h5Seurat"))

6.7.2 Notes

6.7.2.1 This is Atlas 1.0 data

6.7.2.2 Newer files have more samples…but are a smaller file size because it’s filtered more

6.7.2.3 When publishing…make sure to define the file you used

6.7.2.4 KPMP is consistently updating files with more samples

6.7.2.5 Goals is to have 1000 participants

6.7.3 Meta.data in the snRNASeq object

6.7.4 subclass.l2: Highest Resolution Cell Type/State

6.7.5 subclass.l1: Cell Type

6.7.6 class: Cell Group

6.8 There is a lot that you can do with just this information

6.8.1 But what about the clinical meta.data?

6.8.2 Separate cells according to disease type

6.8.2.1 Clinical Meta.data online!

https://www.kpmp.org/available-data

6.8.2.2 Data portal

6.8.2.3 This is a .csv file

Participant ID
Tissue Source
Protocol
Sample Type
Enrollment Category
Primary Adjudicated Category
Sex
Age (Years) (Binned)
Race
KDIGO Stage
Baseline eGFR (ml/min/1.73m2) (Binned)
Proteinuria (mg) (Binned)
A1c (%) (Binned)
Albuminuria (mg) (Binned)
Diabetes History
Diabetes Duration (Years)
Hypertension History
Hypertension Duration (Years)
On RAAS Blockade

6.8.2.4 Adding Meta.data to Seurat Object

6.8.2.4.1 Old way (Xiao-Tong Su)

6.8.2.4.2 New Way (dplr!)

6.8.2.5 Flexible enough to use with any object (as they are updated)

6.8.2.6 Meta.data in the object!

6.8.2.7 Application: Split data by disease type

6.9 Examples of Publishing with KPMP Data

6.9.1 So far I’ve published 4 articles using KPMP data

6.9.1.1 JASN 2024

6.9.1.1.1 Define DCT subtypes in human kidney

https://pubmed.ncbi.nlm.nih.gov/38238903/

6.9.1.2 JCI INsight 2025

6.9.1.2.1 Define TAL subtypes in human kidney

https://pubmed.ncbi.nlm.nih.gov/40471686/

6.9.1.3 Phys Genomics 2025

6.9.1.3.1 Define PT cell response to injury

https://pubmed.ncbi.nlm.nih.gov/39982410/

6.9.1.4 AJPR 2025

6.9.1.4.1 Define PT cell response to injury

https://pubmed.ncbi.nlm.nih.gov/40027701/

6.9.2 Publishing Tips

6.9.2.1 Must convert from mouse -> human (see SCORE #4)

6.9.2.2 Directions for citing KPMP Data in manuscripts

https://www.kpmp.org/help-docs/study-overview?tabname=citingkpmpdata

6.9.2.3 When in doubt email the KPMP team!

https://www.kpmp.org/collaboration

6.9.2.4 Add to Methods

6.9.2.5 Email Help

Nikki Bonevich is a great point of contact for KPMP data questions

email:

7 Closing Remarks

I hope that you found this session helpful.

7.1 Would anyone else like to share their experience with KPMP data?

7.2 Questions?

7.3 Community Questions

Do you have a coding problem that you’d like some support on?
Do you have a topic you’d like covered at a future meeting?

Email me:

7.4 Upcoming Schedule

7.5 Brainstorming Upcoming Topics

7.5.1 Potential Future Topics

7.5.1.1 Getting Started with Spatial Transcriptomics Data (Lu Zhang)

7.5.1.2 Marker selection for Spatial Transcriptomics (Jonathan Nelson)

7.5.1.3 Tissue Preparation for Spatial Transcriptomics (Jonathan Nelson)

7.5.1.4 Machine Learning in Single Cell Analysis (Michael Haidar)

7.5.1.5 Single Cell Analysis with Continuous Variables (Kelly Street)

7.5.1.6 Defining Cells with Gene sets (Jonathan Nelson)

7.5.1.7 Integrating Single-Cell and Single-Nucleus RNAseq Data (Katie Emberley)

8 Session Info

sessionInfo()
## R version 4.4.3 (2025-02-28 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] openxlsx_4.2.8              patchwork_1.3.0            
##  [3] cowplot_1.1.3               glmGamPoi_1.18.0           
##  [5] sctransform_0.4.1           RColorBrewer_1.1-3         
##  [7] kableExtra_1.4.0            scCustomize_3.0.1          
##  [9] styler_1.10.3               lubridate_1.9.4            
## [11] forcats_1.0.0               stringr_1.5.1              
## [13] purrr_1.0.4                 readr_2.1.5                
## [15] tidyr_1.3.1                 tibble_3.2.1               
## [17] tidyverse_2.0.0             DESeq2_1.46.0              
## [19] SummarizedExperiment_1.36.0 Biobase_2.66.0             
## [21] MatrixGenerics_1.18.1       matrixStats_1.5.0          
## [23] GenomicRanges_1.58.0        GenomeInfoDb_1.42.3        
## [25] IRanges_2.40.1              S4Vectors_0.44.0           
## [27] BiocGenerics_0.52.0         EnhancedVolcano_1.24.0     
## [29] ggrepel_0.9.6               here_1.0.1                 
## [31] BiocManager_1.30.25         ggplot2_3.5.1              
## [33] knitr_1.50                  SeuratObject_5.0.2         
## [35] Seurat_4.4.0                dplyr_1.1.4                
## 
## loaded via a namespace (and not attached):
##   [1] RcppAnnoy_0.0.22        splines_4.4.3           later_1.4.1            
##   [4] R.oo_1.27.0             polyclip_1.10-7         janitor_2.2.1          
##   [7] lifecycle_1.0.4         rprojroot_2.0.4         globals_0.16.3         
##  [10] lattice_0.22-6          MASS_7.3-64             magrittr_2.0.3         
##  [13] plotly_4.10.4           sass_0.4.9              rmarkdown_2.29         
##  [16] jquerylib_0.1.4         yaml_2.3.10             httpuv_1.6.15          
##  [19] zip_2.3.2               spam_2.11-1             sp_2.2-0               
##  [22] spatstat.sparse_3.1-0   reticulate_1.41.0.1     pbapply_1.7-2          
##  [25] abind_1.4-8             zlibbioc_1.52.0         Rtsne_0.17             
##  [28] R.cache_0.16.0          R.utils_2.13.0          circlize_0.4.16        
##  [31] GenomeInfoDbData_1.2.13 irlba_2.3.5.1           listenv_0.9.1          
##  [34] spatstat.utils_3.1-3    goftest_1.2-3           spatstat.random_3.3-2  
##  [37] fitdistrplus_1.2-2      parallelly_1.42.0       svglite_2.1.3          
##  [40] leiden_0.4.3.1          codetools_0.2-20        DelayedArray_0.32.0    
##  [43] xml2_1.3.8              shape_1.4.6.1           tidyselect_1.2.1       
##  [46] UCSC.utils_1.2.0        farver_2.1.2            spatstat.explore_3.3-4 
##  [49] jsonlite_1.9.1          progressr_0.15.1        ggridges_0.5.6         
##  [52] survival_3.8-3          systemfonts_1.2.1       tools_4.4.3            
##  [55] ica_1.0-3               Rcpp_1.0.14             glue_1.8.0             
##  [58] gridExtra_2.3           SparseArray_1.6.2       xfun_0.51              
##  [61] withr_3.0.2             fastmap_1.2.0           digest_0.6.37          
##  [64] timechange_0.3.0        R6_2.6.1                mime_0.12              
##  [67] ggprism_1.0.6           colorspace_2.1-1        scattermore_1.2        
##  [70] tensor_1.5              spatstat.data_3.1-6     R.methodsS3_1.8.2      
##  [73] generics_0.1.3          data.table_1.17.0       httr_1.4.7             
##  [76] htmlwidgets_1.6.4       S4Arrays_1.6.0          uwot_0.2.3             
##  [79] pkgconfig_2.0.3         gtable_0.3.6            lmtest_0.9-40          
##  [82] XVector_0.46.0          htmltools_0.5.8.1       dotCall64_1.2          
##  [85] scales_1.3.0            png_0.1-8               snakecase_0.11.1       
##  [88] spatstat.univar_3.1-2   rstudioapi_0.17.1       tzdb_0.5.0             
##  [91] reshape2_1.4.4          nlme_3.1-167            GlobalOptions_0.1.2    
##  [94] cachem_1.1.0            zoo_1.8-14              KernSmooth_2.23-26     
##  [97] parallel_4.4.3          miniUI_0.1.1.1          vipor_0.4.7            
## [100] ggrastr_1.0.2           pillar_1.10.1           grid_4.4.3             
## [103] vctrs_0.6.5             RANN_2.6.2              promises_1.3.2         
## [106] xtable_1.8-4            cluster_2.1.8           paletteer_1.6.0        
## [109] beeswarm_0.4.0          evaluate_1.0.3          cli_3.6.4              
## [112] locfit_1.5-9.12         compiler_4.4.3          rlang_1.1.5            
## [115] crayon_1.5.3            future.apply_1.11.3     rematch2_2.1.2         
## [118] plyr_1.8.9              ggbeeswarm_0.7.2        stringi_1.8.4          
## [121] viridisLite_0.4.2       deldir_2.0-4            BiocParallel_1.40.0    
## [124] munsell_0.5.1           lazyeval_0.2.2          spatstat.geom_3.3-5    
## [127] Matrix_1.7-2            hms_1.1.3               future_1.34.0          
## [130] shiny_1.10.0            ROCR_1.0-11             igraph_2.1.4           
## [133] bslib_0.9.0