scLAB Documentation
Getting Started
This guide will help you get up and running with single-cell RNA-seq analysis (scRNA-seq). scLAB provides a complete workflow for analyzing your single-cell data without any coding required.
Loading Data
scLAB supports two types of scRNA-seq data files:
1. Cellranger Output h5 Files
Load raw count data from CellRanger output. Alternatively, you can load multiple h5 files (multiple samples) and scLAB will concatenate them.
2. Processed h5ad Files
Load already-processed AnnData files (h5ad format) for visualization and secondary analysis.
Standard Analysis Modules
Informed by Scanpy, scLAB supports the following quality control workflow:
1. Quality Control (QC)
Divided into basic and advanced quality control:
- Basic quality control: Filter cells based on minimum genes per cells
and minimum cells per gene thresholds:
- Minimum genes per cell: Remove cells with too few genes (default: 200)
- Minimum cells per gene: Remove genes expressed in too few cells (default: 3)
- Advanced quality control: Calculate standard quality metrics:
- n_genes_by_counts: Total number of genes expressed in each cell
- total_counts: Total counts (UMIs) per cell
- (Optional) pct_counts_mt: Percentage of mitochondrial gene counts per cell
- (Optional) rb_counts_mt: Percentage of ribosomal gene counts per cell
- (Optional) hb_counts_mt: Percentage of hemoglobin gene counts per cell
2. Normalization
Normalize gene expression to account for sequencing depth differences between cells. scLAB uses standard log-normalization. Default is 10,000 counts per cell. Users can choose if they want to normalize to the median total counts per cell as well.
3. Feature Selection
Identify highly variable genes that drive biological variation in your dataset. Default: selects top 2000 highly variable genes. Option to identify highly variable genes by accounting for batch effects.
4. Regression
Regress out unwanted variation from your dataset (e.g. techincal effect).
5. Principal Component Analysis (PCA)
Perform linear dimensionality reduction using PCA. Default: 50 components. Option to scale data (recommended when performing PCA). If data is not scaled, PCA will run on normalized data.
6. Non-linear dimensionality Reduction (UMAP/tSNE)
Perform UMAP/tSNE dimensionality algorithms to create a 2D visualization of \ your cells based on gene expression similarity. Assumed that PCA was performed prior to this.
- UMAP parameters:
- Number of neighbors: size of local neighborhoods used to learn manifold structure. It controls the balance between local and global structure. Default: 15.
- Minimum distance: controls how tightly points are packed together in the low dimensional space. Default: 0.1.
- tSNE parameters:
- Perplexity: controls balance between local and global structure. Default: 30.
- Early exaggeration: helps tSNE find better global structure. Default: 12.
- Learning rate: controls how far points move during each step of optimization. Default: 1000.
7. Clustering
Perform clustering using Leiden algorithm. Adjust resolution between 0.1-2. Make sure to perform either UMAP/tSNE first to view clustering results on computed embeddings.
Advanced Analysis Modules
Cell-Level Tasks
Available tasks:
- Label clusters: give custom names to clusters identified from clustering algorithm. scLAB automatically detects column names that start with "leiden_", which can used to provide custom names to. If your column names are different, you can rename the columns using Dataset Manager module. You can name the new column consisting of custom names; however, it is recommended that you name the column starting with "leiden_" for scLAB to be able to recognize the column for other tasks such as differential gene expression analysis.
- Subset by cell metadata: subset your dataset by columns in your .obs of your AnnData. scLAB automatically detects the data types of each column in .obs and provide a way to efficiently subset them. E.g. if your dataset contain 3 samples, you may subset by selecting desired samples. If you wish to subset by numerical values, scLAB also provides a threshold for subsetting. Subsetted datasets will be saved to memory and available for exports via Export Datasets module.
- Subset by gene expression: users may wish to subset their dataset give single- or multi-genes expression thresholds. Subsetted datasets will be saved to memory and available for exports via Export Datasets module.
- Annotate cells: similar to subset by gene expression, this subtask allows users to annotate cells by gene expression threshold, essentially adding a boolean column to .obs containing values of True or False, pertaining to the chosen gene expression criteria. For example, users may use this to annotate cells that are positive for gene X. This method does not subset your dataset, but add a column of True/False in .obs.
Gene-Level Tasks
Available tasks:
- Differential gene expression: perform differential gene expression analysis. scLAB
automatically detects columns starting with "leiden_" to be available to perform differential expression analysis.
You can the names of your columns by going to Dataset Manager module. Available parameters:
- Method: Wilcoxon (default), t-test, overestimated t-test
- Matrix used: raw count (default), normalized values
- Co-expression analysis: perform correlation analysis between two genes using spearman or pearson method.
Visualization
scLAB provides interactive visualizations powered by ECharts:
Gene Expression Plots
Visualize cells in 2D space colored by gene expression. Plots available for export as png.
Cluster & Cell Groups Plots
Visualize cells in 2D space colored by clusters and cell groups within .obs of your dataset. Plots available for export as png.
Quality Metrics Plots
Visualize quality metrics violin plots performed on your dataset. This won't be available if you haven't performed standard QC methods using Scanpy's calculate_qc_metrics function. To do this in scLAB, navigate to the Quality Control module.