scGRNdb Tutorials

Overview

scGRNdb provides 3 core functions:

Explore Networks: Browse available cell type GRNs and query your genes of interest.
Module-Pathway Enrichment Pipeline: Identify modular components in your own network and functionally annotate them with pathway and disease databases.
Network Prioritization Pipeline: Query your genes of interest against the scGRNdb database to find cell type GRNs and driver genes that best model your data.

Note: All pipelines require gene names in HGNC (human) or MGI (mouse) symbol format.

Explore Networks

This section allows you to browse and visualize available gene regulatory networks in scGRNdb. You can also compare networks for similar network structures.

Step 1: Browse Available Networks

The network table provides all the available GRNs in scGRNdb. You can filter networks by species (human or mouse), tissue, and cell type, as well as the scRNAseq data atlas used to generate the network. Once you selected your search criteria, click on up to 2 table entries, and the input forms will automatically populate with your choices.

After selecting the networks, choose how you would like to visualize the network under "Network Scope". "Direct Neighbors" allows you to input a list or file of genes to visualize in the networks. "Full Network" allows you to visualize the entire network at the module level, where you can explore the functional pathways of those modules.

Note: The input genes musts be in HGNC (human) or MGI (mouse) symbol format, corresponding to the species of the network.

Example Input Gene File

GENE1

GENE2

GENE3

GENE4

GENE5

GENE6

GENE7

GENE8

Step 2: Visualize the Network

Direct Neighbors

If you selected "Direct Neighbors" in the previous step, you will be able to visualize the network of your input genes. The size of the nodes (genes) will be proportional to their degree in the network, and the edges will be weighted by the edge weight.

There are a couple of styling options available:

Adjust node size based on degree.
Filter edges by weight.
Refresh layout whenever you adjust the filtering parameters.
Expand the search depth from your input genes network.

Full Network

If you selected "Full Network" in the previous step, you will be able to visualize the entire network at the module level. The size of the nodes (modules) will be proportional to the number of genes in the module, and the edges will be weighted by the number of outgoing edges from the module.

You can search for your genes of interest by supplying a list or txt file of genes in the same format as the Step 1 input genes. Modules that contain your genes will be highlighted in the network, and you can click them to explore in more detail. Double-click on any module to view the genes in the it, as well as its associated functional pathways and diseases. This will display a second network visualization below, similar to the Direct Neighbors visualization.

There are a couple of styling options available:

Adjust node size based on the number of genes in the module.
Filter edges by weight.
Refresh layout whenever you adjust the filtering parameters.

Compare Networks

When 2 networks are selected in either the "Direct Neighbors" or "Full Network" visualizations, you can compare their network structures. Click the "Combine Networks" button will combine both networks. The first network's nodes and edges will be colored in red, the second in blue, and the shared in green. The similar styling options are available as the "Direct Neighbors" visualization.

Module-Pathway Enrichment Pipeline

This pipeline helps you identify functional modules in your gene network and connect them to biological pathways and disease gene signatures.

Step 1: Select a Network

The main input for the analysis is a network file. It can be a network that you generated or a sample network provided. The network file should have the following columms:

HEAD: Source gene (HGNC/MGI symbol)
TAIL: Target gene (HGNC/MGI symbol)
WEIGHT: Edge weight (numeric value)

Example Network File

HEAD	TAIL	WEIGHT
GENE1	GENE2	0.1
GENE1	GENE3	1
GENE4	GENE3	0.75

Step 2: Select Module Parameters

The module detection algorithm will find the densely connected subgraphs in the network. It is based on Leiden clustering, which detects communities in the network by optimizing a modularity score. Since we want to analyze the function of the genes within the modules, we need to control the number of genes in the modules to provide interpretable pathway enrichment results. To do this, you can set the following parameters:

Minimum module size (recommended default: 10 genes)
Maximum module size (recommended default: 300 genes)

Step 3: Select Species and Pathway Databases

The next step is to select the species (human or mouse) and pathway databases for enrichment analysis. We have collected a list of pathway databases for each species, and you can select one or more of them. For sample data, we recommend starting with GO Biological Process and DisGeNET.

Pathway Databases

GO Biological Process

KEGG

GO Cellular Component

Reactome

GO Molecular Function

Biocarta

DisGeNET

GWAS Catalog

Step 4 (Optional): Provide your Email

If you provide your email, you will receive an email notification when the analysis is complete. If you do not provide an email, remember to save your sessionID to retrieve your results later.

Step 5: Submit and Monitor

Click Submit to start the analysis. You can monitor the progress of the analysis in the Review Files tab. If you provided an email, you will receive an email notification when the analysis is complete.

Step 6: Explore Results

Downloads

Once the analysis is complete, you can download the results in the downloads table:

Modules - A .txt file listing all genes and their associated modules
Pathway Enrichment - A .txt file with the full enrichment analysis results

Pathway Enrichment Table

The Pathway Enrichment file is also displayed as a table below the downloads table. Here is a description of the columns:

ID - Unique ID
MODULE ID - Identifier for each module
PATHWAY - Name of the enriched pathway
PATHWAY SOURCE - Database used for pathway enrichment
P - P-value calculated using the hypergeometric test
FDR - False Discovery Rate
RISK RATIO - Enrichment score
module_size - Number of genes in the module
pathway_size - Number of genes in the pathway
overlap - Number of overlapping genes between the module and the pathway

You can filter the results by module ID, size, or overlap. You can also adjust the number of rows displayed per page and download the entire table.

Step 7: Key Driver Analysis

Review the results table and choose pathways for further Key Driver Analysis (KDA). You can select as many pathways as you want. When you click "Prepare KDA", you will be redirected to the KDA tab, where you can review pathways you selected. Then, you can click Run KDA, which will take you to the KDA analysis page on Mergeomics Web Server. Your session will carry over to the Mergeomics Web Server with all input files and recommended parameters already set. All you will need to do is provide your email and click submit.

We recommend the default parameters for KDA. More details about the KDA parameters can be found on the Mergeomics tutorial page.

Network Prioritization Pipeline

This pipeline allows you to model your gene set against cell type GRNs in scGRNdb and identify the cell type specific mechanisms that best explain your data.

Step 1: Prepare Your Gene Set File

The main input for the analysis is one or more gene sets. If you have one gene set, you can provide it as a comma-separated list of genes, or as txt file. If you have multiple gene sets, you can provide it as a txt file. If you provide a txt file, it should have the following columns:

genes: Gene names (HGNC/MGI symbol)
module: Gene set name

Note: Your gene set should contain at least 10 genes for interpretable results.

Example Gene Set File

genes	module
GENE1	module1
GENE2	module1
GENE3	module1
GENE1	module2
GENE4	module2
GENE3	module3

Step 2: Select Species and Atlas

The next step is to select the species (human or mouse) and their corresponding scRNAseq atlases used to generate the GRNs. You can select as many atlases as you want. For brain tissues, we recommend selecting any of the Allen Brain Atlases. For other tissues, we recommend Tabula Sapiens and GTEx for human and Tabula Muris for mouse.

scRNAseq Data Atlases

Human	Mouse
Allen Brain Atlas (10X)	Allen Brain Atlas (10X)
Allen Brain Atlas (SmartSeq)	Allen Brain Atlas (SmartSeq)
Tabula Sapiens	Tabula Muris (10X)
Human Cell Landscape	Tabula Muris (SmartSeq)
GTEx	Tabula Muris Senis (10X)
	Tabula Muris Senis (SmartSeq)
	Mouse Cell Atlas

Human

Mouse

Allen Brain Atlas (10X)

Allen Brain Atlas (SmartSeq)

Tabula Sapiens

Tabula Muris (10X)

Human Cell Landscape

Tabula Muris (SmartSeq)

GTEx

Tabula Muris Senis (10X)

Tabula Muris Senis (SmartSeq)

Mouse Cell Atlas

Step 3 (Optional): Provide your Email

If you provide your email, you will receive an email notification when the analysis is complete. If you do not provide an email, remember to save your sessionID to retrieve your results later.

Step 4: Submit and Monitor

Step 5: Explore Results

Downloads

Once the analysis is complete, you can download the results in the downloads table:

Enriched Networks - A .txt file listing the gene set and their enriched networks.

Enriched Networks Table

The Enriched Networks file is also displayed as a table below the downloads table. Here is a description of the columns:

ID - Unique ID
GENESET - Name of gene set
NETWORK TISSUE - Tissue of the enriched GRN
NETWORK CELLTYPE - Cell Type of the enriched GRN
NETWORK MODULE - GRN subnetwork with enrichment for gene set
NETWORK CELLTYPE - scGRNdb cell atlas of the enriched GRN
P - P-value calculated using the hypergeometric test
FDR - False Discovery Rate
RISK RATIO - Enrichment score
GENESET SIZE - Number of genes in the gene set
NETWORK MODULE SIZE - Number of genes in the GRN module
OVERLAP - Number of overlapping genes between the GRN module and the gene set

You can filter the results by any column. The most common filtering would be to identify your geneset, sort the FDR column, filter to any tissue or cell type of interest, and select your networks.

Step 6: Key Driver Analysis

Review the results table and choose networks for Key Driver Analysis (KDA). You can select as many networks as you want. When you click "Prepare KDA", you will be redirected to the KDA tab, where you can review networks you selected. Each unique network will have its own KDA run, and their mapped genesets will be combined into one file. Then, you can click Run KDA, which will take you to the KDA analysis page on Mergeomics Web Server. Your session will carry over to the Mergeomics Web Server with all input files and recommended parameters already set. All you will need to do is provide your email and click submit.

We recommend the default parameters for KDA. More details about the KDA parameters can be found on the Mergeomics tutorial page.