Quick Links
Overview
This tutorial provides an overview for using our integrated MSEA-wKDA pipeline from [Run MSEA] tab. This tutorial illustrates the different data input steps and result display steps in the sequences as they appear for the user. Example screenshots are included to illustrate many of the steps.
The purpose of the pipeline is to take an association dataset for a given disease or phenotype from the user as input and integrate the association data with functional genomics information, pathways, and gene networks to derive pathways, gene networks and key regulatory genes for the disease or phenotype.
There are two main steps in the analysis: Marker Set Enrichment Analysis (MSEA) [1] and Weighted Key Driver Analysis(wKDA) [2-4]. MSEA aims to identify pathways or gene subnetworks that are enriched for genetic risks of the given disease/trait. wKDA takes the significant pathways and gene subnetworks identified from MSEA and integrates them with network models to identify the key regulators (drivers).
Steps 2-7 require the users to upload datasets or select pre-defined datasets needed for integration (detailed in Table 1). If uploading new data, the user needs to follow the format indicated below.
Table 1. Descriptions of data categories, format, and preloaded sample files
Data Category | Data Category Description | Format (Tab Delimited) | Preloaded Sample File Name | Sample File Description | Sample Data References |
Disease Association Data | Marker to trait association | Marker id, -log10p value [Example] | Sample GWAS | MDF-corrected LDL GWAS from GLGC | [5] |
glgc.tc | Total cholesterol GWAS | [6] | |||
glgc.tg | Triglyercid GWAS | [6] | |||
glgc.ldl | LDL GWAS | [6] | |||
glgc.hdl | HDL GWAS | [6] | |||
cardiogram_c4d.cad | Coronary artery disease GWAS | [7] | |||
diagram.t2d | Type 2 diabetes GWAS | [8] | |||
magic.fastingglucose | Fasting glucose GWAS | [9] | |||
Marker Mapping Data | Marker to gene mapping | Marker id, gene symbol id [Example] | esnp.all | Combined list of all eQTLs curated from literature | [6-21] |
esnp.adipose | Adipose eQTLs | [10-13] | |||
esnp.blood | Blood eQTLs | [10, 12, 14] | |||
esnp.brain | Brain eQTLs | [15-18] | |||
esnp.liver | Liver eQTLs | [10, 13, 19] | |||
esnp.muscle_skeletal | Skeletal muscle eQTLs | [10] | |||
gene2loci.010kb | Human SNP to gene mapping based on a chromosomal distance of 10kb | Null | |||
gene2loci.020kb | Human SNP to gene mapping based on a chromosomal distance of 20kb | Null | |||
gene2loci.050kb | Human SNP to gene mapping based on a chromosomal distance of 50kb | Null | |||
gene2loci.regulome | Human SNP to gene mapping based on RegulomeDB (ENCODE) | [20] | |||
all.mapping | Combined list of all the above mapping | Null | |||
Gene Sets | Collections of pre-defined sets of genes that are functionally related | Gene symbol id, gene set id [Example] | Canonical pathways | Pathways collected from KEGG, REACTOME and Biocarta | [21, 22] |
Co-expression modules | Derived from coexpression networks by applying WGCNA on gene expression data | [12, 13, 15-19, 23] | |||
Gene Sets Description | Detailed descriptions of gene sets such as the full name of a biological pathway | Gene set id, gene set description [Example] | Canonical pathways | Description of the pathways including pathway name and database source | [21, 22] |
Co-expression modules | Description includes tissue type for the expression data | [12, 13, 15-19, 23] | |||
Gene Regulatory Netwrosk | Network edges from pre-defined gene networks | Source gene id, target gene id, weight [Example] | adipose | Adipose Bayesian networks | [12, 15-19] |
blood | Blood Bayesian networks | [12] | |||
brain | Brain Bayesian networks | [15-18] | |||
liver | Liver Bayesian networks | [15-19] | |||
muscle | Muscle Bayesian networks | [15-19] | |||
PPI | Protein-protein interaction network | [24] |
Association Dataset for MSEA
Select/Upload Association Data
- The menu gives the user the option to select either a sample association dataset or upload their own dataset. IMPORTANT: Press the submit button after selecting your option.
MARKER VALUE rs4747841 0.1452 rs4749917 0.1108 rs737656 1.3979 - If the user chooses to upload an association dataset, the user will be redirected to an upload page:
- After selecting the appropriate dataset (one is provided in the downloads section), click [Upload File] and make sure you see the "Data Submitted" checkmark:
- Click [Back to MSEA] after uploading your input file.
Marker Mapping File for MSEA
Select/Upload Marker Mapping File
- Users can choose between uploading their own mapping files or using the sample mapping files.
- If user chooses to upload a mapping file, the user will be redirected to an upload page:
- Select/Upload Marker Mapping menu gives the user 14 mapping datasets. The datasets are described in Table 1. The user can select any combination of the mapping files (if more than one is selected, the mapping files are combined). Note: if using GWAS, the mapping file should have already corrected for LD (i.e. using MDF) to remove redundant SNPs that are in high LD for each gene. The preloaded mapping files have all been corrected for LD.
GENE MARKER CDK6 rs10 AGER rs1000 N4BP2 rs1000000
Parameters for MSEA
Enter MSEA Parameters
- Enter the following MSEA parameter values and then click the submit button.
-
Permutation type
Options: Gene or Marker, indicating gene-based permutation or marker-based permutation to estimate statistical significance p-values. Gene-based permutation yields more conservative p-values than marker-based permutation.
Default value: Gene -
Max Genes in Gene Sets: defines the maximum gene number that a gene set can have.
Options: Number between 2 and 10,000; suggested between 200-800
Default value: 500 -
Min Genes in Gene Sets: defines the minimal gene number that a gene set can have.
Options: Number between 2 and < Max Genes in Gene Sets
Default value: 20 -
Min Overlap Allowed for Merging: defines the minimum overlap ratio between gene sets if the user prefers to merge overlapping gene sets that are associated with the disease/trait as determined by MSEA into merged supersets.
Options: 0 to 1.0
Default value: 0.33 (33% overlap) -
Number of Permutations: the number of gene or marker permutations conudcted in the MSEA analysis
Options: 1000 to 20,000 (for publication, recommend >= 10,000)
Default value: 2000 -
MSEA FDR cutoff: FDR should within the specified FDR cutoff.
Options: Between 25 to 0
Default value: 25
Gene Sets for MSEA
Select/Upload Gene Sets
- Select/Upload Gene Sets menu gives the user three sample gene set datasets as described in Table 1. The first option in the menu is for uploading your own gene sets.
MODULE GENE rctm001 CDSF4 rctm001 EIF2AK2 M10401 XRCC5
Gene Sets Description for MSEA
Select/Upload Gene Sets Description
- Select/Upload Gene Sets Description menu gives two sample description files and an option for uploading your own gene set description file. Gene set description describes the gene sets in Step 4.
MODULE SOURCE DESCR rctm001 reactome NS1 Mediated Effects on Host Pathways M10287 biocarta fMLP induced chemokine gene expression M10462 kegg Adipocytokine signaling pathway
Enter Email and Run
Enter Email and Submit Job
- Enter your Email ID in the text box and press submit (Optional) if you prefer to get notification emails regarding job start and job completion. The job completion alert will also give you a link for you to download your results and provide the results as attachments. We will delete your e-mail id after job completion and this e-mail id will not be used for any further communication.
Job Execution
- Wait for your results. Your job may take 30 minutes or more due to the complexity of integration. This page will load your results after execution is done. If you want to close your browser then please copy the link in this page to see your results at a later time. If you have provided your e-mail id then we will send you this link in the job completion e-mail alert.
MSEA Pipeline Execution Email Notification
- If the user provided an email address, then an email notification is sent to the provided email with a link to the results page which will be active when the MSEA analysis is completed.
MSEA Pipeline Completion Email Notification
- If the user provided an email address, then an email notification is sent to the provided email with a link to the results page upon completion of the MSEA analysis. The results link will remain active for 24 hours. Additionally, the results files will be included in the email as attachments.
Marker Set Enrichment Analysis Results
Display MSEA Results
- Marker Set Enrichment Analysis Table lists the significant pathways/modules found to be enriched for your association data at your pre-defined FDR cutoff. The user can download the full result files containing all information from this page.
- Interpretation of Results
- The Merged Supersets Table lists the significant supersets after merging any overlapping gene sets among the significant pathways/modules. Merging is done as a part of our MSEA analyses based on the parameter value at Step 3E.
- Interpretation of Merged Supersets
- IMPORTANT: You can choose to stop here if this is all you need. Only if you want to continue to run wKDA click on "Run wKDA", and continue to the wKDA Tutorial.
Field Name | Description |
Module ID | Module id/gene set id from input gene set |
MSEA:P-Value | Set enrichment p-value |
MSEA:FDR | False discovery rate for set enrichment |
Description | Gene set description |
Module Top Genes | Top five genes in the gene set with the lowest p-values for the association study |
Module Top Marker | Top five markers in the gene set with the lowest p-values for the association study |
Module Top Association Score | Top five lowest p-values for the association study in -log10 |
Module Details | A web link that will load the gene set to DAVID for detailed functional annotations |
Field Name | Description |
Merge Module ID | New module id/gene set after merge |
Merge Module P-value | Merged set enrichment p-value |
Frequency | Equivalent to FDR |
Number of Genes | Number of genes in the gene set after merging |
Number of Markers | Number of association study markers in the merged module |
Density | Number of markers per gene |
Overlap | List of overlapping gene sets merged |
Description | Functional description of the merge module |