Marker Set Enrichment Analysis (MSEA)

Quick Links

Overview
MDF

MSEA
- Association
- Mapping
- Parameters
- Gene Sets
- Description
- Email/Submit
- Results
Meta MSEA
Additional Data Types
- Mouse GWAS
- Human EWAS
MSEA to wKDA
wKDA
Visualization
PharmOmics

Overview

This tutorial provides an overview for using our integrated MSEA-wKDA pipeline from [Run MSEA] tab. This tutorial illustrates the different data input steps and result display steps in the sequences as they appear for the user. Example screenshots are included to illustrate many of the steps.

The purpose of the pipeline is to take an association dataset for a given disease or phenotype from the user as input and integrate the association data with functional genomics information, pathways, and gene networks to derive pathways, gene networks and key regulatory genes for the disease or phenotype.

There are two main steps in the analysis: Marker Set Enrichment Analysis (MSEA) [1] and Weighted Key Driver Analysis(wKDA) [2-4]. MSEA aims to identify pathways or gene subnetworks that are enriched for genetic risks of the given disease/trait. wKDA takes the significant pathways and gene subnetworks identified from MSEA and integrates them with network models to identify the key regulators (drivers).

Steps 2-7 require the users to upload datasets or select pre-defined datasets needed for integration (detailed in Table 1). If uploading new data, the user needs to follow the format indicated below.

Table 1. Descriptions of data categories, format, and preloaded sample files

Data Category	Data Category Description	Format (Tab Delimited)	Preloaded Sample File Name	Sample File Description	Sample Data References
Disease Association Data	Marker to trait association	Marker id, -log₁₀p value [Example]	Sample GWAS	MDF-corrected LDL GWAS from GLGC	[5]
			glgc.tc	Total cholesterol GWAS	[6]
			glgc.tg	Triglyercid GWAS	[6]
			glgc.ldl	LDL GWAS	[6]
			glgc.hdl	HDL GWAS	[6]
			cardiogram_c4d.cad	Coronary artery disease GWAS	[7]
			diagram.t2d	Type 2 diabetes GWAS	[8]
			magic.fastingglucose	Fasting glucose GWAS	[9]
Marker Mapping Data	Marker to gene mapping	Marker id, gene symbol id [Example]	esnp.all	Combined list of all eQTLs curated from literature	[6-21]
			esnp.adipose	Adipose eQTLs	[10-13]
			esnp.blood	Blood eQTLs	[10, 12, 14]
			esnp.brain	Brain eQTLs	[15-18]
			esnp.liver	Liver eQTLs	[10, 13, 19]
			esnp.muscle_skeletal	Skeletal muscle eQTLs	[10]
			gene2loci.010kb	Human SNP to gene mapping based on a chromosomal distance of 10kb	Null
			gene2loci.020kb	Human SNP to gene mapping based on a chromosomal distance of 20kb	Null
			gene2loci.050kb	Human SNP to gene mapping based on a chromosomal distance of 50kb	Null
			gene2loci.regulome	Human SNP to gene mapping based on RegulomeDB (ENCODE)	[20]
			all.mapping	Combined list of all the above mapping	Null
Gene Sets	Collections of pre-defined sets of genes that are functionally related	Gene symbol id, gene set id [Example]	Canonical pathways	Pathways collected from KEGG, REACTOME and Biocarta	[21, 22]
Gene Sets		Gene symbol id, gene set id [Example]	Co-expression modules	Derived from coexpression networks by applying WGCNA on gene expression data	[12, 13, 15-19, 23]
Gene Sets Description	Detailed descriptions of gene sets such as the full name of a biological pathway	Gene set id, gene set description [Example]	Canonical pathways	Description of the pathways including pathway name and database source	[21, 22]
Gene Sets Description		Gene set id, gene set description [Example]	Co-expression modules	Description includes tissue type for the expression data	[12, 13, 15-19, 23]
Gene Regulatory Netwrosk	Network edges from pre-defined gene networks	Source gene id, target gene id, weight [Example]	adipose	Adipose Bayesian networks	[12, 15-19]
			blood	Blood Bayesian networks	[12]
			brain	Brain Bayesian networks	[15-18]
			liver	Liver Bayesian networks	[15-19]
			muscle	Muscle Bayesian networks	[15-19]
			PPI	Protein-protein interaction network	[24]

Association Dataset for MSEA

Select/Upload Association Data

The menu gives the user the option to select either a sample association dataset or upload their own dataset.

IMPORTANT:

MARKER	VALUE
rs4747841	0.1452
rs4749917	0.1108
rs737656	1.3979

If the user chooses to upload an association dataset, the user will be redirected to an upload page:

After selecting the appropriate dataset (one is provided in the downloads section), click [Upload File] and make sure you see the "Data Submitted" checkmark:

Click [Back to MSEA] after uploading your input file.

Marker Mapping File for MSEA

Select/Upload Marker Mapping File

Users can choose between uploading their own mapping files or using the sample mapping files.

If user chooses to upload a mapping file, the user will be redirected to an upload page:

Select/Upload Marker Mapping menu gives the user 14 mapping datasets. The datasets are described in Table 1. The user can select any combination of the mapping files (if more than one is selected, the mapping files are combined).

Note:

MDF

GENE	MARKER
CDK6	rs10
AGER	rs1000
N4BP2	rs1000000

Parameters for MSEA

Enter MSEA Parameters

Enter the following MSEA parameter values and then click the submit button.

Permutation type
Options: Gene or Marker, indicating gene-based permutation or marker-based permutation to estimate statistical significance p-values. Gene-based permutation yields more conservative p-values than marker-based permutation.
Default value: Gene
Max Genes in Gene Sets: defines the maximum gene number that a gene set can have.
Options: Number between 2 and 10,000; suggested between 200-800
Default value: 500
Min Genes in Gene Sets: defines the minimal gene number that a gene set can have.
Options: Number between 2 and < Max Genes in Gene Sets
Default value: 20
Min Overlap Allowed for Merging: defines the minimum overlap ratio between gene sets if the user prefers to merge overlapping gene sets that are associated with the disease/trait as determined by MSEA into merged supersets.
Options: 0 to 1.0
Default value: 0.33 (33% overlap)
Number of Permutations: the number of gene or marker permutations conudcted in the MSEA analysis
Options: 1000 to 20,000 (for publication, recommend >= 10,000)
Default value: 2000
MSEA FDR cutoff: FDR should within the specified FDR cutoff.
Options: Between 25 to 0
Default value: 25

Gene Sets for MSEA

Select/Upload Gene Sets

Select/Upload Gene Sets menu gives the user three sample gene set datasets as described in Table 1. The first option in the menu is for uploading your own gene sets.

MODULE	GENE
rctm001	CDSF4
rctm001	EIF2AK2
M10401	XRCC5

Gene Sets Description for MSEA

Select/Upload Gene Sets Description

Select/Upload Gene Sets Description menu gives two sample description files and an option for uploading your own gene set description file. Gene set description describes the gene sets in Step 4.

MODULE	SOURCE	DESCR
rctm001	reactome	NS1 Mediated Effects on Host Pathways
M10287	biocarta	fMLP induced chemokine gene expression
M10462	kegg	Adipocytokine signaling pathway

Enter Email and Run

Enter Email and Submit Job

Enter your Email ID in the text box and press submit (Optional) if you prefer to get notification emails regarding job start and job completion. The job completion alert will also give you a link for you to download your results and provide the results as attachments. We will delete your e-mail id after job completion and this e-mail id will not be used for any further communication.

Job Execution

Wait for your results. Your job may take 30 minutes or more due to the complexity of integration. This page will load your results after execution is done. If you want to close your browser then please copy the link in this page to see your results at a later time. If you have provided your e-mail id then we will send you this link in the job completion e-mail alert.

MSEA Pipeline Execution Email Notification

If the user provided an email address, then an email notification is sent to the provided email with a link to the results page which will be active when the MSEA analysis is completed.

MSEA Pipeline Completion Email Notification

If the user provided an email address, then an email notification is sent to the provided email with a link to the results page upon completion of the MSEA analysis. The results link will remain active for 24 hours. Additionally, the results files will be included in the email as attachments.

Marker Set Enrichment Analysis Results

Display MSEA Results

Marker Set Enrichment Analysis Table lists the significant pathways/modules found to be enriched for your association data at your pre-defined FDR cutoff. The user can download the full result files containing all information from this page.

Interpretation of Results

Field Name	Description
Module ID	Module id/gene set id from input gene set
MSEA:P-Value	Set enrichment p-value
MSEA:FDR	False discovery rate for set enrichment
Description	Gene set description
Module Top Genes	Top five genes in the gene set with the lowest p-values for the association study
Module Top Marker	Top five markers in the gene set with the lowest p-values for the association study
Module Top Association Score	Top five lowest p-values for the association study in -log₁₀
Module Details	A web link that will load the gene set to DAVID for detailed functional annotations

The Merged Supersets Table lists the significant supersets after merging any overlapping gene sets among the significant pathways/modules. Merging is done as a part of our MSEA analyses based on the parameter value at Step 3E.

Interpretation of Merged Supersets

Field Name	Description
Merge Module ID	New module id/gene set after merge
Merge Module P-value	Merged set enrichment p-value
Frequency	Equivalent to FDR
Number of Genes	Number of genes in the gene set after merging
Number of Markers	Number of association study markers in the merged module
Density	Number of markers per gene
Overlap	List of overlapping gene sets merged
Description	Functional description of the merge module

IMPORTANT: You can choose to stop here if this is all you need. Only if you want to continue to run wKDA click on "Run wKDA", and continue to the wKDA Tutorial.

Quick Links

Overview

Table 1. Descriptions of data categories, format, and preloaded sample files

Association Dataset for MSEA

Select/Upload Association Data

Marker Mapping File for MSEA

Select/Upload Marker Mapping File

Parameters for MSEA

Enter MSEA Parameters

Gene Sets for MSEA

Select/Upload Gene Sets

Gene Sets Description for MSEA

Select/Upload Gene Sets Description

Enter Email and Run

Enter Email and Submit Job

Job Execution

MSEA Pipeline Execution Email Notification

MSEA Pipeline Completion Email Notification

Marker Set Enrichment Analysis Results

Display MSEA Results