| A Web Server for Multidimensional Data Integration
Marker Set Enrichment Analysis (MSEA)

Overview

This tutorial provides an overview for using our integrated MSEA-wKDA pipeline from [Run MSEA] tab. This tutorial illustrates the different data input steps and result display steps in the sequences as they appear for the user. Example screenshots are included to illustrate many of the steps.

The purpose of the pipeline is to take an association dataset for a given disease or phenotype from the user as input and integrate the association data with functional genomics information, pathways, and gene networks to derive pathways, gene networks and key regulatory genes for the disease or phenotype.

There are two main steps in the analysis: Marker Set Enrichment Analysis (MSEA) [1] and Weighted Key Driver Analysis(wKDA) [2-4]. MSEA aims to identify pathways or gene subnetworks that are enriched for genetic risks of the given disease/trait. wKDA takes the significant pathways and gene subnetworks identified from MSEA and integrates them with network models to identify the key regulators (drivers).

Steps 2-7 require the users to upload datasets or select pre-defined datasets needed for integration (detailed in Table 1). If uploading new data, the user needs to follow the format indicated below.

Table 1. Descriptions of data categories, format, and preloaded sample files

Data Category Data Category Description Format (Tab Delimited) Preloaded Sample File Name Sample File Description Sample Data References
Disease Association Data Marker to trait association Marker id, -log10p value [Example] Sample GWAS MDF-corrected LDL GWAS from GLGC [5]
glgc.tc Total cholesterol GWAS [6]
glgc.tg Triglyercid GWAS [6]
glgc.ldl LDL GWAS [6]
glgc.hdl HDL GWAS [6]
cardiogram_c4d.cad Coronary artery disease GWAS [7]
diagram.t2d Type 2 diabetes GWAS [8]
magic.fastingglucose Fasting glucose GWAS [9]
Marker Mapping Data Marker to gene mapping Marker id, gene symbol id [Example] esnp.all Combined list of all eQTLs curated from literature [6-21]
esnp.adipose Adipose eQTLs [10-13]
esnp.blood Blood eQTLs [10, 12, 14]
esnp.brain Brain eQTLs [15-18]
esnp.liver Liver eQTLs [10, 13, 19]
esnp.muscle_skeletal Skeletal muscle eQTLs [10]
gene2loci.010kb Human SNP to gene mapping based on a chromosomal distance of 10kb Null
gene2loci.020kb Human SNP to gene mapping based on a chromosomal distance of 20kb Null
gene2loci.050kb Human SNP to gene mapping based on a chromosomal distance of 50kb Null
gene2loci.regulome Human SNP to gene mapping based on RegulomeDB (ENCODE) [20]
all.mapping Combined list of all the above mapping Null
Gene Sets Collections of pre-defined sets of genes that are functionally related Gene symbol id, gene set id [Example] Canonical pathways Pathways collected from KEGG, REACTOME and Biocarta [21, 22]
Co-expression modules Derived from coexpression networks by applying WGCNA on gene expression data [12, 13, 15-19, 23]
Gene Sets Description Detailed descriptions of gene sets such as the full name of a biological pathway Gene set id, gene set description [Example] Canonical pathways Description of the pathways including pathway name and database source [21, 22]
Co-expression modules Description includes tissue type for the expression data [12, 13, 15-19, 23]
Gene Regulatory Netwrosk Network edges from pre-defined gene networks Source gene id, target gene id, weight [Example] adipose Adipose Bayesian networks [12, 15-19]
blood Blood Bayesian networks [12]
brain Brain Bayesian networks [15-18]
liver Liver Bayesian networks [15-19]
muscle Muscle Bayesian networks [15-19]
PPI Protein-protein interaction network [24]

Association Dataset for MSEA

  1. Select/Upload Association Data

    1. The menu gives the user the option to select either a sample association dataset or upload their own dataset.
    2. IMPORTANT: Press the submit button after selecting your option.
      • MARKER VALUE
        rs4747841 0.1452
        rs4749917 0.1108
        rs737656 1.3979
    3. If the user chooses to upload an association dataset, the user will be redirected to an upload page:
    4. After selecting the appropriate dataset (one is provided in the downloads section), click [Upload File] and make sure you see the "Data Submitted" checkmark:
    5. Click [Back to MSEA] after uploading your input file.

Marker Mapping File for MSEA

  1. Select/Upload Marker Mapping File

    1. Users can choose between uploading their own mapping files or using the sample mapping files.
    2. If user chooses to upload a mapping file, the user will be redirected to an upload page:
    3. Select/Upload Marker Mapping menu gives the user 14 mapping datasets. The datasets are described in Table 1. The user can select any combination of the mapping files (if more than one is selected, the mapping files are combined).
    4. Note: if using GWAS, the mapping file should have already corrected for LD (i.e. using MDF) to remove redundant SNPs that are in high LD for each gene. The preloaded mapping files have all been corrected for LD.

      • GENE MARKER
        CDK6 rs10
        AGER rs1000
        N4BP2 rs1000000

Parameters for MSEA

  1. Enter MSEA Parameters

    1. Enter the following MSEA parameter values and then click the submit button.
    2. Permutation type
      Options: Gene or Marker, indicating gene-based permutation or marker-based permutation to estimate statistical significance p-values. Gene-based permutation yields more conservative p-values than marker-based permutation.
      Default value: Gene
    3. Max Genes in Gene Sets: defines the maximum gene number that a gene set can have.
      Options: Number between 2 and 10,000; suggested between 200-800
      Default value: 500
    4. Min Genes in Gene Sets: defines the minimal gene number that a gene set can have.
      Options: Number between 2 and < Max Genes in Gene Sets
      Default value: 20
    5. Min Overlap Allowed for Merging: defines the minimum overlap ratio between gene sets if the user prefers to merge overlapping gene sets that are associated with the disease/trait as determined by MSEA into merged supersets.
      Options: 0 to 1.0
      Default value: 0.33 (33% overlap)
    6. Number of Permutations: the number of gene or marker permutations conudcted in the MSEA analysis
      Options: 1000 to 20,000 (for publication, recommend >= 10,000)
      Default value: 2000
    7. MSEA FDR cutoff: FDR should within the specified FDR cutoff.
      Options: Between 25 to 0
      Default value: 25

Gene Sets for MSEA

  1. Select/Upload Gene Sets

    1. Select/Upload Gene Sets menu gives the user three sample gene set datasets as described in Table 1. The first option in the menu is for uploading your own gene sets.
      • MODULE GENE
        rctm001 CDSF4
        rctm001 EIF2AK2
        M10401 XRCC5

Gene Sets Description for MSEA

  1. Select/Upload Gene Sets Description

    1. Select/Upload Gene Sets Description menu gives two sample description files and an option for uploading your own gene set description file. Gene set description describes the gene sets in Step 4.
      • MODULE SOURCE DESCR
        rctm001 reactome NS1 Mediated Effects on Host Pathways
        M10287 biocarta fMLP induced chemokine gene expression
        M10462 kegg Adipocytokine signaling pathway

Enter Email and Run

  1. Enter Email and Submit Job

    1. Enter your Email ID in the text box and press submit (Optional) if you prefer to get notification emails regarding job start and job completion. The job completion alert will also give you a link for you to download your results and provide the results as attachments. We will delete your e-mail id after job completion and this e-mail id will not be used for any further communication.
  2. Job Execution

    1. Wait for your results. Your job may take 30 minutes or more due to the complexity of integration. This page will load your results after execution is done. If you want to close your browser then please copy the link in this page to see your results at a later time. If you have provided your e-mail id then we will send you this link in the job completion e-mail alert.
  3. MSEA Pipeline Execution Email Notification

    1. If the user provided an email address, then an email notification is sent to the provided email with a link to the results page which will be active when the MSEA analysis is completed.
  4. MSEA Pipeline Completion Email Notification

    1. If the user provided an email address, then an email notification is sent to the provided email with a link to the results page upon completion of the MSEA analysis. The results link will remain active for 24 hours. Additionally, the results files will be included in the email as attachments.

Marker Set Enrichment Analysis Results

  1. Display MSEA Results

    1. Marker Set Enrichment Analysis Table lists the significant pathways/modules found to be enriched for your association data at your pre-defined FDR cutoff. The user can download the full result files containing all information from this page.
    2. Interpretation of Results
    3. Field Name Description
      Module ID Module id/gene set id from input gene set
      MSEA:P-Value Set enrichment p-value
      MSEA:FDR False discovery rate for set enrichment
      Description Gene set description
      Module Top Genes Top five genes in the gene set with the lowest p-values for the association study
      Module Top Marker Top five markers in the gene set with the lowest p-values for the association study
      Module Top Association Score Top five lowest p-values for the association study in -log10
      Module Details A web link that will load the gene set to DAVID for detailed functional annotations
    4. The Merged Supersets Table lists the significant supersets after merging any overlapping gene sets among the significant pathways/modules. Merging is done as a part of our MSEA analyses based on the parameter value at Step 3E.
    5. Interpretation of Merged Supersets
    6. Field Name Description
      Merge Module ID New module id/gene set after merge
      Merge Module P-value Merged set enrichment p-value
      Frequency Equivalent to FDR
      Number of Genes Number of genes in the gene set after merging
      Number of Markers Number of association study markers in the merged module
      Density Number of markers per gene
      Overlap List of overlapping gene sets merged
      Description Functional description of the merge module
    7. IMPORTANT: You can choose to stop here if this is all you need. Only if you want to continue to run wKDA click on "Run wKDA", and continue to the wKDA Tutorial.