Mergeomics | A Web server for Identifying Pathological Pathways, Networks, and Key Regulators via Multidimensional Data Integration

Mergeomics is a flexible and streamlined pipeline that is able to retrieve meaningful biological insight from multiple types of omics data using multiple levels of analysis. Users can easily build their workflow based on their specific data and desired analysis. To facilitate one's analysis, we provide many different types of sample files including SNP to gene mapping, linkage disequilibrium files, and biological networks.

The two core functions of Mergeomics are marker set enrichment analysis (MSEA) and key driver analysis (KDA). Depending on the 7data type, there are slightly different considerations for MSEA, and so we have segmented the tutorial based on the specific data from the user. From MSEA to KDA, biological markers from significant marker sets found in MSEA and a network is input into KDA. The user can also run KDA as a first step using a list of markers (i.e genes) (tutorial in 'List(s) of genes' button below).

Sample input files can be found here. Sample outputs can be found here.

In addition to this tutorial, we have also embedded instructions throughout the pipeline workflow itself. Look for this button:

**Please save your session ID which can be copied by clicking on it at the top of the left sidebar so that you may return to your session at a later time or we recommend to enter your email when prompted to receive session information and results! (valid for 48 hours)

For an in-depth tutorial on how one can use Mergeomics, click below on the data type you have.

GWAS

TWAS/PWAS/EWAS/MWAS

Multiple of different or the same omics data

List(s) of genes

Preprocess data: marker dependency filtering (MDF).

Start pipeline. If you have a single GWAS study, click on 'Individual GWAS Enrichment'. A common preprocessing step of GWAS analysis is to correct for SNP dependencies based on linkage disequilibrium. We include marker dependency filtering (MDF) in our GWAS pipeline by default for this step which also maps SNPs to genes. You may skip this step in this workflow by clicking the "Skip MDF " button.

Upload or select association file. A two column file is required with SNPs in the 'MARKER' column and association strength (e.g., -log10 transformed p-values, effect size, etc.) in the 'VALUE' column. Many different GWAS sample files are provided including metabolic, neurological, and psychiatric disorders. When uploading a GWAS file, upload all associations including those that do not reach nominal significance.

File format

MARKER	VALUE
rs4747841	0.1452
rs4749917	0.1108
rs737656	1.3979

Example sample files provided

Type 2 diabetes

Alzheimer's disease

HDLC levels

Upload or select SNP to gene mapping file. SNPs in 'MARKER' column and mappped genes in 'GENE' column (i.e., using eQTLs, distance-based, etc.)

File format

GENE	MARKER
CDK6	rs10
AGER	rs1000
N4BP2	rs1000000

Example sample files provided

20kb Distance Mapping

GTEx tissue-specific eQTLs

RegulomeDB Mapping

Upload or select linkage disequilibrium file. A three column file is needed with SNPs in 'MARKERa' and 'MARKERb' columns and the correlation value in the 'WEIGHT' column. We provide 1000 Genomes linkage disequilibrium files from all 26 populations as sample linkage files. An example of a sample choice is "CEU LD50". LD50 means that it will filter out SNPs with a correlation of 50% or higher. The three letter code refers to the population which can be decoded here.

File format

MARKERa	MARKERb	WEIGHT
rs12565	rs29776	0.611
rs11804	rs29776	1
rs12138	rs12562	0.575

Example sample files provided

1000 Genomes LD Files

Choose top percentage of associations. Adjust accordingly based on the size of your GWAS. For example, try using 100 for small GWAS studies and 20 for large GWAS studies.

Review files and submit MDF job. Click 'Click to Review' to see files and parameters chosen. Enter an email and click 'Send Email' to receive emails when the job starts and when the job ends with a link to return to your session to continue onto MSEA. Check the spam folder if the email is missing. Then click 'Run MDF Pipeline' to submit the job. Depending on the size of the inputs, the analysis can range from 5 minutes to 30 minutes.

Run marker set enrichment analysis (MSEA)

Retrieve MDF results and continue to MSEA. The corrected association file, subsetted mapping file, and a review of the chosen files and parameters will be available for download. To continue to MSEA, click 'Run MSEA Pipeline'.

Select/upload gene sets. These are the gene sets that will be tested for association to the disease. Gene sets can be knowledge-driven canonical pathways or data-driven coexpression modules.

File format

MODULE	GENE
Cell cycle	CDC16
Cell cycle	ANAPC1
WGCNA Brown	XRCC5

Example sample files provided

Canonical pathways (KEGG, Reactome, BioCarta)

WGCNA Coexpression Modules

MEGENA Coexpression Modules

(Optional) Select/upload gene sets descriptions. An optional file to include in order to annotate modules in results files. The DESCR column has a full description of the MODULE. Minimum columns needed are MODULE and DESCR. If selecting a sample gene set, the descriptions file will be added automatically.

File format

MODULE	SOURCE	DESCR
Cell cycle	KEGG	Mitotic cell cycle progression is accomplished through a reproducible sequence of events - S, M, G1, and G2 phases.
WGCNA Brown	WGCNA Liver Coexpression Module	Immune function
Proteasome Pathway	BioCarta	https://www.gsea-msigdb.org/gsea/msigdb/cards/ BIOCARTA_PROTEASOME_PATHWAY

Choose MSEA parameters. Default parameters are recommended settings. A description of each parameter can be viewed upon clicking on the 'Click For Tutorial' button. Max Overlap for Merging Gene Mapping is the overlap ratio threshold for merging of genes with shared SNPs. Min Module Overlap Allowed for Merging is the minimum overlap ratio for which a module will remain independent. Modules with overlap ratios above this value will be merged. Number of Permutations: For formal analysis, 10,000 permutations should be used, and 2,000 can be set for exploratory analysis. MSEA to KDA export FDR cutoff: Modules with an FDR less than this cutoff will be used for key driver analysis (KDA). If no modules pass this significance, the top 10 pathways regardless of FDR will be export to KDA. The user must interpret the results accordingly.

Review files/parameters and submit MSEA job. Click 'Click to Review' to see files and parameters chosen. Click 'Run MSEA Pipeline' to submit the job. Depending on the size of the inputs and number of permutations, the analysis can range from 5 minutes to 2 hours. To speed up computation time, decrease the number of permutations. Click on the 'Click to see runtime joblog' toggle to see the job progress in real time. This will be available for download on the results page.

Retrieve MSEA results. At the conclusion of MSEA, a download results table with appear and if an email address was entered, these results will also be emailed to that address. The Module Details file lists genes mapped from SNPs of the modules and the association strengths of the SNPs (as given in the association input file). The Modules Results Summary file reports module significance and number of genes and markers that contributed to the module. The Merged Modules Results Summary file is the full results for supersets (similar modules merged) and independent modules (not similar to any other modules). The Merged Modules Nodes for KDA file lists the nodes or gene sets that will be used for KDA. The website displays individual module results and merged modules results (pictured is the display of individual module results and users can toggle to the 'Merge Module Results'). Below is the interpretation of those tables.

MSEA Results Interpretation

Field Name	Description
Module ID	Module id/gene set id from input gene set
MSEA:P-Value	Set enrichment p-value
MSEA:FDR	False discovery rate for set enrichment
Description	Gene set description, if included
Module Top Genes	Top five genes in the gene set with the highest values (e.g. lowest p-values) for the association study
Module Top Marker	Top five SNPs in the gene set with the highest values for the association study
Module Top Association Score	Top five highest values for the association study

Merged Modules Results Interpretation

Field Name	Description
Merge Module ID	New module id/gene set after merge
Merge Module P-value	Merged set enrichment p-value
Frequency	Equivalent to FDR
Number of Genes	Number of genes in the gene set after merging
Number of Markers	Number of association study markers in the merged module
Density	Number of markers per gene
Description	Functional description of the merged module

Choose to run wKDA or PharmOmics analysis (optional). Users can stop their analysis at MSEA or continue to wKDA (go to step 14) or PharmOmics (go to step 20) if there are significant results from MSEA (FDR less than 0.05 or 0.25 are the recommended significance levels). User can choose one route and still run the other by opening the MSEA toggle and clicking on the other analysis.

Run MSEA to key driver analysis (KDA)

Select/upload network file. The network file describes molecular connections. In a directed network, the source is in the 'HEAD' column and the target is in the 'TAIL' column. The network need not be directed or have weights (the 'WEIGHT' column can have all '1's).

File format

HEAD	TAIL	WEIGHT
A1BG	SNHG6	1
A1BG	UNC84A	1
A1CF	KIAA1958	1

Example sample files provided

Tissue-specific gene regulatory networks

Protein-protein interaction network

Transcription factors and targets

Choose KDA parameters. The search depth defines the distance (number of connections) away from a potential key driver that will be considered its local network neighborhood. A search depth of 1 is recommended but can be increased for sparse networks or small input gene lists. The edge type can be either 'Undirected' or 'Directed'; the former means that directionality (source and target designation) is not considered, and the latter means it is considered. The min overlap is the threshold above which hubs will be designated as co-hubs. The edge factor is the degree of influence of a edge weight. 0 is no influence (all weights are equal), 0.5 is partial influence, and 1 is full influence.

Review KDA files/parameters and submit job. Click 'Click to Review' to see files and parameters chosen. Click 'Run wKDA Pipeline' to submit the job. Depending on the size of the inputs, the analysis can range from 10 minutes to 2 hours.

KDA Results Interpretation

Field Name	Description
Merge Module ID	New module id/gene set after merge
Key Driver Node	Key driver genes in the merged module
P-value	Enrichment p-value for key driver genes
FDR	False discovery rate for the enrichment value of the key driver node
Module Genes	Total number of nodes in the gene module
KD Subnetwork Genes	Total number of genes in key driver subnetwork
Module and Subnetwork Overlap	Number of key driver neighbors that are members of the module
Fold Enrichment	Enrichment of KD subnetwork genes within the gene module

Run KDA to PharmOmics

18	(Optional) Select inputs for KDA to PharmOmics analysis. The user can conclude analysis at KDA or run drug based repositioning on KDA subnetwork results. Choose to run overlap based drug repositioning (around 5 min to run) or network based drug repositioning (around 30 minutes to 2 hours to run). Next, choose all genes from the generated subnetwork, only genes from input modules (i.e. gene sets) in the subnetwork (not including subnetwork genes that are not members of input modules), or the user can choose specific modules to include. For drug network repositioning, you additionally need to select or upload a network and select the species. Also for network drug repositioning, we offer only the option to query meta signatures (studies using different dose and treatment durations are meta-analyzed). You may use your results in our separate PharmOmics pipeline to run the dose/time segregated analysis which will take around 3 hours to complete and will require a login due to heavy use of computational resources.
19a	Retrieve PharmOmics results from overlap based drug repositioning. Image on the right shows results from the overlap based drug repositioning option. The results ranks the drugs based on concordance between the KDA subnetwork genes and the drug signatures genes (Jaccard score). Data from our comprehensive species- and tissue-specific drug signatures are used as well as L1000 signatures. The drug repositioning results and the genes used for repositioning can be obtained from the download links.
19b	Retrieve PharmOmics results from network based drug repositioning. Image on the right shows results from the network based drug repositioning option. The results ranks the drugs based on the significance of connectivity between drug signature genes and input genes as defined by a gene network model. Signatures from our comprehensive species- and tissue-specific drug signatures database are used. The drug repositioning results and the genes used for repositioning can be obtained from the download links. We also provide links to visualize the network model of drug genes and their first neighbor input genes (input genes having one edge distance from drug gene).

Run MSEA to PharmOmics

Select input modules and drug repositioning type. Choose to run overlap (around 5 min to run) or network based drug repositioning (around 30 minutes to 2 hours to run). Next, choose to run all modules below a specified significance threshold or select specific modules. Finally, choose whether to use genes from the entire geneset or only those genes that were mapped from SNPs (genes derived from association data). For drug network repositioning, you additionally need to select or upload a network and select the species. Also for network drug repositioning, we offer only the option to query meta signatures (studies using different dose and treatment durations are meta-analyzed). You may use your results in our separate PharmOmics pipeline to run the dose/time segregated analysis which will take around 3 hours to complete and will require a login due to heavy use of computational resources. Please refer to steps 19a and 19b to read interpretation of PharmOmics results.

The workflow for epigenome-, transcriptome-, proteome-, and metabolome-wide association studies is similar to GWAS except that it does not include a mapping file or marker dependency filtering by default though it is included as optional steps in this pipeline. This is because TWAS and PWAS markers do not need to be mapped to genes (if gene sets are to be tested). However, we leave the options to include a mapping file and run marker dependency filtering as epigenome markers such as CpG sites may be mapped to genes and also may have dependencies. Currently, we do not provide any metabolite marker sets or metabolite to gene mapping files, but the user can provide these files.

Run marker set enrichment analysis (MSEA).

Start pipeline. If you have a single TWAS, EWAS, PWAS, or MWAS study, click on 'Individual EWAS, TWAS, PWAS, or MWAS enrichment' on the Run Mergeomics page.

Upload or select association file. A two column file is required with markers (i.e. genes, methylation sites, etc.) in the 'MARKER' column and association strength (e.g., -log10 transformed p-values, fold change, etc.) in the 'VALUE' column. You may first upload all associations including those that do not reach nominal significance and adjust the cutoff of signals to include accordingly (top 50%, 20%, etc.).

File format

MARKER	VALUE
C1QA	5.348
GFAP	1.907
JUNB	0.425

Example sample files provided

Example EWAS

Example TWAS (DEGs set)

(Optional) Upload/select mapping file. If your markers do not match those of the marker sets to be enriched, then select 'Yes' to the question 'Would you like to use a mapping file?' and upload or select a mapping file. For example, we provide CG methylation probe ID to gene mappings and gene sets can subsequently be tested for enrichment. Transcriptome and proteome data usually do not require a mapping file. The required file format is pictured to the right ('MARKER' and 'GENE' columns).

(Optional) Upload marker dependency file. If there are dependencies between markers that may lead to spurious associations, a dependency file can be uploaded and if both constituents of any of the marker pairs in the dependency file is detected, the marker with the highest association value will be kept. This option only appears if a mapping file is used.

Select/upload gene sets. These are the gene sets that will be tested for association to the disease. Gene sets can be knowledge-driven canonical pathways or data-driven coexpression modules.

File format

MODULE	GENE
Cell cycle	CDC16
Cell cycle	ANAPC1
WGCNA Brown	XRCC5

Example sample files provided

Canonical pathways (KEGG, Reactome, BioCarta)

WGCNA Coexpression Modules

MEGENA Coexpression Modules

(Optional) Select/upload gene sets descriptions. An optional file to include in order to annotate modules in results files. The DESCR column has a full description of the MODULE. Minimum columns needed are MODULE and DESCR. If a sample gene set is chosen, the descriptions will be added automatically.

File format

MODULE	SOURCE	DESCR
Cell cycle	KEGG	Mitotic cell cycle progression is accomplished through a reproducible sequence of events - S, M, G1, and G2 phases.
WGCNA Brown	WGCNA Liver Coexpression Module	Immune function
Proteasome Pathway	BioCarta	https://www.gsea-msigdb.org/gsea/msigdb/cards/ BIOCARTA_PROTEASOME_PATHWAY

Choose MSEA parameters. Default parameters are recommended settings. A description of each parameter can be viewed upon clicking on the 'Click For Tutorial' button. For EWAS/TWAS/PWAS/MWAS data, the permutation type is set to "marker" and Max Overlap for Merging Gene Mapping is set to 1 (means no merging) (only applies to GWAS data). Min Module Overlap Allowed for Merging is the minimum overlap ratio for which a module will remain independent. Modules with overlap ratios above this value will be merged. Set to 1 to skip module merging. Number of Permutations: For formal analysis, 10,000 permutations should be used, and 2,000 can be set for exploratory analysis. MSEA to KDA export FDR cutoff: Modules with an FDR less than this cutoff will be used for key driver analysis (KDA). If no modules pass this significance, the top 10 pathways regardless of FDR will be export to KDA. The user must interpret the results accordingly.

MSEA Results Interpretation

Field Name	Description
Module ID	Module id/gene set id from input gene set
MSEA:P-Value	Set enrichment p-value
MSEA:FDR	False discovery rate for set enrichment
Description	Gene set description, if included
Module Top Genes	Top five genes in the gene set with the highest values (e.g. lowest p-values) for the association study
Module Top Marker	Top five SNPs in the gene set with the highest values for the association study
Module Top Association Score	Top five highest values for the association study

Merged Modules Results Interpretation

Field Name	Description
Merge Module ID	New module id/gene set after merge
Merge Module P-value	Merged set enrichment p-value
Frequency	Equivalent to FDR
Number of Genes	Number of genes in the gene set after merging
Number of Markers	Number of association study markers in the merged module
Density	Number of markers per gene
Description	Functional description of the merged module

Choose to run wKDA or PharmOmics analysis (optional). Users can stop their analysis at MSEA or continue to wKDA (go to step 11) or PharmOmics (go to step 17) if there are significant results from MSEA (FDR less than 0.05 or 0.25 are the recommended significance levels). Users can choose one route and still run the other by opening the MSEA toggle and clicking on the other analysis.

Run MSEA to key driver analysis (KDA)

File format

HEAD	TAIL	WEIGHT
A1BG	SNHG6	1
A1BG	UNC84A	1
A1CF	KIAA1958	1

Example sample files provided

Tissue-specific gene regulatory networks

Protein-protein interaction network

Transcription factors and targets

Choose KDA parameters. The 'search depth' defines the distance (number of connections) away from a potential key driver that will be considers its local network neighborhood. A search depth of 1 is recommended but can be increased for sparse networks or small input gene lists. The 'edge type' can be either 'Undirected' or 'Directed'; the former means that directionality (source and target designation) is not considered and the latter means it is considered. The 'min overlap' is the threshold above which hubs will be designated as co-hubs. The 'Edge factor' is the degree of influence of a edge weight. 0 is no influence (all weights are equal), 1 is full influence, and 0.5 is partial influence.

KDA Results Interpretation

Field Name	Description
Merge Module ID	New module id/gene set after merge
Key Driver Node	Key driver genes in the merged module
P-value	Enrichment p-value for key driver genes
FDR	False discovery rate for the enrichment value of the key driver node
Module Genes	Total number of nodes in the gene module
Module and Subnetwork Overlap	Number of key driver neighbors that are members of the module
Fold Enrichment	Enrichment of KD subnetwork genes within the gene module

Run MSEA to PharmOmics

15	(Optional) Select inputs for KDA to PharmOmics analysis. The user can conclude analysis at KDA or run drug based repositioning on KDA subnetwork results. Choose to run overlap based drug repositioning (around 5 minutes to run) or network based drug repositioning (around 30 minutes to 2 hours to run). Next, choose all genes from the generated subnetwork, only genes from input modules (i.e., gene sets) in the subnetwork (not including subnetwork genes that are not members of input modules), or the user can choose specific modules to include. For drug network repositioning, you additionally need to select or upload a network and select the species. Also for network drug repositioning, we offer only the option to query meta signatures (studies using different dose and treatment durations are meta-analyzed). You may use your results in our separate PharmOmics pipeline to run the dose/time segregated analysis which will take around 3 hours to complete and will require a login due to heavy use of computational resources.
16a	Retrieve PharmOmics results from overlap based drug repositioning. Image on the right shows results from the overlap based drug repositioning option. The results ranks the drugs based on concordance between the KDA subnetwork genes and the drug signatures genes (Jaccard score). Data from our comprehensive species- and tissue-specific drug signatures are used as well as L1000 signatures. The drug repositioning results and the genes used for repositioning can be obtained from the download links.
16b	Retrieve PharmOmics results from network based drug repositioning. Image on the right shows results from the network based drug repositioning option. The results ranks the drugs based on the significance of connectivity between drug signature genes and input genes as defined by a gene network model. Signatures from our comprehensive species- and tissue-specific drug signatures database are used. The drug repositioning results and the genes used for repositioning can be obtained from the download links. We also provide links to visualize the network model of drug genes and their first neighbor input genes (input genes having one edge distance from drug gene).

Run MSEA to PharmOmics

Select input modules and drug repositioning type. Choose to run overlap (around 5 min to run) or network based drug repositioning (around 30 minutes to 2 hours to run). Next, choose to run all modules below a specified significance threshold or select specific modules. For drug network repositioning, you additionally need to select or upload a network and select the species. Also for network drug repositioning, we offer only the option to query meta signatures (studies using different dose and treatment durations are meta-analyzed). You may use your results in our separate PharmOmics pipeline to run the dose/time segregated analysis which will take around 3 hours to complete and will require a login due to heavy use of computational resources. Please refer to steps 16a and 16b to read interpretation of PharmOmics results.

In meta-MSEA, a marker set level meta analysis is done for any combination of multiple omics studies (multiple of the same type or different types). MSEA is run separately on each dataset and a meta p-value is calculated across the datasets.

Add datasets and run meta-MSEA

1	Start pipeline. If you have multiple omics studies (multiple of the same type or multiple of different types) to analyze and would like to run a marker set level meta analysis, click on 'Multiple Omics Datasets (GWAS, EWAS, TWAS, PWAS, MWAS) enrichment'.
2	Choose type of association data to add. According to your data type, select 'GWAS Enrichment' or 'EWAS/TWAS/PWAS/MWAS Enrichment'. This separation exists because the preprocessing of and parameters for GWAS and EWAS/TWAS/PWAS/MWAS data for MSEA are slightly different.
3	Select/upload files and parameters for MSEA for each dataset. The workflow for adding files and parameters for GWAS and EWAS/TWAS/PWAS/MWAS data is the same as in the individual pipelines. Refer to the 'Run marker set enrichment analysis section' for GWAS and for EWAS/TWAS/PWAS/MWAS. You may also refer to the 'Data input details' and 'Parameters details' sections below.
4	Select parameters for MSEA for each dataset. "Gene" permutation is recommended for GWAS and "Marker" is the only option for EWAS/TWAS/PWAS/MWAS. "Max Overlap for Merging Gene Mapping" is the overlap ratio threshold for merging genes with shared markers (SNPs). This applied only to GWAS data and is set to 1 for EWAS/TWAS/PWAS/MWAS data. The "MSEA to KDA export FDR cutoff" is the cutoff for the individual MSEA to be considered for KDA. Set the value to 100 to have no cutoff apply for the specific dataset. The module/gene set will still need to pass the Meta-MSEA FDR cutoff.
5	Review datasets. Each dataset addition with be appended to this review table. Added datasets can be deleted and more datasets can be added by clicking on 'Add another association data'. The minimum number of datasets is 2 and the maximum is 5. Because MSEA is run on mutiple datasets, the runtime may be very long (an individual MSEA run is usually 5 minutes to 30 minutes). Runtime can be shortened by decreasing number of permutations.
6	Select marker sets and parameters for Meta-MSEA.. "Min Module Overlap Allowed for Merging" is the minimum gene overlap ratio between modules that will have them merged (to merge redundant modules). We recommend the default value. "MSEA to KDA export Meta FDR cutoff" is the meta FDR cutoff to consider modules/gene sets for KDA. The modules must pass all individual MSEA cutoffs as well as the meta FDR cutoff to be included in KDA. To run meta-MSEA, click on 'Run multiple omics datasets enrichment'.
7	Retrieve meta-MSEA results. Result files available for download include the 'Modules Details' file which lists the genes that contributed to the module's association, the 'Full Results File' which lists for each module the P and FDR values of association significance and the number of genes that contributed to the module's association. The 'Merged Modules Full Results' file shows the results for nonredundant sets (similar gene sets are merged based on 'Min Module Overlap Allowed for Merging' parameter). The 'Merged Modules' file contains the genes that will be used as input to KDA if the user would like to run wKDA directly using the 'Run wKDA Pipeline' button. The 'Meta MSEA Individual Study P and FDR Results' file lists for each individual MSEA the association P and FDR values. Each dataset has an ID which can be decoded with the "Meta MSEA File and Parameter Selection" File. The 'Individual MSEA Result Files' is a zip file containing full MSEA results for each study. Interpretation for the interactive result tables is below. You may conclude your analysis at MSEA or continue to KDA (step 6). Please note that the default meta FDR cut off for associated modules from meta-MSEA is 25% (MSEA to KDA export FDR cutoff parameter). If you would like to be more stringent or lenient in your inclusion of modules, you may either edit the 'Merged Modules' file to only include your desired modules according to the significance results and use them in the standalone wKDA pipeline or rerun the analysis with a more stringent cutoff. PharmOmics can also be run on results from MSEA by clicking on 'Run PharmOmics Pipeline' (go to step 14).

MSEA Results Interpretation

Field Name	Description
Module ID	Module id/gene set id from input gene set
MSEA:P-Value	Set enrichment p-value
MSEA:FDR	False discovery rate for set enrichment
Description	Gene set description, if included
Module Top Genes	Top five genes in the gene set with the highest values (e.g. lowest p-values) for the association study
Module Top Marker	Top five SNPs in the gene set with the highest values for the association study
Module Top Association Score	Top five highest values for the association study

Merged Modules Results Interpretation

Field Name	Description
Merge Module ID	New module id/gene set after merge
Merge Module P-value	Merged set enrichment p-value
Frequency	Equivalent to FDR
Number of Genes	Number of genes in the gene set after merging
Number of Markers	Number of association study markers in the merged module
Density	Number of markers per gene
Description	Functional description of the merged module

Combined Results Interpretation

Field Name	Description
Module	Module name
Description	Module description
P.values	P Values for individual MSEA studies (P-value=1 if study missing)
FDR.values	FDR Values for individual MSEA studies (FDR=1 if study missing)
Meta P	Meta P Value
Meta FDR	Meta FDR Value

Run Meta-MSEA to key driver analysis (KDA)

File format

HEAD	TAIL	WEIGHT
A1BG	SNHG6	1
A1BG	UNC84A	1
A1CF	KIAA1958	1

Example sample files provided

Tissue-specific gene regulatory networks

Protein-protein interaction network

Transcription factors and targets

KDA Results Interpretation

Field Name	Description
Merge Module ID	New module id/gene set after merge
Key Driver Node	Key driver genes in the merged module
P-value	Enrichment p-value for key driver genes
FDR	False discovery rate for the enrichment value of the key driver node
Module Genes	Total number of nodes in the gene module
Module and Subnetwork Overlap	Number of key driver neighbors that are members of the module
Fold Enrichment	Enrichment of KD subnetwork genes within the gene module

Run KDA to PharmOmics

12	(Optional) Select inputs for KDA to PharmOmics analysis. The user can conclude analysis at KDA or run drug based repositioning on KDA subnetwork results. Choose to run overlap (around 5 min to run) or network based drug repositioning (around 30 minutes to 2 hours to run). Next, choose all genes from the generated subnetwork, only genes from input modules (i.e. gene sets) in the subnetwork (not including subnetwork genes that are not members of input modules), or the user can choose specific modules to include. For drug network repositioning, you additionally need to select or upload a network and select the species. Also for network drug repositioning, we offer only the option to query meta signatures (studies using different dose and treatment durations are meta-analyzed). You may use your results in our separate PharmOmics pipeline to run the dose/time segregated analysis which will take around 3 hours to complete and will require a login due to heavy use of computational resources.
13a	Retrieve PharmOmics results from overlap based drug repositioning. Image on the right shows results from the overlap based drug repositioning option. The results ranks the drugs based on concordance between the KDA subnetwork genes and the drug signatures genes (Jaccard score). Data from our comprehensive species- and tissue-specific drug signatures are used as well as L1000 signatures. The drug repositioning results and the genes used for repositioning can be obtained from the download links.
13b	Retrieve PharmOmics results from network based drug repositioning. Image on the right shows results from the network based drug repositioning option. The results ranks the drugs based on the significance of connectivity between drug signature genes and input genes as defined by a gene network model. Signatures from our comprehensive species- and tissue-specific drug signatures database are used. The drug repositioning results and the genes used for repositioning can be obtained from the download links. We also provide links to visualize the network model of drug genes and their first neighbor input genes (input genes having one edge distance from drug gene).

Run Meta-MSEA to PharmOmics

Select input modules and drug repositioning type. Choose to run overlap (around 5 min to run) or network based drug repositioning (around 30 minutes to 2 hours to run). Next, choose to run all modules below a specified significance threshold or select specific modules. For drug network repositioning, you additionally need to select or upload a network and select the species. Also for network drug repositioning, we offer only the option to query meta signatures (studies using different dose and treatment durations are meta-analyzed). You may use your results in our separate PharmOmics pipeline to run the dose/time segregated analysis which will take around 3 hours to complete and will require a login due to heavy use of computational resources. Please refer to steps 11a and 11b to read interpretation of PharmOmics results.

For a list of genes with no corresponding association values, there are a number of analyses that can be run to derive biological meaning. One can test whether this set or sets of genes are enriched for an GWAS, TWAS, PWAS, or EWAS study. One can also use this list of genes as input for key driver analysis to see whether these genes are enriched in neighbors of key regulatory genes. Click below to see a tutorial of the different options.

Option 1 - Key driver analysis

Start pipeline. Click on 'Weighted Key Driver Analysis'.

Upload OR copy and paste genes/nodes. From the 'Please select option' drop down menu, click on 'Upload Gene Sets' to upload a tab delimited gene sets file with a 'MODULE' 'NODE' header where the 'MODULE' denotes the gene set name and 'NODE' contains the corresponding genes. If you want to test just one gene list, the'MODULE' column can have a single arbitrary name or you can click on the 'Input single list of genes' option to drag and drop a text file or click to copy and paste a list of genes (shown on the picture on the right). In the next step, a gene sets description file can be uploaded.

File format

HEAD	TAIL	WEIGHT
A1BG	SNHG6	1
A1BG	UNC84A	1
A1CF	KIAA1958	1

Select parameters for key driver analysis. The search depth defines the distance (number of connections) away from a potential key driver that will be considered its local network neighborhood. A search depth of 1 is recommended. The edge type can be either 'Undirected' or 'Directed'; the former means that directionality (source and target designation) is not considered, and the latter means it is considered. The min overlap is the threshold above which hubs will be designated as co-hubs. The Edge factor is the degree of influence of a edge weight. 0 is no influence (all weights are equal)1 is full influence, and 0.5 is partial influence.

Retrieve KDA results. At the conclusion of KDA, key driver results files and cytoscape files are generated. The 'Key Drivers Results' file lists for each key driver its P and FDR values, the module of which it is a key driver for, the total number of neighbors ('N.neigh'), the number of neighbors that are members of the module ('N.obsrv'), and the expected (null) number of neighbors that are members of the module as calculated by permutation. The cytoscape files can be used on Cytoscape Desktop for the user to customize the network visualization. Cytoscape visualization of the top 5 key drivers for each module can be viewed on the browser by clicking on 'Display KDA subnetwork'. The analysis can be concluded at this step or subnetwork genes can be used for drug repositioning in the PharmOmics pipeline by clicking on 'Run PharmOmics pipeline'. In the interactive result table, the 'Merge Module ID' is the module name after merging redundant modules, 'Key Driver Node' is the key driver gene for the module, the 'P-value' is the enrichment p-value of the key driver gene's neighbors for module members, FDR is the false discovery rate for the enrichment value, 'Module Genes' is the total number of nodes in the gene module, 'KD Subnetwork Genes' is the total number of genes in the key driver subnetwork, 'Module and Subnetwork Overlap' is the number of key driver neighbors that are members of the module, and 'Fold Enrichment' is the enrichment of KD subnetwork genes within the gene module.

Option 2 - Test enrichment for disease associations

Prepare gene sets file. Arrange your gene set(s) in a two column file with columns 'MODULE' and 'GENE'. If you have just a few gene sets or one gene set to test, it is advised to add them to a larger gene set list. You may download the sample canonical pathways (KEGG, Reactome, and BioCarta) from our resources page and add your geneset. The gene sets file format is shown on the right.

File format

MODULE	GENE
GeneSetOfInterest	CDC16
GeneSetOfInterest	EIF2AK2
GeneSetOfInterest	XRCC5

Choose pipeline to test gene sets in MSEA. Choose pipeline of interest depending on which disease study (GWAS, EWAS/TWAS/PWAS/MWAS, or multiple studies) you would like to test for association of your gene set. We provide many example GWAS files.

Upload gene sets. Upload your custom gene sets in the 'Marker Sets' input and select/upload remaining files and parameters (refer to embedded tutorials in the pipeline or the GWAS and EWAS/TWAS/PWAS/MWAS pipeline tutorials).

Retrieve MSEA results. In the MSEA results, you can see the degree of enrichment of your gene set(s). P-values, FDRs, and number of genes ('NGENES') from the association data contributed to the enrichment of the module are recorded. In the 'Module Details' file, the genes that contributed to the module's enrichment is recorded.

Option 3 - PharmOmics network or overlap drug repositioning

We have developed PharmOmics, a drug repositioning tool that utilizes an extensive curation of species- and tissue-specific in vivo and in vitro drug signatures from GEO, ArrayExpress, TG-GATEs, and drugMatrix. We offer network drug repositioning (App 2) and overlap-based drug repositioning (App 3) which also includes L1000 signatures. You may test your gene list for concordance with drug signatures in the PharmOmics pipeline. As with our MSEA pipeline, the tutorial is embedded in the pipeline, but you can also view more detail in the PharmOmics tutorial.

Data input details

All files are recommended to be in UTF-8/ASCII encoded format.

Association data (GWAS, EWAS, TWAS, PWAS): Mergeomics uses as input summary statistics of association studies. For GWAS/EWAS, summary statistics is usually freely available or can be requested. Users should download the entire file and extract the SNP/Epigenetic marker and P value columns, then -log10 transform the P values so that higher values indicate higher association. These two columns should then be labeled 'MARKER' and 'VALUE', respectively. For TWAS/PWAS data, in the 'MARKER' column is the gene/protein and in the 'VALUE' column it can be -log10 P value or fold change (differentially expressed genes and proteins). Importantly, the whole data should be input, not just nominally significant values
Gene sets ("marker sets"): These can be from canonical pathways or coexpression modules from WGCNA and MEGENA where the gene set name goes into the 'MODULE' column and the corresponding genes into the 'GENE' column. For example, for WGCNA, the output is multiple files for each module name (a color), each of which contains all the genes in the given module. The user will need to combine these files to form one tab delimited two-column .txt file, where the ‘MODULE’ column contains the WGCNA module names, e.g. Brown, and the ‘GENE’ column contains the constituent genes of each module. In other words, each gene in the ‘GENE’ column has its corresponding module name/color in the adjacent ‘MODULE’ column. If uploading a gene set, an option to upload gene sets descriptions will appear. This input is completely optional and links a description to the module if the user desires.
Networks: Networks can include gene regulatory, protein-protein interaction, transcription factor-target, and others. The file will need to contain the following columns: “HEAD”, “TAIL”, and “WEIGHT”. “HEAD” and “TAIL” indicate the pairs of connected genes by network edges in the network and for directed networks they represent source and target, respectively. “WEIGHT” refers to a measure of strength or confidence of the edge and every value can be set to “1” for an unweighted network.

Parameters details

Marker dependency filtering (MDF)

Linkage disequilibrium (LD) threshold: We recommend either 50% or 70% LD threshold and also recommend the user try at both 50% and 70%.
Top percentage of associations: We recommend 50%, but this can be reduced to 20% for larger studies or increased to 100% for smaller studies. We recommend the user to tune this parameter to see what fits best for their data.

Marker set enrichment analysis (MSEA)

Permutation type: "Gene" is recommended for GWAS to reduce bias from many markers (SNPs) mapping to the same gene. This is more stringent, however, and the user can still choose to run marker (SNP) based permutation. For EWAS/TWAS/PWAS/MWAS, "Marker" is the only permutation type that can be used as the above scenario does not apply.
Max Overlap for Merging Gene Mapping: This is the overlap ratio threshold for merging genes with shared markers (SNPs). Over this overlap ratio, the genes will be merged. We recommend the default value. This consideration does not apply to EWAS/TWAS/PWAS/MWAS so only 1 can be used which indicates no merging.
Min/Max Genes in Gene Sets: These parameters specify what is the minimum and maximum number of genes to be included in the analysis (will filter out modules with gene number < minimum and > maximum).
Max Overlap for Merging Gene Mapping: This is the overlap ratio threshold for merging genes with shared markers (SNPs). Over this overlap ratio, the genes will be merged. We recommend the default value. This consideration does not apply to EWAS/TWAS/PWAS/MWAS so only 1 can be used which indicates no merging.
Min Module Overlap Allowed for Merging: This is the minimum gene overlap ratio between modules that will have them merged (to merge redundant modules). For instance, for the default value of 0.33, the modules need to have an overlap ratio of 0.33 or greater to be merged. For less merging (more independent modules), a higher overlap threshold can be set (e.g., 0.5) and for more merging (less independent modules), a lower ratio can be set (e.g., 0.2). Set a value of 1 to skip module merging.
Number of permutations: We set the default to 2000 for an exploratory analysis and recommend this value to start with as the run time may be significantly increased with a higher permutation number. We recommend 10,000 for formal analysis.
MSEA to KDA export FDR cutoff: The parameter is for exporting results to KDA. Modules/gene sets with FDR less than this cutoff will be used for KDA. We recommend using 5 (FDR<0.05) for formal analysis but set the default to 25 (FDR<0.25) for exploratory analysis. If no modules pass this threshold, then the top 10 pathways will be used for KDA but please note if this is the case and interpret results accordingly.

Key Driver Analysis (KDA)

Search depth: Determines the search distance used to define the key driver's local network neighborhood (1 refers to 1 connection away). We recommend 1 but can be increased to 2 or 3 for more sparse networks or for smaller input gene lists.
Edge Type: If users would like to consider directionality (key driver gene must be upstream of target genes), choose "Directed", otherwise choose "Undirected".
Min Hub Overlap: The ratio threshold above which hubs will be considered co-hubs. Default value is recommended.
Edge factor: The degree of influence of the weights (in the 'WEIGHT' column). 1 is full influence, 0.5 is partial influence, and 0 is no influence. If an unweighted network is uploaded, users should choose 0.

Video Tutorials

Overview

File upload

Individual GWAS enrichment

Individual EWAS, TWAS, PWAS, or MWAS enrichment

Weighted Key Driver Analysis

To see descriptions on how the pipeline can be used, click on the use cases below.

Use case	Data	Analysis
1. What are the genetic mechanisms of autism spectrum disorder (ASD)? What are the key regulators of those gene sets associated with ASD? What drugs can be used to target these disease networks?	GWAS	MDF MSEA KDA PharmOmics

MDF

GWAS summary statistics can be pulled from a study on ASD (detailing SNPs and their association values, i.e. p-values). The file format for input into MSEA is a two column tab delimited text file with two columns: 'MARKER' and 'VALUE'. SNPs occupy the 'MARKER' value and any measure of association strength can be put into the 'VALUE' column (most commonly used is -log10 transformed p-values but another measure such as effect size can be used). A higher value should indicate stronger association (hence -log10 transformed p-values). A recommended preprocessing step for GWAS is marker dependency filtering (MDF) which corrects for linkage disequilibrium. We offer the 1000 Genomes linkage files and will match the linkage population to the population that was tested in the GWAS (e.g. CEU is central europeans; the population codes can be accessed here). To enrich SNPs for gene sets, a SNP to gene mapping file is used such as expression quantitative trait loci (eQTLs), and we choose to use brain tissue-specific eQTLs from GTEx. If running MDF, another parameter is to choose the percent top associations. We recommend 50% generally, but can be adjusted to 100% for small studies or 25% for large studies. If skipping MDF, the user will have to adjust the top percentage of associations to include before uploading their association data.

MSEA

After results are produced by MDF, we choose the inputs and parameters for MSEA. "Gene" based permutation is chosen to avoid bias from multiple markers mapping to the same gene (gene set could have high significance from a large amount of markers but those markers could map to just one gene). We may choose "marker" based permutation as an alternate, more lenient analysis. Since this is a an exploratory analysis, 1000 permutations are chosen and for formal analysis, 10000 number of permutations will be set which may take longer. Since we want to focus on the most significant pathways in the KDA, a MSEA to KDA export FDR cutoff of 5 is chosen (5% FDR). We leave the other parameters as the default values as this is recommended. More details of these parameters can be found above in the 'Parameters details' section. Finally, the marker sets (i.e. gene sets) chosen are the WGCNA and MEGENA coexpression modules for brain cortex made from GTEx expression data ('Brain Cortex Coexpression Modules'). In the results, enrichment P and FDR values are recorded for each gene set for genes mapped from SNPs.

KDA

From MSEA, we retrieve P and FDR values for each gene set. Modules from MSEA passing the FDR threshold to export results from MSEA to KDA will be undergo merging to combine potentially highly similar pathways and then be fed into wKDA. We leave most parameters to default values. Because we are using a dense network and have many genes (~500) as input into KDA, we leave the search depth to 1 to simplify the analysis and focus on more direct connections. We do not want to require the key driver to be upstream of the target gene, so we set the edge type to "Undirected". To have the edge weight to have partial influence on the analysis, we set edge weight to 0.5. We choose to run wKDA on a brain gene regulatory network on these significant modules. In the results, we identify key regulator genes of the modules based on network topology. A disease subnetwork is generated from the top 5 key drivers from each module (in the 'Cytoscape Edges' file).

PharmOmics

In the direct path from KDA to PharmOmics, the disease subnetwork genes can be used for overlap or network-based drug repositioning which can give gene network regulatory informed therapeutic insight into ASD. In overlap-based drug repositioning, the drugs are ranked based on the significance of the Jaccard score between the subnetwork genes and drug signature genes. In network-based drug repositioning, drugs are ranked based on the connectivity of their signature genes with the wKDA derived subnetworks as defined by a gene network model. Alternatively, if enough significant key drivers are identified, these key drivers can be used for drug repositioning.

2. What functional gene sets are enriched for differentially expressed genes between high sucrose treated mice and untreated mice and do they overlap with type 2 diabetes GWAS?

TWAS, GWAS

Meta-MSEA

3. Which cell type specific differentially expressed genes (DEGs) of beta-amyloid aggregation mouse models are enriched for Alzheimer's disease GWAS?

Gene lists, GWAS

MSEA

4. What genes are key regulators for protein folding genes in a brain gene regulatory network?

Single gene list

KDA

Mergeomics Tutorial

Data input details

Parameters details

Marker dependency filtering (MDF)

Marker set enrichment analysis (MSEA)

Key Driver Analysis (KDA)

Video Tutorials

Overview

File upload

Individual GWAS enrichment

Individual EWAS, TWAS, PWAS, or MWAS enrichment

Weighted Key Driver Analysis

MDF

MSEA

KDA

PharmOmics