| A Web Server for Multidimensional Data Integration
Marker Dependency Filtering (MDF)

Overview

Before running the Mergeomics pipeline, if users are providing their own association data (i.e. GWAS, EWAS, etc.), we recommend that users utilize the provided MD Prune script on their on association data to account for any dependencies (if known) between markers (i.e. LD in GWAS).

NOTE: Currently, the webserver only provides sample files for LD in CEU GWAS populations for a number of different LD thresholds. If you are using a different population, species, or data type, then you will need to upload your own marker dependency file. However, it is important to note that if your file is larger than 400MB, it cannot be uploaded to our server so you will have to run the MDF yourself using the provided script and the directions below.

MDF Webserver Module

  1. Upload Association Data

    1. Users must select or provide an Association Data file that gives the correlation of markers with the specific phenotype/disease (-log10 p value) and follows the format specified in Table 1. For demonstration purposes, if users would like to download the sample GWAS file (which is the same LDL GWAS file displayed below) and then upload it to complete the tutorial that is feasible as well.
      • MARKER VALUE
        rs4747841 0.1452
        rs4749917 0.1108
        rs737656 1.3979
  2. Upload Marker Mapping File

    1. Users must select or provide a Marker Mapping file that maps each marker in the association file to a specific gene and follows the format specified in Table 1. For demonstration purposes, if users would like to download the sample Mapping file (which is the same 50kb distance mapping displayed below) and then upload it to complete the tutorial this is feasible as well.
      • GENE MARKER
        CDK6 rs10
        AGER rs1000
        N4BP2 rs1000000
  3. Select/Upload Marker Dependency File

    1. Users select from pre-uploaded Marker Dependency files or provide their own. The provided files are a selection of LD files for GWAS data for different LD cutoffs in the CEU population. Additional LD files can be obtained from HapMap. These files give the dependency of the different markers on eachother and must follow the format specified in Table 1.
      • MARKERa MARKERb WEIGHT
        rs12565 rs29776 0.611
        rs11804 rs29776 1
        rs12138 rs12562 0.575
    2. If the user chooses to upload an association dataset, the user will be redirected to an upload page:
    3. After selecting the appropriate dataset, click [Upload File] and make sure you see the "Data Submitted" checkmark:
    4. Click [Back to Marker Dependency Filtering] after uploading your input file.

    5. NOTE: There is a file size upload limit of 400MB (if the Marker Dependency file you want to use is larger than this, please follow the tutoral on how to use the local version.)
  4. Select Percentage of Top Markers

    1. To speed computation time, we can select a certain percentage of our markers to be considered in the Marker Dependency Filtering. This filtering is done based on percentage of top markers, as sorted by p-value (in the Marker Association file).
      Default value: 50%
  5. Enter Email and Run

    1. Enter Your Email ID in the text box and press submit (Optional) if you prefer to get notification emails regarding job start and job completion. The job completion alert will also give you a link for you to download your results and provide the results as attachments. We will delete your e-mail id after job completion and this e-mail id will not be used for any further communication.
    2. Click on Run MDF Button
  6. Job Execution

    1. Wait for your results. Your job may take 30 minutes or more. This page will load your results after execution is done. If you want to close your browser then please copy the link in this page to see your results at a later time. If you have provided your e-mail id then we will send you this link in the job completion e-mail alert.
  7. Results

    1. You can continue directly to the MSEA pipeline using the resulting MDF-corrected association and mapping files by clicking the [Run MSEA] button.

    2. You can download the MDF-corrected association and mapping files using the corresponding download links on the results page. These files can then be uploaded while running the MSEA or Meta MSEA pipelines.

Download MDF Script and Run Locally

  1. Users must first download the MD Prune script and corresponding bash file from the Downloads section. The MD Prune script calls the bash script, and only the file names in the bash script need to be modified.
  2. Users must change the path of the MARFILE="../resources/gwas/CAD2.new.txt" to the pathway to their association (i.e. GWAS) file. Information on the association file and the required file format is located in Table 1. Additionally, a sample association file can be obtained from here in the Downloads section.
  3. Users must change the path of the GENFILE="../resources/mapping/gene2loci.020kb.txt" to the pathway to their mapping file. Information on the mapping file and the require file format is located in Table 1. Additionally, a sample mapping file can be obtained from here in the Downloads section.
  4. The last file that is needed for the MD Prune script is a marker dependency file which needs to have the associated path altered: MDSFILE="../resources/linkage/ld70.ceu.txt" This file defines the dependency structure between markers. These files can be obtained for LD of GWAS loci for different human populations from the HapMap consortium here, in addition to commonly used LD files, which are provided as sample files.
  5. Optionally, the output path for the dependency corrected association and mapping files can be specified: OUTPATH="output/" And the percentage of top associated markers can be limited to speed computation: NTOP=0.5. The output, dependency corrected association and mapping files, can then be used in the following MSEA pipeline.