Sample Files

Click on sample data below to download.

Association Files

Association Files Sources

Data type Trait Paper Summary statistics link
GWAS Alzheimer's disease Marioni, R. E., et al. (2018). "GWAS on family history of Alzheimer’s disease." Translational Psychiatry 8(1): 99.
Attention Deficit Hyperactivity Disorder Middeldorp, C. M., et al. (2016). "A Genome-Wide Association Meta-Analysis of Attention-Deficit/Hyperactivity Disorder Symptoms in Population-Based Pediatric Cohorts." Journal of the American Academy of Child & Adolescent Psychiatry 55(10): 896-905.e896.
Alcohol Dependence Olfson, E. and L. J. Bierut (2012). "Convergence of Genome-Wide Association and Candidate Gene Studies for Alcoholism." Alcoholism: Clinical and Experimental Research 36(12): 2086-2094.
Body Mass Index Locke, A. E., et al. (2015). "Genetic studies of body mass index yield new insights for obesity biology." Nature 518(7538): 197-206.
Breast Cancer Rashkin, S. R., et al. (2020). "Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts." Nature Communications 11(1): 4423.
Coronary Artery Disease Nikpay, M., et al. (2015). "A comprehensive 1000 Genomes–based genome-wide association meta-analysis of coronary artery disease." Nature Genetics 47(10): 1121-1130.
Fasting Glucose Manning, A. K., et al. (2012). "A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance." Nature Genetics 44(6): 659-669.
Heart Failure Shah, S., et al. (2020). "Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure." Nature Communications 11(1): 163.
High Density Lipoproteins (HDL) Willer, C. J., et al. (2013). "Discovery and refinement of loci associated with lipid levels." Nature Genetics 45(11): 1274-1283.
Low Density Lipoproteins (LDL) Willer, C. J., et al. (2013). "Discovery and refinement of loci associated with lipid levels." Nature Genetics 45(11): 1274-1283.
Major Depressive Disorder Coleman, J. R. I., et al. (2020). "Genome-wide gene-environment analyses of major depressive disorder and reported lifetime traumatic experiences in UK Biobank." Molecular Psychiatry 25(7): 1430-1446.
Parental Lifespan Timmers, P. R., et al. (2019). "Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances." Elife 8.
Parkinson’s Disease Blauwendraat, C., et al. (2019). "Parkinson's disease age at onset genome-wide association study: Defining heritability, genetic loci, and α-synuclein mechanisms." Movement Disorders 34(6): 866-875.
Psoriasis Nair, R. P., et al. (2009). "Genome-wide scan reveals association of psoriasis with IL-23 and NF-κB pathways." Nature Genetics 41(2): 199-204.
Severe illness in COVID-19 Pairo-Castineira, E., et al. (2021). "Genetic mechanisms of critical illness in COVID-19." Nature 591(7848): 92-98.
Schizophrenia Ripke, S., et al. (2014). "Biological insights from 108 schizophrenia-associated genetic loci." Nature 511(7510): 421-427.
Stroke Malik, R., et al. (2018). "Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes." Nature Genetics 50(4): 524-537.
Psoriasis Nair, R. P., et al. (2009). "Genome-wide scan reveals association of psoriasis with IL-23 and NF-κB pathways." Nature Genetics 41(2): 199-204.
Systemic Lupus Erythematosus Wang, Y.-F., et al. (2021). "Identification of 38 novel loci for systemic lupus erythematosus and genetic heterogeneity between ancestral groups." Nature Communications 12(1): 772.
Type 2 Diabetes Fuchsberger, C., et al. (2016). "The genetic architecture of type 2 diabetes." Nature 536(7614): 41-47.
Total Cholesterol Willer, C. J., et al. (2013). "Discovery and refinement of loci associated with lipid levels." Nature Genetics 45(11): 1274-1283.
Triglycerides Willer, C. J., et al. (2013). "Discovery and refinement of loci associated with lipid levels." Nature Genetics 45(11): 1274-1283.
EWAS Birth Weight Küpers, L. K., et al. (2019). "Meta-analysis of epigenome-wide association studies in neonates reveals widespread differential DNA methylation associated with birthweight." Nature Communications 10(1): 1893.
Maternal Anxiety Sammallahti, S., et al. (2021). "Maternal anxiety during pregnancy and newborn epigenome-wide DNA methylation." Molecular Psychiatry.
Social communication Rijlaarsdam, J., et al. (2021). "Epigenetic profiling of social communication trajectories and co-occurring mental health problems: a prospective, methylome-wide association study." Development and Psychopathology: 1-10.
Psoriasis Roberson, E. D. O., et al. (2012). "A Subset of Methylated CpG Sites Differentiate Psoriatic from Normal Skin." Journal of Investigative Dermatology 132(3): 583-592., Gu, X., et al. (2015). "Correlation between Reversal of DNA Methylation and Clinical Symptoms in Psoriatic Epidermis Following Narrow-Band UVB Phototherapy." Journal of Investigative Dermatology 135(8): 2077-2083.,

Mapping Files

Marker Dependency Files

Marker sets

Information on coexpression modules

Includes WGCNA (1) and MEGENA (2) coexpression networks made from GTEx (3) human gene expression data. Both WGCNA and MEGENA network methods are based on hierarchical clustering to assign co-regulated genes into the same coexpression module. Agglomerative hierarchical clustering is used in WGCNA, whereas divisive clustering is used in MEGENA. Gene-clusters are identified by merging (in agglomerative) or splitting (in divisive) based on a distance measure (e.g. 1-|correlation|). In WGCNA, 1 minus topological overlap matrix (TOM), hence dissTOM=1-TOM, was used as the distance measure. TOM is based on the correlation score (edge weight) between two genes (nodes) but also considers the edge weights of common neighbors of these two nodes in the network. To calculate the distance between two clusters, average dissTOM score of all gene pairs (each pair includes one gene from each cluster, while comparing 2 clusters) is used. In MEGENA, a shortest path distance (SPD) based distance measure is used. To create compact modules, a nested k-medoids clustering, which defines k-best clusters at each step that minimizes the SPD within each cluster, is used. Nested k-medoids clustering is ran until no more compact child cluster can be defined. MEGENA performs multi-scale clustering, which assigns a gene into numerous modules from different scales. Finally, we annotated each module with its functions by using curated biological pathways from the Reactome database (4) based on a hypergeometric test (one-tailed Fisher Exact test).
For WGCNA, we utilized a r2 >0.7 for soft threshold selection but in cases where this threshold could not be reached we used a default soft threshold = 6 and we used a k = 100. For MEGENA, we used the "Spearman" method for correlation, we set our min module size = 10 and max module size = 2500. Recommended or default parameters were used for other criteria.

Biological Networks

Information on bayesian networks

Bayesian composite human and mouse networks are made using RIMBANet (5,6) with tissue-specific expression data and the priors transcription factor-target pairs and eQTLs

Information on FANTOM5 transcription factor networks

FANTOM5 networks (7) are optimized by choosing a weight cutoff that yields a scale-free network (reaching -0.95 correlation between the log of degrees and log of the number of nodes)

Information on STRING PPI network

To reduce density of the STRING PPI (8) network, the top 5% of edges by "combined_score" are kept


  • 1. Langfelder, P. and S. Horvath (2008). "WGCNA: an R package for weighted correlation network analysis." BMC Bioinformatics 9(1): 559.
  • 2. Song, W. M. and Zhang, B. (2015) Multiscale Embedded Gene Co-expression Network Analysis. PLoS Comput Biol 11(11): e1004574.
  • 3. (2020). "The GTEx Consortium atlas of genetic regulatory effects across human tissues." Science 369(6509): 1318.
  • 4. Fabregat, A., et al. (2017). "Reactome pathway analysis: a high-performance in-memory approach." BMC Bioinformatics 18(1): 142.
  • 5. Zhu, J., et al. (2007). "Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations." PLOS Computational Biology 3(4): e69.
  • 6. Zhu, J., et al. (2008). "Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks." Nature Genetics 40(7): 854-861.
  • 7. Marbach, D., Lamparter, D., Quon, G., Kellis, M., Kutalik, Z. and Bergmann, S. (2016) Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods, 13, 366-370.
  • 8. Szklarczyk, D., Gable, A.L., Nastou, K.C., Lyon, D., Kirsch, R., Pyysalo, S., Doncheva, N.T., Legeay, M., Fang, T., Bork, P. et al. (2021) The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res, 49, D605-d612.