MPRA data access portal


We performed saturation mutagenesis in conjunction with massively parallel reporter assays (MPRAs) on 21 regulatory elements, including 20 commonly studied, disease-relevant promoter and enhancer sequences from the literature, and one ultraconserved enhancer (UC88). For the former, we focused primarily on regulatory sequences in which specific mutations are known to cause disease, both for their clinical relevance and to provide for positive control variants. Selected elements were limited up to 600 base pairs (bp) for technical reasons related to the mapping of variants to barcodes by subassembly. In addition, we selected only sequences where cell line-based reporter assays were previously established.

The regulatory elements

Promoters

Name Genomic coordinates (GRCh37) Genomic coordinates (GRCh38) Transcript Associated Phenotype Luciferase vector MPRA vector Cell line Transf. time (hr) Fold change (Wild type) Fold Change (MPRA) Construct size (bp)
F9 chrX:138,612,622-138,612,924 chrX:139,530,463-139,530,765 NM_000133.3 Hemophilia B pGL4.11b pGL4.11c HepG2 24 2.6 2.1 303
FOXE1 chr9:100,615,537-100,616,136 chr9:97,853,255-97,853,854 NM_004473.3 Thyroid cancer pGL4.11b pGL4.11c HeLa 24 4.2 2.5 600
GP1BB chr22:19,710,789-19,711,173 chr22:19,723,266-19,723,650 NM_000407.4 Bernard-Soulier Syndrome pGL4.11b pGL4.11c HEL 92.1.7 24 22.1 12.3 385
HBB chr11:5,248,252-5,248,438 chr11:5,227,022-5,227,208 NM_000518.4 Thalassemia pGL4.11b pGL4.11c HEL 92.1.7 24 14.3 8.4 187
HBG1 chr11:5,271,035-5,271,308 chr11:5,249,805-5,250,078 NM_000559.2 Hereditary persistence of fetal hemoglobin pGL4.11b pGL4.11c HEL 92.1.7 24 118.1 41.8 274
HNF4A (P2) chr20:42,984,160-42,984,444 chr20:44,355,520-44,355,804 NM_175914.4 Maturity-onset diabetes of the young (MODY) pGL4.11b pGL4.11c HEK293T 24 2.8 1.3 285
LDLR chr19:11,199,907-11,200,224 chr19:11,089,231-11,089,548 NM_000527.4 Familial hypercholesterolemia pGL4.11b pGL4.11b HepG2 24 110.7 76.6 318
MSMB chr10:51,548,988-51,549,578 chr10:46,046,244-46,046,834 NM_002443.3 Prostate cancer pGL4.11b pGL4.11c HEK293T 24 8.4 3.4 593
PKLR chr1:155,271,186-155,271,655 chr1:155,301,395-155,301,864 NM_000298.5 Pyruvate kinase deficiency pGL4.11b pGL4.11c K562 48 29.4 9.6 470
TERT chr5:1,295,104-1,295,362 chr5:1,294,989-1,295,247 NM_198253.2 Various types of cancer pGL4.11b pGL4.11b HEK293T, SF7996 24 231.8,5.2 148.2, 2.7 259

Enhancers

Name Genomic coordinates (GRCh37) Genomic coordinates (GRCh38) Associated Phenotype Luciferase vector MPRA vector Cell line Transf. time (hr) Fold change (Wild type) Fold Change (MPRA) Construct size (bp)
BCL11A+58 chr2:60,722,075-60,722,674 chr2:60,494,940-60,495,539 Sickle cell disease pGL4.23 pGL4.23d HEL 92.1.7 24 2.5 1.7 600
IRF4 chr6:396,143-396,593 chr6:396,143-396,593 Human pigmentation pGL4.23 pGL4.23d SK-MEL-28 24 44.5 16.3 451
IRF6 chr1:209,989,135-209,989,735 chr1:209,815,790-209,816,390 Cleft lip pGL4.23 pGL4.23c HaCaT 24 17 16.7 600
MYC (rs6983267) chr8:128,413,074-128,413,673 chr8:127,400,829-127,401,428 Various types of cancer pGL4.23 pGL4.23c HEK293T 32, 20nM LiCl added after 24hr 2.8 0.7 600
MYC (rs11986220) chr8:128,531,515-128,531,977 chr8:127,519,270-127,519,732 Various types of cancer pGL4.23 pGL4.23d LNCaP + 100nM DHT 24 5.5 3.2 464
RET chr10:43,581,927-43,582,526 chr10:43,086,479-43,087,078 Hirschsprung pGL3 pGL3c Neuro-2a 24 2 0.9 600
SORT1 chr1:109,817,274-109,817,873 chr1:109,274,652-109,275,251 Plasma low-density lipoprotein cholesterol & myocardial infarction pGL4.23 pGL4.23c HepG2 24 235.3 202.2 600
TCF7L2 chr10:114,757,999-114,758,598 chr10:112,998,240-112,998,839 Type 2 diabetes pGL4.23 pGL4.23d MIN6 24 9 2.7 600
UC88 chr2:162,094,919-162,095,508 chr2:161,238,408-161,238,997 - pGL4.23 pGL4.23c Neuro-2a 24 9.3 5.4 590
ZFAND3 chr6:37,775,275-37,775,853 chr6:37,807,499-37,808,077 Type 2 diabetes pGL4.23 pGL4.23c MIN6 24 14.3 7.3 579
ZRS chr7:156,583,813-156,584,297 chr7:156,791,119-156,791,603 Limb malformations TATA-pGL4m (EV087) pGL4Zc NIH/3T3 (with HOXD13/ HOXD13+HAND2) 24 04.02.2002 3.7/2.6 485

Data usage

We made our data available prior to publication in line with Fort Lauderdale principle, allowing others to use the data but allowing the data producers to make the first presentations and to publish the first paper with global analyses of the data. Furter, we reserved the right to publish the first analysis of the differences seen in the TERT knock-down experiments and alternative cell-type experiments. Studies that do not overlap with these intentions may be submitted for publication at any time, but must appropriately cite the data source. After publication of the data, the first publication of the data producers should be cited for any use of these data.

The manuscript describing this data set, global analysis and TERT knock-down experiments was published on Aug 8, 2019:

Kircher M, Xiong C, Martin B, Schubach M, Inoue F, Bell RJA, Costello JF, Shendure J, Ahituv N. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nature Communication. 2019 Aug 8;10(1):3583. doi: 10.1038/s41467-019-11526-w.

NCBI GEO and raw data submission are available under accession GSE126550.

Contact

For general information and specific questions about the experiment contact Nadav Ahituv. For questions on the data analysis please contact Martin Kircher. About the website please contact Max Schubach or refer directly to the github repository of this website.

Download elements


Download Selected Elements Download All Elements

Format description

Variant files for each element are available for genome releases GRCh37 and GRCh38. Except of the chromosomal coordinates the files of the different releases are identical. Files are TAB (.tsv) or COMMA (.csv) separated, each row contains one variant, and they include a header. If a possible variant at a position is not shown, it was not observed in the saturation mutagenesis library and therefore not present in the final model for fitting.

Columns

Chromosome - Chromosome of the variant.
Position - Chromosomal position (GRCh38 or GRCH38) of the variant.
Ref - Reference allele of the variant (A, T, G, or C).
Alt - Alternative allele of the variant (A, T, G, or C). One base-pair deletions are represented as -.
Tags - Number of unique tags associated with the variant.
DNA - Count of DNA sequences that contain the variant (used for fitting the linear model).
RNA - Count of RNA sequences that contain the variant (used for fitting the linear model).
Value - Log2 variant expression effect derived from the fit of the linear model (coefficient).
P-Value - P-value of the coefficient.

Data usage

We are making our data available prior to publication in line with Fort Lauderdale principle, allowing others to use the data but allowing the data producers to make the first presentations and to publish the first paper with global analyses of the data. In addition, we also reserve the right to publish the first analysis of the differences seen in the TERT knock-down experiments and alternative cell-type experiments. Studies that do not overlap with these intentions may be submitted for publication at any time, but must appropriately cite the data source. After publication of the data, the first publication of the data producers should be cited for any use of these data.