We performed saturation mutagenesis in conjunction with massively parallel reporter assays (MPRAs) on 21 regulatory elements, including 20 commonly studied, disease-relevant promoter and enhancer sequences from the literature, and one ultraconserved enhancer (UC88). For the former, we focused primarily on regulatory sequences in which specific mutations are known to cause disease, both for their clinical relevance and to provide for positive control variants. Selected elements were limited up to 600 base pairs (bp) for technical reasons related to the mapping of variants to barcodes by subassembly. In addition, we selected only sequences where cell line-based reporter assays were previously established.
Name | Genomic coordinates (GRCh37) | Genomic coordinates (GRCh38) | Transcript | Associated Phenotype | Luciferase vector | MPRA vector | Cell line | Transf. time (hr) | Fold change (Wild type) | Fold Change (MPRA) | Construct size (bp) |
---|---|---|---|---|---|---|---|---|---|---|---|
F9 | chrX:138,612,622-138,612,924 | chrX:139,530,463-139,530,765 | NM_000133.3 | Hemophilia B | pGL4.11b | pGL4.11c | HepG2 | 24 | 2.6 | 2.1 | 303 |
FOXE1 | chr9:100,615,537-100,616,136 | chr9:97,853,255-97,853,854 | NM_004473.3 | Thyroid cancer | pGL4.11b | pGL4.11c | HeLa | 24 | 4.2 | 2.5 | 600 |
GP1BB | chr22:19,710,789-19,711,173 | chr22:19,723,266-19,723,650 | NM_000407.4 | Bernard-Soulier Syndrome | pGL4.11b | pGL4.11c | HEL 92.1.7 | 24 | 22.1 | 12.3 | 385 |
HBB | chr11:5,248,252-5,248,438 | chr11:5,227,022-5,227,208 | NM_000518.4 | Thalassemia | pGL4.11b | pGL4.11c | HEL 92.1.7 | 24 | 14.3 | 8.4 | 187 |
HBG1 | chr11:5,271,035-5,271,308 | chr11:5,249,805-5,250,078 | NM_000559.2 | Hereditary persistence of fetal hemoglobin | pGL4.11b | pGL4.11c | HEL 92.1.7 | 24 | 118.1 | 41.8 | 274 |
HNF4A (P2) | chr20:42,984,160-42,984,444 | chr20:44,355,520-44,355,804 | NM_175914.4 | Maturity-onset diabetes of the young (MODY) | pGL4.11b | pGL4.11c | HEK293T | 24 | 2.8 | 1.3 | 285 |
LDLR | chr19:11,199,907-11,200,224 | chr19:11,089,231-11,089,548 | NM_000527.4 | Familial hypercholesterolemia | pGL4.11b | pGL4.11b | HepG2 | 24 | 110.7 | 76.6 | 318 |
MSMB | chr10:51,548,988-51,549,578 | chr10:46,046,244-46,046,834 | NM_002443.3 | Prostate cancer | pGL4.11b | pGL4.11c | HEK293T | 24 | 8.4 | 3.4 | 593 |
PKLR | chr1:155,271,186-155,271,655 | chr1:155,301,395-155,301,864 | NM_000298.5 | Pyruvate kinase deficiency | pGL4.11b | pGL4.11c | K562 | 48 | 29.4 | 9.6 | 470 |
TERT | chr5:1,295,104-1,295,362 | chr5:1,294,989-1,295,247 | NM_198253.2 | Various types of cancer | pGL4.11b | pGL4.11b | HEK293T, SF7996 | 24 | 231.8,5.2 | 148.2, 2.7 | 259 |
Name | Genomic coordinates (GRCh37) | Genomic coordinates (GRCh38) | Associated Phenotype | Luciferase vector | MPRA vector | Cell line | Transf. time (hr) | Fold change (Wild type) | Fold Change (MPRA) | Construct size (bp) |
---|---|---|---|---|---|---|---|---|---|---|
BCL11A+58 | chr2:60,722,075-60,722,674 | chr2:60,494,940-60,495,539 | Sickle cell disease | pGL4.23 | pGL4.23d | HEL 92.1.7 | 24 | 2.5 | 1.7 | 600 |
IRF4 | chr6:396,143-396,593 | chr6:396,143-396,593 | Human pigmentation | pGL4.23 | pGL4.23d | SK-MEL-28 | 24 | 44.5 | 16.3 | 451 |
IRF6 | chr1:209,989,135-209,989,735 | chr1:209,815,790-209,816,390 | Cleft lip | pGL4.23 | pGL4.23c | HaCaT | 24 | 17 | 16.7 | 600 |
MYC (rs6983267) | chr8:128,413,074-128,413,673 | chr8:127,400,829-127,401,428 | Various types of cancer | pGL4.23 | pGL4.23c | HEK293T | 32, 20nM LiCl added after 24hr | 2.8 | 0.7 | 600 |
MYC (rs11986220) | chr8:128,531,515-128,531,977 | chr8:127,519,270-127,519,732 | Various types of cancer | pGL4.23 | pGL4.23d | LNCaP + 100nM DHT | 24 | 5.5 | 3.2 | 464 |
RET | chr10:43,581,927-43,582,526 | chr10:43,086,479-43,087,078 | Hirschsprung | pGL3 | pGL3c | Neuro-2a | 24 | 2 | 0.9 | 600 |
SORT1 | chr1:109,817,274-109,817,873 | chr1:109,274,652-109,275,251 | Plasma low-density lipoprotein cholesterol & myocardial infarction | pGL4.23 | pGL4.23c | HepG2 | 24 | 235.3 | 202.2 | 600 |
TCF7L2 | chr10:114,757,999-114,758,598 | chr10:112,998,240-112,998,839 | Type 2 diabetes | pGL4.23 | pGL4.23d | MIN6 | 24 | 9 | 2.7 | 600 |
UC88 | chr2:162,094,919-162,095,508 | chr2:161,238,408-161,238,997 | - | pGL4.23 | pGL4.23c | Neuro-2a | 24 | 9.3 | 5.4 | 590 |
ZFAND3 | chr6:37,775,275-37,775,853 | chr6:37,807,499-37,808,077 | Type 2 diabetes | pGL4.23 | pGL4.23c | MIN6 | 24 | 14.3 | 7.3 | 579 |
ZRS | chr7:156,583,813-156,584,297 | chr7:156,791,119-156,791,603 | Limb malformations | TATA-pGL4m (EV087) | pGL4Zc | NIH/3T3 (with HOXD13/ HOXD13+HAND2) | 24 | 04.02.2002 | 3.7/2.6 | 485 |
We made our data available prior to publication in line with Fort Lauderdale principle, allowing others to use the data but allowing the data producers to make the first presentations and to publish the first paper with global analyses of the data. Furter, we reserved the right to publish the first analysis of the differences seen in the TERT knock-down experiments and alternative cell-type experiments. Studies that do not overlap with these intentions may be submitted for publication at any time, but must appropriately cite the data source. After publication of the data, the first publication of the data producers should be cited for any use of these data.
The manuscript describing this data set, global analysis and TERT knock-down experiments was published on Aug 8, 2019:
Kircher M, Xiong C, Martin B, Schubach M, Inoue F, Bell RJA, Costello JF, Shendure J, Ahituv N. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nature Communication. 2019 Aug 8;10(1):3583. doi: 10.1038/s41467-019-11526-w.
NCBI GEO and raw data submission are available under accession GSE126550.
For general information and specific questions about the experiment contact Nadav Ahituv. For questions on the data analysis please contact Martin Kircher. About the website please contact Max Schubach or refer directly to the github repository of this website.
Variant files for each element are available for genome releases GRCh37 and GRCh38. Except of the chromosomal coordinates the files of the different releases are identical. Files are TAB (.tsv
) or COMMA (.csv
) separated, each row contains one variant, and they include a header. If a possible variant at a position is not shown, it was not observed in the saturation mutagenesis library and therefore not present in the final model for fitting.
Chromosome - Chromosome of the variant.
Position - Chromosomal position (GRCh38 or GRCH38) of the variant.
Ref - Reference allele of the variant (A
, T
, G
, or C
).
Alt - Alternative allele of the variant (A
, T
, G
, or C
). One base-pair deletions are represented as -
.
Tags - Number of unique tags associated with the variant.
DNA - Count of DNA sequences that contain the variant (used for fitting the linear model).
RNA - Count of RNA sequences that contain the variant (used for fitting the linear model).
Value - Log2 variant expression effect derived from the fit of the linear model (coefficient).
P-Value - P-value of the coefficient.
We are making our data available prior to publication in line with Fort Lauderdale principle, allowing others to use the data but allowing the data producers to make the first presentations and to publish the first paper with global analyses of the data. In addition, we also reserve the right to publish the first analysis of the differences seen in the TERT knock-down experiments and alternative cell-type experiments. Studies that do not overlap with these intentions may be submitted for publication at any time, but must appropriately cite the data source. After publication of the data, the first publication of the data producers should be cited for any use of these data.