
In gene therapy development and manufacturing, multiple steps can be optimized to improve the efficacy and safety of therapeutics. Transgenic viral vectors must be successfully and specifically delivered to the correct cells within a patient, and that transgene must also be expressed within those cells at the appropriate levels.
Transgene expression is often modulated by changing the dosage to reach the desired level of expression, which adds additional manufacturing costs to an already expensive process. Being able to control the level of expression precisely would not only reduce costs but could increase the safety of the therapy by ensuring expression occurs at suitable levels within target cell types.
Training Models for Predicting Regulatory Motifs

Recent work by Zrimec at al., has used Generative Adversarial Networks (GANs) to design regulatory sequences to improve the control of expression2. In the schematic above, a convolutional Generator network
A generative model of regulatory sequences alone is not sufficient for this task, as it can not predict different levels of expression for a given regulatory sequence. To design sequences for controlled expression a model relating sequence to expression level is required. Recently, Zrimec et al. trained a separate neural network model to do just this, that is they trained a neural network to predict expression levels based on natural genomic sequences from Saccharomyces cerevisiae spanning multiple types of regulatory regions (4,238 sequences of 1,000bp inputs spanning gene promoters, UTRs, and terminators).1 With these two models working together the design of regulatory sequences for controlled expression levels can be attempted.
Model Optimization
Based on the two models described above, an optimization procedure was used to exploit the generative model to produce regulatory DNA with target expression levels to guide the

By differentiating through the
In Vivo Validation
In vivo validation of a selected set of the generated sequences was done over a range spanning 4 orders of magnitude of predicted expression levels. Critically, this set was restricted to generated sequences that displayed properties similar to natural sequences (e.g., promoter contains known binding motifs) and concomitantly exhibited low sequence similarity measures. Also, two regulatory sequences of the POP6 (predicted transcripts per million (TPM) of 64) and RPL3 (predicted TPM of 303) genes were used as low and high expression controls.

As seen in the figure above, observed experimental measurements of the mRNA levels produced by each generated sequence showed a good rank correlation with the predicted levels (Spearman’s ⍴ = 0.74); however, the difference between predicted and actual expression levels was relatively high. A 7.7-fold and 2.5-fold difference between predictions and TPM measurements was observed with the lower ‘gen-10’ (predicted TPM ~10, avg. measured TPM 77) and higher ‘gen-1000’ groups (predicted TPM ~1000, avg. measured TPM 397), respectively. Although the authors could not generate sequences with expression lower than the POP6 control, within the gen-1000 group, 4 out of 7 regulatory constructs (57%) displayed average expression levels that surpassed those of the natural highly-expressed RPL3 control by up to 2.7-fold. This demonstrates the method’s ability to design regulatory DNA that exceeds natural expression levels, although not at a precisely predicted level.
Tunable Transgene Expression and Safer Gene Therapies
Based on these results, the design of gene regulatory elements has promising potential for controlling a given gene's expression within a target cell. The path to developing this method in human genes is relatively straightforward, as existing expression prediction models (e.g., BPNet) can be used to drive the generation process.3
This method could increase transgene expression and reduce the titering of a given therapy during development and manufacturing, ultimately leading to increased efficacy. Future work in ai gene therapy could also use single-cell expression data to try and design for specific cell types, further improving the specificity and ultimate safety of a designed gene therapy viral vector.
AI Disclosure: Feature image was generated by an AI image tool, MidJourney.
Stay up-to-date on the latest AI gene therapy research
Sign up for our newsletterReferences
- Zrimec J, Börlin CS, Buric F, et al. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat Commun. 2020;11(1):6141.
- Zrimec J, Fu X, Muhammad AS, et al. Controlling gene expression with deep generative design of regulatory DNA. Nat Commun. 2022;13(1):5099.
- Avsec Ž, Weilert M, Shrikumar A, et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet. 2021;53(3):354-366.
