§ Begin
Design a smart mutation library
Provide a wild-type sequence — get a library of multi-mutant variants ranked by predicted fitness, codon-optimized for your expression host, and ready to order from your synthesis vendor.
§ 1 · Sequence
Provide a wild-type
FASTA, SnapGene, GenBank, raw DNA, or raw protein. Auto-detected.
Multiple CDS features detected — pick the gene to evolve
The longest CDS in a plasmid is usually the antibiotic resistance gene. Pick your gene of interest.
§ 2 · Search
Tune the parameters
Sensible defaults — change if you know what you're after.
§ 3 · Generate
Run the engine
Scoring runs on our GPU. Your individual sequences are never shared, exposed, or reproduced; only anonymous, aggregated signals improve our models. Privacy.
§ Awaiting input
The library appears here.
Provide a sequence above and click Generate. Results render as a sortable variant table with mutation map, PCR primers, and one-click synthesis ordering.
§ 4 · Library
Variants
How to read this
Fitness — the summed ESM-2 log-likelihood gain over wild-type (ΣΔLL) across the variant’s mutations. Higher means the model finds that combination more evolutionarily plausible. It’s a prior, not a verdict — screen experimentally.
Mutations — substitutions vs. wild-type (WT·position·new, e.g. W58L). GC % — GC of the codon-optimized DNA (40–60% is PCR-friendly). Tm — primer melting temp. bp — amplicon length.
Filter box: C49 = variants mutating residue 49 · W58L = that exact substitution · gc>50, tm>58, fitness>2, bp<800 = numeric ranges.
Stay shallow — cap at 3–4 mutations per variant unless you have a structural reason.
Mutation landscape
Where mutations land across the protein. Bar height counts variants sharing a position; color encodes ΔLL strength (faint · medium · saturated).
| Rank ↕ | Mutations ↕ | Fitness ↕ | GC % ↕ | Tm (°C) ↕ | bp ↕ |
|---|
§ Learn from results
Tested these on the bench? Teach the engine.
Enter a measured value for the variants you assayed (any number — activity, expression, stability; higher = better). Leave the rest blank. TuringDNA fits a model on your results and proposes a smarter round 2 — conditioned on your data, not just the prior.
§ Begin · CRISPR
Design a CRISPR edit.
Paste a gene or region. Get ranked sgRNAs for a SpCas9 / Cas12a knockout — or switch to base editing to install a precise C→T / A→G change (including premature-stop knockouts) with predicted amino-acid outcomes. Free for any account.
§ 1 · Target
Paste your DNA
The gene or region you want to edit. ACGT only; FASTA headers OK; up to 1 Mbp.
§ 2 · Guides
Ranked sgRNAs
—
How to read this
Composite — the overall pick: on-target activity × (1 − off-target penalty). On-target — predicted cut efficiency from Doench-style sequence features (1.0 = strong).
Self-off — worst CFD similarity to any other site in the sequence you pasted (0 = unique here). This is not a genome-wide check — run your top guides through CRISPOR before ordering.
KO score — probability the indel gives a true loss of function. FS % — fraction of predicted repair outcomes that frameshift. Dominance — how often the single most-likely repair outcome occurs (40%+ = clean).
Base-edit mode adds Editability (in-window editing activity), the predicted edit, and its amino-acid consequence.
| Rank | Strand | Position | Spacer | PAM | Composite | On-target | Self-off | KO score | Top indel | FS % | Dominance | Base editor | Edit | Editability | AA change | Outcome | Genome off | Exon context | Sense oligo | Antisense oligo | Order | GC% | Flags |
|---|
Primer Analysis
Paste your candidate primers, BLAST them all in one go, and let the fitness model rank them — best amplification of your template, fewest off-target matches. No more BLASTing one primer at a time.
One per line or FASTA. Optional names and forward/reverse tags
(e.g. GeneA_F / GeneA_R) — pairs are auto-detected and scored
as amplicons.
The DNA region you want to amplify. Stays on our server — used to check each primer's binding site, uniqueness and the predicted amplicon size.
How to read this
Fitness — the combined score used to rank your candidates (higher = better pick): Tm, GC, 3′-end stability and specificity.
On template — whether the primer binds your template uniquely. Off-target products — extra amplicons in-silico PCR predicts beyond the intended one (0 = specific). Tm — nearest-neighbor melting temp; aim for a matched pair.
These primers are yours — TuringDNA scores and ranks them; it doesn’t invent them.
The specificity scan runs full in-silico PCR against your chosen genome — the same check NCBI Primer-BLAST performs — so the result is complete here. Predictions are a strong guide; as always, validate critical results at the bench.
Plasmid Editor
Import a GenBank or FASTA record (or paste raw DNA), see the annotated circular & linear map, run restriction analysis across 1,000+ enzymes, simulate a digest with a virtual gel, and save constructs to your library.
§ One workbench
Your construct is the centre of gravity. Import it above to map and cut it here — then, without leaving your sequence, hand any region to the other tools.
Features
Restriction analysis
Tap enzymes to show their cut sites on the map; pick a few and digest to size the fragments.
Restriction sites from Biopython/REBASE (1,000+ enzymes), circular-aware. Auto-annotation flags ORFs + common motifs; GenBank features are kept as-is.
Clone & assemble
Paste your fragments — FASTA, or one per line as name: SEQUENCE. Overlapping ends assemble scarlessly; bare junctions get homology arms designed into the primers.
§ Library
My plasmids
§ Documentation
How TuringDNA works
Four tools, one workbench — what each does, what to trust, how to cite it.
Overview
TuringDNA is an integrated molecular-biology workbench built around your construct. The Plasmid Editor is the hub: import a sequence, then design CRISPR edits, analyse primers, or evolve a coding region on it — without leaving your sequence. Four tools, one workspace.
Your individual sequences stay private — never shared, exposed, or reproduced, and they only leave this machine on the explicit opt-in network steps listed below. Only anonymous, aggregated signals improve our models (Privacy Policy).
Plasmid Editor
Import and work with circular or linear constructs end to end.
- Import. GenBank · FASTA · EMBL · SnapGene-style features · raw DNA. Annotations are read from the record where present.
- Maps & editing. Annotated circular and linear maps, plus a sequence view with feature colouring, find, translate, reverse-complement and in-place editing.
- Restriction. Scans the REBASE enzyme set (1,000+ enzymes), separates cutters from non-cutters, and simulates a digest on a virtual agarose gel.
- Cloning. Simulates Gibson assembly, Golden Gate, and restriction-ligation — junctions, fragment order, and the assembled product.
- Export. GenBank or FASTA; save constructs to your per-user library for 45 days.
Cloning and digests are in-silico predictions. Confirm overhangs, junctions and orientation before you commit reagents.
CRISPR
Design SpCas9 guide RNAs against a pasted region (or a gene resolved by symbol/accession) — for knockout or base editing.
- Guides. Every NGG-PAM 20-mer with an on-target activity score (Doench-style sequence features), strand and position.
- Off-target. CFD scoring (Doench 2016) against other sites in the input. Base-edit mode adds CBE/ABE windows, the predicted edit, the amino-acid consequence, bystander flags and CRISPR-STOP knockouts.
- Knockout. Predicted indel spectrum, frameshift %, out-of-frame dominance, and a loss-of-function likelihood.
- Cloning oligos. Ready-to-order sense/antisense pairs for the standard vectors (BbsI/BsmBI; Cas12a geometry), copied straight to a vendor order.
Off-target search covers only the sequence you provide — it is not a whole-genome scan. Run your top guides through a genome-wide tool (e.g. CRISPOR) before ordering.
Primer Analysis
Bring your candidate primers; TuringDNA evaluates and ranks them — it does not invent primers from scratch.
- Scoring. Nearest-neighbor Tm (SantaLucia), GC content, 3′-end stability, and hairpin / self-dimer heuristics for every primer.
- In-silico PCR. Binds each pair against your template and reports the intended amplicon plus any off-target products and their sizes.
- Specificity (opt-in). Checks candidates against NCBI nt and separates intended from off-target hits.
- Ranking. Combines the above into a recommended pair — no BLASTing one primer at a time.
Directed Evolution
Design a smart mutation library for an existing protein using ESM-2 zero-shot scoring. Provide a wild-type sequence (or send a CDS from the Plasmid Editor); receive a library of multi-mutant variants ranked by predicted evolutionary fitness, codon-optimized for your host, ready to order.
This is not de novo protein design. It's directed-evolution library generation: improving an existing protein along an existing axis (stability, expression, mild activity tuning).
- Parse & translate. FASTA · SnapGene · GenBank · EMBL · raw DNA · raw protein. Plasmid files surface a CDS picker; raw DNA with multiple stops triggers 6-frame ORF discovery.
- Zero-shot scoring. ESM-2 computes ΔLL = log P(mutant | x_WT) − log P(WT | x_WT) for every position × 19 substitutions (Meier et al. 2021 wild-type marginal scheme).
- Combinatorial search. Simulated annealing over the top-percentile pool of single-site mutations. Multiple restarts; cumulative ΣΔLL as fitness; stop-codon and duplicate-position penalties.
- Codon optimization. Reverse-translate with a host codon-usage table. Synonymously scrub BsaI / BsmBI / NotI sites for Golden Gate compatibility.
| Mutations / variant | Approx. functional retention |
|---|---|
| 1–2 | 70–85% |
| 3–4 | 50–70% |
| 5–6 | 30–55% |
| 7–8 | 15–40% |
| 9+ | < 25% |
Stay shallow. Cap Max mutations / variant at 3–4 unless you have a structural reason. Screen, don't trust.
What ESM-2 sees: evolutionary plausibility from ~65 M UniRef50 sequences — conservation, coevolution, sequence context. What it can't see: structure explicitly, rare-but-essential active-site roles, epistasis between selected mutations; membrane proteins, IDPs and multi-domain assemblies are weakest.
Filter syntax (variant table): C49 = variants mutating residue 49 · W58L = exactly that substitution · gc>50, tm>58, fitness>2, bp<800 = numeric ranges.
What runs locally vs over the network
- Local: plasmid parsing, annotation, restriction & cloning simulation; CRISPR design, CFD off-target and oligo generation; primer scoring and in-silico PCR; ESM-2 inference, simulated annealing and codon optimization; all CSV / Excel / GenBank / FASTA export. Sequences never leave this machine for these steps.
- Opt-in network: NCBI BLAST (sequence identification and primer specificity — sends that sequence to NCBI); AlphaFold-DB / ESMFold structure embeds; Europe PMC literature lookup (IP Radar); synthesis-vendor redirects. Each is explicit and off by default.
Citations
Meier et al. (2021). Language models enable zero-shot prediction of the effects of mutations on protein function. NeurIPS 34.
Lin et al. (2023). Evolutionary-scale prediction of atomic-level protein structure. Science 379:1123–1130.
Doench et al. (2016). Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34:184–191.
Hsu et al. (2013). DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31:827–832.
Komor et al. (2016) & Gaudelli et al. (2017). Programmable base editing of C·G and A·T pairs. Nature 533:420 / 551:464.
Allawi & SantaLucia (1997). Thermodynamics and NMR of internal G·T mismatches in DNA. Biochemistry 36:10581–10594.
Licenses of bundled components
- ESM-2 weights — MIT (Meta/FAIR)
- Transformers, Accelerate — Apache 2.0 (Hugging Face)
- PyTorch — BSD-style (Meta)
- Biopython (parsing, restriction, Tm) — BSD-derived
- Mol* viewer — MIT (PDBe / RCSB)
- Inter, JetBrains Mono — SIL Open Font License
- Lucide icons — ISC