Prot2Text-V2 Demo

Prot2Text-V2 treats a protein sequence as if it were another language and translates it into English. Supply a raw amino acid sequence and the model returns a clear, human-readable paragraph describing what the protein does.

The paper describing Prot2Text-V2 has been accepted to the NeurIPS 2025 main conference and pairs fast experimentation with explainability-minded outputs.

  • Input: protein sequence using IUPAC single-letter amino acid codes (20 canonical amino acids).
  • Output: polished descriptions of predicted function, localization cues, and structural hints.
  • Why it matters: accelerate protein characterization, lab annotations, or downstream hypothesis building.

Model architecture at a glance

  • Protein language model encoder: facebook/esm2_t36_3B_UR50D.
  • Modality adapter: lightweight bridge aligning protein embeddings with the language model.
  • Natural language decoder: meta-llama/Llama-3.1-8B-Instruct for articulate descriptions.

Resources

Sample sequences
1 1024
0 4
0.05 1
1 1000
1 2
  • Model stack: Facebook ESM2 encoder + Llama 3.1 8B instruction-tuned decoder.
  • Token budget: the generator truncates after the configured Max new tokens.
  • Attribution: Outputs are predictions; validate experimentally before publication.