FishNALM-8_H3K9me3

FishNALM-8_H3K9me3 is a fine-tuned version of FishNALM-8_pretrain for H3K9me3 prediction in fish genomics.

Model description

This repository contains a task-specific fine-tuned checkpoint from the FishNALM model family. The model was initialized from the pretrained base model FishNALM-8_pretrain and then fine-tuned for H3K9me3 prediction.

Task

Task name: H3K9me3 prediction
Task type: binary classification
Prediction target: H3K9me3-positive vs H3K9me3-negative genomic sequences

Examples:

  • CTCF TFBS prediction
  • Pou5f1 TFBS prediction
  • Sox2 TFBS prediction
  • histone modification prediction
  • promoter prediction
  • splice donor prediction
  • splice acceptor prediction
  • splice classification

Base model

  • Base model repository: xia-lab/FishNALM-8_pretrain
  • Model family: FishNALM
  • Initialization type: pretrained checkpoint + downstream fine-tuning

Training data

This model was fine-tuned on H3K9me3 prediction data from FishGUE.

Evaluation

  • Primary metric: MCC
  • Evaluation split / strategy: predefined train/validation/test split

Intended uses

This model is intended for:

  • fish genomics sequence classification
  • downstream task inference on sequences similar to the fine-tuning setting
  • comparative benchmarking within fish genomic prediction tasks

Limitations

  • This is a task-specific fine-tuned model and should be used within the scope of H3K9me3 prediction.
  • Generalization to other species, tasks, or sequence lengths may be limited.
  • This is a research model and is not intended for clinical or diagnostic use.

How to use

Load tokenizer and model

from transformers import AutoTokenizer, AutoModelForSequenceClassification

repo_name = "xia-lab/FishNALM-8_H3K9me3"

tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForSequenceClassification.from_pretrained(repo_name)

Example inference

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

repo_name = "xia-lab/FishNALM-8_H3K9me3"
sequence = "ATGCGTACGTTAGCTAGCTAGCTAGCTAGCTA"

tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForSequenceClassification.from_pretrained(repo_name)

inputs = tokenizer(
    sequence,
    return_tensors="pt",
    truncation=True,
    padding="max_length",
    max_length=512,
)

with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
probabilities = torch.softmax(logits, dim=-1)
prediction = torch.argmax(probabilities, dim=-1)

print("logits:", logits)
print("probabilities:", probabilities)
print("prediction:", prediction)

Label mapping

  • 0: negative
  • 1: positive

Files in this repository

Typical files in this repository may include:

  • config.json
  • model.safetensors
  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json
  • vocab.txt
  • README.md

Citation

If you use this model, please cite the FishNALM manuscript.

Contact

For questions, please contact: xqxia@ihb.ac.cn

Downloads last month
15
Safetensors
Model size
89.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support