WhatsApp Image 2026-01-16 at 13.34.13

GAD-77M(Generative Autogressive Decoder-77M):

GAD-77M is a custom-built, decoder-only language model (77 million parameters) designed with an agentic architecture. Unlike standard GPT-2 models, GAD-77M features specialized modules for long-term memory and multi-dimensional intent processing.

This specific version is a Pre-training release, having achieved a remarkably low training loss of 1.18 on a specialized corpus focused heavily on astronomy, astrophysics, technology, world knowledge, furoms, comminuties, conversations and more.

πŸš€ Model Highlights

  • Architecture: GAD (Generative Autogressive Decoder)
  • Parameters: ~77 Million
  • Training Loss: 1.18 (Final Epoch)
  • Specialization: High-accuracy retrieval of large world knowledge data (technology, astronomy, conversations, wikipedia, furoms...).
  • Core Innovation: Integration of MultiIntentEvolver and AdaptiveMemory.

🧠 Advanced Architecture

GAD-77M goes beyond the standard Transformer block:

  1. Multi-Dimensional Intent Evolver: Uses parallel GRUs to track different layers of interaction (Emotional, Goal-oriented, and Factual) simultaneously.
  2. Adaptive Memory Module: A dedicated memory parameter space that updates during training, allowing the model to "anchor" specific factual knowledge better than standard embeddings.
  3. Self-Reflective Head: A confidence-scoring mechanism that evaluates the model's own output certainty.
  4. SwiGLU Activations & RMSNorm: Modern architectural choices used in state-of-the-art models like Llama 3 for better stability and performance.

πŸ“Š Training Performance

The model was trained for 8 epochs. The loss curve showed exceptional convergence:

  • Initial Loss: ~5.8
  • Final Loss: 1.18 This low loss indicates a high degree of "knowledge compression," making it an ideal candidate for further Instruction Tuning.

πŸ”­ Capabilities (Pre-training Version)

In its current state, the model acts as a powerful text-completer and knowledge retriever. It has shown high proficiency in:

  • Describing stellar objects (e.g., Rigel, Orion's Belt, ΞΈ1 Ori).
  • Managing technical data and scientific units.
  • Maintaining context across complex astronomical descriptions.

πŸ’» How to Use

Use this endpoint: https://rnevo2016--gad-agentic-api-fastapi-app.modal.run

for example:

import requests

url = "https://rnevo2016--gad-agentic-api-fastapi-app.modal.run/generate"

data = {
    "prompt": "Orion constellation is known for",
    "max_new_tokens": 50,
    "temperature": 0.8
}

response = requests.post(url, json=data)
result = response.json()

print(result["text"])
Downloads last month
268
Safetensors
Model size
77.1M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Model tree for Raziel1234/GAD-1

Finetunes
2 models

Dataset used to train Raziel1234/GAD-1

Collection including Raziel1234/GAD-1