GAD-77M(Generative Autogressive Decoder-77M):
GAD-77M is a custom-built, decoder-only language model (77 million parameters) designed with an agentic architecture. Unlike standard GPT-2 models, GAD-77M features specialized modules for long-term memory and multi-dimensional intent processing.
This specific version is a Pre-training release, having achieved a remarkably low training loss of 1.18 on a specialized corpus focused heavily on astronomy, astrophysics, technology, world knowledge, furoms, comminuties, conversations and more.
π Model Highlights
- Architecture: GAD (Generative Autogressive Decoder)
- Parameters: ~77 Million
- Training Loss: 1.18 (Final Epoch)
- Specialization: High-accuracy retrieval of large world knowledge data (technology, astronomy, conversations, wikipedia, furoms...).
- Core Innovation: Integration of
MultiIntentEvolverandAdaptiveMemory.
π§ Advanced Architecture
GAD-77M goes beyond the standard Transformer block:
- Multi-Dimensional Intent Evolver: Uses parallel GRUs to track different layers of interaction (Emotional, Goal-oriented, and Factual) simultaneously.
- Adaptive Memory Module: A dedicated memory parameter space that updates during training, allowing the model to "anchor" specific factual knowledge better than standard embeddings.
- Self-Reflective Head: A confidence-scoring mechanism that evaluates the model's own output certainty.
- SwiGLU Activations & RMSNorm: Modern architectural choices used in state-of-the-art models like Llama 3 for better stability and performance.
π Training Performance
The model was trained for 8 epochs. The loss curve showed exceptional convergence:
- Initial Loss: ~5.8
- Final Loss: 1.18 This low loss indicates a high degree of "knowledge compression," making it an ideal candidate for further Instruction Tuning.
π Capabilities (Pre-training Version)
In its current state, the model acts as a powerful text-completer and knowledge retriever. It has shown high proficiency in:
- Describing stellar objects (e.g., Rigel, Orion's Belt, ΞΈ1 Ori).
- Managing technical data and scientific units.
- Maintaining context across complex astronomical descriptions.
π» How to Use
Use this endpoint: https://rnevo2016--gad-agentic-api-fastapi-app.modal.run
for example:
import requests
url = "https://rnevo2016--gad-agentic-api-fastapi-app.modal.run/generate"
data = {
"prompt": "Orion constellation is known for",
"max_new_tokens": 50,
"temperature": 0.8
}
response = requests.post(url, json=data)
result = response.json()
print(result["text"])
- Downloads last month
- 268
