Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨, xLSTM

Recent Activity

liked a model 1 day ago

nvidia/Gemma-4-31B-IT-NVFP4

commentedon a paper 1 day ago

Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

upvoted a paper 1 day ago

Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

View all activity

Organizations

upvoted a paper 1 day ago

Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

Paper • 2604.20447 • Published 4 days ago • 2

upvoted an article 3 days ago

Article

mlinter: a linter for Transformers modeling files

4 days ago

•

6

upvoted a collection 10 days ago

GlotSuite

GlotSuite: Paving the Way for Bringing Generative AI to Underserved Communities • 17 items • Updated 10 days ago • 3

upvoted an article 18 days ago

Article

How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs

18 days ago

•

59

upvoted a collection 23 days ago

Gemma 4

8 items • Updated 23 days ago • 682

upvoted a collection about 1 month ago

fiNERweb

A multilingual dataset for NER covering 91 langauges and 25 scripts • 3 items • Updated Dec 16, 2025 • 3

upvoted a paper about 1 month ago

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

Paper • 2603.19223 • Published Mar 19 • 31

upvoted 2 collections about 1 month ago

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated 5 days ago • 126

Nemotron-Cascade 2

Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation • 4 items • Updated 5 days ago • 50

upvoted a changelog about 1 month ago

Hugging Face Changelog

Protected Spaces with Public URLs

Mar 20

• 122

upvoted a collection about 1 month ago

Olmo Hybrid

6 items • Updated Mar 5 • 25

upvoted a paper about 1 month ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published Mar 17 • 21

upvoted 2 articles about 1 month ago

Article

State of Open Source on Hugging Face: Spring 2026

Mar 17

•

79

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

70

upvoted 3 papers about 1 month ago

Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA

Paper • 2603.14782 • Published Mar 16 • 1

Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike

Paper • 2603.15130 • Published Mar 16 • 1

Effective Distillation to Hybrid xLSTM Architectures

Paper • 2603.15590 • Published Mar 16 • 33

upvoted 2 articles about 1 month ago

Article

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Mar 9

•

26

Article

FlashHead: Accelerating Language Model Inference ~ Efficient drop-in replacement for the classification head

Mar 11

•

2

upvoted a paper about 1 month ago

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published Mar 10 • 82