TecDec
Description: TecDec is a text classification model implemented using the TextCatCNN architecture in the SpaCy NLP library. The model performs binary classification to distinguish between technocratic and deliberative frames in text.
Language: Japanese
Training Data: The model was trained on a dataset of Japanese texts, primarily policy-oriented or related to policy debates. The dataset includes both synthetic and human-generated samples, labeled for technocratic versus deliberative frames using a teacher-student paradigm with Human-in-the-Loop (HITL) active learning.
This dataset contains a modified subset of the llm-jp-corpus-v4 (available at llm-jp-corpus-v4).
The original data is licensed under CC-BY 4.0, CC-BY-SA 4.0, MIT, and ODC-BY 1.0.
Changes made:
- Extracted a specific sub-portion of the original records (for the Japanese language).
- Created synthetic data based on the original records.
- Added custom labels for deep learning training.
- Cleaned/formatted for use with spaCy.
Note: This project is an independent derivative and is not endorsed by llm-jp-corpus-v4 creators.
Task: Binary text classification (technocratic vs. deliberative).
Usage: Intended for use in computational linguistics and Computational Critical Discourse Analysis (CCDA), particularly for analyzing policy discourse.
GitHub: https://github.com/ugodevkit/tecdec
License: LGPL-3.0
Limitations: The model is specialized for Japanese policy texts and may not generalize to other languages or domains without fine-tuning.
Model tree for ugo86/ja_tecdec_labeler
Base model
spacy/ja_core_news_lg