TecDec

Description: TecDec is a text classification model implemented using the TextCatCNN architecture in the SpaCy NLP library. The model performs binary classification to distinguish between technocratic and deliberative frames in text.

Language: Japanese

Training Data: The model was trained on a dataset of Japanese texts, primarily policy-oriented or related to policy debates. The dataset includes both synthetic and human-generated samples, labeled for technocratic versus deliberative frames using a teacher-student paradigm with Human-in-the-Loop (HITL) active learning.

This dataset contains a modified subset of the llm-jp-corpus-v4 (available at llm-jp-corpus-v4).

The original data is licensed under CC-BY 4.0, CC-BY-SA 4.0, MIT, and ODC-BY 1.0.

Changes made:

  • Extracted a specific sub-portion of the original records (for the Japanese language).
  • Created synthetic data based on the original records.
  • Added custom labels for deep learning training.
  • Cleaned/formatted for use with spaCy.

Note: This project is an independent derivative and is not endorsed by llm-jp-corpus-v4 creators.

Task: Binary text classification (technocratic vs. deliberative).

Usage: Intended for use in computational linguistics and Computational Critical Discourse Analysis (CCDA), particularly for analyzing policy discourse.

GitHub: https://github.com/ugodevkit/tecdec

License: LGPL-3.0

Limitations: The model is specialized for Japanese policy texts and may not generalize to other languages or domains without fine-tuning.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ugo86/ja_tecdec_labeler

Finetuned
(1)
this model

Dataset used to train ugo86/ja_tecdec_labeler