# Technical Debt & Refactoring Plan **Created**: 2026-01-21 **Status**: Pending - UX improvements in progress first **Last Updated**: 2026-01-21 --- ## Executive Summary GameConfigIdeaEditBrainstorm is a sophisticated AI-powered game design platform with ~24,343 lines of Python across 79 files. While feature-rich, it has accumulated technical debt that impacts maintainability, testability, and extensibility. **Decision**: Focus on UX improvements first to better understand real usage patterns, then return to this restructuring plan with concrete insights. --- ## Current Architecture Overview ``` ├── app.py (4,104 lines) # Monolithic main file - PRIMARY CONCERN ├── Core Engine │ ├── my_text_game_engine_attempt.py │ ├── game_state.py │ ├── condition_evaluator.py │ └── game_configs.py ├── AI/ML Integration │ ├── leveraging_machine_learning.py │ └── llm_playtester.py ├── UI Tabs (ui_tabs/) # Already modularized - GOOD ├── Exporters (exporters/) # 14 platforms - GOOD separation ├── External Ports │ ├── narrativeengine_hfport/ │ ├── storygenattempt_hfport/ │ └── dnd_game_master_hfport/ └── Scenario Templates (*_scenarios.py) ``` --- ## Issues by Priority ### P0 - Critical (Blocks scaling) | Issue | Impact | Effort | Notes | |-------|--------|--------|-------| | Monolithic `app.py` (4,104 lines) | Hard to maintain, test, or onboard contributors | High | Break into feature modules | | No automated tests | Can't refactor safely | Medium | Add pytest suite | | Tight UI-engine coupling | Can't unit test game logic | High | Extract pure engine layer | ### P1 - High (Impacts development velocity) | Issue | Impact | Effort | Notes | |-------|--------|--------|-------| | Mixed state management (Player + GameState) | Confusing, potential bugs | Medium | Complete migration to GameState | | 60+ hardcoded LLM list | Hard to maintain/extend | Low | Create ModelRegistry class | | Sparse error handling | Silent failures confuse users | Medium | Add structured logging | | Lambda consequences + declarative effects coexisting | Inconsistent, harder to validate | Medium | Migrate all to declarative | ### P2 - Medium (Quality of life) | Issue | Impact | Effort | Notes | |-------|--------|--------|-------| | Code duplication in exporters | Maintenance burden | Medium | Extract base exporter class | | Missing type hints | IDE support, bugs | Low | Add progressively | | Inconsistent naming | Cognitive load | Low | Establish conventions | | Magic strings/numbers | Bugs, hard to refactor | Low | Create enums/constants | ### P3 - Low (Nice to have) | Issue | Impact | Effort | Notes | |-------|--------|--------|-------| | Exporter quality variance | Some exports may fail | Medium | Add capability metadata | | No caching for LLM inferences | Repeated work | Medium | Add caching layer | | Sparse docstrings | Onboarding difficulty | Low | Document as we go | --- ## Proposed Refactoring Phases ### Phase 1: Foundation (After UX work) - [ ] Extract `app.py` into logical modules: - `app_core.py` - Gradio app setup, shared state - `app_generation.py` - Content generation handlers - `app_playtest.py` - Playtest/preview handlers - `app_export.py` - Export handlers - `app_media.py` - Media generation handlers - [ ] Add basic pytest infrastructure - [ ] Create constants/enums for magic strings ### Phase 2: Engine Isolation - [ ] Extract pure game engine (no Gradio dependencies) - [ ] Complete Player → GameState migration - [ ] Migrate lambda consequences to declarative effects - [ ] Add engine unit tests ### Phase 3: ML Infrastructure - [ ] Create ModelRegistry class with metadata - [ ] Add structured error handling + logging - [ ] Implement inference caching ### Phase 4: Polish - [ ] Extract shared exporter base class - [ ] Add type hints throughout - [ ] Comprehensive documentation pass - [ ] Add integration tests --- ## Metrics to Track - Lines in `app.py` (target: <500) - Test coverage % (target: >60%) - Average function length (target: <50 lines) - Number of untyped functions (target: 0) --- ## UX Insights to Gather First Before restructuring, document insights from UX work: - [ ] Which tabs/features are actually used most? - [ ] What are common user workflows? - [ ] Where do users get confused or stuck? - [ ] Which exporters are production-quality vs experimental? - [ ] What error messages do users encounter? These insights will inform which modules to prioritize and how to structure the codebase for real usage patterns. --- ## Notes / Updates *Add notes here as UX work progresses* - 2026-01-21: Plan created. Starting UX improvements first.