Test-Driven Development (TDD) Transformation Plan¶

SSOT Key: tdd-transformation Objective: Transform development workflow to Test-Driven Development and maintain CI-enforced coverage quality.

Executive Summary¶

Current State: - Coverage threshold: No-regression policy (must not decrease from baseline) + unified 96% target (backend + frontend + scripts) - Test files: 100 - Source files: 75 - Test-to-source ratio: 1.7:1 (22,655 test LOC / 13,162 source LOC) - Well-organized test structure aligned with SSOT domains - CI Coverage Enforcement: ✅ NOW ENFORCED (post-merge validation added)

Target State: - Coverage threshold: 96% unified coverage (backend + frontend + scripts, measured via unified-coverage.json) - TDD-first development workflow - Documented testing patterns and best practices - Service layer coverage: 80%+ (currently 16.59%)

Current Testing Infrastructure Analysis¶

Test Configuration¶

Component	Configuration	Location
Test Framework	pytest + pytest-asyncio + pytest-cov	`apps/backend/pyproject.toml`
Coverage Tool	pytest-cov with XML + terminal reports	`pyproject.toml` `[tool.pytest.ini_options]`
Local Threshold	90% backend (pyproject.toml); 96% unified (calculate_unified_coverage.py)	`apps/backend/pyproject.toml`
CI Threshold	Monitored (post-merge validation)	`.github/workflows/ci.yml`
Parallel Execution	pytest-xdist (4 workers local, auto in CI)	`moon.yml` test-execution
Database Lifecycle	Auto-create/cleanup via context manager	`scripts/test_lifecycle.py`

Test Organization (Domain-Based)¶

Tests are organized by domain matching the source structure:

tests/
├── conftest.py          # Shared fixtures (db, client, test_user)
├── fixtures/            # Factory patterns
├── accounting/          # 20 test files
├── reconciliation/     # 13 test files
├── extraction/          # 18 test files
├── auth/                # 5 test files
├── ai/                  # 8 test files
├── assets/              # 4 test files
├── api/                 # 4 test files
├── reporting/           # 13 test files
├── market_data/         # 1 test file
├── infra/              # 12 test files
├── unit/                # 2 test files
└── e2e/                 # 4 test files (51 test functions)

Total: ~100 test files, ~675 test functions organized by feature domain

Test Execution Modes¶

Command	Description
`moon run :test`	Run all tests (default, 90% backend coverage gate)
`moon run :test -- --fast`	TDD mode (no coverage, fastest)
`moon run :test -- --smart`	Coverage on changed files only
`moon run :test -- --e2e`	E2E tests (Playwright)
`moon run :test -- tests/accounting/`	Run specific module tests
`moon run :test -- tests/accounting/test_journal_service.py`	Run specific file

Test Case Numbering System (ACx.y.z)¶

Purpose: Establish traceability between EPIC acceptance criteria and test implementations.

Numbering Convention¶

Format: ACx.y.z

Component	Meaning	Example
AC	Acceptance Criteria prefix	AC (fixed)
x	EPIC number (no zero padding)	1, 2, 3
y	Feature block within EPIC	1, 2, 3
z	Test case number within block	1, 2, 3

Examples: - AC1.1.1 → EPIC-1 (EPIC-001), Block 1 (Authentication), Test case 1 - AC2.3.5 → EPIC-2 (EPIC-002), Block 3 (Journal Entry Posting), Test case 5

Feature Block Organization¶

Each EPIC should divide features into logical blocks:

EPIC-001 Example (Infrastructure & Authentication): - Block 1: Backend Health Check - Block 2: User Authentication (Registration/Login) - Block 3: Database Connectivity - Block 4: Docker Environment

EPIC-002 Example (Double-Entry Bookkeeping): - Block 1: Account Management (CRUD) - Block 2: Journal Entry Creation - Block 3: Journal Entry Posting & Voiding - Block 4: Balance Calculation - Block 5: Accounting Equation Validation

Test Case Documentation Requirements¶

In EPIC Documents¶

Each EPIC must include a Test Cases section with:

## 🧪 Test Cases

### AC2.1: Account Management

| ID | Test Case | Test Function | Priority |
|----|-----------|---------------|----------|
| AC2.1.1 | Create account with valid data | `test_create_account()` | P0 |
| AC2.1.2 | Create account with duplicate code | `test_create_account_duplicate_code()` | P0 |
| AC2.1.3 | List accounts with type filter | `test_list_accounts_with_filters()` | P1 |

### AC2.2: Journal Entry Creation

| ID | Test Case | Test Function | Priority |
|----|-----------|---------------|----------|
| AC2.2.1 | Balanced entry passes validation | `test_balanced_entry_passes()` | P0 |
| AC2.2.2 | Unbalanced entry fails | `test_unbalanced_entry_fails()` | P0 |

In Test Code¶

Test functions MUST start with the AC number in docstring:

@pytest.mark.asyncio
async def test_balanced_entry_passes():
    """AC2.2.1: Balanced entry passes validation.

    Verify that journal entries with equal debits and credits
    are accepted by the validation logic.
    """
    # Test implementation...

Implementation Guidelines¶

1. EPIC Document Update Checklist¶

When creating/updating an EPIC: - [ ] Define feature blocks (x.y structure) - [ ] Create test case table for each block - [ ] Link test functions to AC IDs - [ ] Reference test file paths

2. Test Code Update Checklist¶

When writing tests: - [ ] Add AC number in test docstring first line - [ ] Follow naming: test_<feature>_<scenario>() - [ ] Group tests by feature block (use pytest marks if needed) - [ ] Update EPIC document with new test references

3. Code Review Checklist¶

During PR review: - [ ] New features have AC numbers assigned in EPIC - [ ] Test docstrings include AC references - [ ] EPIC test case table updated - [ ] Test-to-AC traceability maintained

Benefits¶

Traceability: Easy to find tests for acceptance criteria
Coverage Verification: Identify missing tests for AC blocks
Communication: Product/QA can reference test IDs
Maintenance: Track which tests validate which requirements

Migration Strategy¶

Phase 1: Apply to EPIC-001 and EPIC-002 (pilot) Phase 2: Apply to new EPICs going forward Phase 3: Backfill existing EPICs (optional)

TDD Transformation Strategy¶

Phase 1: Documentation & Standards (Week 1)¶

Objective: Establish clear TDD guidelines and integrate into SSOT.

1.1 Create TDD SSOT Document¶

File: docs/ssot/tdd.md

Contents: 1. TDD workflow (Red-Green-Refactor cycle) 2. Test organization patterns (unit → integration → e2e) 3. When to write tests first vs. tests after 4. Test naming conventions 5. Mocking guidelines (what to mock vs. what to test) 6. Coverage quality metrics (branch vs. line coverage)

1.2 Update Development.md¶

File: docs/ssot/development.md

Additions: - TDD workflow section - Test-first development checklist - Coverage requirements (96% unified: backend + frontend + scripts) - Test review process

1.3 Create Testing Standards Checklist¶

Checklist for PR reviews:

- [ ] New features have tests written FIRST
- [ ] Edge cases covered (null, empty, boundary values)
- [ ] Error handling tested
- [ ] Unified coverage maintained ≥ 96% (run `python scripts/calculate_unified_coverage.py`)
- [ ] No test-only changes (refactors should have tests updated)

Phase 2: Coverage Threshold Upgrade (Week 1-2)¶

Objective: Raise coverage requirement and ensure CI enforcement.

Status (2026-02-25): - Local coverage threshold: 90% backend (--cov-fail-under=90 in pyproject.toml); 96% unified - CI coverage threshold: No-regression baseline + unified 96% gate - Branch coverage tracking: enabled via --cov-branch - CI now enforces no-regression: Each shard runs ~25% of tests, merged unified coverage validated post-merge

2.1 Local Configuration¶

# apps/backend/pyproject.toml
[tool.pytest.ini_options]
addopts = "--cov=src --cov-report=term-missing --cov-report=xml --cov-branch --cov-fail-under=90 -m 'not slow'"

2.2 CI Configuration¶

# .github/workflows/ci.yml
- name: Validate unified coverage threshold
  run: |
    pip install coverage
    coverage lcov --lcov-file=coverage.lcov --data-file=.coverage
    coverage report --include="src/*" --fail-under=99

Phase 3: Coverage Gap Analysis (Week 2)¶

Objective: Identify and fix coverage gaps systematically.

3.1 Current Coverage Status (2026-02-25)¶

Layer	Coverage	Status
models/	97.76%	✅ Excellent
schemas/	97.93%	✅ Excellent
utils/	56.52%	⚠️ Partial
routers/	27.02%	❌ Low
services/	16.59%	❌ CRITICAL GAP

3.2 Service Layer Coverage Gaps (CRITICAL)¶

Service	Coverage	Risk
services/reporting.py	9.29%	🔴 Financial reports
services/fx_revaluation.py	0%	🔴 Currency gains/losses
services/reconciliation.py	13.76%	🔴 Matching engine
services/review_queue.py	12.5%	🔴 Approval workflow
services/validation.py	11.3%	🔴 Statement validation
services/classification.py	0%	🔴 Transaction categorization

3.3 Priority Matrix for Coverage Boost¶

Priority	Module	Current Coverage	Target	Action
P0	services/reporting	9.29%	80%	Add error path tests
P0	services/reconciliation	13.76%	80%	Add error path tests
P0	services/validation	11.3%	80%	Add error path tests
P1	services/review_queue	12.5%	80%	Add error path tests
P1	services/fx_revaluation	0%	80%	Add FX tests
P2	routers/	27.02%	60%	Add router error tests

Phase 4: Test-First Development Practices (Week 3-4)¶

Objective: Establish TDD workflow in daily development.

4.1 Red-Green-Refactor Cycle¶

Template for new features:

# 1. RED: Write failing test
@pytest.mark.asyncio
async def test_new_feature_expected_behavior():
    """Test that new feature works as expected."""
    # Setup
    # Exercise
    # Assert (will fail initially)
    pass

# 2. GREEN: Implement minimum to pass
# Add production code to make test pass

# 3. REFACTOR: Improve code without breaking tests
# Clean up, optimize, add more tests

4.2 Test Organization Guidelines¶

Test file structure:

# tests/domain/test_feature.py

import pytest
from src.services.feature import Feature

# 1. Unit tests (isolated, mocked dependencies)
@pytest.mark.asyncio
async def test_feature_unit_case():
    pass

# 2. Integration tests (real DB, no external APIs)
@pytest.mark.asyncio
async def test_feature_integration(db):
    pass

# 3. Edge cases
@pytest.mark.asyncio
async def test_feature_edge_case_null():
    pass

@pytest.mark.asyncio
async def test_feature_edge_case_empty():
    pass

# 4. Error cases
@pytest.mark.asyncio
async def test_feature_error_invalid_input():
    pass

4.3 Mocking Guidelines¶

DO mock: - External APIs (OpenRouter, S3, FX providers) - File system operations (in unit tests) - Time (for deterministic tests) - Async background tasks (in unit tests)

DO NOT mock: - Database (use test DB fixture) - Business logic (test real implementation) - Service layer (test via router endpoints) - Internal utilities (test actual behavior)

Phase 5: Continuous Improvement (Ongoing)¶

Objective: Maintain coverage quality and prevent regression.

5.1 Pre-Commit Coverage Check¶

Add to .pre-commit-config.yaml:

- repo: local
  hooks:
    - id: coverage-check
      name: Coverage check (90% backend)
      entry: uv run pytest --cov=src --cov-fail-under=90
      language: system
      pass_filenames: false
      always_run: true

5.2 Coverage Dashboard¶

Actions: - Coverage badge in README with threshold: 96% unified - Coveralls reports align with local threshold - Monitor coverage trends over time

5.3 Test Quality Metrics¶

Beyond line coverage: 1. Branch coverage: Ensure all if/else branches tested 2. Mutation testing: Use mutmut to verify test quality 3. Test complexity: Keep cyclomatic complexity low 4. Test execution time: Identify slow tests for optimization

Coverage Accuracy Verification¶

Current Coverage Metrics¶

Configuration:

[tool.coverage.run]
source = ["src"]
omit = [
    "src/__init__.py",
    "src/models/__init__.py",
    "src/schemas/__init__.py",
    "src/schemas/user.py",
    "src/services/__init__.py",
    "src/routers/__init__.py",
    "src/routers/users.py",
    "src/services/extraction.py",
    "src/prompts/*",
    "src/main.py",
    "src/env_smoke_test.py",
    "src/env_check.py",
]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "if TYPE_CHECKING:",
    "if __name__ == .__main__.:",
]

Accuracy Concerns & Fixes¶

Concern	Status	Action
Exclusions are appropriate	✅ Correct	`__init__.py`, `main.py`, prompts excluded correctly
Exclude external API calls	✅ Correct	Integration tests cover extraction.py
Database setup code excluded	✅ Correct	Bootloader checks excluded
Branch coverage vs. line coverage	✅ Fixed	`--cov-branch` added
CI coverage enforcement	✅ Fixed	Post-merge validation added

TDD Workflow Documentation¶

Before Writing Code¶

Read SSOT for the domain (e.g., accounting.md)
Identify test cases:
Happy path (normal operation)
Edge cases (boundary values, null, empty)
Error cases (invalid inputs, failures)
Write failing tests (RED)

After Tests Pass (GREEN)¶

Run all tests to ensure no regressions
Check coverage meets 96% unified (run python scripts/calculate_unified_coverage.py)
Refactor code for readability and performance
Update documentation if behavior changed

Code Review Checklist¶

## Test Coverage
- [ ] Unified coverage ≥ 96% (run `python scripts/calculate_unified_coverage.py`)
- [ ] Branch coverage verified
- [ ] Edge cases tested
- [ ] Error handling tested
- [ ] No pragma: no cover (unless justified)

## TDD Compliance
- [ ] Tests written before implementation
- [ ] Tests organized by domain (SSOT-aligned)
- [ ] Test names describe behavior (not implementation)
- [ ] No test-only commits

Migration Timeline¶

Week	Milestone	Deliverable
1	Documentation & Threshold Update	`docs/ssot/tdd.md`, `development.md` updated, 96% unified threshold
2	Coverage Gap Analysis	Detailed coverage report, gap identification
3	Core Domain Coverage Boost	Accounting & reconciliation at 96%+
4	Feature Coverage Boost	Extraction, reporting, auth at 96%+
5	CI Coverage Enforcement	Post-merge validation (COMPLETED)
6+	Continuous Improvement	Maintain 96% unified, add quality metrics

Success Criteria¶

Quantitative: - [x] Unified coverage ≥ 96% (verified by calculate_unified_coverage.py: 95.74% as of 2026-03-02) - [x] CI coverage enforcement added (post-merge validation) - [ ] Service layer coverage ≥ 80% - [ ] Zero regressions in coverage after PRs - [ ] Test execution time < 30s for unit+integration

Qualitative: - [ ] Developers follow TDD workflow - [ ] Tests document expected behavior (not just cover lines) - [ ] Code review includes test quality assessment - [ ] Coverage gaps are rare and addressed quickly

Resources¶

Internal References¶

development.md - Development workflow
accounting.md - Accounting domain
reconciliation.md - Reconciliation domain
extraction.md - Statement parsing
reporting.md - Financial reports

External References¶

Last Updated: 2026-02-25 Owner: Development Team Review Cycle: Quarterly