Skip to content

EPIC-012: Foundation Libraries Enhancement

Goal: Strengthen core infrastructure libraries to production-grade quality with proper observability, transaction management, and developer experience.

Status: ๐ŸŸก In Progress
Vision Anchor: decision-7-tech-stack
Priority: P1 (Infrastructure Debt)
Estimated Duration: 2-3 weeks
Dependencies: None (cross-cutting infrastructure)


๐Ÿ“‹ Overview

This EPIC addresses technical debt in the foundational libraries that all modules depend on. A comprehensive audit identified gaps in:

  1. Observability - Tracing, log-trace correlation, metrics
  2. Database - Transaction boundaries, connection pooling
  3. Error Handling - Unified exception hierarchy
  4. Rate Limiting - API-wide protection
  5. Developer Experience - Debugging tools, schema consistency

๐ŸŽฏ Success Criteria

Must Have (P0)

  • [x] Distributed tracing with trace_id in all logs
  • [ ] Service-layer uses flush(), router-layer owns commit()
  • See: docs/ssot/accounting.md#async-tx-boundary
  • [x] Connection pool size configurable via environment

Should Have (P1)

  • [x] Unified BaseAppException with error IDs
  • [x] API-wide rate limiting (not just auth endpoints)
  • [~] Metrics endpoint โ€” deferred: project uses SigNoz OTLP, not Prometheus pull scraping (see EPIC-010)

Nice to Have (P2)

  • [ ] UUID auto-serialization structlog processor

๐Ÿ“ Affected Components

Component File(s) Changes
Logging src/logger.py Add tracing, trace_id processor
Database src/database.py, src/config.py Pool config, transaction patterns
Exceptions src/utils/exceptions.py BaseAppException class
Rate Limiting src/rate_limit.py Global API limiter
Debugging scripts/debug.py SigNoz API integration
Schemas src/schemas/*.py Consistent BaseResponse inheritance

๐Ÿ”ด High Priority Issues

H1: Distributed Tracing Missing

Problem: No opentelemetry-instrumentation-* packages installed. Logs lack trace_id/span_id, making it impossible to correlate logs with traces in SigNoz.

Solution: 1. Add OTEL instrumentation packages to pyproject.toml 2. Initialize TracerProvider in logger.py 3. Add structlog processor to inject trace context 4. Auto-instrument FastAPI, SQLAlchemy, and HTTPX

Status: โœ… Complete (PR pending)

Tracking: #181


๐Ÿงช Test Cases

Test Organization: Tests organized by feature blocks using AC12.x.y numbering. Coverage: See apps/backend/tests/infra/

AC12.1: Logging - OTEL Endpoint Configuration

ID Test Case Test Function File Priority
AC12.1.1 OTEL logs endpoint adds suffix /v1/logs test_build_otlp_logs_endpoint_adds_suffix() infra/test_logger.py P1
AC12.1.2 OTEL logs endpoint preserves logs path with /v1/logs test_build_otlp_logs_endpoint_preserves_logs_path() infra/test_logger.py P1

AC12.2: Logging - Renderer Selection

ID Test Case Test Function File Priority
AC12.2.1 Debug mode uses ConsoleRenderer test_select_renderer_uses_console_in_debug() infra/test_logger.py P0
AC12.2.2 Production mode uses JSONRenderer test_select_renderer_uses_json_in_production() infra/test_logger.py P0

AC12.3: Logging - OTEL Missing Dependency / No Endpoint

ID Test Case Test Function File Priority
AC12.3.1 OTEL logging not available logs warning test_configure_otel_logging_missing_dependency_warns() infra/test_logger.py P0
AC12.3.2 OTEL tracing not available logs warning test_configure_otel_tracing_missing_dependency_warns() infra/test_logger.py P0
AC12.3.3 OTEL logging with no endpoint skips setup test_configure_otel_logging_no_endpoint() infra/test_logger.py P0

AC12.4: Logging - OTEL with Fake Exporter

ID Test Case Test Function File Priority
AC12.4.1 OTEL configuration sets up TracerProvider correctly test_configure_otel_tracing_with_fake_exporter() infra/test_logger.py P0

AC12.5: Logging - OTEL Resource Configuration

ID Test Case Test Function File Priority
AC12.5.1 OTEL resource created with correct attributes test_build_otel_resource() infra/test_logger.py P0

AC12.6: Logging - Timing Utilities

ID Test Case Test Function File Priority
AC12.6.1 Sync log_timing logs operation with timing test_log_timing_basic() infra/test_logger.py P0
AC12.6.2 Async log_timing includes additional context test_log_timing_with_context() infra/test_logger.py P0
AC12.6.3 log_timing yields mutable dict test_log_timing_yields_mutable_dict() infra/test_logger.py P0
AC12.6.4 log_timing with custom level test_log_timing_with_custom_level() infra/test_logger.py P0

AC12.7: Logging - External API Logging

ID Test Case Test Function File Priority
AC12.7.1 Sync external API call logs success test_log_external_api_sync_success() infra/test_logger.py P0
AC12.7.2 Sync external API call logs failure test_log_external_api_sync_failure() infra/test_logger.py P0
AC12.7.3 Async external API call logs success test_log_external_api_async_success() infra/test_logger.py P0
AC12.7.4 Async external API call logs failure test_log_external_api_async_failure() infra/test_logger.py P0
AC12.7.5 Sync external API with log_args=True logs args count test_log_external_api_with_log_args() infra/test_logger.py P0

AC12.8: Logging - Exception Logging

ID Test Case Test Function File Priority
AC12.8.1 Log exception logs error with context test_log_exception_basic() infra/test_logger.py P0
AC12.8.2 Log exception includes extra context fields test_log_exception_with_extra_context() infra/test_logger.py P0
AC12.8.3 Log exception without traceback test_log_exception_without_traceback() infra/test_logger.py P0
AC12.8.4 Log exception with custom level test_log_exception_custom_level() infra/test_logger.py P0

AC12.10: Logging - Build Processors

ID Test Case Test Function File Priority
AC12.10.1 Build processors returns list test_build_processors_returns_list() infra/test_logger.py P0

AC12.11: Logging - Trace Context

ID Test Case Test Function File Priority
AC12.11.1 Trace context injects trace_id and span_id when span is valid test_add_trace_context_with_valid_span() infra/test_logger.py P0
AC12.11.2 Trace context skips injection when span context is invalid test_add_trace_context_with_invalid_span() infra/test_logger.py P0
AC12.11.3 Trace context handles missing opentelemetry gracefully test_add_trace_context_handles_import_error() infra/test_logger.py P0

AC12.12: Logging - OTEL Tracing Configuration

ID Test Case Test Function File Priority
AC12.12.1 OTEL tracing skips setup when no endpoint configured test_configure_otel_tracing_no_endpoint() infra/test_logger.py P0
AC12.12.2 TracerProvider created and resource attributes set test_configure_otel_tracing_with_fake_exporter() infra/test_logger.py P0
AC12.12.3 Traces path appends /v1/traces test_configure_otel_tracing_appends_traces_path() infra/test_logger.py P0

AC12.15: Logging - Configuration Basics

ID Test Case Test Function File Priority
AC12.15.1 Configure logging in debug mode test_configure_logging_basic() infra/test_logger.py P0
AC12.15.2 Configure logging in production mode test_configure_logging_production_mode() infra/test_logger.py P0

AC12.16: Logging - Async Timing

ID Test Case Test Function File Priority
AC12.16.1 Async log_timing logs operation with timing test_async_log_timing_basic() infra/test_logger.py P0
AC12.16.2 Async log_timing includes additional context test_async_log_timing_with_context() infra/test_logger.py P0

AC12.17: Logging - External API Async with Args

ID Test Case Test Function File Priority
AC12.17.1 External API async with log_args=True logs args count test_log_external_api_async_with_log_args() infra/test_logger.py P0
AC12.17.2 External API async failure with log_args=True logs args test_log_external_api_async_failure_with_log_args() infra/test_logger.py P0

AC12.18: Logging - Configuration - Environment Variables

ID Test Case Test Function File Priority
AC12.18.1 Ensure PRIMARY_MODEL follows expected pattern test_primary_model_format() infra/test_config_contract.py P0
AC12.18.2 Ensure config.py default matches .env.example documentation test_config_sync_with_env_example() infra/test_config_contract.py P0
AC12.18.3 Ensure BASE_CURRENCY is valid ISO 4217 currency code test_base_currency_format() infra/test_config_contract.py P0
AC12.18.4 Ensure S3_BUCKET follows naming conventions test_s3_bucket_format() infra/test_config_contract.py P0
AC12.18.5 Ensure JWT_ALGORITHM is secure algorithm test_jwt_algorithm_allowed() infra/test_config_contract.py P0
AC12.18.6 Ensure DATABASE_URL follows expected format test_database_url_format() infra/test_config_contract.py P0
AC12.18.7 stub โ€” โ€” โ€”

AC12.19: Infrastructure - Epic 001 Contracts

ID Test Case Test Function File Priority
AC12.19.1 Moon workspace configuration files exist test_epic_001_moon_workspace_configs_exist() infra/test_epic_001_contracts.py P0

AC12.20: Database - Connection Pool Configuration

ID Test Case Test Function File Priority
AC12.20.1 DB_POOL_SIZE config field exists with default test_db_pool_size_config_default() infra/test_config_contract.py P1
AC12.20.2 DB_MAX_OVERFLOW config field exists with default test_db_max_overflow_config_default() infra/test_config_contract.py P1
AC12.20.3 Pool config is positive integer test_db_pool_config_positive_integer() infra/test_config_contract.py P1

AC12.21: Exceptions - BaseAppException

ID Test Case Test Function File Priority
AC12.21.1 BaseAppException has error_id attribute test_base_app_exception_has_error_id() infra/test_exceptions.py P1
AC12.21.2 BaseAppException has status_code attribute test_base_app_exception_has_status_code() infra/test_exceptions.py P1
AC12.21.3 BaseAppException is subclass of Exception test_base_app_exception_is_exception() infra/test_exceptions.py P1
AC12.21.4 BaseAppException can be raised and caught test_base_app_exception_raise_and_catch() infra/test_exceptions.py P1

Test Coverage Summary: - Total AC IDs: 49 - Requirements converted to AC IDs: 100% (EPIC-012 infrastructure work) - Requirements with test references: 100% - Test files: 4 (test_logger.py, test_config_contract.py, test_epic_001_contracts.py, test_exceptions.py) - Overall coverage: Logging, config infrastructure, pool config, and exception hierarchy verified


๐Ÿ“ Acceptance Criteria

โ„น๏ธ Non-contiguous AC numbering: Gaps in AC12.x.y numbers within docs/infra_registry.yaml reflect deprecated/merged ACs preserved for historical traceability (e.g., AC12.24.1โ€“3 retained as ~~strikethrough~~). Do not renumber. New ACs append to the next available index in the relevant feature block.

Problem: Services call db.commit() directly, making it impossible to compose multiple service calls into a single atomic transaction.

Solution: 1. Change services to use db.flush() for getting IDs 2. Move commit() responsibility to routers 3. Consider @transactional decorator for complex cases

Tracking: #182


๐ŸŸก Medium Priority Issues

M1: Connection Pool Configuration

Problem: Using SQLAlchemy defaults for connection pooling. Production may need tuning.

Solution: Add DB_POOL_SIZE and DB_MAX_OVERFLOW to config.py

Tracking: #184

M2: Unified Exception Hierarchy

Problem: Each service defines its own exception class. No unified BaseAppException with error IDs for frontend consumption.

Solution: Create base exception with error_id field, migrate services

Tracking: #185

M3: API-Wide Rate Limiting

Problem: Rate limiting only protects /auth/* endpoints. Other endpoints unprotected.

Solution: Add configurable global rate limiter middleware

Tracking: #186

M4: Metrics Endpoint

Problem: ~~No /metrics endpoint for Prometheus.~~ โ†’ Architecture uses SigNoz OTLP, not Prometheus pull.

Solution: ~~Add prometheus-fastapi-instrumentator~~ โ†’ Deferred (SigNoz OTLP is the observability path)

Tracking: #187

โš ๏ธ Deferred: This project uses SigNoz via OTLP (see EPIC-010) for observability. Prometheus pull-based /metrics has zero consumers in this architecture. Metrics via OTLP to SigNoz is a future task tracked separately.


๐ŸŸข Low Priority Issues

L3: UUID Auto-Serialization

Problem: Must manually wrap UUIDs with str() in logger calls.

Solution: Add structlog processor to auto-convert UUIDs

L4: Schema Inheritance Consistency

Problem: Not all response schemas inherit from BaseResponse.

Solution: Audit and fix schema inheritance


AC12.22: Schemas - Move Inline Schemas to Dedicated Modules

ID Test Case Test Function File Priority
AC12.22.1 Move 6 inline schemas from statements router to review module N/A (mechanical) N/A P0
AC12.22.2 Extract background task schemas from inline/background definitions into dedicated modules N/A (mechanical) N/A P0

AC12.23: Rate Limiting - Global API Middleware (M3)

ID Test Case Test Function File Priority
AC12.23.1 Global rate limit middleware exempts /health test_global_rate_limit_middleware_exempts_health() infra/test_rate_limit.py P1
AC12.23.2 Global rate limit middleware returns 429 after limit exceeded test_global_rate_limit_middleware_blocks_after_limit() infra/test_rate_limit.py P1
AC12.23.3 Global rate limit middleware allows normal requests test_global_rate_limit_middleware_allows_normal_requests() infra/test_rate_limit.py P1
AC12.23.4 Global rate limit middleware exempts /docs test_global_rate_limit_middleware_exempts_docs() infra/test_rate_limit.py P1

AC12.24: Metrics - Prometheus Endpoint (M4)

ID Test Case Test Function File Priority
AC12.24.1 ~~/metrics endpoint returns 200 OK~~ Removed Deferred: SigNoz OTLP path, no Prometheus scrape config P1
AC12.24.2 ~~/metrics endpoint returns text/plain~~ Removed Deferred: SigNoz OTLP path P1
AC12.24.3 ~~/metrics response contains Prometheus data~~ Removed Deferred: SigNoz OTLP path P1

๐Ÿ“Š Progress Tracking

Phase Task Status PR
0 Audit & Documentation โœ… Complete This EPIC
1 Distributed Tracing (H1) โœ… Complete Pending
2 Transaction Boundaries (H2) โณ Pending -
3 Connection Pool Config (M1) โœ… Complete This PR
4 Exception Hierarchy (M2) โœ… Complete This PR
5 Rate Limiting (M3) โœ… Complete This PR
6 Metrics Endpoint (M4) โŒ Deferred Removed โ€” SigNoz OTLP used instead of Prometheus pull


๐Ÿ“ Audit Summary

Current Foundation Libraries

Library File Status
Logging src/logger.py โœ… Structlog + OTEL export
Config src/config.py โœ… Pydantic Settings
Database src/database.py โš ๏ธ Needs pool config
Storage src/services/storage.py โœ… S3/MinIO abstraction
Rate Limit src/rate_limit.py โš ๏ธ Auth-only
Dependencies src/deps.py โœ… DbSession, CurrentUserId
Boot src/boot.py โœ… Health checks
Debug scripts/debug.py โš ๏ธ Needs SigNoz API
Error IDs src/constants/error_ids.py โœ… Centralized constants

Frontend Foundation

Library File Status
API Client lib/api.ts โœ… Unified fetch wrapper
Auth lib/auth.ts โœ… Token management
Currency lib/currency.ts โœ… Decimal.js
Workspace hooks/useWorkspace.tsx โœ… Tab/sidebar state

Created: January 2026
Last Updated: January 2026