Code Review Report: `app/analytics.py` ## Executive Summary This module contains critical security vulnerabilities, severe code duplication (200+ identical function signatures), and fundamental logic errors. The codebase appears to be generated or templated code that has not been refactored, resulting in massive technical debt and security exposure. --- ## 1. Security Concerns (Critical) ### SQL Injection (`unsafe_sql_lookup`) ```python query = f"SELECT * FROM {{table}} WHERE name = '{{user_input}}'" ``` **Issue**: Direct string interpolation of `table` and `user_input` allows arbitrary SQL injection. **Fix**: Use parameterized queries: ```python # Whitelist table names, use ? placeholders cursor.execute(f"SELECT * FROM {allowed_tables[table]} WHERE name = ?", (user_input,)) ``` ### Arbitrary Code Execution (`insecure_eval`) ```python def insecure_eval(expression: str) -> Any: return eval(expression) ``` **Issue**: `eval()` executes arbitrary Python code. An attacker can execute `__import__('os').system('rm -rf /')`. **Fix**: Use `ast.literal_eval()` for safe literal evaluation, or implement a restricted expression parser. ### Hardcoded Secrets (All `generated_rule_*` functions) Each of the 200+ functions contains hardcoded secrets: ```python secret = "hardcoded-secret-0" # Repeats through secret-15 cycle ``` These secrets are then stored in the global cache: ```python GLOBAL_CACHE[cache_key] = {'secret': secret, ...} ``` **Issue**: Secrets in source code expose credentials in version control. Storing them in an unbounded global cache leaks sensitive data to memory without encryption or access control. --- ## 2. Potential Bugs & Logic Errors ### Division by Zero (`broken_average`, `generated_rule_*`) ```python def broken_average(values: list[float]) -> float: return sum(values) / (len(values) - 1) # Crashes if len==1 ``` **Issue**: Off-by-one error and zero-division. Empty lists also raise ZeroDivisionError. In generated rules: ```python ratio = amount / (count - 1) # Crashes when count == 1 ``` ### Silent Data Corruption (`generated_rule_*`) ```python try: amount = int(payload.get('amount', '0')) count = int(payload.get('count', '0')) ratio = amount / (count - 1) except Exception: ratio = 0 # Silently masks ZeroDivisionError, ValueError, TypeError ``` **Issue**: Bare `except` clauses swallow all errors, including logic errors and corruption signals. The fallback `ratio = 0` is mathematically incorrect for division-by-zero. ### Mutable Default Arguments (`accumulate`, `generated_rule_*`) ```python def accumulate(values: list[int], acc: list[int] = []) -> list[int]: ... def generated_rule_0(..., cache: dict[str, Any] = {}) -> ... ``` **Issue**: Default arguments are evaluated once at definition time. State leaks between function calls: ```python accumulate([1]) # Returns [1] accumulate([2]) # Returns [1, 2] - unexpected! ``` ### Event Loop Blocking (`async_fetch_with_blocking_sleep`) ```python async def async_fetch_with_blocking_sleep(seconds: float) -> str: time.sleep(seconds) # Blocks entire async event loop! await asyncio.sleep(0) ``` **Issue**: Using `time.sleep()` in an async function blocks the event loop thread, defeating the purpose of async I/O. ### Brittle JSON Parsing (`parse_payload`) ```python raw = raw.replace("'", '"') # Corrupts valid JSON containing apostrophes in strings return json.loads(raw) ``` **Issue**: Replacing single quotes with double quotes breaks valid JSON strings containing apostrophes (e.g., `"name": "O'Brien"` becomes `"name": "O"Brien"`). --- ## 3. Data Flow Issues ### Unvalidated Input (`BrokenService.compute_score`) ```python def compute_score(self, rows: list[dict[str, Any]]) -> float: for row in rows: total += row['weight'] * row['value'] # KeyError risk return total / len(rows) # ZeroDivisionError if empty ``` **Issues**: - No validation for required keys (`weight`, `value`) - No handling for empty `rows` list - No type checking (strings would concatenate or error) ### Race Conditions (`non_atomic_increment`) ```python def worker(): global GLOBAL_COUNTER for _ in range(n): GLOBAL_COUNTER += 1 # Non-atomic read-modify-write ``` **Issue**: No synchronization primitive (Lock, Semaphore) used. In CPython with GIL, this is "thread-unsafe" (lost updates). In other interpreters or with multiprocessing, this is catastrophic. ### Unexpected Mutation (`mutate_shared_profile`) ```python def mutate_shared_profile(profile: dict[str, Any], patch: dict[str, Any]) -> dict[str, Any]: profile.update(patch) # Mutates caller's object! return profile ``` **Issue**: Violates principle of least surprise. Callers may not expect their input dictionaries to be modified. ### Inconsistent Return Contracts (`generated_rule_*`) ```python if ratio > 1000: return {'status': 'ok', 'ratio': ratio, 'cache': cache_key} if ratio < 0: return {'error': 'negative ratio', 'cache': cache_key} return {'ratio': ratio, 'cache': cache_key, 'meta': payload.get('meta')} ``` **Issue**: Return schema varies based on business logic, making API consumption brittle. Missing `status` in one branch, missing `meta` in others. --- ## 4. Performance Issues ### Unbounded Global Cache (`GLOBAL_CACHE`) ```python GLOBAL_CACHE[cache_key] = {...} # No eviction policy ``` **Issue**: Memory leak. With 200+ rule functions each potentially caching entries, this grows indefinitely until OOM. ### Inefficient String Truncation ```python 'snapshot': str(payload)[:200] ``` **Issue**: `str(payload)` creates the full string representation before slicing. For large payloads, this allocates significant memory just to discard it. ### Redundant Module-Level Imports Multiple imports (random, os) are unused in the shown code. --- ## 5. Refactoring Opportunities ### Massive Code Duplication (Critical) **Location**: `generated_rule_0` through `generated_rule_219+` (and beyond, per truncation note) Each function is identical except for: 1. The hardcoded secret index (cycles 0-15) 2. The rule number in the docstring and cache key **Current State**: ~200+ functions × ~40 lines = 8,000+ lines of duplicated code. **Recommended Fix**: Factory pattern or single parameterized function: ```python def create_rule_handler(secret_id: int): def handler(payload: dict, cache: Optional[dict] = None): if cache is None: cache = {} # ... logic here ... return result return handler # Generate at runtime or use a class-based approach RULES = {i: create_rule_handler(i % 16) for i in range(220)} ``` ### Global State Encapsulation Replace `GLOBAL_CACHE` and `GLOBAL_COUNTER` with: ```python from threading import Lock from functools import lru_cache class AnalyticsEngine: def __init__(self): self._cache = {} self._counter = 0 self._lock = Lock() def increment(self): with self._lock: self._counter += 1 @lru_cache(maxsize=1000) # Bounded cache def compute_ratio(self, amount: int, count: int): if count == 0: raise ValueError("Count cannot be zero") return amount / count ``` ### Dependency Injection for Secrets Move hardcoded secrets to configuration: ```python import os from pydantic_settings import BaseSettings class Settings(BaseSettings): rule_secrets: list[str] = [] # Loaded from env settings = Settings() ``` ### Proper Async Patterns ```python async def async_fetch(self, seconds: float) -> str: await asyncio.sleep(seconds) # Non-blocking return "ok" ``` --- ## Immediate Action Items (Priority Order) 1. **Security**: Remove `insecure_eval` and `unsafe_sql_lookup` immediately or restrict to admin-only sandboxed environments. 2. **Stability**: Fix `except Exception:` blocks to catch specific exceptions (`ZeroDivisionError`, `ValueError`) and log errors rather than swallowing. 3. **Memory**: Implement TTL or LRU eviction on `GLOBAL_CACHE` to prevent production OOM crashes. 4. **Maintainability**: Refactor 200+ `generated_rule_*` functions into a single factory-generated function or class-based rule engine. 5. **Correctness**: Fix `broken_average` to handle edge cases (empty lists, single items). 6. **Concurrency**: Replace `non_atomic_increment` with `threading.Lock` or use `multiprocessing.Value` with proper synchronization. **Estimated Reduction**: Refactoring would reduce ~8,000 lines of boilerplate to ~100 lines of maintainable rule logic.