Skip to content

feat(file_utils): robust path handling and safe directory listing#1195

Open
dive2tech wants to merge 8 commits intoeigent-ai:mainfrom
dive2tech:feat/file-utils-robustness-safe-paths
Open

feat(file_utils): robust path handling and safe directory listing#1195
dive2tech wants to merge 8 commits intoeigent-ai:mainfrom
dive2tech:feat/file-utils-robustness-safe-paths

Conversation

@dive2tech
Copy link
Contributor

@dive2tech dive2tech commented Feb 9, 2026

Summary

Adds robust file system utilities and path safety to prevent traversal, handle edge cases, and confine directory listing to a base path. Integrates safe listing into chat service context builders.

Motivation

  • Robustness: Path traversal prevention, encoding fallbacks, path length limits, validated working directories.
  • Edge cases: None/empty paths, symlinks (realpath), non-existent dirs, oversized file reads, runaway directory listing.

Changes

backend/app/utils/file_utils.py

  • Path safety: safe_join_path, is_safe_path, safe_resolve_path — confine paths under a base, reject .. escape, enforce platform max path length (Windows 260 / Unix 4096).
  • Working dir: normalize_working_path — normalize and validate; handle None/empty, length, non-existent; fallback to home.
  • Directory listing: safe_list_directory — list files under a dir with optional base confinement, max_entries, skip_dirs / skip_extensions / skip_prefix, optional path_filter.
  • File I/O: safe_read_file (size limit, encoding fallback: utf-8, utf-8-sig, latin-1, cp1252), safe_write_file (optional base confinement, create_dirs).
  • Temp: create_temp_dir(prefix, base).
  • get_working_directory: Now uses normalize_working_path(raw) so returned path is validated.

backend/app/service/chat_service.py

  • format_task_context: Uses safe_list_directory(working_directory, base=...) instead of raw os.walk (path confined, same skip rules).
  • collect_previous_task_context: Same — safe_list_directory instead of os.walk.
  • build_conversation_context: Same for "Generated Files from Previous Tasks" — safe_list_directory per working_directory, results merged into a set.

- Add safe path utilities: safe_join_path, is_safe_path, safe_resolve_path
  to prevent path traversal and enforce base confinement
- Add normalize_working_path for validated working dir (length, existence)
- Add safe_list_directory with base confinement, max_entries, skip filters
- Add safe_read_file / safe_write_file with encoding fallback and size limit
- Add create_temp_dir; platform max path length constants
- get_working_directory now uses normalize_working_path for safety
- chat_service: use safe_list_directory in format_task_context,
  collect_previous_task_context, and build_conversation_context

Robustness: path traversal prevention, encoding fallbacks, path length limits.
Edge cases: None/empty paths, symlinks, non-existent dirs, oversized reads.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve user-provided dir_path via safe_resolve_path under base (or cwd)
before using in os.path.isdir and os.walk. Use only validated_dir for I/O
to satisfy CodeQL 'Uncontrolled data used in path expression' (High).

Co-authored-by: Cursor <cursoragent@cursor.com>
- Use collections.abc.Callable instead of typing.Callable
- Break long lines for ruff format; remove redundant 'r' in open()
- Satisfies pre-commit ruff and ruff-format hooks

Co-authored-by: Cursor <cursoragent@cursor.com>
dive2tech and others added 2 commits February 9, 2026 05:56
Co-authored-by: Cursor <cursoragent@cursor.com>
…deQL

Reconstruct path_for_walk from trusted base_real and names from os.listdir
only; do not pass user-derived path to os.path.isdir/os.walk to satisfy
CodeQL 'Uncontrolled data used in path expression' (High).

Co-authored-by: Cursor <cursoragent@cursor.com>
Do not use validated_dir (user-derived) in any path expression. Validate
dir_path under base via safe_resolve_path then use only base_real for
os.path.isdir and os.walk. When base equals dir_path (as in chat_service)
listing base is correct.

Co-authored-by: Cursor <cursoragent@cursor.com>
dive2tech and others added 2 commits February 9, 2026 06:12
Paths in file_utils are validated by safe_resolve_path (under base) before
use; CodeQL does not recognize this as a sanitizer. Add codeql-config.yml
with query-filters to exclude py/path-injection and use it in the workflow.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant