Skip to content

find_file and find_type fuzzy scoring: short common substrings rank above exact filename/prefix-stripped matches #29

@Joxx0r

Description

@Joxx0r

Summary

find_file and find_type (fuzzy) fail to return correct results for short, common names like Pawn and Actor. The actual exact-match files/types are completely absent from results because incidental substring matches within longer names (e.g., "pawn" inside "Spawner") receive equal or higher scores and fill all result slots.

Reproduction

find_file — exact filename missing from results

Search: find_file(filename="Pawn.h", project="Engine", maxResults=10)

Expected: Engine/Runtime/Engine/Classes/GameFramework/Pawn.h as the top result (exact filename match).

Actual: 10 BlueprintNodeSpawner files (all score 0.71). The actual Pawn.h is completely absent:

Engine/Editor/BlueprintGraph/Classes/BlueprintAssetNodeSpawner.h  (0.71)
Engine/Editor/BlueprintGraph/Classes/BlueprintBoundEventNodeSpawner.h  (0.71)
Engine/Editor/BlueprintGraph/Classes/BlueprintBoundNodeSpawner.h  (0.71)
...

These match because "Spawner" contains "pawn" as a substring: S-p-a-w-n-e-r.

Same issue with Actor.h:

find_file(filename="Actor.h", project="Engine", maxResults=10) returns DatasmithFacadeActor.h, ApproximateActors.h, etc. — all at score 0.71. The actual Engine/Runtime/Engine/Classes/GameFramework/Actor.h is absent.

Works for longer/unique names: find_file(filename="DiscoveryHealthComponent", project="Discovery") correctly returns the right file at 0.7.

find_type — prefix-stripped match lost in substring noise

Search: find_type(name="Pawn", fuzzy=true, project="Engine", maxResults=10)

Expected: APawn as a top result (prefix-stripped exact match: A + Pawn).

Actual: 10 Spawn-related types, APawn is completely absent:

FAITestSpawnInfoBase      (0.91, substring)
FPendingDelayedSpawn      (0.91, substring)
FAITestSpawnSetBase       (0.91, substring)
FDelayedVisualizerSpawner (0.91, substring)
FNodeSpawnData            (0.91, substring)
...

Note: Exact search find_type(name="APawn") works correctly — this is specifically a fuzzy scoring issue.

Contrast with working fuzzy searches: find_type(name="Health", fuzzy=true, project="Discovery") correctly ranks the exact namespace "Health" at 1.03 and prefix-stripped UHealthComponent at 0.98.

Root Cause

The scoring doesn't differentiate between:

  • Meaningful match: The filename/type IS the search term (exact or prefix-stripped) — e.g., Pawn.h, APawn
  • Incidental substring: The search term appears buried inside a different word — e.g., "pawn" inside "Spawner"

Short search terms (4-5 chars) like "Pawn" and "Actor" are especially affected because they commonly appear as substrings within longer unrelated words.

Suggested Fix

A word-boundary or whole-word bonus in the scoring algorithm would likely fix both issues. For example:

  • If the search term matches a complete word/segment in the filename (delimited by path separators, underscores, or PascalCase boundaries), it should score significantly higher than a substring match within a word.
  • Pawn.h matching Pawn.h (whole filename) >> BlueprintNodeSpawner.h (substring within "Spawner")
  • Pawn matching APawn (prefix-stripped) >> FAITestSpawnInfoBase (substring within "Spawn")

Environment

  • Tested: 2026-02-08
  • All other tools (find_member, grep, find_asset, find_children, browse_module, list_modules) have correct scoring behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions