Skip to content

Conversation

@jmarble
Copy link

@jmarble jmarble commented Oct 23, 2025

The Bug

array_unique() with SORT_REGULAR was failing to remove duplicate numeric strings when mixed with alphanumeric strings:

$units = ['5', '10', '3A', '5'];
array_unique($units, SORT_REGULAR);
// Before: ['5', '10', '3A', '5'] - duplicate '5' not removed 
// After:  ['5', '10', '3A']      - works correctly 

Root Cause

Non-transitive comparisons in SORT_REGULAR (where '5' == 5 and 5 != '5abc' but '5' < '5abc') broke the sort-based algorithm's assumption that sorting would group duplicates adjacently.

Solution

Implemented type-optimized hybrid approach for SORT_REGULAR:

Three-tier algorithm selection:

  1. Integer-only arrays: O(N) hash table for optimal performance
  2. Arrays/Objects present: O(N log N) sort-based deduplication (existing algorithm)
  3. Mixed scalar types: O(N) hash bucketing with type-aware hashing and SORT_REGULAR comparison

Performance

Improved performance across all data types compared to PHP 8.4.13 while fixing correctness issues.

Backward Compatibility

  • 100% BC compatible - preserves all existing coercion behavior
  • Uses identical comparison functions as current implementation
  • All existing tests pass (834/834)

Tests Added

  • gh20262.phpt - Minimal regression test for the bug
  • array_unique_variation_sort_regular.phpt - Comprehensive SORT_REGULAR behavior coverage (16 scenarios)

@jmarble jmarble requested a review from bukka as a code owner October 23, 2025 23:06
@jmarble jmarble marked this pull request as draft October 23, 2025 23:35
@jmarble jmarble force-pushed the gh20262-array-unique-sort-regular branch from 4102fe6 to f60a45f Compare October 23, 2025 23:55
…h mixed strings

array_unique() with SORT_REGULAR was failing to remove duplicate numeric
strings when mixed with alphanumeric strings due to non-transitive
comparison issues in the sort-based algorithm.

Implemented hash-bucketing optimization for SORT_REGULAR that preserves
full type coercion semantics while improving performance from O(n²) to O(n).

Closes phpGH-20262
@jmarble jmarble force-pushed the gh20262-array-unique-sort-regular branch from f60a45f to c4fc6a9 Compare October 24, 2025 01:11
@jmarble jmarble marked this pull request as ready for review October 24, 2025 01:48
@jmarble jmarble marked this pull request as draft October 24, 2025 04:38
- Eliminate Hash-DoS and integer overflow vulnerabilities
- Add exception handling to prevent memory leaks
- Implement type-specific optimizations (hash/sort/bucket)
- Dynamic bucket sizing for memory efficiency
- Fix resource handling
Use ZVAL_DEREF macro to dereference values before comparing in the
array_unique function. Also add a new test case to verify the behavior.
@jmarble jmarble requested review from devnexen and ndossche October 25, 2025 03:20
@jmarble jmarble requested a review from devnexen October 25, 2025 16:14
@jmarble
Copy link
Author

jmarble commented Oct 26, 2025

Another variation of the example bug, but using objects:
https://3v4l.org/5kr1t

My PR does not make an attempt to resolve this issue with complex types.

@jmarble
Copy link
Author

jmarble commented Oct 28, 2025

Closing in favor of #20305 which provides a complete solution that fixes transitivity for all mixed types (scalars, nested arrays, and objects).

@jmarble jmarble closed this Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants