Add partitioned(by:) (#152)

mdznr · xwu · natecook1000 · web-flow · commit e2fa131d6260 · 2021-10-20T22:21:39.000+02:00
* Add `partitioned(_:)` `partitioned(_:)` works like `filter(_:)`, but also returns the excluded elements by returning a tuple of two `Array`s * For collections with fewer than 8 elements, use the `Sequence`-based implementation This constant was determined using benchmarking. More information: #152 (comment) * Remove check for collections fewer than 8 elements * Make `_partitioned` `internal` * Prefer `Array` over `ContiguousArray` * Document `partitioned(_:)` on `Collection` * Remove `partitioned(upTo:)` * Remove `_tupleMap` * Remove `_partitioned` and use it inline (since it’s no longer used by the `Collection` implementation * Remove unnecessary conversation of `Array` to `Array` * Correct indentation Co-authored-by: Xiaodi Wu <13952+xwu@users.noreply.github.com> * Consistent syntax Co-authored-by: Xiaodi Wu <13952+xwu@users.noreply.github.com> * Add an external `by:` label to `partitioned` * Add labels to returned tuple `falseElements`, `trueElements` * Correct function signature * Rename `belongsInSecondCollection` parameter name to simply `predicate` The parameter name was potentially confusing. Unlike the other `partition` functions, this function can rely on its named tuple to clarify its behavior. * Update copyright information Co-authored-by: Nate Cook <natecook@apple.com> * Update documentation Co-authored-by: Nate Cook <natecook@apple.com> * Update comment * Add a precondition to ensure that the count matches up with the number of actual elements found while iterating Co-authored-by: Xiaodi Wu <13952+xwu@users.noreply.github.com> Co-authored-by: Nate Cook <natecook@apple.com>
diff --git a/Guides/Partition.md b/Guides/Partition.md
@@ -42,6 +42,20 @@ let p = numbers.partitioningIndex(where: { $0.isMultiple(of: 20) })
 // numbers[p...] = [20, 40, 60]
 ```
 
+The standard library’s existing `filter(_:)` method provides functionality to
+get the elements that do match a given predicate. `partitioned(by:)` returns
+both the elements that match the predicate as well as those that don’t, as a
+tuple.
+
+```swift
+let cast = ["Vivien", "Marlon", "Kim", "Karl"]
+let (longNames, shortNames) = cast.partitioned(by: { $0.count < 5 })
+print(longNames)
+// Prints "["Vivien", "Marlon"]"
+print(shortNames)
+// Prints "["Kim", "Karl"]"
+```
+
 ## Detailed Design
 
 All mutating methods are declared as extensions to `MutableCollection`.
@@ -69,11 +83,17 @@ extension Collection {
         where belongsInSecondPartition: (Element) throws -> Bool
     ) rethrows -> Index
 }
+
+extension Sequence {
+    public func partitioned(
+        by predicate: (Element) throws -> Bool
+    ) rethrows -> (falseElements: [Element], trueElements: [Element])
+}
 ```
 
 ### Complexity
 
-The existing partition is an O(_n_) operations, where _n_ is the length of the
+The existing partition is an O(_n_) operation, where _n_ is the length of the
 range to be partitioned, while the stable partition is O(_n_ log _n_). Both
 partitions have algorithms with improved performance for bidirectional
 collections, so it would be ideal for those to be customization points were they
@@ -82,6 +102,9 @@ to eventually land in the standard library.
 `partitioningIndex(where:)` is a slight generalization of a binary search, and
 is an O(log _n_) operation for random-access collections; O(_n_) otherwise.
 
+`partitioned(by:)` is an O(_n_) operation, where _n_ is the number of elements
+in the original sequence.
+
 ### Comparison with other languages
 
 **C++:** The `<algorithm>` library defines `partition`, `stable_partition`, and
diff --git a/README.md b/README.md
@@ -27,6 +27,7 @@ Read more about the package, and the intent behind it, in the [announcement on s
 #### Subsetting operations
 
 - [`compacted()`](https://github.com/apple/swift-algorithms/blob/main/Guides/Compacted.md): Drops the `nil`s from a sequence or collection, unwrapping the remaining elements.
+- [`partitioned(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Partition.md): Returns the elements in a sequence or collection that do and do not match a given predicate.
 - [`randomSample(count:)`, `randomSample(count:using:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/RandomSampling.md): Randomly selects a specific number of elements from a collection.
 - [`randomStableSample(count:)`, `randomStableSample(count:using:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/RandomSampling.md): Randomly selects a specific number of elements from a collection, preserving their original relative order.
 - [`striding(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Stride.md): Returns every nth element of a collection.
diff --git a/Sources/Algorithms/Partition.swift b/Sources/Algorithms/Partition.swift
@@ -2,7 +2,7 @@
 //
 // This source file is part of the Swift Algorithms open source project
 //
-// Copyright (c) 2020 Apple Inc. and the Swift project authors
+// Copyright (c) 2020-2021 Apple Inc. and the Swift project authors
 // Licensed under Apache License v2.0 with Runtime Library Exception
 //
 // See https://swift.org/LICENSE.txt for license information
@@ -204,3 +204,137 @@ extension Collection {
   }
 }
 
+//===----------------------------------------------------------------------===//
+// partitioned(by:)
+//===----------------------------------------------------------------------===//
+
+extension Sequence {
+  /// Returns two arrays containing, in order, the elements of the sequence that
+  /// do and don’t satisfy the given predicate.
+  ///
+  /// In this example, `partitioned(by:)` is used to separate the input based on
+  /// whether a name is shorter than five characters:
+  ///
+  ///     let cast = ["Vivien", "Marlon", "Kim", "Karl"]
+  ///     let (longNames, shortNames) = cast.partitioned(by: { $0.count < 5 })
+  ///     print(longNames)
+  ///     // Prints "["Vivien", "Marlon"]"
+  ///     print(shortNames)
+  ///     // Prints "["Kim", "Karl"]"
+  ///
+  /// - Parameter predicate: A closure that takes an element of the sequence as
+  /// its argument and returns a Boolean value indicating whether the element
+  /// should be included in the second returned array. Otherwise, the element
+  /// will appear in the first returned array.
+  ///
+  /// - Returns: Two arrays with all of the elements of the receiver. The
+  /// first array contains all the elements that `predicate` didn’t allow, and
+  /// the second array contains all the elements that `predicate` allowed.
+  ///
+  /// - Complexity: O(*n*), where *n* is the length of the sequence.
+  @inlinable
+  public func partitioned(
+    by predicate: (Element) throws -> Bool
+  ) rethrows -> (falseElements: [Element], trueElements: [Element]) {
+    var lhs = [Element]()
+    var rhs = [Element]()
+    
+    for element in self {
+      if try predicate(element) {
+        rhs.append(element)
+      } else {
+        lhs.append(element)
+      }
+    }
+    
+    return (lhs, rhs)
+  }
+}
+
+extension Collection {
+  /// Returns two arrays containing, in order, the elements of the collection
+  /// that do and don’t satisfy the given predicate.
+  ///
+  /// In this example, `partitioned(by:)` is used to separate the input based on
+  /// whether a name is shorter than five characters.
+  ///
+  ///     let cast = ["Vivien", "Marlon", "Kim", "Karl"]
+  ///     let (longNames, shortNames) = cast.partitioned(by: { $0.count < 5 })
+  ///     print(longNames)
+  ///     // Prints "["Vivien", "Marlon"]"
+  ///     print(shortNames)
+  ///     // Prints "["Kim", "Karl"]"
+  ///
+  /// - Parameter predicate: A closure that takes an element of the collection
+  /// as its argument and returns a Boolean value indicating whether the element
+  /// should be included in the second returned array. Otherwise, the element
+  /// will appear in the first returned array.
+  ///
+  /// - Returns: Two arrays with all of the elements of the receiver. The
+  /// first array contains all the elements that `predicate` didn’t allow, and
+  /// the second array contains all the elements that `predicate` allowed.
+  ///
+  /// - Complexity: O(*n*), where *n* is the length of the collection.
+  @inlinable
+  public func partitioned(
+    by predicate: (Element) throws -> Bool
+  ) rethrows -> (falseElements: [Element], trueElements: [Element]) {
+    guard !self.isEmpty else {
+      return ([], [])
+    }
+    
+    // Since collections have known sizes, we can allocate one array of size
+    // `self.count`, then insert items at the beginning or end of that contiguous
+    // block. This way, we don’t have to do any dynamic array resizing. Since we
+    // insert the right elements on the right side in reverse order, we need to
+    // reverse them back to the original order at the end.
+    
+    let count = self.count
+    
+    // Inside of the `initializer` closure, we set what the actual mid-point is.
+    // We will use this to partition the single array into two.
+    var midPoint: Int = 0
+    
+    let elements = try [Element](
+      unsafeUninitializedCapacity: count,
+      initializingWith: { buffer, initializedCount in
+        var lhs = buffer.baseAddress!
+        var rhs = lhs + buffer.count
+        do {
+          for element in self {
+            if try predicate(element) {
+              rhs -= 1
+              rhs.initialize(to: element)
+            } else {
+              lhs.initialize(to: element)
+              lhs += 1
+            }
+          }
+          
+          precondition(lhs == rhs, """
+            Collection's `count` differed from the number of elements iterated.
+            """
+          )
+          
+          let rhsIndex = rhs - buffer.baseAddress!
+          buffer[rhsIndex...].reverse()
+          initializedCount = buffer.count
+          
+          midPoint = rhsIndex
+        } catch {
+          let lhsCount = lhs - buffer.baseAddress!
+          let rhsCount = (buffer.baseAddress! + buffer.count) - rhs
+          buffer.baseAddress!.deinitialize(count: lhsCount)
+          rhs.deinitialize(count: rhsCount)
+          throw error
+        }
+      })
+    
+    let lhs = elements[..<midPoint]
+    let rhs = elements[midPoint...]
+    return (
+      Array(lhs),
+      Array(rhs)
+    )
+  }
+}
diff --git a/Tests/SwiftAlgorithmsTests/PartitionTests.swift b/Tests/SwiftAlgorithmsTests/PartitionTests.swift
@@ -133,4 +133,40 @@ final class PartitionTests: XCTestCase {
       }
     }
   }
+  
+  func testPartitionedWithEmptyInput() {
+    let input: [Int] = []
+    
+    let s0 = input.partitioned(by: { _ in return true })
+    
+    XCTAssertTrue(s0.0.isEmpty)
+    XCTAssertTrue(s0.1.isEmpty)
+  }
+  
+  /// Test the example given in the `partitioned(by:)` documentation
+  func testPartitionedExample() throws {
+    let cast = ["Vivien", "Marlon", "Kim", "Karl"]
+    let (longNames, shortNames) = cast.partitioned(by: { $0.count < 5 })
+    XCTAssertEqual(longNames, ["Vivien", "Marlon"])
+    XCTAssertEqual(shortNames, ["Kim", "Karl"])
+  }
+  
+  func testPartitionedWithPredicate() throws {
+    let s0 = ["A", "B", "C", "D"].partitioned(by: { $0 == $0.lowercased() })
+    let s1 = ["a", "B", "C", "D"].partitioned(by: { $0 == $0.lowercased() })
+    let s2 = ["a", "B", "c", "D"].partitioned(by: { $0 == $0.lowercased() })
+    let s3 = ["a", "B", "c", "d"].partitioned(by: { $0 == $0.lowercased() })
+    
+    XCTAssertEqual(s0.0, ["A", "B", "C", "D"])
+    XCTAssertEqual(s0.1, [])
+    
+    XCTAssertEqual(s1.0, ["B", "C", "D"])
+    XCTAssertEqual(s1.1, ["a"])
+    
+    XCTAssertEqual(s2.0, ["B", "D"])
+    XCTAssertEqual(s2.1, ["a", "c"])
+    
+    XCTAssertEqual(s3.0, ["B"])
+    XCTAssertEqual(s3.1, ["a", "c", "d"])
+  }
 }