Skip to content

Commit d43a5f2

Browse files
andrewdalpinoElGigimcharytoniuk
authored
2.5 (#336)
* Initial commit (#299) * Vantage Tree (#300) * Initial commit * Better testing * Improve the docs * Rename benchmark * Explicitly import max() function * Fix coding style * Wrapper interface (#314) * Add Wrapper interface for models wrappers * Add WrapperAware trait * Fix PhpDoc * Revert "Add WrapperAware trait" This reverts commit 241abc4. * Rename Wrapper interface to EstimatorWrapper * PHP CS fix * Swoole Backend (#312) * add Swoole backend * phpstan: ignore swoole * feat: swoole process scheduler * fix(swoole): redo tasks when hash collision happens * chore(swoole): make sure coroutines are at the root of the scheduler * chore(swoole): set affinity / bind worker to a specific CPU core * chore(swoole): use igbinary if available * fix: remove comment * fix(swoole): worker cpu affinity * fix(swoole): cpu num * feat: scheduler improvements * style * chore(swoole): remove unnecessary atomics * chore(swoole): php backwards compatibility * fix: phpstan, socket message size * fix: uncomment test * style: composer fix * Plus plus check (#317) * Initial commit * Allow deltas in units tests * Swoole docs (#326) * add Swoole backend * phpstan: ignore swoole * feat: swoole process scheduler * fix(swoole): redo tasks when hash collision happens * chore(swoole): make sure coroutines are at the root of the scheduler * chore(swoole): set affinity / bind worker to a specific CPU core * chore(swoole): use igbinary if available * fix: remove comment * fix(swoole): worker cpu affinity * fix(swoole): cpu num * feat: scheduler improvements * style * chore(swoole): remove unnecessary atomics * chore(swoole): php backwards compatibility * fix: phpstan, socket message size * fix: uncomment test * style: composer fix * docs: Swoole backend * Fix coding style and composer.lock * fix(swoole): setAffinity does not exist on some versions of Swoole (#327) * Back out Swoole Backend code * Bump version --------- Co-authored-by: Ronan Giron <[email protected]> Co-authored-by: Mateusz Charytoniuk <[email protected]>
1 parent 696a2f6 commit d43a5f2

32 files changed

+926
-39
lines changed

CHANGELOG.md

+7
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
- 2.5.0
2+
- Added Vantage Point Spatial tree
3+
- Blob Generator can now `simulate()` a Dataset object
4+
- Added Wrapper interface
5+
- Plus Plus added check for min number of sample seeds
6+
- LOF prevent div by 0 local reachability density
7+
18
- 2.4.1
29
- Sentence Tokenizer fix Arabic and Farsi language support
310
- Optimize online variance updating
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
<?php
2+
3+
namespace Rubix\ML\Benchmarks\Graph\Trees;
4+
5+
use Rubix\ML\Graph\Trees\VantageTree;
6+
use Rubix\ML\Datasets\Generators\Blob;
7+
use Rubix\ML\Datasets\Generators\Agglomerate;
8+
9+
/**
10+
* @Groups({"Trees"})
11+
* @BeforeMethods({"setUp"})
12+
*/
13+
class VantageTreeBench
14+
{
15+
protected const DATASET_SIZE = 10000;
16+
17+
/**
18+
* @var \Rubix\ML\Datasets\Labeled;
19+
*/
20+
protected $dataset;
21+
22+
/**
23+
* @var VantageTree
24+
*/
25+
protected $tree;
26+
27+
public function setUp() : void
28+
{
29+
$generator = new Agglomerate([
30+
'Iris-setosa' => new Blob([5.0, 3.42, 1.46, 0.24], [0.35, 0.38, 0.17, 0.1]),
31+
'Iris-versicolor' => new Blob([5.94, 2.77, 4.26, 1.33], [0.51, 0.31, 0.47, 0.2]),
32+
'Iris-virginica' => new Blob([6.59, 2.97, 5.55, 2.03], [0.63, 0.32, 0.55, 0.27]),
33+
]);
34+
35+
$this->dataset = $generator->generate(self::DATASET_SIZE);
36+
37+
$this->tree = new VantageTree(30);
38+
}
39+
40+
/**
41+
* @Subject
42+
* @Iterations(3)
43+
* @OutputTimeUnit("seconds", precision=3)
44+
*/
45+
public function grow() : void
46+
{
47+
$this->tree->grow($this->dataset);
48+
}
49+
}

composer.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@
7979
"@test",
8080
"@check"
8181
],
82-
"analyze": "phpstan analyse -c phpstan.neon",
82+
"analyze": "phpstan analyse -c phpstan.neon --memory-limit 1G",
8383
"benchmark": "phpbench run --report=aggregate",
8484
"check": [
8585
"@putenv PHP_CS_FIXER_IGNORE_ENV=1",

docs/datasets/generators/blob.md

+10-2
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,16 @@ A normally distributed (Gaussian) n-dimensional blob of samples centered at a gi
1717
```php
1818
use Rubix\ML\Datasets\Generators\Blob;
1919

20-
$generator = new Blob([-1.2, -5., 2.6, 0.8, 10.], 0.25);
20+
$generator = new Blob([-1.2, -5.0, 2.6, 0.8, 10.0], 0.25);
2121
```
2222

2323
## Additional Methods
24-
This generator does not have any additional methods.
24+
Fit a Blob generator to the samples in a dataset.
25+
```php
26+
public static simulate(Dataset $dataset) : self
27+
```
28+
29+
Return the center coordinates of the Blob.
30+
```php
31+
public center() : array
32+
```

docs/graph/trees/vantage-tree.md

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
<span style="float:right;"><a href="https://github.com/RubixML/ML/blob/master/src/Graph/Trees/VPTree.php">[source]</a></span>
2+
3+
# Vantage Tree
4+
A Vantage Point Tree is a binary spatial tree that divides samples by their distance from the center of a cluster called the *vantage point*. Samples that are closer to the vantage point will be put into one branch of the tree while samples that are farther away will be put into the other branch.
5+
6+
**Interfaces:** Binary Tree, Spatial
7+
8+
**Data Type Compatibility:** Depends on distance kernel
9+
10+
## Parameters
11+
| # | Param | Default | Type | Description |
12+
|---|---|---|---|---|
13+
| 1 | max leaf size | 30 | int | The maximum number of samples that each leaf node can contain. |
14+
| 2 | kernel | Euclidean | Distance | The distance kernel used to compute the distance between sample points. |
15+
16+
## Example
17+
```php
18+
use Rubix\ML\Graph\Trees\VantageTree;
19+
use Rubix\ML\Kernels\Distance\Euclidean;
20+
21+
$tree = new VantageTree(30, new Euclidean());
22+
```
23+
24+
## Additional Methods
25+
This tree does not have any additional methods.
26+
27+
### References
28+
>- P. N. Yianilos. (1993). Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces.

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,7 @@ nav:
201201
- Trees:
202202
- Ball Tree: graph/trees/ball-tree.md
203203
- K-d Tree: graph/trees/k-d-tree.md
204+
- Vantage Tree: graph/trees/vantage-tree.md
204205
- Kernels:
205206
- Distance:
206207
- Canberra: kernels/distance/canberra.md

phpunit.xml

+14-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
<?xml version="1.0" encoding="UTF-8"?>
2-
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" backupGlobals="false" backupStaticAttributes="false" bootstrap="vendor/autoload.php" colors="true" convertErrorsToExceptions="true" convertNoticesToExceptions="true" convertWarningsToExceptions="true" forceCoversAnnotation="true" processIsolation="false" stopOnFailure="false" xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/9.3/phpunit.xsd">
2+
<phpunit
3+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
4+
backupGlobals="false"
5+
backupStaticAttributes="false"
6+
bootstrap="vendor/autoload.php"
7+
colors="true"
8+
convertErrorsToExceptions="true"
9+
convertNoticesToExceptions="true"
10+
convertWarningsToExceptions="true"
11+
forceCoversAnnotation="true"
12+
processIsolation="true"
13+
stopOnFailure="false"
14+
xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/9.3/phpunit.xsd"
15+
>
316
<coverage processUncoveredFiles="true">
417
<include>
518
<directory suffix=".php">src</directory>

src/Clusterers/Seeders/PlusPlus.php

+7
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
use Rubix\ML\Kernels\Distance\Distance;
77
use Rubix\ML\Kernels\Distance\Euclidean;
88
use Rubix\ML\Specifications\DatasetIsNotEmpty;
9+
use Rubix\ML\Exceptions\RuntimeException;
910

1011
use function count;
1112

@@ -49,12 +50,18 @@ public function __construct(?Distance $kernel = null)
4950
*
5051
* @param Dataset $dataset
5152
* @param int $k
53+
* @throws RuntimeException
5254
* @return list<list<string|int|float>>
5355
*/
5456
public function seed(Dataset $dataset, int $k) : array
5557
{
5658
DatasetIsNotEmpty::with($dataset)->check();
5759

60+
if ($k > $dataset->numSamples()) {
61+
throw new RuntimeException("Cannot seed $k clusters with only "
62+
. $dataset->numSamples() . ' samples.');
63+
}
64+
5865
$centroids = $dataset->randomSubset(1)->samples();
5966

6067
while (count($centroids) < $k) {

src/Datasets/Generators/Blob.php

+42
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,14 @@
44

55
use Tensor\Matrix;
66
use Tensor\Vector;
7+
use Rubix\ML\DataType;
8+
use Rubix\ML\Helpers\Stats;
9+
use Rubix\ML\Datasets\Dataset;
710
use Rubix\ML\Datasets\Unlabeled;
811
use Rubix\ML\Exceptions\InvalidArgumentException;
912

1013
use function count;
14+
use function sqrt;
1115

1216
/**
1317
* Blob
@@ -37,6 +41,34 @@ class Blob implements Generator
3741
*/
3842
protected $stdDev;
3943

44+
/**
45+
* Fit a Blob generator to the samples in a dataset.
46+
*
47+
* @param Dataset $dataset
48+
* @throws InvalidArgumentException
49+
* @return self
50+
*/
51+
public static function simulate(Dataset $dataset) : self
52+
{
53+
$features = $dataset->featuresByType(DataType::continuous());
54+
55+
if (count($features) !== $dataset->numFeatures()) {
56+
throw new InvalidArgumentException('Dataset must only contain'
57+
. ' continuous features.');
58+
}
59+
60+
$means = $stdDevs = [];
61+
62+
foreach ($features as $values) {
63+
[$mean, $variance] = Stats::meanVar($values);
64+
65+
$means[] = $mean;
66+
$stdDevs[] = sqrt($variance);
67+
}
68+
69+
return new self($means, $stdDevs);
70+
}
71+
4072
/**
4173
* @param (int|float)[] $center
4274
* @param int|float|(int|float)[] $stdDev
@@ -74,6 +106,16 @@ public function __construct(array $center = [0, 0], $stdDev = 1.0)
74106
$this->stdDev = $stdDev;
75107
}
76108

109+
/**
110+
* Return the center coordinates of the Blob.
111+
*
112+
* @return list<int|float>
113+
*/
114+
public function center() : array
115+
{
116+
return $this->center->asArray();
117+
}
118+
77119
/**
78120
* Return the dimensionality of the data this generates.
79121
*

src/EstimatorWrapper.php

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<?php
2+
3+
namespace Rubix\ML;
4+
5+
/**
6+
* Wrapper
7+
*
8+
* @category Machine Learning
9+
* @package Rubix/ML
10+
* @author Ronan Giron
11+
*/
12+
interface EstimatorWrapper extends Estimator
13+
{
14+
/**
15+
* Return the base estimator instance.
16+
*
17+
* @return Estimator
18+
*/
19+
public function base() : Estimator;
20+
}

0 commit comments

Comments
 (0)