Skip to content

Commit 18bd11a

Browse files
authored
Merge pull request #33 from relthyg/set_fetcher_name
Add option to set `fetcherName` for Tika >= 2.0.0
2 parents d0db71f + d760b8d commit 18bd11a

File tree

2 files changed

+34
-1
lines changed

2 files changed

+34
-1
lines changed

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,11 @@ You can use an URL instead of a file path and the library will download the file
122122
**no need** to add `-enableUnsecureFeatures -enableFileUrl` to command line when starting the server, as described
123123
[here](https://wiki.apache.org/tika/TikaJAXRS#Specifying_a_URL_Instead_of_Putting_Bytes).
124124

125+
If you use Apache Tika >= 2.0.0, you *can* [define an HttpFetcher](https://cwiki.apache.org/confluence/display/TIKA/tika-pipes)
126+
and use the option `-enableUnsecureFeatures -enableFileUrl` when starting the server to make the server download remote
127+
files when passing a URL instead of a filname to `$client->getText()`. In order to do so, you must set the name of
128+
the HttpFetcher using `$client->setFetcherName('yourFetcherName')`.
129+
125130
### Methods
126131

127132
Here are the full list of available methods
@@ -254,6 +259,12 @@ $client->setOCRLanguages($languages);
254259
$client->getOCRLanguages();
255260
```
256261

262+
Set HTTP fetcher name (for Tika >= 2.0.0 only, see https://cwiki.apache.org/confluence/display/TIKA/tika-pipes)
263+
264+
```php
265+
$client->setFetcherName($fetcherName)
266+
```
267+
257268
### Breaking changes
258269

259270
Since 1.0 version there are some breaking changes:

src/Clients/WebClient.php

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,13 @@ class WebClient extends Client
5151
*/
5252
protected $retries = 3;
5353

54+
/**
55+
* Name of the fetcher to be used (for Tika >= 2.0.0 only)
56+
*
57+
* @var string|null
58+
*/
59+
protected $fetcherName = null;
60+
5461
/**
5562
* Default cURL options
5663
*
@@ -208,6 +215,16 @@ public function setRetries(int $retries): self
208215
return $this;
209216
}
210217

218+
/**
219+
* Set the name of the fetcher to be used (for Tika >= 2.0.0 only)
220+
*/
221+
public function setFetcherName(string $fetcherName): self
222+
{
223+
$this->fetcherName = $fetcherName;
224+
225+
return $this;
226+
}
227+
211228
/**
212229
* Get all the options
213230
*/
@@ -626,7 +643,12 @@ protected function getParameters(string $type, string $file = null): array
626643

627644
if(!empty($file) && preg_match('/^http/', $file))
628645
{
629-
$headers[] = "fileUrl:$file";
646+
if($this->fetcherName) {
647+
$headers[] = "fetcherName:$this->fetcherName";
648+
$headers[] = "fetchKey:$file";
649+
} else {
650+
$headers[] = "fileUrl:$file";
651+
}
630652
}
631653

632654
switch($type)

0 commit comments

Comments
 (0)