Skip to content

Commit 54266a8

Browse files
committed
Skipped some tests for Apache Tika 1.17 and lower due to problems with Tesseract (see TIKA-2509)
1 parent a5d7cf2 commit 54266a8

File tree

3 files changed

+11
-3
lines changed

3 files changed

+11
-3
lines changed

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ jobs:
6060
- name: Setup environment
6161
run: sudo apt-get -y install tesseract-ocr
6262

63-
- name: Setup environment
63+
- name: Setup Java
6464
uses: actions/setup-java@v3
6565
with:
6666
distribution: 'temurin'

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,7 @@ There are a few samples to test against:
304304

305305
There are some issues found during tests, not related with this library:
306306

307+
* Apache Tika 1.17 and lower can't extract text from OCR as described in [TIKA-2509](https://issues.apache.org/jira/browse/TIKA-2509)
307308
* Tesseract slows down document parsing as described in [TIKA-2359](https://issues.apache.org/jira/browse/TIKA-2359)
308309

309310
## Integrations

tests/BaseTest.php

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -267,9 +267,16 @@ public function testImageMetadataHeight(string $file): void
267267
*/
268268
public function testImageOCR(string $file): void
269269
{
270-
$text = self::$client->getText($file);
270+
if(version_compare(self::$version, '1.18') >= 0)
271+
{
272+
$text = self::$client->getText($file);
271273

272-
$this->assertMatchesRegularExpression('/voluptate/i', $text);
274+
$this->assertMatchesRegularExpression('/voluptate/i', $text);
275+
}
276+
else
277+
{
278+
$this->markTestSkipped('Apache Tika 1.17 and lower can\'t find Tesseract binaries');
279+
}
273280
}
274281

275282
/**

0 commit comments

Comments
 (0)