Skip to content

Commit 713366a

Browse files
committed
Merge branch '1.x'
2 parents 196cb40 + 54266a8 commit 713366a

File tree

8 files changed

+5890
-11
lines changed

8 files changed

+5890
-11
lines changed

.github/workflows/tests.yml

+86
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
name: tests
2+
3+
on:
4+
push:
5+
branches:
6+
- 1.x
7+
8+
jobs:
9+
tests:
10+
runs-on: ubuntu-latest
11+
strategy:
12+
matrix:
13+
php:
14+
- '7.3'
15+
- '7.4'
16+
- '8.0'
17+
- '8.1'
18+
tika:
19+
- '1.15'
20+
- '1.16'
21+
- '1.17'
22+
- '1.18'
23+
- '1.19'
24+
- '1.19.1'
25+
- '1.20'
26+
- '1.21'
27+
- '1.22'
28+
- '1.23'
29+
- '1.24'
30+
- '1.24.1'
31+
- '1.25'
32+
- '1.26'
33+
- '1.27'
34+
- '1.28'
35+
- '1.28.1'
36+
- '2.0.0'
37+
- '2.1.0'
38+
- '2.2.0'
39+
- '2.2.1'
40+
- '2.3.0'
41+
42+
name: PHP ${{ matrix.php }} - TIKA ${{ matrix.tika }}
43+
44+
steps:
45+
- name: Checkout code
46+
uses: actions/checkout@v2
47+
48+
- name: Cache Apache Tika binaries
49+
uses: actions/cache@v1
50+
with:
51+
path: bin
52+
key: binaries-apache-tika
53+
54+
- name: Cache PHP dependencies
55+
uses: actions/cache@v1
56+
with:
57+
path: vendor
58+
key: dependencies-php-${{ matrix.php }}-composer-${{ hashFiles('**/composer.lock') }}
59+
60+
- name: Setup environment
61+
run: sudo apt-get -y install tesseract-ocr
62+
63+
- name: Setup Java
64+
uses: actions/setup-java@v3
65+
with:
66+
distribution: 'temurin'
67+
java-version: '8'
68+
69+
- name: Setup PHP
70+
uses: shivammathur/setup-php@v2
71+
with:
72+
php-version: ${{ matrix.php }}
73+
extensions: curl, dom, gd, json, libxml, mbstring, zip
74+
coverage: none
75+
76+
- name: Install dependencies
77+
run: composer install --no-ansi --no-interaction --no-scripts --no-progress --prefer-dist
78+
79+
- name: Download Apache Tika binaries
80+
run: APACHE_TIKA_VERSION=${{ matrix.tika }} scripts/download.sh
81+
82+
- name: Start Apache Tika server
83+
run: APACHE_TIKA_VERSION=${{ matrix.tika }} scripts/spawn.sh
84+
85+
- name: Execute tests
86+
run: APACHE_TIKA_VERSION=${{ matrix.tika }} vendor/bin/phpunit --verbose

.gitignore

-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
.phpunit.result.cache
22

33
bin
4-
composer.lock
54
reports
65
vendor

README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
[![Current release](https://img.shields.io/github/release/vaites/php-apache-tika.svg)](https://github.com/vaites/php-apache-tika/releases/latest)
22
[![Package at Packagist](https://img.shields.io/packagist/dt/vaites/php-apache-tika.svg)](https://packagist.org/packages/vaites/php-apache-tika)
3+
[![Build status](https://img.shields.io/github/workflow/status/vaites/php-apache-tika/tests/1.x)](https://github.com/vaites/php-apache-tika/actions)
34
[![Code coverage](https://img.shields.io/codecov/c/github/vaites/php-apache-tika.svg)](https://codecov.io/github/vaites/php-apache-tika)
45
[![Code quality](https://img.shields.io/scrutinizer/quality/g/vaites/php-apache-tika.svg)](https://scrutinizer-ci.com/g/vaites/php-apache-tika/)
5-
[![Code insight](https://img.shields.io/sensiolabs/i/ec066502-0fde-4455-9fc3-8e9fe6867834.svg)](https://insight.sensiolabs.com/projects/ec066502-0fde-4455-9fc3-8e9fe6867834)
6+
[![Code insight](https://img.shields.io/sensiolabs/i/92852e11-8648-4d48-9698-653aee765df5.svg)](https://insight.sensiolabs.com/projects/92852e11-8648-4d48-9698-653aee765df5)
67
[![License](https://img.shields.io/github/license/vaites/php-apache-tika.svg?color=%23999999)](https://github.com/vaites/php-apache-tika/blob/master/LICENSE)
78

89
# PHP Apache Tika
@@ -303,6 +304,7 @@ There are a few samples to test against:
303304

304305
There are some issues found during tests, not related with this library:
305306

307+
* Apache Tika 1.17 and lower can't extract text from OCR as described in [TIKA-2509](https://issues.apache.org/jira/browse/TIKA-2509)
306308
* Tesseract slows down document parsing as described in [TIKA-2359](https://issues.apache.org/jira/browse/TIKA-2359)
307309

308310
## Integrations

0 commit comments

Comments
 (0)