Skip to content

Commit 0d43151

Browse files
committed
Added method documentation to README.md
1 parent 50d52d7 commit 0d43151

File tree

1 file changed

+61
-16
lines changed

1 file changed

+61
-16
lines changed

README.md

+61-16
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,7 @@
55
[![Code insight](https://img.shields.io/sensiolabs/i/ec066502-0fde-4455-9fc3-8e9fe6867834.svg)](https://insight.sensiolabs.com/projects/ec066502-0fde-4455-9fc3-8e9fe6867834)
66
[![License](https://img.shields.io/github/license/vaites/php-apache-tika.svg)](https://github.com/vaites/php-apache-tika/blob/master/LICENSE)
77

8-
PHP Apache Tika
9-
===============
8+
# PHP Apache Tika
109

1110
This tool provides [Apache Tika](https://tika.apache.org) bindings for PHP, allowing to extract text and metadata
1211
from documents, images and other formats.
@@ -21,8 +20,7 @@ Although the library contains a list of supported versions, any version of Apach
2120
backward compatibility is maintained by Tika team. Therefore, it is not necessary to wait for an update of the library
2221
to work with the new versions of the tool.
2322

24-
Features
25-
--------
23+
## Features
2624

2725
* Simple class interface to Apache Tika features:
2826
* Text and HTML extraction
@@ -34,8 +32,7 @@ Features
3432
* Compatible with Apache Tika 1.7 or greater
3533
* Tested up to 1.18
3634

37-
Requirements
38-
------------
35+
## Requirements
3936

4037
* PHP 5.4 or greater
4138
* [Multibyte String support](http://php.net/manual/en/book.mbstring.php)
@@ -46,8 +43,7 @@ Requirements
4643
* Java 7 for Tika 1.10 or greater
4744
* [Tesseract](https://github.com/tesseract-ocr/tesseract) (optional for OCR recognition)
4845

49-
Installation
50-
------------
46+
## Installation
5147

5248
Install using Composer:
5349

@@ -61,8 +57,7 @@ If you want to use OCR you must install [Tesseract](https://github.com/tesseract
6157

6258
The library assumes `tesseract` binary is in path, so you can compile it yourself or install using any other method.
6359

64-
Usage
65-
-----
60+
## Usage
6661

6762
Start Apache Tika server with [caution](http://www.openwall.com/lists/oss-security/2015/08/13/5):
6863

@@ -95,9 +90,61 @@ Or use to extract text from images:
9590
You can use an URL instead of a file path and the library will download the file and pass it to Apache Tika. There's
9691
**no need** to add `-enableUnsecureFeatures -enableFileUrl` to command line when starting the server, as described
9792
[here](https://wiki.apache.org/tika/TikaJAXRS#Specifying_a_URL_Instead_of_Putting_Bytes).
93+
94+
### Methods
95+
96+
Tika related methods:
97+
98+
$client->getMetadata($file);
99+
$client->getLanguage($file);
100+
$client->getMIME($file);
101+
$client->getHTML($file);
102+
$client->getText($file);
103+
$client->getMainText($file);
104+
105+
Get the version of current Tika app/server:
106+
107+
$client->getVersion();
108+
109+
Get the full list of Apacke Tika supported versions:
110+
111+
$client->getSupportedVersions();
112+
113+
Set/get a callback for sequential read of response:
114+
115+
$client->setCallback($callback);
116+
$client->getCallback();
117+
118+
Set/get the chunk size for secuential read:
119+
120+
$client->setChunkSize($size);
121+
$client->getChunkSize();
122+
123+
Set/get JAR/Java paths (only CLI mode):
124+
125+
$client->setPath($path);
126+
$client->getPath();
127+
128+
$client->setJava($java);
129+
$client->getJava();
130+
131+
Set/get host properties (only server mode):
132+
133+
$client->setHost($host);
134+
$client->getHost();
135+
136+
$client->setPort($port);
137+
$client->getPort();
138+
139+
$client->setRetries($retries);
140+
$client->getRetries();
141+
142+
Set/get [cURL client options](http://php.net/manual/en/function.curl-setopt.php) (only server mode):
143+
144+
$client->setOptions($options);
145+
$client->getOptions();
98146

99-
Tests
100-
-----
147+
## Tests
101148

102149
Tests are designed to **cover all features for all supported versions** of Apache Tika in app mode and server mode.
103150
There are a few samples to test against:
@@ -108,16 +155,14 @@ There are a few samples to test against:
108155
* **sample4**: unsupported media
109156
* **sample5**: huge text for callbacks
110157

111-
Issues
112-
------------
158+
## Issues
113159

114160
There are some issues found during tests, not related with this library:
115161

116162
* 1.9 version running Java 7 on server mode throws random error 500 (*Unexpected RuntimeException*)
117163
* 1.14 version on server mode throws random errors (*Expected ';', got ','*) when parsing image metadata
118164
* Tesseract slows down document parsing as described in [TIKA-2359](https://issues.apache.org/jira/browse/TIKA-2359)
119165

120-
Integrations
121-
-----
166+
## Integrations
122167

123168
- [Symfony2 Bundle](https://github.com/welcoMattic/ApacheTikaBundle)

0 commit comments

Comments
 (0)