Skip to content

Wraps the charset detection logic from StormCrawler as a Tika module

License

Notifications You must be signed in to change notification settings

tballison/tika-detector-stormcrawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tika-detector-stormcrawler

Wraps the charset detection logic from StormCrawler as a Tika module

Has 2 configs:

  • fastMethod (false)
  • maxLength (0 unlimited)

Needs configuring in tika-config.xml

<encodingDetectors> 
  <encodingDetector class="com.digitalpebble.tika.detect.SCCharsetDetector"/> 
</encodingDetectors>

About

Wraps the charset detection logic from StormCrawler as a Tika module

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 100.0%