It'd be nice to have the ability to generate hocr output when running ocr via tesseract. I have a patch and will send the pull request.