Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark OCR #54

Open
asismohanty81 opened this issue Sep 29, 2021 · 6 comments
Open

Spark OCR #54

asismohanty81 opened this issue Sep 29, 2021 · 6 comments

Comments

@asismohanty81
Copy link

This is regarding an error we are facing while invoking the Table-detection model from Spark OCR. Looks like a known error but didn’t find much concrete solution from the issues’ logs.
Probably has to do with compatibility of the versions - tried Spark OCR 3.8 as suggested but ended up getting the same issue. Could you advise further?

binary_to_image.setImageType(ImageType.TYPE_3BYTE_BGR)
table_detector = ImageTableDetector.pretrained("general_model_table_detection_v2", "en", "clinical/ocr").setInputCol("image").setOutputCol("region")

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.ocr.OcrPythonResourceDownloader.getDownloadSize.
: java.lang.NoClassDefFoundError: Could not initialize class com.johnsnowlabs.ocr.OcrPythonResourceDownloader$
at com.johnsnowlabs.ocr.OcrPythonResourceDownloader.getDownloadSize(OcrPythonResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

Further details in the attached -
Model_loading_issue.docx

@xyutech
Copy link
Contributor

xyutech commented Sep 29, 2021

Hello,
May you share which version of spark-nlp you use?

@asismohanty81
Copy link
Author

asismohanty81 commented Sep 29, 2021 via email

@xyutech
Copy link
Contributor

xyutech commented Sep 29, 2021

Hello Asis,

Thank you for information.
May you make sure your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are valid?

@xyutech
Copy link
Contributor

xyutech commented Sep 29, 2021

Also it'd be helpful to see output of
spark
invocation. Something like

Spark version: 3.0.2
Spark NLP version: 3.0.1
Spark OCR version: 3.7.0

@jigsawcoder
Copy link

jigsawcoder commented Oct 13, 2021

I am facing similar issue while using the below example in Google Colab:
https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/master/jupyter/SparkOcrImageTableRecognitionPdf.ipynb

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.ocr.OcrPythonResourceDownloader.getDownloadSize.
: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 3G6894YVGPKFHYRC; S3 Extended Request ID: cxQQE9B6i8HgmWAlJ72zulORmmV9ACK71mMXticDwDEoVHXgV/VU0yAMlsi/hvWTMqBXmxi2tXI=), S3 Extended Request ID: cxQQE9B6i8HgmWAlJ72zulORmmV9ACK71mMXticDwDEoVHXgV/VU0yAMlsi/hvWTMqBXmxi2tXI=
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1467)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1326)
at com.johnsnowlabs.client.AWSGateway.getMetadata(AWSGateway.scala:112)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:62)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:68)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:145)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:378)
at com.johnsnowlabs.ocr.OcrPythonResourceDownloader$.getDownloadSize(OcrPythonResourceDownloader.scala:23)
at com.johnsnowlabs.ocr.OcrPythonResourceDownloader.getDownloadSize(OcrPythonResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

While using ImageTableDetector.pretrained("general_model_table_detection_v2", "en", "clinical/ocr")

I am using 30 day trial version right now and AWS access key and secret key is 'Null'
So I am passing a black string.
AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''

Version detail:

Spark version: 2.4.7
Spark OCR version: 3.8.0

SparkSession - in-memory

SparkContext

Spark UI

Version
v2.4.7
Master
local[*]
AppName
Spark OCR

@mykolamelnykml
Copy link
Contributor

Hello @jigsawcoder . Did you receive aws credentials in email with license key? If not please contact to the customer support or public slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants