John Snow Labs Library 4.2.0 Release
We are announcing with incredible excitement the release of the John Snow Labs 4.2.0 Library!
It introduces
- New Enterprise Syntax to easily access any feature of any JSL-Library.
- Highly configurable Automatic Installers with various Authorization Flows and Installation Targets like 1-Click OAUTH, 1 Line Databricks, 1 Line for new Enterprise Compatible venv and extended Offline support.
- Easily run a Python Function, Raw Python Code Snippet, Python Script or Python Module in a Databricks cluster in 1 line of code and create one if missing.
- Smart License/Jar/Wheel Caching, never type your license twice on the same machine when starting up a SparkSession or Re-Installing licensed libs!
- Various of Safety Mechanisms and Footguns removed, to reduce injuries :)
Introducing the new Enterprise Syntax for working with all of John Snow Labs libraries.
It bundles every relevant Function and Class you might ever need when working with JSL-Libraries into 1 simple import line.
from johnsnowlabs import *
This single import gets your thorugh all of the certification Notebooks, with exception of a few third party libraries.
The following modules will become avaiable :
Links to existing prodcuts
Usage&Overview for more details on import structure
nlp.MyAnno()
andnlp.my_function()
for every of Spark NLP's Python Functions/Classes/Modulesocr.MyAnno()
andocr.my_function()
for every of Spark OCR's Python Functions/Classes/Moduleslegal.MyAnno()
andlegal.my_function()
for every of Spark For Legal Python Functions/Classes/Modulesfinance.MyAnno()
andfinance.my_function()
for every of Spark For Finance Python Functions/Classes/Modulesmedical.MyAnno()
andmedical.my_function()
for every of Spark For Medical Python Functions/Classes/Modulesviz.MyVisualizer()
for every of Spark NLP-Display Classesjsl.load()
andjsl.viz()
from NLU
New Powerful Installation and Spark Session Start
The John Snow Labs libary aims to make installing Licensed Libraries and starting a Sparksession as easy as possible.
Installation Docs & Launch a Spark Session Docs
jsl.install()
- Authorization Flows (proove you have a License):
- Auto-Detect Environment Variables
- Auto Detect license files in current working dir
- Auto Detect cached license information that was stored in
~/.johnsnowlabs
from previous uns - Auto-Inject Local Browser Based OAuth
- Auto-Inject Colab Button based Oauth
- Manual Variable Definition
- Manual Json Path
- Access Token
- Installation Targets (Where to install to?):
- Currently running Python Process
- Into a Python environment, which is not the currently running Process
- Into a provided Venv
- Into a freshly created venv by the john snow labs library
- Airgap, by creating easy copy-pastable Zip file with all Jar/Wheels/Licenses to run in airgap
- Databricks
- Authorization Flows (proove you have a License):
jsl.start()
- After having run
jsl.install()
you can just runjsl.start()
.It remembers the license that was used to install and also has all jars pre-downloaded.
Additionally, it gives very helpful Logs when launching a session, telling you loaded jars and their versions.
You can even load a new license duringjsl.start()
, which supports all of the previously mentioned authorization flows.
- After having run
License Management
List all of your usable jsl licenses with jsl.list_remote_licenses()
And your locally cached licenses with jsl.list_local_licenses()
Databricks Utils
Easily submit any task to a Databricks cluster, in various formats, see Utils for databricks Docs
Run a Raw Python Code String in a Cluster and also create on on the fly.
from johnsnowlabs import *
script = """
import nlu
print(nlu.load('sentiment').predict('That was easy!'))"""
cluster_id = jsl.install(json_license_path=my_license, databricks_host=my_host,databricks_token=my_token)
jsl.run_in_databricks(script,
databricks_cluster_id=cluster_id,
databricks_host=my_host,
databricks_token=my_token,
run_name='Python Code String Example')
Run a Python Function in a Cluster.
def my_function():
import nlu
medical_text = """A 28-year-old female with a history of gestational
diabetes presented with a one-week history of polyuria ,
polydipsia , poor appetite , and vomiting ."""
df = nlu.load('en.med_ner.diseases').predict(medical_text)
for c in df.columns: print(df[c])
# my_function will run on databricks
jsl.run_in_databricks(my_function,
databricks_cluster_id=cluster_id,
databricks_host=my_host,
databricks_token=my_token,
run_name='Function test')
Run a Python Script in a Cluster.
jsl.run_in_databricks('path/to/my/script.py',
databricks_cluster_id=cluster_id,
databricks_host=my_host,
databricks_token=my_token,
run_name='Script test ')
Run a Python Module in a Cluster
import johnsnowlabs.auto_install.health_checks.nlp_test as nlp_test
jsl.run_in_databricks(nlp_test,
databricks_cluster_id=cluster_id,
databricks_host=my_host,
databricks_token=my_token,
run_name='nlp_test')
Testing Utils
You can use the John Snow Labs library to automatically test 10000+ models and 100+ Notebooks in 1 line of code within a small machine like a single Google Colab Instance and generate very handy error reports of potentially broken Models, Notebooks or Models hub Markdown Snippets.
Automatically test Notebooks/Modelshub Markdwon via URL, File-Path and many more options!
Workshop Notebook Testing Utils
See Utils for Testing Notebooks docs
from johnsnowlabs.utils.notebooks import test_ipynb
# Test a Local Markdown file with a Python Snippet
test_ipynb('path/to/local/notebook.ipynb')
# Test a Modelshub Python Markdown Snippet via URL
test_ipynb('https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/5.Spark_OCR.ipynb',)
# Test a folder of Markdown Snippets and generate a Report file, which captures all stderr/stdout
test_ipynb('my/notebook/folder')
# Test an Array of URLS/Paths to Markdown Fies
test_ipynb([ 'https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/5.Spark_OCR.ipynb', 'path/to/local/notebook.ipynb',])
# Run ALL notebooks in the Certification Folder
test_result = test_ipynb('WORKSHOP')
# Only run Finance notebooks
test_result = test_ipynb('WORKSHOP-FIN')
# Only run Legal notebooks
test_result = test_ipynb('WORKSHOP-LEG')
# Only run Medical notebooks
test_result = test_ipynb('WORKSHOP-MED')
# only run Open Source notebooks
test_result = test_ipynb('WORKSHOP-OS')
Modelshub Testing Utils
See Utils for Testing Models & Modelshub Markdown Snippets Docs
from johnsnowlabs.utils.modelhub_markdown import test_markdown
# Test a Local Markdown file with a Python Snippet
test_markdown('path/to/my/file.md')
# Test a Modelshub Python Markdown Snippet via URL
test_markdown('https://nlp.johnsnowlabs.com/2022/08/31/legpipe_deid_en.html')
# Test a folder of Markdown Snippets and generate a Report file, which captures all stderr/stdout
test_markdown('my/markdown/folder')
# Test an Array of URLS/Paths to Markdown Fies
test_ipynb(['legpipe_deid_en.html','path/to/local/markdown_snippet.md',])