Release John Snow Labs Library 4.2.0 Release · JohnSnowLabs/johnsnowlabs

We are announcing with incredible excitement the release of the John Snow Labs 4.2.0 Library!
It introduces

New Enterprise Syntax to easily access any feature of any JSL-Library.
Highly configurable Automatic Installers with various Authorization Flows and Installation Targets like 1-Click OAUTH, 1 Line Databricks, 1 Line for new Enterprise Compatible venv and extended Offline support.
Easily run a Python Function, Raw Python Code Snippet, Python Script or Python Module in a Databricks cluster in 1 line of code and create one if missing.
Smart License/Jar/Wheel Caching, never type your license twice on the same machine when starting up a SparkSession or Re-Installing licensed libs!
Various of Safety Mechanisms and Footguns removed, to reduce injuries :)

Introducing the new Enterprise Syntax for working with all of John Snow Labs libraries.
It bundles every relevant Function and Class you might ever need when working with JSL-Libraries into 1 simple import line.
from johnsnowlabs import *
This single import gets your thorugh all of the certification Notebooks, with exception of a few third party libraries.
The following modules will become avaiable :

Links to existing prodcuts

Usage&Overview for more details on import structure

nlp.MyAnno() and nlp.my_function() for every of Spark NLP's Python Functions/Classes/Modules
ocr.MyAnno() and ocr.my_function() for every of Spark OCR's Python Functions/Classes/Modules
legal.MyAnno() and legal.my_function() for every of Spark For Legal Python Functions/Classes/Modules
finance.MyAnno() and finance.my_function() for every of Spark For Finance Python Functions/Classes/Modules
medical.MyAnno() and medical.my_function() for every of Spark For Medical Python Functions/Classes/Modules
viz.MyVisualizer() for every of Spark NLP-Display Classes
jsl.load() and jsl.viz() from NLU

New Powerful Installation and Spark Session Start

The John Snow Labs libary aims to make installing Licensed Libraries and starting a Sparksession as easy as possible.
Installation Docs & Launch a Spark Session Docs

jsl.install()
- Authorization Flows (proove you have a License):
  - Auto-Detect Environment Variables
  - Auto Detect license files in current working dir
  - Auto Detect cached license information that was stored in ~/.johnsnowlabs from previous uns
  - Auto-Inject Local Browser Based OAuth
  - Auto-Inject Colab Button based Oauth
  - Manual Variable Definition
  - Manual Json Path
  - Access Token
- Installation Targets (Where to install to?):
  - Currently running Python Process
  - Into a Python environment, which is not the currently running Process
  - Into a provided Venv
  - Into a freshly created venv by the john snow labs library
  - Airgap, by creating easy copy-pastable Zip file with all Jar/Wheels/Licenses to run in airgap
  - Databricks
jsl.start()
- After having run jsl.install() you can just run jsl.start() .It remembers the license that was used to install and also has all jars pre-downloaded.
  Additionally, it gives very helpful Logs when launching a session, telling you loaded jars and their versions.
  You can even load a new license during jsl.start() , which supports all of the previously mentioned authorization flows.

License Management

List all of your usable jsl licenses with jsl.list_remote_licenses()
And your locally cached licenses with jsl.list_local_licenses()

Databricks Utils

Easily submit any task to a Databricks cluster, in various formats, see Utils for databricks Docs

Run a Raw Python Code String in a Cluster and also create on on the fly.

from johnsnowlabs import *
script = """
import nlu
print(nlu.load('sentiment').predict('That was easy!'))"""

cluster_id = jsl.install(json_license_path=my_license, databricks_host=my_host,databricks_token=my_token)
jsl.run_in_databricks(script,
                      databricks_cluster_id=cluster_id,
                      databricks_host=my_host,
                      databricks_token=my_token,
                      run_name='Python Code String Example')

Run a Python Function in a Cluster.

def my_function():
    import nlu
    medical_text = """A 28-year-old female with a history of gestational 
    diabetes presented with a one-week history of polyuria ,
     polydipsia , poor appetite , and vomiting ."""
    df = nlu.load('en.med_ner.diseases').predict(medical_text)
    for c in df.columns: print(df[c])

# my_function will run on databricks
jsl.run_in_databricks(my_function,
                      databricks_cluster_id=cluster_id,
                      databricks_host=my_host,
                      databricks_token=my_token,
                      run_name='Function test')

Run a Python Script in a Cluster.

jsl.run_in_databricks('path/to/my/script.py',
                      databricks_cluster_id=cluster_id,
                      databricks_host=my_host,
                      databricks_token=my_token,
                      run_name='Script test ')

Run a Python Module in a Cluster

import johnsnowlabs.auto_install.health_checks.nlp_test as nlp_test
jsl.run_in_databricks(nlp_test,
                      databricks_cluster_id=cluster_id,
                      databricks_host=my_host,
                      databricks_token=my_token,
                      run_name='nlp_test')

Testing Utils

You can use the John Snow Labs library to automatically test 10000+ models and 100+ Notebooks in 1 line of code within a small machine like a single Google Colab Instance and generate very handy error reports of potentially broken Models, Notebooks or Models hub Markdown Snippets.

Automatically test Notebooks/Modelshub Markdwon via URL, File-Path and many more options!

Workshop Notebook Testing Utils

See Utils for Testing Notebooks docs

from johnsnowlabs.utils.notebooks import test_ipynb

# Test a Local Markdown file with a Python Snippet

test_ipynb('path/to/local/notebook.ipynb')
# Test a Modelshub Python Markdown Snippet via URL 
test_ipynb('https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/5.Spark_OCR.ipynb',)


# Test a folder of Markdown Snippets and generate a Report file, which captures all stderr/stdout
test_ipynb('my/notebook/folder')


# Test an Array of URLS/Paths to Markdown Fies
test_ipynb([ 'https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/Certification_Trainings/Healthcare/5.Spark_OCR.ipynb', 'path/to/local/notebook.ipynb',])

# Run ALL notebooks in the Certification Folder  
test_result = test_ipynb('WORKSHOP')

# Only run Finance notebooks
test_result = test_ipynb('WORKSHOP-FIN')

# Only run Legal notebooks
test_result = test_ipynb('WORKSHOP-LEG')

# Only run Medical notebooks
test_result = test_ipynb('WORKSHOP-MED')

# only run Open Source notebooks
test_result = test_ipynb('WORKSHOP-OS')

Modelshub Testing Utils

See Utils for Testing Models & Modelshub Markdown Snippets Docs

from johnsnowlabs.utils.modelhub_markdown import test_markdown

# Test a Local Markdown file with a Python Snippet
test_markdown('path/to/my/file.md')
# Test a Modelshub Python Markdown Snippet via URL 
test_markdown('https://nlp.johnsnowlabs.com/2022/08/31/legpipe_deid_en.html')

# Test a folder of Markdown Snippets and generate a Report file, which captures all stderr/stdout
test_markdown('my/markdown/folder')

# Test an Array of URLS/Paths to Markdown Fies
test_ipynb(['legpipe_deid_en.html','path/to/local/markdown_snippet.md',])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

John Snow Labs Library 4.2.0 Release