6.2.3 #14703

DevinTDHa · 2025-12-03T11:22:57Z

DevinTDHa
Dec 3, 2025
Maintainer

📢 Spark NLP 6.2.3: Further Improvements for NerDL

Spark NLP 6.2.3 introduces targeted improvements to training performance and stability of NerDLApproach and bug fixes for CamemBertForTokenClassification.

NerDLApproach now uses new internal data-loading behavior, and improving training speed and preventing out-of-memory errors.

🔥 Highlights

Enhanced NerDLApproach training performance through threaded data loading and optimized partitioning.

🚀 New Features & Enhancements

NerDLApproach Training Optimizations

Significant performance improvements for training of NerDLApproach:

Threaded Data Loading: When enabling the memory optimizer (setEnableMemoryOptimizer(true)), data can now be pre-fetched through a threaded data loader. By default, it is disabled but can be tuned by using:

.setPrefetchBatches(int)

By tuning this parameter (for example 20 batches), you can get training time reductions of about 10%.

Optimized Partitioning Strategy: NerDLApproach now applies optimized dataframe partitioning when using the memory optimizer (setEnableMemoryOptimizer(true)) by default, improving parallelization efficiency during training and preventing out-of-memory errors.

For manual tuning of the input data frames, this behavior can be disabled with:

.setOptimizePartitioning(false)

🐛 Bug Fixes

CamemBertForTokenClassification: Fixed an issue with expected input types during inference.

❤️ Community Support

Slack - real-time discussion with the Spark NLP community and team
GitHub - issue tracking, feature requests, and contributions
Discussions - community ideas and showcases
Medium - latest Spark NLP articles and tutorials
YouTube - educational videos and demos

💻 Installation

Python

pip install spark-nlp==6.2.3

Spark Packages

CPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.2.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:6.2.3

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.2.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:6.2.3

Apple Silicon

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.2.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:6.2.3

AArch64

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.2.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:6.2.3

Maven

<dependency>
  <groupId>com.johnsnowlabs.nlp</groupId>
  <artifactId>spark-nlp_2.12</artifactId>
  <version>6.2.3</version>
</dependency>

FAT JARs

What's Changed

Full Changelog: 6.2.2...6.2.3

This discussion was created from the release 6.2.3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

6.2.3 #14703

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

6.2.3 #14703

Uh oh!

DevinTDHa Dec 3, 2025 Maintainer

📢 Spark NLP 6.2.3: Further Improvements for NerDL

🔥 Highlights

🚀 New Features & Enhancements

NerDLApproach Training Optimizations

🐛 Bug Fixes

❤️ Community Support

💻 Installation

Python

Spark Packages

Maven

FAT JARs

What's Changed

Replies: 0 comments

DevinTDHa
Dec 3, 2025
Maintainer