JohnSnowLabs · AbdullahMubeenAnwar · Oct 20, 2025
diff --git a/Spark_NLP_Udemy_MOOC/Healthcare_NLP/BertSentenceChunkEmbeddings.ipynb b/Spark_NLP_Udemy_MOOC/Healthcare_NLP/BertSentenceChunkEmbeddings.ipynb
@@ -294,19 +294,33 @@
     "id": "AUbv3YL59D8Q"
    },
    "source": [
-    "- `inputCols`: The name of the columns containing the input annotations. It can read either a String column or an Array.\n",
+    "- **inputCols**: Input annotation columns, typically `[\"sentence\", \"chunk\"]`. The `chunk` column provides the text spans, and the `sentence` column provides contextual information.\n",
     "\n",
-    "- `outputCol`: The name of the column in Document type that is generated. We can specify only one column here.\n",
+    "- **outputCol**: Name of the output column that will contain the resulting sentence-chunk embeddings.\n",
     "\n",
-    "- `chunkWeight`: Relative weight of chunk embeddings in comparison to sentence embeddings. The value should between 0 and 1. The default is 0.5, which means the chunk and sentence embeddings are given equal weight.\n",
+    "- **chunkWeight**: Relative weight of chunk embeddings compared to sentence embeddings. The value should be between 0 and 1. A value of `0.5` (default) means both chunk and sentence embeddings are given equal weight.\n",
     "\n",
-    "- `setMaxSentenceLength`: Sets max sentence length to process, by default 128.\n",
+    "- **strategy**: Strategy for computing embeddings. Supported options:\n",
+    "  - `\"sentence_average\"`: Averages sentence and chunk embeddings (default).\n",
+    "  - `\"scope_average\"`: Averages scope and chunk embeddings, where the scope is defined by the `scopeWindow`.\n",
+    "  - `\"chunk_only\"`: Uses only the chunk embeddings.\n",
+    "  - `\"scope_only\"`: Uses only the scope embeddings, defined by `scopeWindow`.\n",
     "\n",
-    "- `caseSensitive`: Determines whether the definitions of the white listed entities are case sensitive.\n",
+    "- **scopeWindow**: A tuple `(left, right)` defining how many tokens before and after the chunk are included when calculating scope embeddings. Defaults to `(0, 0)`, meaning no additional context tokens are included.\n",
     "\n",
-    "- `strategy`: Strategy for computing embeddings. Supported strategies are: `sentence_average`, `scope_average`, `chunk_only`, `scope_only`. The default is `sentence_average`.\n",
+    "- **batchSize**: Number of sentences processed per batch during embedding computation. Affects performance and memory usage.\n",
     "\n",
-    "- `scopeWindow`: cope window to calculate scope embeddings. The scope window is defined by two non-negative integers. The default is [0, 0], which means only the chunk embeddings are used. The first integer defines the number of tokens before the chunk and the second integer defines the number of tokens after the chunk.\n",
+    "- **caseSensitive**: Whether to preserve case when matching tokens for embedding computation. Default: `True`.\n",
+    "\n",
+    "- **dimension**: The embedding vector dimension. This depends on the pretrained model (e.g., 768 for BERT base).\n",
+    "\n",
+    "- **storageRef**: Unique reference name identifying the embeddings source. Useful when sharing models across pipelines.\n",
+    "\n",
+    "- **lazyAnnotator**: Whether the annotator should load resources lazily in a `RecursivePipeline`. Default: `False`.\n",
+    "\n",
+    "- **isLong**: Whether to use `Long` type instead of `Int` for model inputs. Some BERT models require `Long` tensors. Default: `False`.\n",
+    "\n",
+    "- **configProtoBytes**: TensorFlow configuration serialized as a byte array. Intended for advanced users who want to fine-tune session settings.\n",
     "\n",
     "All the parameters can be set using the corresponding set method in camel case. For example, `.setInputCols()`.\n",
     "\n",

diff --git a/Spark_NLP_Udemy_MOOC/Healthcare_NLP/ChunkMapperFilterer.ipynb b/Spark_NLP_Udemy_MOOC/Healthcare_NLP/ChunkMapperFilterer.ipynb
diff --git a/Spark_NLP_Udemy_MOOC/Healthcare_NLP/DeIdentification.ipynb b/Spark_NLP_Udemy_MOOC/Healthcare_NLP/DeIdentification.ipynb
diff --git a/Spark_NLP_Udemy_MOOC/Healthcare_NLP/DeIdentificationModel.ipynb b/Spark_NLP_Udemy_MOOC/Healthcare_NLP/DeIdentificationModel.ipynb
diff --git a/Spark_NLP_Udemy_MOOC/Healthcare_NLP/LargeFewShotClassifierModel.ipynb b/Spark_NLP_Udemy_MOOC/Healthcare_NLP/LargeFewShotClassifierModel.ipynb
@@ -251,9 +251,15 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "- `batchSize` : Batch size for processing documents (default: 8).\n",
-        "- `caseSensitive` : Whether the classifier is sensitive to text casing (default: false).\n",
-        "- `maxSentenceLength` : Maximum input sentence length (text beyond this may be truncated)."
+        "- **inputCols**: Input columns containing `DOCUMENT` annotations.\n",
+        "\n",
+        "- **outputCol**: Output column name where classification results (`CATEGORY`) are stored.\n",
+        "\n",
+        "- **batchSize**: Batch size for processing documents. Default: `8`.\n",
+        "\n",
+        "- **caseSensitive**: Whether the classifier is sensitive to text casing. Default: `False`.\n",
+        "\n",
+        "- **maxSentenceLength**: Maximum input sentence length. Text beyond this limit may be truncated.\n"
       ]
     },
     {