Skip to content

Commit

Permalink
added healthcare docs to Annotator page for v552 (#1697)
Browse files Browse the repository at this point in the history
  • Loading branch information
mehmetbutgul authored Jan 20, 2025
1 parent e26b94d commit 47fa80b
Show file tree
Hide file tree
Showing 6 changed files with 27 additions and 10 deletions.
6 changes: 6 additions & 0 deletions docs/en/licensed_annotator_entries/AssertionDL.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ Parameters:

- `datasetInfo` *(Str)*: Descriptive information about the dataset being used.

- `blackList` *(list[str])*: If defined, list of entities to ignore. The rest will be processed.

- `whiteList` *(list[str])*: If defined, list of entities to process. The rest will be ignored. Do not include IOB prefix on labels.

- `caseSensitive` *(Bool)*: Determines whether the definitions of the white listed and black listed entities are case sensitive. Default: True.

For pretrained models please see the
[Models Hub](https://nlp.johnsnowlabs.com/models?task=Assertion+Status) for available models.
{%- endcapture -%}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,11 @@ Parameters:

- `caseSensitive`: Determines whether the definitions of the white listed entities are case sensitive.

All the parameters can be set using the corresponding set method in camel case. For example, `.setInputcols()`.
- `strategy`: Strategy for computing embeddings. Supported strategies are: `sentence_average`, `scope_average`, `chunk_only`, `scope_only`. The default is `sentence_average`.

- `scopeWindow`: cope window to calculate scope embeddings. The scope window is defined by two non-negative integers. The default is [0, 0], which means only the chunk embeddings are used. The first integer defines the number of tokens before the chunk and the second integer defines the number of tokens after the chunk.

All the parameters can be set using the corresponding set method in camel case. For example, `.setInputCols()`.

> For more information and examples of `BertSentenceChunkEmbeddings` annotator, you can check the [Spark NLP Workshop](https://github.com/JohnSnowLabs/spark-nlp-workshop), and in special, the notebook [24.1.Improved_Entity_Resolution_with_SentenceChunkEmbeddings.ipynb](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/24.1.Improved_Entity_Resolution_with_SentenceChunkEmbeddings.ipynb).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Parameters:
- `blackListWords`: The black list of words. If a word from this list appears within the scope window, the chunk will be filtered out.
- `whiteListWords`: The white list of words. If a word from this list appears within the scope window, the chunk will be kept.
- `confidenceThreshold`: The confidence threshold to filter the chunks. Filtering is only applied if the confidence of the chunk is below the threshold.
- `possibleRegexContext` : The possible regex context to filter the chunks. If the regex is found in the context(chunk), the chunk is kept.
- `impossibleRegexContext` : The impossible regex context to filter the chunks. If the regex is found in the context(chunk), the chunk is removed.

{%- endcapture -%}

Expand Down
16 changes: 7 additions & 9 deletions docs/en/licensed_annotator_entries/ContextualEntityRuler.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ It is particularly useful for refining entity recognition results according to s

Parameters:

- `setCaseSensitive`: Whether to perform case-sensitive matching. Default is False.
- `setAllowPunctuationInBetween`: Whether to allow punctuation between prefix/suffix patterns and the entity. Default is True.
- `setDropEmptyChunks`: If True, removes chunks with empty content after applying rules. Default is False.
- `setCaseSensitive`: If True, it is case sensitive while checking the context. Default is False.
- `setMergeOverlapping`: If False, it returns both modified entities and the original entities at the same time. Default is True.
- `caseSensitive`: Whether to perform case-sensitive matching. Default is False.
- `allowPunctuationInBetween`: Whether to allow punctuation between prefix/suffix patterns and the entity. Default is True.
- `allowTokensInBetween`: Whether to allow tokens between prefix/suffix patterns and the entity. Default is False.
- `dropEmptyChunks`: If True, removes chunks with empty content after applying rules. Default is False.
- `mergeOverlapping`: If False, it returns both modified entities and the original entities at the same time. Default is True.
- `rules`: The updating rules. Each rule is a dictionary with the following keys:
- `entity`: The target entity label to modify.
Example: `"AGE"`.
Expand All @@ -38,9 +38,8 @@ Parameters:
Example: `["\\b(old|young)\\b"]` matches words like "old" or "young" as suffixes.
- `replaceEntity`: Optional string specifying the new entity label to replace with the target entity label.
Example: `"MODIFIED_AGE"` replaces `"AGE"` with `"MODIFIED_AGE"` in matching cases.
- `mode`: Specifies the operational mode for the rules.
Possible values depend on the use case (e.g., `"include"`, `"exclude"`).
Default: `"include"`
- `mode`: Specifies the operational mode for the rules. Options: `include`, `exclude`, or `replace_label_only`. Default is `include`.

{%- endcapture -%}

{%- capture model_input_anno -%}
Expand Down Expand Up @@ -101,7 +100,6 @@ rules = [ {
"replaceEntity" : "Modified_Date",
"mode" : "include"
}

]

contextual_entity_ruler = medical.ContextualEntityRuler() \
Expand Down
4 changes: 4 additions & 0 deletions docs/en/licensed_annotator_entries/DeIdentification.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,10 @@ If True, the month will remain unchanged during the obfuscation process.
If False, the month will be modified along with the year and day.
Default: False.

- `keepTextSizeForObfuscation` : Whether to keep the text length same obfuscating entities. If `True`, the output text will remain the same if a same length fake is available, otherwise length might vary.

- `fakerLengthOffset` : It specifies how much length deviation is accepted in obfuscation, with `keepTextSizeForObfuscation` enabled. It must be greater than 0.


To create a configured DeIdentificationModel, please see the example of DeIdentification.
{%- endcapture -%}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ Parameters:
- `batchSize` *(Int)*: Batch size
- `caseSensitive` *(Bool)*: Whether the classifier is sensitive to text casing
- `maxSentenceLength` *(Int)*: The maximum length of the input text
- `blackList` *(list[str])*: If defined, list of entities to ignore. The rest will be processed.
- `whiteList` *(list[str])*: If defined, list of entities to process. The rest will be ignored. Do not include IOB prefix on labels.
- `caseSensitive` *(Bool)*: Determines whether the definitions of the white listed and black listed entities are case sensitive. Default: True.


{%- endcapture -%}
Expand Down

0 comments on commit 47fa80b

Please sign in to comment.