Skip to content

Commit 6652c78

Browse files
committed
Updated inference script
1 parent 5da59c7 commit 6652c78

File tree

1 file changed

+18
-7
lines changed

1 file changed

+18
-7
lines changed

README.md

+18-7
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ Lugano.
66
## Challenge
77

88
Here is a brief explanation of the challenge:
9-
The challenge was proposed by **Ai4Privacy**, a company that builds global solutions that enhance **privacy protections**
9+
The challenge was proposed by **Ai4Privacy**, a company that builds global solutions that enhance **privacy protections
10+
**
1011
in the rapidly evolving world of **Artificial Intelligence**.
1112
The challenge goal is to create a machine learning model capable of detecting and masking **PII** (Personal Identifiable
1213
Information) in text data across several languages and locales. The task requires working with a synthetic dataset to
@@ -17,7 +18,9 @@ including client support, legal, and general data anonymization tools. Success i
1718
scaling privacy-conscious AI systems without compromising the UX or operational performance.
1819

1920
## Getting Started
21+
2022
Create a `.env` file. Start copying the `.env.example` file and rename it to `.env`. Fill in the required values.
23+
2124
```bash
2225
cp .env.example .env
2326
```
@@ -93,17 +96,17 @@ Here is a list of available BERT models that can be used for fine-tuning. Additi
9396
may also work with minimal modifications:
9497

9598
- BERT classic
96-
+ `bert-base-uncased`, `bert-large-uncased`, `bert-base-cased`, `bert-large-cased`
99+
+ `bert-base-uncased`, `bert-large-uncased`, `bert-base-cased`, `bert-large-cased`
97100
- DistilBERT
98-
+ `distilbert-base-uncased`, `distilbert-base-cased`
101+
+ `distilbert-base-uncased`, `distilbert-base-cased`
99102
- RoBERTa
100-
+ `roberta-base`, `roberta-large`
103+
+ `roberta-base`, `roberta-large`
101104
- ALBERT
102-
+ `albert-base-v2`, `albert-large-v2`, `albert-xlarge-v2`, `albert-xxlarge-v2`
105+
+ `albert-base-v2`, `albert-large-v2`, `albert-xlarge-v2`, `albert-xxlarge-v2`
103106
- Electra
104-
+ `google/electra-small-discriminator`, `google/electra-base-discriminator`, `google/electra-large-discriminator`
107+
+ `google/electra-small-discriminator`, `google/electra-base-discriminator`, `google/electra-large-discriminator`
105108
- DeBERTa
106-
+ `microsoft/deberta-base`, `microsoft/deberta-large`
109+
+ `microsoft/deberta-base`, `microsoft/deberta-large`
107110

108111
### GLiNER Fine-Tuning
109112

@@ -141,4 +144,12 @@ You can use the following GLiNER models for fine-tuning, though additional compa
141144
- `gliner-community/gliner_small-v2.5`
142145

143146
## Results
147+
144148
A results folder is available in the repository to store the results of the various experiments and related metrics.
149+
150+
## Other Information
151+
152+
We also provide a solution to the issue in
153+
the [pii-masking-400k](https://huggingface.co/datasets/ai4privacy/pii-masking-400k/discussions/3) repository.
154+
We created a method to transform the natural language text into a token-tag format that can be used to train a Named
155+
Entity Recognition (NER) model using the `AutoTrain` `huggingface` api.

0 commit comments

Comments
 (0)