You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+35-91
Original file line number
Diff line number
Diff line change
@@ -19,18 +19,8 @@
19
19
20
20
## Overview
21
21
22
-
### What is DataFog?
23
-
24
22
DataFog is an open-source DevSecOps platform that lets you scan and redact Personally Identifiable Information (PII) out of your Generative AI applications.
The DataFog library provides functionality for text and image processing, including PII (Personally Identifiable Information) annotation and OCR (Optical Character Recognition) capabilities.
45
-
46
-
### Installation
47
-
48
-
To install the DataFog library, use the following command:
49
-
50
-
```
51
-
pip install datafog
52
-
```
53
-
54
-
### Usage
34
+
To use DataFog, you'll need to create a DataFog client with the desired operations. Here's a basic setup:
55
35
56
-
The [Getting Started notebook](/examples/getting_started.ipynb) features a standalone Colab notebook.
36
+
```python
37
+
from datafog import DataFog
57
38
58
-
#### Text PII Annotation
59
-
60
-
To annotate PII in a given text, lets start with a set of clinical notes:
**Chief Complaint:** "I've been experiencing severe back pain and numbness in my legs."
88
-
89
-
**History of Present Illness:** The patient is a 35-year-old who presents with a 2-month history of worsening back pain, numbness in both legs, and occasional tingling sensations. The patient reports working as a freelance writer and has been experiencing increased stress due to tight deadlines and financial struggles.
63
+
### OCR PII Annotation
90
64
91
-
**Past Medical History:** Hypothyroidism
65
+
For OCR capabilities, you can use the following:
92
66
93
-
**Social History:**
94
-
The patient shares a small apartment with two roommates and relies on public transportation. They mention feeling overwhelmed with work and personal responsibilities, often sacrificing sleep to meet deadlines. The patient expresses concern over the high cost of healthcare and the need for affordable medication options.
67
+
```python
68
+
import asyncio
69
+
import nest_asyncio
95
70
96
-
**Review of Systems:** Denies fever, chest pain, or shortness of breath. Reports occasional headaches.
71
+
nest_asyncio.apply()
97
72
98
-
**Physical Examination:**
99
-
- General: Appears tired but is alert and oriented.
100
-
- Vitals: BP 128/80, HR 72, Temp 98.6°F, Resp 14/min
101
73
102
-
**Assessment/Plan:**
103
-
- Continue to monitor blood pressure and thyroid function.
104
-
- Discuss affordable medication options with a pharmacist.
105
-
- Refer to a social worker to address housing concerns and access to healthcare services.
106
-
- Encourage the patient to engage with community support groups for social support.
107
-
- Schedule a follow-up appointment in 4 weeks or sooner if symptoms worsen.
108
-
109
-
**Comments:** The patient's health concerns are compounded by socioeconomic factors, including employment status, housing stability, and access to healthcare. Addressing these social determinants of health is crucial for improving the patient's overall well-being.
110
-
111
-
```
112
-
113
-
we can then set up our pipeline to accept these files
Note: The DataFog library uses asynchronous programming for OCR, so make sure to use the `async`/`await` syntax when calling the appropriate methods.
142
85
143
-
loop = asyncio.get_event_loop()
144
-
loop.run_until_complete(run_ocr_pipeline_demo())
86
+
## Examples
145
87
146
-
```
88
+
For more detailed examples, check out our Jupyter notebooks in the `examples/` directory:
147
89
148
-
You'll notice that we use async functions liberally throughout the SDK - given the nature of the functions we're providing and the extension of DataFog into API/other formats, this allows the functions to be more easily adapted for those uses.
90
+
-`text_annotation_example.ipynb`: Demonstrates text PII annotation
91
+
-`image_processing.ipynb`: Shows OCR capabilities and text extraction from images
149
92
93
+
These notebooks provide step-by-step guides on how to use DataFog for various tasks.
0 commit comments