Skip to content

Commit 22dd9c5

Browse files
Sid MohanSid Mohan
authored andcommitted
updated README
1 parent d076499 commit 22dd9c5

File tree

3 files changed

+68
-76
lines changed

3 files changed

+68
-76
lines changed

README.md

Lines changed: 65 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ pip install datafog
5454

5555
### Usage
5656

57-
The [Getting Started notebook](/datafog-python/examples/getting_started.ipynb) features a standalone Colab notebook that lets you get up and running in no time.
57+
The [Getting Started notebook](/datafog-python/examples/getting_started.ipynb) features a standalone Colab notebook.
5858

5959

6060
#### Text PII Annotation
@@ -63,54 +63,90 @@ To annotate PII in a given text, lets start with a set of clinical notes:
6363

6464
```
6565
!git clone https://gist.github.com/b43b72693226422bac5f083c941ecfdb.git
66-
```
66+
# Define the directory path
67+
folder_path = 'clinical_notes/'
68+
69+
# List all files in the directory
70+
file_list = os.listdir(folder_path)
71+
text_files = sorted([file for file in file_list if file.endswith('.txt')])
6772
68-
```python
69-
from datafog import TextPIIAnnotator
73+
with open(os.path.join(folder_path, text_files[0]), 'r') as file:
74+
clinical_note = file.read()
7075
71-
text = "John Doe lives at 1234 Elm St, Springfield."
72-
text_annotator = TextPIIAnnotator()
73-
annotated_text = text_annotator.run(text)
74-
print(annotated_text)
76+
display(Markdown(clinical_note))
77+
```
78+
which looks like this:
7579
```
7680
77-
This will output the annotated text with PII labeled, such as `{"LOC": ["Springfield"]}`.
81+
**Date:** April 10, 2024
7882
79-
#### Image Text Extraction and Annotation
83+
**Patient:** Emily Johnson, 35 years old
8084
81-
To extract text from an image and perform PII annotation, you can use the `DataFog` class:
85+
**MRN:** 00987654
8286
83-
```python
84-
from datafog import DataFog
87+
**Chief Complaint:** "I've been experiencing severe back pain and numbness in my legs."
8588
86-
image_url = "https://pbs.twimg.com/media/GM3-wpeWkAAP-cX.jpg"
87-
datafog = DataFog()
88-
annotated_text = await datafog.run_ocr_pipeline([image_url])
89-
print(annotated_text)
90-
```
89+
**History of Present Illness:** The patient is a 35-year-old who presents with a 2-month history of worsening back pain, numbness in both legs, and occasional tingling sensations. The patient reports working as a freelance writer and has been experiencing increased stress due to tight deadlines and financial struggles.
90+
91+
**Past Medical History:** Hypothyroidism
9192
92-
This will download the image, extract the text using OCR, and annotate any PII found in the extracted text.
93+
**Social History:**
94+
The patient shares a small apartment with two roommates and relies on public transportation. They mention feeling overwhelmed with work and personal responsibilities, often sacrificing sleep to meet deadlines. The patient expresses concern over the high cost of healthcare and the need for affordable medication options.
9395
94-
#### Text Processing
96+
**Review of Systems:** Denies fever, chest pain, or shortness of breath. Reports occasional headaches.
9597
96-
To process and annotate text using the DataFog pipeline, you can use the `DataFog` class:
98+
**Physical Examination:**
99+
- General: Appears tired but is alert and oriented.
100+
- Vitals: BP 128/80, HR 72, Temp 98.6°F, Resp 14/min
97101
98-
```python
99-
from datafog import DataFog
102+
**Assessment/Plan:**
103+
- Continue to monitor blood pressure and thyroid function.
104+
- Discuss affordable medication options with a pharmacist.
105+
- Refer to a social worker to address housing concerns and access to healthcare services.
106+
- Encourage the patient to engage with community support groups for social support.
107+
- Schedule a follow-up appointment in 4 weeks or sooner if symptoms worsen.
108+
109+
**Comments:** The patient's health concerns are compounded by socioeconomic factors, including employment status, housing stability, and access to healthcare. Addressing these social determinants of health is crucial for improving the patient's overall well-being.
100110
101-
text = ["Tokyo is the capital of Japan"]
102-
datafog = DataFog()
103-
annotated_text = await datafog.run_text_pipeline(text)
104-
print(annotated_text)
105111
```
106112

107-
This will process the given text and annotate entities such as person names and locations.
113+
we can then set up our pipeline to accept these files
114+
115+
```
116+
async def run_text_pipeline_demo():
117+
results = await datafog.run_text_pipeline(texts)
118+
print("Text Pipeline Results:", results)
119+
return results
120+
121+
122+
texts = [clinical_note]
123+
loop = asyncio.get_event_loop()
124+
results = loop.run_until_complete(run_text_pipeline_demo())
125+
```
108126

109-
For more detailed usage and examples, please refer to the API documentation.
110127

111128
Note: The DataFog library uses asynchronous programming, so make sure to use the `async`/`await` syntax when calling the appropriate methods.
112129

130+
#### OCR PII Annotation
131+
132+
Let's use a image (which could easily be a converted or scanned PDF)
133+
134+
![Executive Email](https://pbs.twimg.com/media/GM3-wpeWkAAP-cX.jpg)
135+
136+
```
137+
datafog = DataFog(operations='extract_text')
138+
url_list = ['https://pbs.twimg.com/media/GM3-wpeWkAAP-cX.jpg']
139+
140+
async def run_ocr_pipeline_demo():
141+
results = await datafog.run_ocr_pipeline(url_list)
142+
print("OCR Pipeline Results:", results)
143+
144+
loop = asyncio.get_event_loop()
145+
loop.run_until_complete(run_ocr_pipeline_demo())
146+
147+
```
113148

149+
You'll notice that we use async functions liberally throughout the SDK - given the nature of the functions we're providing and the extension of DataFog into API/other formats, this allows the functions to be more easily adapted for those uses.
114150

115151
## Contributing
116152

Lines changed: 0 additions & 1 deletion
This file was deleted.

examples/getting_started.ipynb

Lines changed: 3 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -220,55 +220,12 @@
220220
},
221221
{
222222
"cell_type": "code",
223-
"execution_count": 13,
223+
"execution_count": null,
224224
"metadata": {},
225-
"outputs": [
226-
{
227-
"data": {
228-
"text/markdown": [
229-
"\n",
230-
"**Date:** April 10, 2024\n",
231-
"\n",
232-
"**Patient:** Emily Johnson, 35 years old\n",
233-
"\n",
234-
"**MRN:** 00987654\n",
235-
"\n",
236-
"**Chief Complaint:** \"I've been experiencing severe back pain and numbness in my legs.\"\n",
237-
"\n",
238-
"**History of Present Illness:** The patient is a 35-year-old who presents with a 2-month history of worsening back pain, numbness in both legs, and occasional tingling sensations. The patient reports working as a freelance writer and has been experiencing increased stress due to tight deadlines and financial struggles.\n",
239-
"\n",
240-
"**Past Medical History:** Hypothyroidism\n",
241-
"\n",
242-
"**Social History:**\n",
243-
"The patient shares a small apartment with two roommates and relies on public transportation. They mention feeling overwhelmed with work and personal responsibilities, often sacrificing sleep to meet deadlines. The patient expresses concern over the high cost of healthcare and the need for affordable medication options.\n",
244-
"\n",
245-
"**Review of Systems:** Denies fever, chest pain, or shortness of breath. Reports occasional headaches.\n",
246-
"\n",
247-
"**Physical Examination:**\n",
248-
"- General: Appears tired but is alert and oriented.\n",
249-
"- Vitals: BP 128/80, HR 72, Temp 98.6°F, Resp 14/min\n",
250-
"\n",
251-
"**Assessment/Plan:**\n",
252-
"- Continue to monitor blood pressure and thyroid function.\n",
253-
"- Discuss affordable medication options with a pharmacist.\n",
254-
"- Refer to a social worker to address housing concerns and access to healthcare services.\n",
255-
"- Encourage the patient to engage with community support groups for social support.\n",
256-
"- Schedule a follow-up appointment in 4 weeks or sooner if symptoms worsen.\n",
257-
"\n",
258-
"**Comments:** The patient's health concerns are compounded by socioeconomic factors, including employment status, housing stability, and access to healthcare. Addressing these social determinants of health is crucial for improving the patient's overall well-being.\n",
259-
"\n"
260-
],
261-
"text/plain": [
262-
"<IPython.core.display.Markdown object>"
263-
]
264-
},
265-
"metadata": {},
266-
"output_type": "display_data"
267-
}
268-
],
225+
"outputs": [],
269226
"source": [
270227
"# Define the directory path\n",
271-
"folder_path = 'b43b72693226422bac5f083c941ecfdb/'\n",
228+
"folder_path = 'clinical_notes/'\n",
272229
"\n",
273230
"# List all files in the directory\n",
274231
"file_list = os.listdir(folder_path)\n",

0 commit comments

Comments
 (0)