Skip to content

Commit 6314014

Browse files
Sid MohanSid Mohan
authored andcommitted
all checks passed - updated README
1 parent 213f030 commit 6314014

File tree

1 file changed

+20
-24
lines changed

1 file changed

+20
-24
lines changed

README.md

Lines changed: 20 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -20,25 +20,25 @@
2020
## Overview
2121

2222
### What is DataFog?
23-
DataFog is an open-source DevSecOps platform that lets you scan and redact Personally Identifiable Information (PII) out of your Generative AI applications.
23+
24+
DataFog is an open-source DevSecOps platform that lets you scan and redact Personally Identifiable Information (PII) out of your Generative AI applications.
2425

2526
### What problem are we solving?
2627

2728
**Context**
2829

29-
The primary use case today is Retrieval Augmented Generation (RAG) systems. As a refresher, RAG systems operate by retrieving information from a custom knowledge base—constructed by you or your team—and leverage this information, either by directly citing the files in a response or inferred through the model's responses. This knowledge base is assembled through a deliberate process, which involves uploading files into a workflow. These files are then segmented into logical information blocks and tagged according to their contextual significance. There are a thousand ways to add nuance to this characterization, but this suffices for the vast majority of cases!
30-
30+
The primary use case today is Retrieval Augmented Generation (RAG) systems. As a refresher, RAG systems operate by retrieving information from a custom knowledge base—constructed by you or your team—and leverage this information, either by directly citing the files in a response or inferred through the model's responses. This knowledge base is assembled through a deliberate process, which involves uploading files into a workflow. These files are then segmented into logical information blocks and tagged according to their contextual significance. There are a thousand ways to add nuance to this characterization, but this suffices for the vast majority of cases!
3131

3232
**Problem**
3333

3434
How do you keep:
3535

36-
* Customer PII
37-
* Employee PII
38-
* Sensitive company information pertaining to org changes or restructurings
39-
* Pending M&A activity
40-
* Conversations with external counsel on material corporate matters (i.e. product recall, etc)
41-
* and more
36+
- Customer PII
37+
- Employee PII
38+
- Sensitive company information pertaining to org changes or restructurings
39+
- Pending M&A activity
40+
- Conversations with external counsel on material corporate matters (i.e. product recall, etc)
41+
- and more
4242

4343
from entering a Generative AI environment in the first place? What you need is a tool to scan and redact your RAG-bound documents based on your organization or team needs.
4444

@@ -47,39 +47,37 @@ That's where DataFog comes in. Our solution to this problem is through two major
4747
**PII Observability** Take in your batch/streaming data and return a scan indicating character-level detection of entities
4848
**Privacy Filter** DataFog can slot in as a pre-processor that redacts PII from your files before they get uploaded to a RAG database
4949

50-
5150
With this SDK, you can import it into a Python environment (like a Google Colab notebook, check out our [Getting Started](examples/getting-started.ipynb)) and within a few lines of code you're up and running.
5251

53-
5452
### How it works
5553

5654
<img src="https://www.datafog.ai/hero.png" alt="DataFog Overview" style="width:50%;">
5755

56+
### There's lots of PII tools out there; why DataFog?
5857

59-
### There's lots of PII tools out there; why DataFog?
60-
If you look at the landscape of PII detection tools, their very existence was in many cases driven by regulatory requirements (i.e. 'comply with CCPA/GDPR/HIPAA').
61-
In this scenario, there's a very defined problem, a specific set of immutable entities to look for, and a relatively static universe of document schema to work with. What that means as an end-result is that the products
62-
are purpose-built for the problem that they are solving.
63-
64-
However, Generative AI changes how we think about privacy. There's now a changing set of privacy requirements (new M&A deals, internal discussions means new terms to scan/redact) as well as different and varying document sources to contend with. PII detection is no longer just about compliance, it's an active - and for some, new - internal security threat for CISOs and Eng Leaders to contend with. We want DataFog to be built and driven to meet the needs of the open-source community as they tackle this challenge.
58+
If you look at the landscape of PII detection tools, their very existence was in many cases driven by regulatory requirements (i.e. 'comply with CCPA/GDPR/HIPAA').
59+
In this scenario, there's a very defined problem, a specific set of immutable entities to look for, and a relatively static universe of document schema to work with. What that means as an end-result is that the products
60+
are purpose-built for the problem that they are solving.
6561

62+
However, Generative AI changes how we think about privacy. There's now a changing set of privacy requirements (new M&A deals, internal discussions means new terms to scan/redact) as well as different and varying document sources to contend with. PII detection is no longer just about compliance, it's an active - and for some, new - internal security threat for CISOs and Eng Leaders to contend with. We want DataFog to be built and driven to meet the needs of the open-source community as they tackle this challenge.
6663

6764
## Installation
6865

6966
DataFog can be installed via pip:
7067

7168
```bash
72-
pip install datafog
69+
pip install datafog
7370
```
7471

75-
and in your python environment:
72+
and in your python environment:
7673

7774
```
7875
from datafog import PresidioEngine as presidio
7976
```
8077

8178
## Examples
82-
Here are some examples of datafog being used to redact information in business contexts. Please see '/examples' for our [Getting Started](examples/getting-started.ipynb) notebook. We'll be regularly updating content and providing comprehensive guides to using DataFog in production contexts. If you have any ideas for a tutorial or guide that you would like to see, please let us know!
79+
80+
Here are some examples of datafog being used to redact information in business contexts. Please see '/examples' for our [Getting Started](examples/getting-started.ipynb) notebook. We'll be regularly updating content and providing comprehensive guides to using DataFog in production contexts. If you have any ideas for a tutorial or guide that you would like to see, please let us know!
8381

8482
```
8583
ceo_email_chunk = "I'm announcing on Friday that Jeff is going to be CTO."
@@ -94,18 +92,17 @@ Here are some examples of datafog being used to redact information in business c
9492
# PII Detected with deny list: [type: CUSTOM_PII, start: 50, end: 53, score: 1.0, type: PERSON, start: 30, end: 34, score: 0.85]
9593
9694
```
97-
## Contributing
9895

99-
DataFog is a community-driven **open-source** platform and we've been fortunate to have a small and growing contributor base. We'd love to hear ideas, feedback, suggestions for improvement - anything on your mind about what you think can be done to make DataFog better! Join our [Discord](https://discord.gg/bzDth394R4) and join our growing community.
96+
## Contributing
10097

98+
DataFog is a community-driven **open-source** platform and we've been fortunate to have a small and growing contributor base. We'd love to hear ideas, feedback, suggestions for improvement - anything on your mind about what you think can be done to make DataFog better! Join our [Discord](https://discord.gg/bzDth394R4) and join our growing community.
10199

102100
### Dev Notes
103101

104102
- Justfile commands:
105103
- `just format` to apply formatting.
106104
- `just lint` to check formatting and style.
107105

108-
109106
### Testing
110107

111108
To run the datafog unit tests, check out this repository and do
@@ -116,7 +113,6 @@ tox
116113
117114
```
118115

119-
120116
## License
121117

122118
This software is published under the [MIT

0 commit comments

Comments
 (0)