Skip to content

Commit 6fbc18b

Browse files
committed
✨ Add manage data spills page
1 parent 17f92dc commit 6fbc18b

File tree

1 file changed

+141
-0
lines changed

1 file changed

+141
-0
lines changed

docs/data/data_spill.rst

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
Data Spills
2+
===========
3+
4+
A data spill is the accidental or deliberate exposure of information into an
5+
uncontrolled or unauthorised environment, or to persons without a need-to-know.
6+
7+
There are many examples of data spills, but for the purposes of this guide, we will focus
8+
on the exposure of sensitive clinical research data in a public GitHub repository
9+
and what to do if this happens.
10+
11+
What is Sensitive Data?
12+
-----------------------
13+
Even though the Kids First project does NOT currently include PHI
14+
(protected health information) data, it does still include data that is
15+
considered sensitive and cannot be exposed to the public.
16+
17+
Sensitive data in the Kids First project is any clinical research data
18+
that has not been approved by the Kids First (Data Coordinating Center) DCC
19+
for public release.
20+
21+
Examples of Kids First sensitive data include but are not limited to:
22+
23+
- A participant's demographics such as gender, ethnicity, race, ethnicity
24+
- A participant's biospecimen info such as tissue type, anatomical site
25+
- A participant's diagnosis info such as the diagnosis name
26+
- A participant's genomic data such as DNA sequencing files
27+
28+
*Note - a Participant is person participating in a Kids First research study*
29+
30+
31+
What is NOT Sensitive Data?
32+
---------------------------
33+
34+
Any Kids First clinical research data that has been approved by the Kids First DCC for public release
35+
36+
Identifiers (non-PHI of course) such as Kids First IDs (i.e. PT_00001111),
37+
IDs in the raw clinical data provided by Kids First researchers
38+
(i.e. PID0001, SS-H02, etc.)
39+
40+
One caveat is that you can have sensitive data inside a **private Kids First
41+
GitHub repository**. Since the repository is private and within the Kids First
42+
GitHub organization it is in a controlled environment with limited exposure
43+
to appropriate persons.
44+
45+
Manage a Data Spill
46+
-------------------
47+
48+
What should you do if you accidentally pushed sensitive data to a public GitHub
49+
repository? Let's take a real scenario that recently happened::
50+
51+
52+
You finish developing a feature branch, make a pull request against the
53+
master branch, get that request approved and merge the feature branch into
54+
master.
55+
56+
Two days go by and you finally realize the output of one your unit
57+
tests accidentally made it into the pull request that merged into master.
58+
That output contained clinical research data from one of the Kids First
59+
studies 😳.
60+
61+
62+
Checklist
63+
^^^^^^^^^
64+
65+
1. **Notify Manager/Team**
66+
Let the appropriate people know as soon as possible.
67+
68+
Email or send a message on Slack to Allison Heath
69+
([email protected]) or your manager. Include the Kids First Technical
70+
Project Manager, Bailey Farrow ([email protected]) on the message
71+
72+
If you are not the owner of the repository where the sensitive data
73+
was pushed, then also let the owner know. You will need their help to
74+
do the clean up.
75+
76+
2. **Notify Consumers and Contributors**
77+
78+
Work with the repository owner to notify anyone who might have cloned or
79+
forked the repository. Let them know that they should
80+
refrain from pulling from or pushing anything to the repository on GitHub
81+
until further notice is given. Later on you'll need to notify them on how
82+
to proceed with use of the code or development.
83+
84+
3. **Make the GitHub repository Private**
85+
86+
Ask the owner of the repository to make it private or do it yourself
87+
if you have privileges.
88+
89+
4. **Notify GitHub Support ([email protected])**
90+
91+
If the sensitive data was part of any pull requests, you will need to
92+
contact GitHub Support to help remove all traces of the data. You
93+
should do this first, **BEFORE** following GitHub's steps to clean up your
94+
repo history (step 4 of this list).
95+
96+
Example Email::
97+
98+
Hello,
99+
100+
I am emailing to ask for help in removing sensitive data
101+
that was pushed to a public GitHub repository. I need GitHub's help
102+
to remove cached views and references to the sensitive data in pull
103+
requests on GitHub.
104+
105+
Details:
106+
107+
Repository: <link to repo on GitHub>
108+
Files to Remove:
109+
- <URL to files in GitHub>
110+
Pull Request where files were introduced: <link to PR on GitHub>
111+
112+
<Any other pertinent information>
113+
114+
Thank you very much in advance!
115+
116+
117+
5. **Clean up Repository History**
118+
119+
**Do not begin this step until** after GitHub support confirms they have
120+
deleted the affected pull requests.
121+
122+
Follow GitHub's recommended steps `here <https://help.github.com/en/articles/removing-sensitive-data-from-a-repository>`_
123+
to remove the sensitive data from your repository's history.
124+
125+
GitHub recommends using the open source repo cleaner tool `BFG`, which
126+
is simple, fast, and works well.
127+
128+
In the last step of the clean up where you need to push the clean
129+
history to the remote, you may need to have the repository owner
130+
temporarily lift the force push protection on the master branch.
131+
132+
6. Notify People Cleanup is Complete
133+
Notify people from steps 1 and 2 that the clean up is complete
134+
135+
For people in step 2, let them know the repository's history has been
136+
cleaned up/overwritten, ask them to delete any clones or forks they have
137+
and pull down new ones.
138+
139+
7. **Fill out an Incident Report**
140+
141+
TODO - Instructions and link to incident report template

0 commit comments

Comments
 (0)