|
| 1 | +Data Spills |
| 2 | +=========== |
| 3 | + |
| 4 | +A data spill is the accidental or deliberate exposure of information into an |
| 5 | +uncontrolled or unauthorised environment, or to persons without a need-to-know. |
| 6 | + |
| 7 | +There are many examples of data spills, but for the purposes of this guide, we will focus |
| 8 | +on the exposure of sensitive clinical research data in a public GitHub repository |
| 9 | +and what to do if this happens. |
| 10 | + |
| 11 | +What is Sensitive Data? |
| 12 | +----------------------- |
| 13 | +Even though the Kids First project does NOT currently include PHI |
| 14 | +(protected health information) data, it does still include data that is |
| 15 | +considered sensitive and cannot be exposed to the public. |
| 16 | + |
| 17 | +Sensitive data in the Kids First project is any clinical research data |
| 18 | +that has not been approved by the Kids First (Data Coordinating Center) DCC |
| 19 | +for public release. |
| 20 | + |
| 21 | +Examples of Kids First sensitive data include but are not limited to: |
| 22 | + |
| 23 | + - A participant's demographics such as gender, ethnicity, race, ethnicity |
| 24 | + - A participant's biospecimen info such as tissue type, anatomical site |
| 25 | + - A participant's diagnosis info such as the diagnosis name |
| 26 | + - A participant's genomic data such as DNA sequencing files |
| 27 | + |
| 28 | +*Note - a Participant is person participating in a Kids First research study* |
| 29 | + |
| 30 | + |
| 31 | +What is NOT Sensitive Data? |
| 32 | +--------------------------- |
| 33 | + |
| 34 | +Any Kids First clinical research data that has been approved by the Kids First DCC for public release |
| 35 | + |
| 36 | +Identifiers (non-PHI of course) such as Kids First IDs (i.e. PT_00001111), |
| 37 | +IDs in the raw clinical data provided by Kids First researchers |
| 38 | +(i.e. PID0001, SS-H02, etc.) |
| 39 | + |
| 40 | +One caveat is that you can have sensitive data inside a **private Kids First |
| 41 | +GitHub repository**. Since the repository is private and within the Kids First |
| 42 | +GitHub organization it is in a controlled environment with limited exposure |
| 43 | +to appropriate persons. |
| 44 | + |
| 45 | +Manage a Data Spill |
| 46 | +------------------- |
| 47 | + |
| 48 | +What should you do if you accidentally pushed sensitive data to a public GitHub |
| 49 | +repository? Let's take a real scenario that recently happened:: |
| 50 | + |
| 51 | + |
| 52 | + You finish developing a feature branch, make a pull request against the |
| 53 | + master branch, get that request approved and merge the feature branch into |
| 54 | + master. |
| 55 | + |
| 56 | + Two days go by and you finally realize the output of one your unit |
| 57 | + tests accidentally made it into the pull request that merged into master. |
| 58 | + That output contained clinical research data from one of the Kids First |
| 59 | + studies 😳. |
| 60 | + |
| 61 | + |
| 62 | +Checklist |
| 63 | +^^^^^^^^^ |
| 64 | + |
| 65 | +1. **Notify Manager/Team** |
| 66 | + Let the appropriate people know as soon as possible. |
| 67 | + |
| 68 | + Email or send a message on Slack to Allison Heath |
| 69 | + ( [email protected]) or your manager. Include the Kids First Technical |
| 70 | + Project Manager, Bailey Farrow ( [email protected]) on the message |
| 71 | + |
| 72 | + If you are not the owner of the repository where the sensitive data |
| 73 | + was pushed, then also let the owner know. You will need their help to |
| 74 | + do the clean up. |
| 75 | + |
| 76 | +2. **Notify Consumers and Contributors** |
| 77 | + |
| 78 | + Work with the repository owner to notify anyone who might have cloned or |
| 79 | + forked the repository. Let them know that they should |
| 80 | + refrain from pulling from or pushing anything to the repository on GitHub |
| 81 | + until further notice is given. Later on you'll need to notify them on how |
| 82 | + to proceed with use of the code or development. |
| 83 | + |
| 84 | +3. **Make the GitHub repository Private** |
| 85 | + |
| 86 | + Ask the owner of the repository to make it private or do it yourself |
| 87 | + if you have privileges. |
| 88 | + |
| 89 | +4. ** Notify GitHub Support ([email protected])** |
| 90 | + |
| 91 | + If the sensitive data was part of any pull requests, you will need to |
| 92 | + contact GitHub Support to help remove all traces of the data. You |
| 93 | + should do this first, **BEFORE** following GitHub's steps to clean up your |
| 94 | + repo history (step 4 of this list). |
| 95 | + |
| 96 | + Example Email:: |
| 97 | + |
| 98 | + Hello, |
| 99 | + |
| 100 | + I am emailing to ask for help in removing sensitive data |
| 101 | + that was pushed to a public GitHub repository. I need GitHub's help |
| 102 | + to remove cached views and references to the sensitive data in pull |
| 103 | + requests on GitHub. |
| 104 | + |
| 105 | + Details: |
| 106 | + |
| 107 | + Repository: <link to repo on GitHub> |
| 108 | + Files to Remove: |
| 109 | + - <URL to files in GitHub> |
| 110 | + Pull Request where files were introduced: <link to PR on GitHub> |
| 111 | + |
| 112 | + <Any other pertinent information> |
| 113 | + |
| 114 | + Thank you very much in advance! |
| 115 | + |
| 116 | + |
| 117 | +5. **Clean up Repository History** |
| 118 | + |
| 119 | + **Do not begin this step until** after GitHub support confirms they have |
| 120 | + deleted the affected pull requests. |
| 121 | + |
| 122 | + Follow GitHub's recommended steps `here <https://help.github.com/en/articles/removing-sensitive-data-from-a-repository>`_ |
| 123 | + to remove the sensitive data from your repository's history. |
| 124 | + |
| 125 | + GitHub recommends using the open source repo cleaner tool `BFG`, which |
| 126 | + is simple, fast, and works well. |
| 127 | + |
| 128 | + In the last step of the clean up where you need to push the clean |
| 129 | + history to the remote, you may need to have the repository owner |
| 130 | + temporarily lift the force push protection on the master branch. |
| 131 | + |
| 132 | +6. Notify People Cleanup is Complete |
| 133 | + Notify people from steps 1 and 2 that the clean up is complete |
| 134 | + |
| 135 | + For people in step 2, let them know the repository's history has been |
| 136 | + cleaned up/overwritten, ask them to delete any clones or forks they have |
| 137 | + and pull down new ones. |
| 138 | + |
| 139 | +7. **Fill out an Incident Report** |
| 140 | + |
| 141 | + TODO - Instructions and link to incident report template |
0 commit comments