-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check that DBFS doesn't contain anything which is non-standard #96
Comments
We could scan the entire dbfs:/ folder structure and store the number of objects detected per run. This would work well for a cleanish dbfs folder structure. However, if dbfs is used extensively during experimentation for storing objects, checkpoints, uploaded files, hive metastore managed tables, etc, the scanner could take a very long time depending on the number of objects. We could probably limit the scan to n objects. if > n objects then cap it and dont scan further. Increase in number of objects between runs could be flagged. |
Thanks for your response, I can see two approaches to implementing this
It would be possible to implement these as different modes which the user could choose from. |
By default, dbfs:/ on a new workspace will only have the dbfs:/tmp folder.
As long as the dbfs is relatively clean and small the traversal may be
fast. Even if state is maintained between runs, if the tree is large it
may take too long to run this each time. Let me run a few tests to see how
long it may take.
*___________________________________*
*Ramdas Murali*
Solutions Architect -
214.235.8353 | ***@***.***
…On Fri, Apr 19, 2024 at 2:25 AM Jim Smith ***@***.***> wrote:
Thanks for your response, I can see two approaches to implementing this
1. An exhaustive traversal of DBFS Root storage to build a complete
map of the discovered structure which can be compared with that of the
approved DBFS structure.
2. A fail-fast approach which recursively takes each path in DBFS and
looks for an equivalent in the approved structure, if it doesn't exist the
check exits and reports that there are extraneous paths in DBFS.
It would be possible to implement these as different modes which the user
could choose from.
Thoughts?
—
Reply to this email directly, view it on GitHub
<#96 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANXPYAPNNFYG7G7BTNKEUE3Y6DBG3AVCNFSM6AAAAABGD4A2LCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRVHEZTEMBSGY>
.
You are receiving this because you commented.Message ID:
<databricks-industry-solutions/security-analysis-tool/issues/96/2065932026
@github.com>
|
hi @ramdaskmdb any updates on this? it's still a blocker for the team |
Unless I'm mistaken, it would be possible for a privileged user to expand the accessibility of a table managed under UC by writing it to an arbitrary location in DBFS. This could be achieved in the following way;
spark.table("mycatalog.myschema.mydata").write.format("delta").save("dbfs:/users/jonsmith/mydata")
Would it be possible to expand on the DBFS checks to include a comparison between the structure as it is discovered with;
The text was updated successfully, but these errors were encountered: