Check that DBFS doesn't contain anything which is non-standard #96

TheRealJimShady · 2024-04-12T10:20:20Z

Unless I'm mistaken, it would be possible for a privileged user to expand the accessibility of a table managed under UC by writing it to an arbitrary location in DBFS. This could be achieved in the following way;
spark.table("mycatalog.myschema.mydata").write.format("delta").save("dbfs:/users/jonsmith/mydata")
Would it be possible to expand on the DBFS checks to include a comparison between the structure as it is discovered with;

The known initial structure of DBFS
An additional user-defined exclusion list.

The text was updated successfully, but these errors were encountered:

ramdaskmdb · 2024-04-19T04:19:27Z

We could scan the entire dbfs:/ folder structure and store the number of objects detected per run. This would work well for a cleanish dbfs folder structure. However, if dbfs is used extensively during experimentation for storing objects, checkpoints, uploaded files, hive metastore managed tables, etc, the scanner could take a very long time depending on the number of objects. We could probably limit the scan to n objects. if > n objects then cap it and dont scan further. Increase in number of objects between runs could be flagged.

TheRealJimShady · 2024-04-19T07:25:12Z

Thanks for your response, I can see two approaches to implementing this

An exhaustive traversal of DBFS Root storage to build a complete map of the discovered structure which can be compared with that of the approved DBFS structure.
A fail-fast approach which recursively takes each path in DBFS and looks for an equivalent in the approved structure, if it doesn't exist the check exits and reports that there are extraneous paths in DBFS.

It would be possible to implement these as different modes which the user could choose from.
Thoughts?

ramdaskmdb · 2024-04-22T14:57:27Z

By default, dbfs:/ on a new workspace will only have the dbfs:/tmp folder. As long as the dbfs is relatively clean and small the traversal may be fast. Even if state is maintained between runs, if the tree is large it may take too long to run this each time. Let me run a few tests to see how long it may take. *___________________________________* *Ramdas Murali* Solutions Architect - 214.235.8353 | ***@***.***

…

On Fri, Apr 19, 2024 at 2:25 AM Jim Smith ***@***.***> wrote: Thanks for your response, I can see two approaches to implementing this 1. An exhaustive traversal of DBFS Root storage to build a complete map of the discovered structure which can be compared with that of the approved DBFS structure. 2. A fail-fast approach which recursively takes each path in DBFS and looks for an equivalent in the approved structure, if it doesn't exist the check exits and reports that there are extraneous paths in DBFS. It would be possible to implement these as different modes which the user could choose from. Thoughts? — Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANXPYAPNNFYG7G7BTNKEUE3Y6DBG3AVCNFSM6AAAAABGD4A2LCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRVHEZTEMBSGY> . You are receiving this because you commented.Message ID: <databricks-industry-solutions/security-analysis-tool/issues/96/2065932026 @github.com>

madcole · 2024-05-01T14:04:18Z

hi @ramdaskmdb any updates on this? it's still a blocker for the team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check that DBFS doesn't contain anything which is non-standard #96

Check that DBFS doesn't contain anything which is non-standard #96

TheRealJimShady commented Apr 12, 2024

ramdaskmdb commented Apr 19, 2024

TheRealJimShady commented Apr 19, 2024

ramdaskmdb commented Apr 22, 2024 via email

madcole commented May 1, 2024

Check that DBFS doesn't contain anything which is non-standard #96

Check that DBFS doesn't contain anything which is non-standard #96

Comments

TheRealJimShady commented Apr 12, 2024

ramdaskmdb commented Apr 19, 2024

TheRealJimShady commented Apr 19, 2024

ramdaskmdb commented Apr 22, 2024 via email

madcole commented May 1, 2024