Chalk Vision #15

viega · 2023-10-02T14:25:16Z

viega
Oct 2, 2023
Maintainer

Chalk Vision

Introduction

After launching Chalk last week, we needed to pause for a bit to tie up some loose ends, catch our breath, and turn toward planning. We’re still going to take a few days to come up with our own short-term priorities.

But in the mean time, we wanted to communicate more about where we’re planning to take Chalk… eventually. Our goal here is to help people understand better the principles we value and are building toward, and to provide a basis for conversation, anyone who wants to make any sort of contribution, whether it be code or just ideas and discourse.

There's far more implied here than we'll get to quickly. We probably won't even get detailed issues for items into the system up front; we'll start with things we're considering for short-term work. But if you are interested in one of these areas, do let us know, as we're always looking for more data from the user base.

Nothing in this document should be taken to indicate relative priority or timeline. We’re not working on all of this stuff right now. None of these categories are inherently prioritized in any way relative to each other.

We’ll definitely be communicating soon about shorter term goals too; just expect those priorities to map into these broad objectives.

Overview

If I were to sum up our guiding design principles here, it's would be:

Make the common needs trivially easy, and make uncommon needs pretty easy.

Plus, we obviously want to have the comprehensive functionality people would put to use. So we want to be great at metadata, collection and reporting in all environments.

As a result, you'll find below that we've grouped everything into seven broad objectives:

More metadata collection
Very flexible runtime data collection
Easy deployment everywhere
Better platform coverage
Strong Documentation
Command Line Usablity
Healthy Core

We hope we're doing well already toward all of these objectives, but there's plenty more we would like to do!

Broad Objective 1: More metadata collection

With our focus on trying to tie together data through an app's lifecycle, we certainly want to be on a never-ending treadmill to help collect data from important technologies. And, we want to make it easy to people to do the integrations they need for their own uses (and ideally contribute them back to the world).

In production, for instance, while we do have a reasonable start for local collection and AWS metadata collection, but there's more to do to there.

And, there is plenty we aren't doing at all, yet, such as cloud metadata for the other major public clouds. We don't do integration with k8s or cloud provider container runtimes.

Not to mention, we haven't paid enough attention to what other tools we should be integrating with in CI/CD for information. For instance, deep integrations w/ package management ecosystems seems likely to lead to some large benefits for users on a lot of fronts.

Additionally, we need to do a little more to facilitate people adding to our data collection facilities if they don't know Nim-- the API is designed so that you could link in plugins, but we would need a little bit more plumbing to actually leverage external plugins.

Additionally, there is metadata people would like to use Chalk to orchestrate collecting during CI/CD that we wouldn't support today, because it would slow build times too much. We should consider a general ability to support orchestration to collect out-of-band collection of metadata.

For instance, with a lot of security analysis tools some companies want to run on every build, like static analysis, we might want to support this via Chalk to facilitate the goal of tying everything together in as simple a way as possible.

Broad Objective 2: Very flexible runtime data collection

In our quest for better observability of software, Chalk currently allows very lightweight, limited data collection in production, at process startup and at a regular interval. However, there are some challenges there:

Noise. Current heart beat reports send the same metadata every heartbeat. This needs to change; for instance, @nettrino has proposed an invariant-based system we are considering, where basically you can set up keys to report only if the values change.
Missing data. Some data that can be worth collecting about an app or its environment tends to be fairly transient, for instance inbound network connections. That should be collectable.
Timeliness. For some data, it'd be would be better to be able to get it quickly as it is produced, where periodic reporting isn't as appropriate. For instance, we heard from serverless developers how big a pain it is to get debug logs; long waiting, then lots of grepping.
Post-deployment querying / configuring. A lot of questions people have around their apps can be best served with lightweight querying. Look at the success of OSQuery, despite the fact that it is fairly heavyweight... it works pretty well, with good controls in place to manage performance requirements. Still, it's not appropriate at the application level, especially in containerized or serverless environments.
Deeper introspection into the app. The more observability people get, the more value they see from it. At the system level, ebpf has shown the value of deep (but safe) introspection, but it is not viable in environments like serverless, Fargate, or really any container-based runtimes based on true virtualization. For instance, using Log4J as an example, everyone saw they had it on images all over the place, but what they really wanted to know is, "is it used?" Which can be cheap and easy to answer at the application level (if you've also solved item 4 above).

Of course, in production infrastructure, performance and cost is tantamount for all the items above. "Do no harm" should definitely be our mantra, with good controls to give people confidence on those issues.

For the above, do note that we've been working on 5) above as a separate project for quite a while (longer than Chalk), which is pretty far along. We need to make sure everything comes together well, there.

Broad Objective 3: Easy Deployment Everywhere

We definitely strive for friction-free deployment wherever possible; that goal led to our approach for handling docker builds, and our configuration strategy, there is a lot more that should be done, especially in making Chalk work as well as possible with a wide variety of technology stacks, not just docker builds.

There are multiple dimensions to this-- we should make it trivially easy to deploy across organizations using built in CI/CD tooling at Github/Gitlab, not to mention other popular CI/CD tools.

But we should also be thinking about how to make it easy to deploy in environments leveraging composition tools like docker bake or docker compose. Currently, without us integrating with those pieces, effectively deploying probably harder, because it would take more work than should be needed to tie data together.

Similarly, we aren't doing anything to support other container runtimes, which complicates deployments using them. We also don't cover all cases for the docker deployment we do cover, as a few scenarios will disable container wrapping (like multi-arch builds, which is pretty important in our view) and some unsupported features (like Heredocs) will cause us to fall back to a regular invocation of Docker.

Broad Objective 4: Better Platform Coverage

Today, we've focused on the platforms that are most common to the pain points we've heard in our initial investigations, namely production systems that are predominantly Linux based backend workloads, running on amd64 or arm64 platforms. We've also found a "quick and dirty" way to support MacOS, but more intended to facilitate development, since many of us use the OS.

Certainly, we know from the enthusiastic responses and asks we've gotten as we've previewed Chalk, that people would deeply benefit from having the same kind of observability in plenty of other environments. That includes both basic marking (and extraction) on other platforms, but also includes the ability to collect environmental data at runtime, where doing so performantly and safely is feasible.

For instance:

While the current platform can be made to work with serverless, we would like to do a lot more to ensure seamless integration with a variety of environments.
We're still seeing some interest in Windows support, both for being able to handle PE binaries (particularly for .NET), and for being able to do runtime chalk stuff in those environments.
We've heard from developers that there would be a lot of value in being able to bring chalk capabilities into the browser. We have done some work to figure out how we'd approach marking and extraction, but haven't really done much toward this yet.
Even if just for the technical challenge, we'd love first class support for OS X; we have a solution that works well enough, but it should properly mark the binary; instead, it (mostly transparently) ensures the mark and the binary in sync. This is because OS X is very particular around binary integrity, which we appreciate. We should be playing by the OS's rules to do it the "right" way.
We've had people express interest in mobile architectures and runtime environments that aren't "always on".

One challenge in some of these scenarios is that the architecture pretty deeply embeds a "posix" assumption. Perhaps that's not too bad, as Windows is the big outlier these days, and WSL2 is probably is a fairly achievable target without an ungainly amount of work.

For this objective, we currently have a preference for production usage, over environments that are more user-centric.

Broad Objective 5: Strong Documentation

We've definitely written a lot of documentation so far, and think it's important for docs to be easy to find and understandable. We also think there need to be both sufficient reference-style docs, and sufficient task-oriented docs (tutorials and how-tos).

And, we want to minimize outdated documentation. In the code, we have tried to make that task more manageable for user docs by keeping docs inline next to the config options and flags.

Unfortunately, as I write this, we're completely lacking in developer documentation. Yet, Chalk is actually a fairly sizable project, to the point where code plus comments is nowhere near enough to help make it easy for people to contribute.

The people who contribute will undoubtedly want to understand the "lay of the land" so to speak, meaning that they'll want to have a reasonable understanding of the architecture, and how they would plug into it.

Some reference documentation would be valuable, but it will be more important for people to have task-oriented how-to guides to tackle the enhancements they're most likely to want to make.

At the very least, we should probably have clear guidance on:

How to support adding new metadata sources of various kinds. Beyond the logistics of it in NIM and/or con4m, we need to discuss the non-Nim API. Additionally, we need to help people understand how to think about the existing metadata keys, vs. new ones.
The approach / processes for integrating third party SAST / SBOM tools.
How to add support for chalking new types of artifacts, if people want to help tackle things we haven't gotten to yet, like Windows binaries, in-browser code, etc.
How to add new output sinks, for instance, for other cloud object storage, databases, etc.
An explainer for the attestation code, and a sense of where to start if looking to extend (either by adding integration to other external secret managers, or by moving the Sigstore integration internally).

All of the above will need to be supported by an overview on how to work with the config files.

Broad Objective 6: Command line usability

For configuration

Ideally, we wouldn't want to directly expose our configuration language to anyone except power users (the config language is not-so-secret sauce for making the uncommon things easy).

And we've done a lot of work toward hiding it. For instance, the chalk load command can take a URL. But, off-the-shelf configs need to be tweaked... people need parameters. And they shouldn't have to drop into a text editor to provide them.

People should be able to configure based on the problem they're going to solve. If they want to tackle something like SBOM implementation, they shouldn't have to navigate a lot of configuration that's irrelevant to the problem. It should be easy to find the configuration UI they need, whatever it is.

Any UI we provide here needs to be not only incredibly easy to use, it needs to be easy to keep in sync w/ any changes to chalk over time.

Eventually, 'wizard'-like interfaces or other similar things might be great, but just being able to reliably configure stuff easily from the command line would need to come first!

For help

We've worked hard to make sure we can provide consistent help in-command and on the web. But the UX for the in-app help did NOT get finished.

The framework is all there to make help easy to search, and easy to browse individual help items. The help's all available. However, we have not finished the presentation layer (everything looks ugly with lots of formatting problems, and while full text search works, it can produce an unmanageable mess of results at the moment).

Beyond that, we should also be considering how much value people would get if we had better "navigation" primitives, both for the help system, and possibly for operating the command in general (e.g., some basic TUI options, probably via wrapping notcurses).

Other commands

We definitely want to improve the user experience for any common command-line operations. For instance, when using the chalk extract command to view data about artifacts in real time. Based on early feedback, we learned most of the time it's better to only show small summary reports for command line operation; but when people do want them, grepping a log and piping it to jq isn't a great, easy to use experience for me personally.

We could imagine a chalk log command that focuses on just removing the cognitive burden, a full TUI interface, both, or none of the above, based on feedback.

Broad Objective 7: Healthy Core

We're committed to making sure our core enables developers to support our functional and non-functional requirements, in a way that is pretty easy for them.

We have built the Chalk core targeting it being:

Highly usable, yet
Very powerful / flexible
Robust
Performant

Still, there are many ways in which we can be advancing the above goals. First, key to giving us the flexibility / power while helping make it easy to deliver on usability, is our underlying configuration language con4m.

While con4m is more than good enough for our initial Chalk release, it is nowhere near what we would consider good enough for everyone else, as it mainly got enough attention to "jump the bar" for where Chalk is today, and no more.

We're not in a hurry to get con4m ready for projects outside our own, but we are committed to keeping it in good shape for chalk, and to design changes in a way that could be used by other projects from day one.

There are lots of warts we should smooth over (many of them intentional choices due to timeline), like syntax that should work better. And we focused on robustness over performance, since performance was clearly good enough for an initial release.

Just in the configuration system, there are plenty of big wins that wouldn't take too much work. For instance, most of the embedded configuration can be pre-computed with a little bit of refactoring, which would eliminate most of an (already small) startup cost.

But outside of the configuration, there are plenty such items.

For instance, there are places that, due to expediency, we used dependencies that weren't quite ideal for the use case. For instance, when doing signing of containers and other artifacts, we are currently dependent on Sigstore's cosign, which is a great tool, but is itself large and may need to be downloaded often, depending on the deployment, which isn't great when build times are crucial. Yet! Most of the core primitives to implement what we need from Sigstore are already available to us via the underlying cryptography library. We just need to implement enough pieces on top of it to provide what we need from Sigstore without being dependent on another library.

Another example where our (few) outside dependencies aren't good enough is with our Zip file handling. It currently isn't easy to do what we need to do without heavy use of tmp files, which is an issue for disk usage, among other things. We could bring in a different dependency to balance our needs better, some work required, of course.

The Zip issue overlaps with another potential concern that can impact robustness... memory footprint. In designing the initial release, we didn't worry too much about it-- virtual memory is pretty effective! But if we want to do better for more constrained environments, we can do a lot better here (for instance, even ensuring the large embedded documentation corpus lives in static memory would be a decent win if memory becomes an issue).

Conclusion

Sure, we're excited about what we've built, and we love the use and the feedback we're already getting. But, we've got a lot we want to do here. We're planning to be actively developing Chalk for a very long time.

We're going to focus on the above vision. Today, we'll be populating the issues with the seven objectives above, and as we start working on things, we'll have any feature tickets link back to those objectives.

We'd certainly love contributions, but feedback and discussion about what Chalk does, what it could do, and what it should do, is also tremendously valuable to help us prioritize and evolve our goals.

So please, don't hesitate to reach out to us to discuss!

viega · 2023-10-02T17:21:25Z

viega
Oct 2, 2023
Maintainer Author

Each one of these objectives now has its own ticket. But I've temporarily locked those; any feedback can come to this thread for now, or brought to us privately if you prefer. Definitely don't want to spread any conversation across multiple places.

1 reply

trishankkarthik Oct 3, 2023

Excited to try Chalk, and see where it goes! One thing I might add to Broad Objective 1: More metadata collection is support for de facto standards such as in-toto attestations. Would be really nice to collect metadata in a shared way.

Cc fellow in-toto Steering Committee members @SantiagoTorres @JustinCappos @colek42 @06kellyjac

viega · 2023-10-03T04:10:21Z

viega
Oct 3, 2023
Maintainer Author

Note that we are using the in-toto standard for signing today, and are fans. Definitely could collect any other signatures found too… good suggestion!

…

On Mon, Oct 2, 2023 at 10:52 PM Trishank Karthik Kuppusamy < ***@***.***> wrote: Excited to try Chalk, and see where it goes! One thing I might add to Broad Objective 1: More metadata collection is support for de facto standards such as in-toto attestations <https://github.com/in-toto/attestation>. Would be really nice to collect metadata in a shared way. Cc fellow in-toto Steering Committee members @SantiagoTorres <https://github.com/SantiagoTorres> @JustinCappos <https://github.com/JustinCappos> @colek42 <https://github.com/colek42> @06kellyjac <https://github.com/06kellyjac> — Reply to this email directly, view it on GitHub <#15 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABELGQKDUUU23TYDNVYSXHLX5N4V3AVCNFSM6AAAAAA5PQMX7OVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TCNZRGEYDG> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

trishankkarthik Oct 3, 2023

Good to know! This highlights how I need to try Chalk sooner than later 😄 Yes, I suspect the other types of attestations besides provenance will be useful for you, too...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chalk Vision #15

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Chalk Vision #15

viega Oct 2, 2023 Maintainer

Chalk Vision

Introduction

Overview

Broad Objective 1: More metadata collection

Broad Objective 2: Very flexible runtime data collection

Broad Objective 3: Easy Deployment Everywhere

Broad Objective 4: Better Platform Coverage

Broad Objective 5: Strong Documentation

Broad Objective 6: Command line usability

For configuration

For help

Other commands

Broad Objective 7: Healthy Core

Conclusion

Replies: 2 comments · 2 replies

viega Oct 2, 2023 Maintainer Author

trishankkarthik Oct 3, 2023

viega Oct 3, 2023 Maintainer Author

trishankkarthik Oct 3, 2023

viega
Oct 2, 2023
Maintainer

Replies: 2 comments 2 replies

viega
Oct 2, 2023
Maintainer Author

viega
Oct 3, 2023
Maintainer Author