Skip to content

Add -g option for custom group attribute#2395

Open
ankor2023 wants to merge 7 commits intosquid-cache:masterfrom
ankor2023:negotiate_kerberos_auth-add-group-annotation-variable-name-option
Open

Add -g option for custom group attribute#2395
ankor2023 wants to merge 7 commits intosquid-cache:masterfrom
ankor2023:negotiate_kerberos_auth-add-group-annotation-variable-name-option

Conversation

@ankor2023
Copy link
Copy Markdown
Contributor

@ankor2023 ankor2023 commented Mar 26, 2026

Changes:

  • Added -g option:
    Allows specifying the annotation attribute name used by the helper
    to return groups. This enables using attributes like clt_conn_tag
    for group annotation within a connection.

  • Updated helper output format:
    Groups are now returned in a single key with comma separated values:

    group=group1,group2,group3

Efficiency:
The new format reduces overhead by removing redundant attribute names
for each group, allowing more groups to fit within the same buffer size.

Add -g option for group attribute name and update helper output format
@squid-anubis squid-anubis added the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Mar 26, 2026
@squid-anubis

This comment was marked as resolved.

@ankor2023 ankor2023 changed the title Add -g option for custom group attribute and optimize helper output format Add -g option for custom group attribute and optimize output format Mar 26, 2026
@squid-anubis

This comment was marked as resolved.

@ankor2023 ankor2023 changed the title Add -g option for custom group attribute and optimize output format Add -g option for custom group attribute Mar 26, 2026
@squid-anubis

This comment was marked as resolved.

@squid-anubis

This comment was marked as resolved.

@squid-anubis squid-anubis removed the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Mar 26, 2026
@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label Mar 26, 2026
Copy link
Copy Markdown
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to send list syntax instead of separate notes has been on my TODO list for a long while. Thank you for implementing.

However, IMO we should not make the key name configurable;

  • first because "group=" has documented semantics and special handling associated to ensure the behaviour happens, and
  • changing the name looses all indications that the values presented are group names, and
  • secondly because in the use-case where "clt_conn_tag=" is useful, it is likely that "group=" will be needed at the same time (eg. by other helpers).
    • What that case actually needs is all notes from an Negotiate and NTLM auth'n helper applied to the connection notes. Which is a separate feature change out of scope here.

Fix review comments
Fix review comments
Fix review comments
Fix review comments
@ankor2023
Copy link
Copy Markdown
Contributor Author

However, IMO we should not make the key name configurable;

  • first because "group=" has documented semantics and special handling associated to ensure the behaviour happens, and
  • changing the name looses all indications that the values presented are group names, and
  • secondly because in the use-case where "clt_conn_tag=" is useful, it is likely that "group=" will be needed at the same time (eg. by other helpers).

We provide administrators with the ability to customize helper behavior while keeping the default behavior unchanged.

As an alternative, we could maintain the current 'group' annotation while duplicating its value to an additional attribute under the -a flag.

Or would it be better to wait for a more generic functionality to copy all annotations from authentication helpers?

@ankor2023
Copy link
Copy Markdown
Contributor Author

I'm also planning a subsequent PR to implement group filtering. In large environments, Kerberos tickets often contain 200–300 groups, while only a few are actually used in Squid policies. Filtering them out would significantly reduce overhead, improve performance, and simplify policy administration.
Would this feature be consistent with the current development roadmap?

@rousskov rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Mar 27, 2026
@rousskov rousskov self-requested a review March 27, 2026 12:51
@rousskov
Copy link
Copy Markdown
Contributor

I'm also planning a subsequent PR to implement group filtering. In large environments, Kerberos tickets often contain 200–300 groups, while only a few are actually used in Squid policies. Filtering them out would significantly reduce overhead, improve performance, and simplify policy administration. Would this feature be consistent with the current development roadmap?

My answer would depend on whether that filtering is universal or highly custom:

  • If virtually none of the deployed negotiate_kerberos_auth helpers would be hurt by that filtering and many would benefit from it, then it would be worth discussing such a filtering algorithm further (preferably before a PR with a lot of code changes needs to be reviewed!).
  • Otherwise, the algorithm probably belongs to a custom helper rather than official Squid sources.

Please note that an algorithm that can be easily implemented as helper output filter, should probably not go into the helper:

negotiate_kerberos_auth ... | sed 's/,groupFoo,/,/'

The above sketch illustrates the concept of an external filter. It is not meant to be used "as is", of course.

@ankor2023
Copy link
Copy Markdown
Contributor Author

ankor2023 commented Mar 28, 2026

@rousskov

  • If virtually none of the deployed negotiate_kerberos_auth helpers would be hurt by that filtering and many would benefit from it, then it would be worth discussing such a filtering algorithm further (preferably before a PR with a lot of code changes needs to be reviewed!).

I would like to implement a new -f option that accepts a comma-separated list of groups (base64-encoded SIDs or standard SID strings (S-1-5-...)) to pass to Squid. All other groups will be filtered out:
-f SID1,SID2,SID3

I also intend to add SID-to-name mapping:
-f SID1:groupVIP,SID2:groupADM,SID3:groupEMPL

With this, the helper will return group names to Squid instead of SIDs.
This will allow administrators to use readable names in policies instead of cryptic SIDs, making them much easier to understand.
Implementing this directly in the helper's C code will improve performance when handling high loads.

Alex, where’s the best place to discuss the filtering logic?

@ankor2023
Copy link
Copy Markdown
Contributor Author

ankor2023 commented Mar 28, 2026

@yadij

  • What that case actually needs is all notes from an Negotiate and NTLM auth'n helper applied to the connection notes. Which is a separate feature change out of scope here.

Amos, I agree that if this is implemented in the near future, the -g/-a option we're discussing will become redundant. I took a quick look at the Squid codebase, but it's a bit too complex for me to handle this PR on my own.

I’ve located the function that processes helper annotations. However, how do we distinguish whether the response was received from the Negotiate or NTLM helper specifically?

void
UpdateRequestNotes(ConnStateData *csd, HttpRequest &request, NotePairs const &helperNotes)
{
    // Tag client connection if the helper responded with clt_conn_tag=tag.
    const char *cltTag = "clt_conn_tag";
    if (const char *connTag = helperNotes.findFirst(cltTag)) {
        if (csd) {
            csd->notes()->remove(cltTag);
            csd->notes()->add(cltTag, connTag);
        }
    }
    request.notes()->replaceOrAdd(&helperNotes);
}

On the other hand, annotations from Basic auth helpers should also be connection-bound. Thus, we can create a new annotation update function UpdateAuthRequestNotes() for auth helpers and call it from authTryGetUser().
Something like this:

void
UpdateAuthRequestNotes(ConnStateData *csd, HttpRequest &request, NotePairs const &helperNotes)
{
    if (csd) {
        csd->notes()->replaceOrAdd(&helperNotes);
    }
    request.notes()->replaceOrAdd(&helperNotes);
}

@rousskov
Copy link
Copy Markdown
Contributor

I would like to implement a new -f option that accepts a comma-separated list of groups (base64-encoded SIDs or standard SID strings (S-1-5-...)) to pass to Squid. All other groups will be filtered out: -f SID1,SID2,SID3

I assume that the primary purpose of this feature is to reduce the amount of work Squid has to do when the helper is dealing with a very large number of unused-by-Squid groups. I suspect that using a post-helper filtering script would suffice for this purpose (and, hence, would be better), but I will not veto a quality implementation of this feature if others are convinced that it is a common use case worth officially supporting.


I also intend to add SID-to-name mapping: -f SID1:groupVIP,SID2:groupADM,SID3:groupEMPL

With this, the helper will return group names to Squid instead of SIDs. This will allow administrators to use readable names in policies instead of cryptic SIDs, making them much easier to understand.

I am not convinced this mapping feature is worth supporting officially because an equivalent mapping is already supported in Squid.

A: Existing approach:

auth_param negotiate program negotiate_kerberos_auth ...
acl groupVIP note group SID1
acl groupADM note group SID2
http_access allow groupVIP
http_access deny groupADM

B: Proposed approach:

auth_param negotiate program negotiate_kerberos_auth ... -f SID1:groupVIP,SID2:groupADM
acl groupVIP note group groupVIP
acl groupADM note group groupADM
http_access allow groupVIP
http_access deny groupADM

Proposed approach B duplicates information while existing approach A is equally expressive and allows for more flexibility, better documentation/comments, and arguably clearer mapping. Am I missing some additional benefits of the proposed approach that would justify making its support official?

Implementing this directly in the helper's C code will improve performance when handling high loads.

Note sure why C code is particularly relevant here -- the helper, "sed" (or an equivalent external filter), and Squid are all written (or could be written) in C/C++. One could argue that mapping one thousand SID values to a single group before that information reaches Squid would make Squid job easier, but you have not made that argument (and "sed" or an equivalent custom script can do the same before-Squid mapping, with more custom environment-specific features, and without touching official code).

@rousskov
Copy link
Copy Markdown
Contributor

@ankor2023: How do we distinguish whether the response was received from the Negotiate or NTLM helper specifically?

We do not want to distinguish different helpers! The distinction must come from helper output, so that all helpers (and other annotation sources) are supported the same way. We have already drafted an implementation of the corresponding feature. Here is the corresponding documentation diff:

-	  clt_conn_tag=TAG
+	  clt_conn_*=TAG
		Associates a TAG with the client TCP connection.

-		The clt_conn_tag=TAG pair is treated as a regular transaction
+		Each clt_conn_*=TAG pair is treated as a regular transaction
		annotation for the current request and also annotates future
		requests on the same client connection. A helper may update
		the TAG during subsequent requests by returning a new kv-pair.

Our code passed initial tests, but its polishing is currently stuck due to Squid Project backlog. I hope we will merge it eventually. No corresponding PRs are welcome at this time.

@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label Mar 30, 2026
@rousskov
Copy link
Copy Markdown
Contributor

Alex, where’s the best place to discuss the filtering logic?

We are supposed to use squid-dev mailing list for that. Discussing code using good old plain text email is a bit clunky. AFAIK, Squid Project is in the process of migrating away from that mailing list to GitHub Issues. That migration has not happened yet.

IMHO, in many cases, posting a documentation-changing draft PR (without any code changes!) would work better than a posting to the mailing list, but it is not common practice yet, and, in some cases, a general discussion can save time (e.g., when the proposed feature is already being implemented by others). In addition to being unusual, using draft documentation-change-only PRs clashes with the the current Squid Project backlog that complicates handling any new PRs correctly.

In summary, we should be using squid-dev mailing list, but what you have been doing is, IMHO, borderline acceptable as well (due to complications outlined above).

@yadij
Copy link
Copy Markdown
Contributor

yadij commented Mar 31, 2026

@ankor2023: How do we distinguish whether the response was received from the Negotiate or NTLM helper specifically?

@rousskov: We do not want to distinguish different helpers!

What I am considering here is that the NTLM and Negotiate are TCP connection authentication, transferred over HTTP instead of normal HTTP transactional auth. Which means the user= and group= details (at least) should always be associated with the TCP connection and available to any future transaction received there. All other helper types are doing HTTP transaction al things, so should not have this behaviour as default.

Anyways;
My proposal would be to adjust UpdateRequestNotes() to accept a filter parameter which is a list of key names to treat the same as clt_conn_tag.
It does not matter to me if Negotiate/NTLM APIs are the only helper APIs to use that parameter to send explicit {"user","group"}. Or of we "treat all the same" - by adding a squid.conf option to make the filter list of keys a configuration option each helper API passes in. Or both.
This feature is entirely with in squid and configurable in squid.conf, no helper changes necessary.

@ankor2023: regarding the "-f" proposal. Seems like it might be a nice UX for some admin. Though I suggest discussing it with the helper author (Markus Moeller) to have it done in his upstream helper since "-f" does not need any squid changes.

@rousskov
Copy link
Copy Markdown
Contributor

What I am considering here is that the NTLM and Negotiate are TCP connection authentication, transferred over HTTP instead of normal HTTP transactional auth. Which means the user= and group= details (at least) should always be associated with the TCP connection and available to any future transaction received there. All other helper types are doing HTTP transaction al things, so should not have this behaviour as default.

My worry is that such a change of the default behavior for all annotations will break those existing deployments that do not meet the above expectations. When we hit those cases, we will be forced to add a yet another layer of complexity to let admins disable that special default behavior and go back to the current default.

There is also value in making all helpers to be handled the same as far as custom (e.g., not group) annotations are concerned.

Finally, there may be a meaningful difference between "a transaction that makes TCP connection authenticated" and "a transaction on a previously authenticated TCP connection". Treating all custom annotations as client connection annotations does not support such differences.

My proposal would be to adjust UpdateRequestNotes() to accept a filter parameter which is a list of key names to treat the same as clt_conn_tag. It does not matter to me if Negotiate/NTLM APIs are the only helper APIs to use that parameter to send explicit {"user","group"}. Or of we "treat all the same" - by adding a squid.conf option to make the filter list of keys a configuration option each helper API passes in. Or both.

A single Squid configuration option would not allow admins to treat different helpers differently. The meaning and effects of same-name annotation naturally depend on the helper (among other things). We could make things more complex by moving that proposed Squid configuration option into each helper (and each ICAP/eCAP service and each note directive and whatever other sources of annotations we may have!) configuration, of course, but that would make things even more complex. And then somebody would want the behavior to depend on transaction properties, forcing us to add ACLs...

I would prefer to keep all that complexity in those helpers/etc. instead, away from primary Squid code. After all, placing custom logic outside primary Squid code is the main reason to have helpers! Helpers that want to annotate the client connection would be responsible for emitting clt_conn_*=value annotations. This is (or should be) already supported for clt_conn_tag=value annotations. We would be extending that existing interface to cover more special cases with a very small probability of breaking any existing ones.

@ankor2023
Copy link
Copy Markdown
Contributor Author

ankor2023 commented Mar 31, 2026

A single Squid configuration option would not allow admins to treat different helpers differently.

I assume Amos is proposing a new configuration parameter for the auth_param directive, something like:
auth_param negotiate annotate connection
As I understand it, this approach wouldn't require any helper code changes at all and ensures full backward compatibility.

Alex, your approach would require renaming the group attribute to clt_conn_group and user to clt_conn_user in the helpers. While this might be difficult to implement for some helpers (like ntlm_auth) and could cause some temporary confusion for admins, I believe the long-term benefits of unification are more important for the project.

With the implementation of either your idea or Amos’s, the current PR becomes unnecessary.

@rousskov
Copy link
Copy Markdown
Contributor

rousskov commented Mar 31, 2026

a new configuration parameter for the auth_param directive, something like: auth_param negotiate annotate connection ... wouldn't require any helper code changes at all and ensures full backward compatibility.

Yes, and I have mentioned some "cons" of that approach in my earlier comment. We need to agree on the problem definition and find the right balance or "pros" and "cons" among possible solutions. I suspect there are two different problem scopes here ("built-in annotations that Squid itself understands/uses" and "other/custom annotations"). The two problems may benefit from two different solutions because their tradeoffs are different.

Alex, your approach would require renaming the group attribute to clt_conn_group and user to clt_conn_user in the helpers.

If we can always safely apply group and user to the client connection when the helper is used in a connection-based authentication context, then no helper changes would be required for that specific improvement. My approach requires helper changes if folks want custom annotations (where helper control/modification is natural!).

N.B. Where necessary, one can probably wrap any authentication helper into a script that would add clt_conn_ prefixes to helper-sent annotations. No helper modification should be required for that.

With the implementation of either your idea or Amos’s, the current PR becomes unnecessary.

The "Groups are now returned in a single key with comma separated values" part of this PR does not depend on annotation scopes. I still need to review that part, but it is a separate improvement idea AFAICT.

@ankor2023
Copy link
Copy Markdown
Contributor Author

The "Groups are now returned in a single key with comma separated values" part of this PR does not depend on annotation scopes.

We can rename the current PR to "Change negotiate_kerberos_auth helper output format" and limit its scope to "Groups are now returned in a single key with comma separated values".

As Amos suggested, we should work with Markus Moeller (the upstream author) to coordinate this and all future changes.

After that, it makes sense to wait for Alex's "clt_conn_*" feature to be merged, and then evaluate the pros and cons of the connection annotation suggested by Amos.

Copy link
Copy Markdown
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Groups are now returned in a single key with comma separated values" part of this PR does not depend on annotation scopes. I still need to review that part, but it is a separate improvement idea AFAICT.

That review is now done (see the dedicated change request).

We can ... limit its scope to "Groups are now returned in a single key with comma separated values".

I enthusiastically support narrowing this PR scope in principle, but it may be best to first agree on how to address the issue discussed in this review change request. Your call.

After that, it makes sense to wait for Alex's "clt_conn_*" feature to be merged, and then evaluate the pros and cons of the connection annotation suggested by Amos.

This multistep plan sounds good to me overall. Let's see if we can complete the first/next step :-).


As Amos suggested, we should work with Markus Moeller (the upstream author) to coordinate this and all future changes.

Yes, of course, assuming Markus is interested in working on this. CC: @huaraz

FWIW, the use of the term "upstream" is a red flag for me in this context:

  • If "upstream" already manages this helper in their own non-Squid repository, then it is best to remove this helper from Squid sources and let "upstream" to manage their program. Otherwise, we are likely to waste time on code modifications that "upstream" may not like!
  • If "upstream" is using Squid repository to manage their helper, then, at the very least1, it is best to pause all discussions about this helper sources until "upstream" starts driving that discussion. Otherwise, again, we are likely to waste time on code modifications that "upstream" may not like!
  • If Squid Project is actually the final "upstream" here, then we can continue without pausing (but should avoid that misleading term!). Markus, as the primary author of the helper code, should be pinged (now done) and is more than welcome to join this effort at any moment, of course!

FWIW, I hope that the last bullet applies -- Squid Project is the final "upstream" here, not any other project or person. This hope is in no way meant to diminish the high value of Markus' contributions to Squid! CC: @huaraz

Footnotes

  1. And, ideally, we should migrate away from this odd relationship model to a model described in either the first or the last bullet.

}
}

append_comma(ad_groups);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing group=g1 group=g2 with group=g1,g2 codifies comma as the delimiter for group values. AFAICT, Helper::Reply::parseResponseKeys() does not treat comma specially today. This raises many red flags, including:

  • parseResponseKeys() treats group=g1,g2 annotation as a single key=value pair, adding a single NotePairs::Entry. Adding two groups via one Entry violates the spirit of NotePairs API that uses one Entry per Squid-perceived annotation value.
  • Helper::Reply::parseResponseKeys() method treats "quoted" values specially. If we codify the special meaning of commas, then that treatment would need to be adjusted to support group=g1,"g2" and similar input.
  • Individual group names with embedded commas may be treated incorrectly. The helper must either quote them (to give parseResponseKeys() a chance to handle them correctly) or reject them.

Please note that we should not assume that all helpers or even all connection authentication helpers are going to do exactly what shipped-with-Squid helpers are doing. We should define an interface and supply compliant implementation in Squid. If we change that interface, like this PR may be doing, we must evaluate whether that change may break existing helpers/deployments and whether the benefits of our changes outweigh that breakage (if any). If they do not outweigh, we may need to version helper output format, so that Squid can reject no-longer-compatible helpers.

I am not sure whether authentication code uses stored group values. I suspect that it does not! Please point me to group-using authentication code if that suspicion is wrong.

I know that general code (e.g., note ACL) lets an admin determine whether values can be delimited and what that delimiter is. A comma is usually the default for delimited values, but it is not the default that matters here. I believe the changes in this PR will break deployments that use a non-comma delimiter for group values because Acl::NoteCheck::matchNotes() calls NotePairs::expandListEntries() and the latter does not treat commas specially. For example, I speculate that, after this PR changes , acl badGuys note -m: group bad will stop matching requests that belong to active and bad groups because the new helper response -- group=active,bad -- is not going to match while the old response -- group=active group=bad -- did match.

Similarly, I believe the changes in this PR will break deployments that do not use a delimiter for group values. For example, I speculate that acl badGuys note group bad will stop matching requests that belong to active and bad groups.

@rousskov rousskov removed the S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box label Apr 1, 2026
@yadij
Copy link
Copy Markdown
Contributor

yadij commented Apr 1, 2026

Since Alex has repeatedly mis-interpreted my descriptions and your clarification (AFAICT @ankor2023 understood), I have started writing up some code myself to submit a draft PR for further discussion of that.

@yadij
Copy link
Copy Markdown
Contributor

yadij commented Apr 1, 2026

FWIW, the use of the term "upstream" is a red flag for me in this context:

I use the term as it is how I think of the situation. It is not strictly true - he is the original author and chose to continue maintaining a separate repository with releases of the helper by itself after submitting for Squid use.
Unlike the normal "upstream" relationship; we are free to adjust the helper in any way we like and leave him to cope with the repository desync. IMHO, it would just be quite impolite to treat a contributor that way so I make sure he is at least aware of each time we change the code.

@huaraz
Copy link
Copy Markdown
Contributor

huaraz commented Apr 1, 2026

The "Groups are now returned in a single key with comma separated values" part of this PR does not depend on annotation scopes.

We can rename the current PR to "Change negotiate_kerberos_auth helper output format" and limit its scope to "Groups are now returned in a single key with comma separated values".

As Amos suggested, we should work with Markus Moeller (the upstream author) to coordinate this and all future changes.

After that, it makes sense to wait for Alex's "clt_conn_*" feature to be merged, and then evaluate the pros and cons of the connection annotation suggested by Amos.

Happy to work on adjustments.

@yadij
Copy link
Copy Markdown
Contributor

yadij commented Apr 2, 2026

I have started writing up some code myself to submit a draft PR for further discussion of that.

Now published as PR #2399.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-for-author author action is expected (and usually required)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants