Skip to content

Part 1 : Adds RLS and CLS control Policies #2048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

singhpk234
Copy link
Contributor

@singhpk234 singhpk234 commented Jul 14, 2025

About the PR

This PR adds proposed policy spec proposed as part [OSS] Row and Column Based Access Control: Policy Definitions

This uses iceberg expressions, to define row level filters and uses column projections as way to project, presently since we are just talking of column (waiting for UDF's in Apache Iceberg to standardize), May be we can use transforms for the projections ? open to it !

Note: In last Apache Iceberg Community Sync all were generally alligned that using iceberg expression with extending its support for reference to iceberg UDF's were the right way to go !

This additionally introduces 2 context variables which are resolved at the catalog end based on the caller
$current_prinicipal, checks if the current principal is the principal activated, if yes makes this as true !
$current_principal_role, checks if the the underlying role is one of the activated principal roles based on current caller context !

so a policy being defined with this context variables are replace inline and evaluated using iceberg expression sdk !

so when ever the caller calls /get-applicable-policies it gets back context resolved variables and row and column policy

TODO : Policy merging

please check tests for E2E tests

Note : This is for engines who wants to get the policies directly ! rather than getting the secure view and are willing to integrate to the Polaris Policy Store directly

check this combination
https://docs.google.com/document/d/1AJicez7xPhzwKXenGZ19h0hngxrwAg3rSajDV1v0x-s/edit?tab=t.0#bookmark=id.j29shahtycb8

@singhpk234
Copy link
Contributor Author

proposal in Apache Polaris for the policy spec - https://lists.apache.org/thread/rf2zsgk9qh36z3s63gx6dgtl0s4cwngr

@singhpk234 singhpk234 requested review from HonahX and flyrain July 14, 2025 21:47
@singhpk234
Copy link
Contributor Author

cc @laurentgo

PolicyEntity policyEntity, boolean inherited) {
private boolean filterApplicablePolicy(PolicyEntity policyEntity) {
// check the type
if (policyEntity.getPolicyType().equals(ACCESS_CONTROL)) {
Copy link
Contributor Author

@singhpk234 singhpk234 Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a PolicyEntityAugmentor which could conditionally modify the object, let me think about it more.

@singhpk234 singhpk234 force-pushed the feature/fgac-policies branch from 6878f0f to decd92b Compare July 14, 2025 23:15
@singhpk234 singhpk234 force-pushed the feature/fgac-policies branch from decd92b to 85ba526 Compare July 15, 2025 06:53
@snazy
Copy link
Member

snazy commented Jul 15, 2025

I'm a bit surprised to see a "ready for review (and merge)" PR for this.
From what I understand quite a few concerns were mentioned, so I'm not sure it's the best idea to start with a code change, but rather collaborate to eventually get to a consensus on the whole approach.

@jbonofre jbonofre self-requested a review July 15, 2025 11:49
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some preliminary comments :)

// Optional, if there means policies is applicable to the role
private String principalRole;

// TODO: model them as iceberg transforms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to redo this policy definition after merging or before merging this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can amend them in another version, if a release goes in with, policies support versioning https://polaris.apache.org/in-dev/unreleased/policy/


The only concern of not using iceberg transform's right now is that they are at the moment limited, yes existing column projection can be modeled as iceberg identity transform ... but if we want to support data masking then transforms should contain references of iceberg UDF's but right now the support is not there hence refrained, open to it if we want to model them like that .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM 👍

public class AccessControlPolicyContent implements PolicyContent {

// Optional, if there means policies is applicable to the role
private String principalRole;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this policy apply to a role? What is the mechanism? I could not find this in the linked doc 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to purely support people who just have role based access control, providers like MS, AWS etc ... for them they can just store their row filters / projections against the role and the applicable policy

I go into the details of this here - https://docs.google.com/document/d/12nhS0GX1U1PqEBKp74bIBZsL9kB5duDlN9diHJAhJsM/edit?tab=t.0#bookmark=id.ij6iuno9gsic

Copy link
Contributor

@dimas-b dimas-b Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is: Is this role a Polaris Principal role? If we have multiple policies, how do we find the set of applicable policies? (I'm not sure I saw details on the in the doc 😅 )

More broadly: Is the binding to roles actually part of the policy?

This is not an objection... more of a point to think about.

// Use a custom deserializer for the list of Iceberg Expressions
@JsonDeserialize(using = IcebergExpressionListDeserializer.class)
@JsonSerialize(using = IcebergExpressionListSerializer.class)
private List<Expression> rowFilters;
Copy link
Contributor

@dimas-b dimas-b Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this work "without context functions" (code comment above)? How will Polaris code interface with these expressions?

private List<Expression> rowFilters;

private static final String DEFAULT_POLICY_SCHEMA_VERSION = "2025-02-03";
private static final Set<String> POLICY_SCHEMA_VERSIONS = Set.of(DEFAULT_POLICY_SCHEMA_VERSION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

@@ -28,7 +28,8 @@ public enum PredefinedPolicyTypes implements PolicyType {
DATA_COMPACTION(0, "system.data-compaction", true),
METADATA_COMPACTION(1, "system.metadata-compaction", true),
ORPHAN_FILE_REMOVAL(2, "system.orphan-file-removal", true),
SNAPSHOT_EXPIRY(3, "system.snapshot-expiry", true);
SNAPSHOT_EXPIRY(3, "system.snapshot-expiry", true),
ACCESS_CONTROL(4, "system.access-control", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term ACCESS_CONTROL is too generic IMHO. How about TABLE_DATA_ACCESS_EXPRESSIONS?

The "expression" part related to the fact that this policy uses Iceberg expressions to represent filters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we'll use the same policy to also have non-expression based filtering, but I think that something like TABLE_ACCESS or TABLE_DATA_ACCESS is a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can the same policy type support different contents? What is the approach to processing different contents within the same policy type?

Copy link
Contributor Author

@singhpk234 singhpk234 Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I modeled all 3 policies to be represented in a single policy spec
columnProjections

  • column hiding (only authorized against RBAC)
  • column projections (DDM -> to be applied with the UDF's)
  • rowFilter expressions (iceberg expression which can contain UDF references)

please let me know if you prefer it otherwise ?

Copy link
Contributor

@dimas-b dimas-b Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a firm opinion on the specific form of these policies (yet).

However, I'd like to make this system extensible. That is, if we have another kind of policy for row filtering, assigning the generic ACCESS_CONTROL name to the current one will make the new policy kind of marginal. This is why I propose for policy type names to be more specific up front.


public static String replaceContextVariable(
String content, PolicyType policyType, AuthenticatedPolarisPrincipal authenticatedPrincipal) {
if (policyType == ACCESS_CONTROL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why force all policies to go through this if? Would it be possible to refactor the code to leverage java types for invoking type-specific replacements?

.setContent(policyEntity.getContent())
.setContent(
PolicyUtil.replaceContextVariable(
policyEntity.getContent(), policyEntity.getPolicyType(), authenticatedPrincipal))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a more flexible / extensible approach to connecting policies with the security context than a simple reference to AuthenticatedPolarisPrincipal. The Principal may not be the only factor in making access decisions.

// Iceberg expressions without context functions for now.
// Use a custom deserializer for the list of Iceberg Expressions
@JsonDeserialize(using = IcebergExpressionListDeserializer.class)
@JsonSerialize(using = IcebergExpressionListSerializer.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This binds Polaris API to internal serialization code in Iceberg. Iceberg changes in Expression serialization will affect Polaris APIs. I'd like to avoid this dependency.

@@ -98,6 +102,10 @@ public static boolean canAttach(PolicyEntity policy, PolarisEntity targetEntity)
case ORPHAN_FILE_REMOVAL:
return BaseMaintenancePolicyValidator.INSTANCE.canAttach(entityType, entitySubType);

case ACCESS_CONTROL:
// TODO: Add validator for attaching this only to table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for deferring this?

import org.apache.polaris.core.auth.AuthenticatedPolarisPrincipal;
import org.apache.polaris.core.policy.content.AccessControlPolicyContent;

public class PolicyUtil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(design) I would recommend to mark the class final as well

Adding javadoc to the class and the public methods would be good as well.

import org.apache.polaris.core.auth.AuthenticatedPolarisPrincipal;
import org.apache.polaris.core.policy.content.AccessControlPolicyContent;

public class PolicyUtil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(best practices) can you please add some unit tests?

if (policyType == ACCESS_CONTROL) {
try {
AccessControlPolicyContent policyContent = AccessControlPolicyContent.fromString(content);
List<Expression> evaluatedRowFilterExpressions = Lists.newArrayList();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists.newArrayList() is actually a deprecated pattern since Java 7 was introduced and can simply be replaced with new ArrayList<>(). The method is not marked as deprecated, but the Guava javadoc states it though


policyContent.setRowFilters(evaluatedRowFilterExpressions);
return AccessControlPolicyContent.toString(policyContent);
} catch (Exception e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(design) is it the right thing to return the original content if an exception occurs?

import org.apache.iceberg.expressions.Expression;
import org.apache.polaris.core.policy.validator.InvalidPolicyException;

public class AccessControlPolicyContent implements PolicyContent {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(design) it may be seen as a preference, but it seems the overall language community is moving towards immutable objects as data carriers (like java records) and wonder if this is something we should adopt here as well

import org.apache.iceberg.expressions.Expression;
import org.apache.iceberg.expressions.ExpressionParser;

public class IcebergExpressionListDeserializer extends JsonDeserializer<List<Expression>> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(design) I wonder why a serializer/deserializer pair for the list is needed vs having it for the elementy type (Expression)?

@@ -30,6 +31,8 @@ private static ObjectMapper configureMapper() {
mapper.configure(DeserializationFeature.FAIL_ON_MISSING_CREATOR_PROPERTIES, true);
// Fails if a required field is present but explicitly null, e.g., {"enable": null}
mapper.configure(DeserializationFeature.FAIL_ON_NULL_CREATOR_PROPERTIES, true);
// This will make sure all the Iceberg parsers are loaded.
RESTSerializers.registerAll(mapper);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could that cause conflicts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need all the REST specific types?

List<Expression> evaluatedRowFilterExpressions = Lists.newArrayList();
for (Expression rowFilterExpression : policyContent.getRowFilters()) {
// check if the expression refers to context variable current_principal_role
if (rowFilterExpression instanceof UnboundPredicate<?>) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(design) I would recommend to use a visitor pattern (like ExpressionVisitor) which is better suited for expression analysis and substitution

Namespace parentNamespace = policyEntity.getParentNamespace();

return ApplicablePolicy.builder()
.setPolicyType(policyEntity.getPolicyType().getName())
.setInheritable(policyEntity.getPolicyType().isInheritable())
.setName(policyEntity.getName())
.setDescription(policyEntity.getDescription())
.setContent(policyEntity.getContent())
.setContent(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with PolicyCatalog exact role but it seems it is used for CRUD operations on the policies. As such, replacing the content of some policies on the fly may not be the right thing to do because people would not be able to access the source of truth?

(design) IMHO this seems like a leakage of an access control mechanism outside of the realm of PolarisAuthorizer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just replacing the get-applicable policy with the context, its a key point to resolve server side identity and the client engine doesn't even needs to know about these as it might not even know user's identity (just has token).
Now if the concern is we want to know exact policy one can just load it right and we don't resolve context there :
please have a look in this API :
https://github.com/apache/polaris/blob/main/spec/polaris-catalog-service.yaml#L153
so IMHO this concern is addressed !

this seems like a leakage of an access control mechanism outside of the realm of PolarisAuthorizer

I thought about this really hard, even talked to folks inside, so far the PolarisAuthorizer interface just Authorizes based on RBAC, now Authorzier returning FGAC predicates and projections would require more thorough thought

I just followed the existing way of retriving the policies from persistence and doing checks which policies apply to the caller ?

if we want to go on the route of supporting GetAccessControlPolicy via Authorizer interface, Here is what I had in mind.

public interface FGACAwarePolarisAuthorizer extends PolarisAuthorizer {


 default AuthorizationContext authorizeOrThrowWithFGAC(
     @Nonnull CallContext callContext,
     @Nonnull AuthenticatedPolarisPrincipal authenticatedPrincipal,
     @Nonnull Set<PolarisBaseEntity> activatedEntities,
     @Nonnull PolarisAuthorizableOperation authzOp,
     @Nullable PolarisResolvedPathWrapper target,
     @Nullable PolarisResolvedPathWrapper secondary) {


    authorizeOrThrow(callContext, authenticatedPrincipal, activatedEntities, authzOp, target, secondary);
  // query policies from persistence and give back FGAC policy 
  // create AuthZ context
}


 AuthorizationContext authorizeOrThrowWithFGAC(
     @Nonnull CallContext callContext,
     @Nonnull AuthenticatedPolarisPrincipal authenticatedPrincipal,
     @Nonnull Set<PolarisBaseEntity> activatedEntities,
     @Nonnull PolarisAuthorizableOperation authzOp,
     @Nullable List<PolarisResolvedPathWrapper> targets,
     @Nullable List<PolarisResolvedPathWrapper> secondaries) {


   authorizeOrThrow(callContext, authenticatedPrincipal, activatedEntities, authzOp, targets, secondaries);
  // query policies from persistence and give back FGAC policy 
  // create AuthZ context
 }
}

AuthorizationContext

  • AccessControlPolicyContent
  • Map<Object, Object> additionalParams

wdyt ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems a good conversation starter, thanks for proposing it. I don't know if it is best to try and extend APIs (which are very generic whereas FGAC only apply to a couple of permission) or have an extra separate method that you combine with a previous check (so 2 calls to the authorizer)

For AccessControlPolicyContent, would it be like this?

  public record AccessControlProcessingInstruction(String type, String expression);
  public record RowFilter(AccessControlProcessingInstruction filter);
  public record ColumnTransformation(int id, String name, AccessControlProcessingInstruction transformation);
  public record AccessControlPolicyContent(String rowFilter, List<ColumnTransformation> columnTransformations);

return content;
}

public static boolean filterApplicablePolicy(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(design) this method doesn't have any javadoc, but shouldn't this belong to some dedicated class related to FGAC instead?

@adutra adutra self-requested a review July 16, 2025 12:12
@snazy
Copy link
Member

snazy commented Jul 17, 2025

(Replying to the later editor PR description)

Note: In last Apache Iceberg Community Sync all were generally alligned that using iceberg expression with extending its support for reference to iceberg UDF's were the right way to go !

We have not agreed on a Polaris implementation in an Iceberg community sync. (Note: the Apache policy is: "If it didn't happen on the mailing list, it never happened.")

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of technical comments on this PR.

I think we should explore all legit options to manage FGAC rules in Polaris. Iceberg expressions is one option. Another option is leveraging a standard like SQL:92, SQL:99, SQL:2002. Substrait is a mature project with a diverse and active community and an option as well.

My concern about Iceberg expressions is the limitations it has regarding available functions and expressions. Some examples: COALESCE, LPAD, RPAD, NULLIF, CASE, CAST, INTERVAL (...).

I think we lose a lot of flexibility without these functions. Maybe I haven't looked into it enough, but I cannot figure out a way with Iceberg expressions to mask a credit card number like ***234. But I guess that's what should not go into Iceberg expressions? What's the representation of those Strings?

Row-level filter expressions benefit a lot from CASE, CAST, COALESCE etc. Not sure whether it's worth to not have those available to users.

The Iceberg expressions implementation is an interpreter, but that's rather a 2nd-ary concern.

All Iceberg query engines I know are based on SQL and can natively handle and optimize SQL functions. Not sure whether it's worth to have an (intermediate) representation.

Spark datasets/frames are not SQL, but FGAC with those APIs is a different topic ( LogicalPlan).

I think we should start with how row-filters and column-transformations are exposed to query engines and then figure out the best way how those are are managed in Polaris ("top down approach").

@Override
public List<Expression> deserialize(JsonParser p, DeserializationContext ctxt)
throws IOException {
ObjectMapper mapper = (ObjectMapper) p.getCodec();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast assumes a Jackson implementation detail and can likely break with Jackson updates.
Therefore please remove this case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As ObjectCodec does have a readTree(JsonParser) method, is the cast even necessary?

Comment on lines +40 to +46
List<Expression> expressions = new ArrayList<>();
if (node.isArray()) {
for (JsonNode element : node) {
// Convert each JSON element back to a string and pass it to ExpressionParser.fromJson
expressions.add(ExpressionParser.fromJson(mapper.writeValueAsString(element)));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary at all?

jsonGenerator.writeStartArray();
if (expressions != null) {
for (Expression expression : expressions) {
jsonGenerator.writeString(ExpressionParser.toJson(expression));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wyh "string" and not proper JSON?

@@ -30,6 +31,8 @@ private static ObjectMapper configureMapper() {
mapper.configure(DeserializationFeature.FAIL_ON_MISSING_CREATOR_PROPERTIES, true);
// Fails if a required field is present but explicitly null, e.g., {"enable": null}
mapper.configure(DeserializationFeature.FAIL_ON_NULL_CREATOR_PROPERTIES, true);
// This will make sure all the Iceberg parsers are loaded.
RESTSerializers.registerAll(mapper);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need all the REST specific types?

"{\n"
+ " \"principalRole\": \"ANALYST\",\n"
+ " \"extraField\": \"someValue\"\n"
+ // Extra field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the "extra field"?

@adutra
Copy link
Contributor

adutra commented Jul 17, 2025

Row-level filter expressions benefit a lot from CASE, CAST, COALESCE etc. Not sure whether it's worth to not have those available to users.

Indeed, the ability to express something like "if X, then allow; else, deny" seems essential to row filters. I don't see any ternary operator in the current Iceberg expressions. @singhpk234 is the intent to add more expressions to Iceberg to enable row filters?

@singhpk234
Copy link
Contributor Author

singhpk234 commented Jul 17, 2025

@snazy @adutra thank you for sharing your feedbacks, i want to walk you all through what my thought process was :

Why Iceberg Expressions and not SQL
Iceberg expresions are portable, dialect agnostic and first class citizens of iceberg world and IMHO are the must for interop
Note almost all engines have
Engine specific SQL -> Engine Expression -> Iceberg Expression -> Iceberg SDK (manifest filtering) pipeline already

we did explore dialect agnostic SQL such as SQL92 for most part of it and we keep coming back to dialect specific requirements if it was a sql, engines want to store their dialect specific stuff directly in policy and hence in persistence, making this policy only workable to the policy definer dialect. what will be the behaviour if i don;t understand the dialect.

What i see is we can expand Iceberg expression to contain UDF references and via UDF you model all you dialect specific stuff but bottom line being you just operate in iceberg expression in policy definition.
If we want dialect specific stuff we can still shove everything in UDF, Here is one common example :
sha256 in spark dialect is sha2, why can model this as hash udf and then for spark its sha2 and trino 256 ?

Also heads up Iceberg Expression are gonna get expanded soon due to the constraints for v4 work that Anton is driving (a uber level idea was discussed in some of the sync), essentially storing table constraints in iceberg metadata so that it can be retrieved and enforced by calling engine but the storage is still iceberg expressions, so iceberg expression is what they plan to use for interoperability

Note : I checked in catalog community sync last time iceberg expression with UDFs seems like a right direction, so i am not saying we pinned on it but just presenting thoughts of what the wide community thinks on this aspect.

I understand UDF's are not there yet and it will take some time meanwhile using iceberg expression seems and storing RLS there seems like good step IMHO.
I know atleast one cloud provider it will for sure help = https://docs.aws.amazon.com/lake-formation/latest/dg/partiql-support.html

Why not Substrait ?

I think if we want subtrait, we can make UDF's return substrait directly, but i think in iceberg community atleast there is no concencus on the IR, as similar discussions have been brought up in the community in past for Iceberg Views, i would request to drive that in iceberg first and then we can incorporate that in policy. Also please note engines like Snowflake / Redshift ... doesn't support substrait, unless its established as a standard IR in iceberg IMHO we should not take dependency on it.

Spark datasets/frames are not SQL, but FGAC with those APIs is a different topic

Yes, Non SQL requires a more through discussions imho, as for example how do we model the SQL written in one langue to be parsed ?

I hope I am able to explain my thought process here ! I really appreciate you taking a look.

@snazy
Copy link
Member

snazy commented Jul 18, 2025

Managing FGAC rules and exposing protection instructions are two different domains.
The first is defined by the catalog and won't be part of Iceberg, the latter (to be) defined by Iceberg.

I think we should not restrict our users to the limitations of Iceberg expressions.

@kevinjqliu
Copy link
Contributor

Thanks for the PR @singhpk234, super interesting to see RLS and CLS policies for governing iceberg tables

Do you know if theres a way I can quickly set this up and do a proof of concept locally?
I looked at #999, #1059, and https://polaris.apache.org/in-dev/unreleased/policy/ but couldnt find anything concrete.

I think it would be helpful to document "getting started with policies" in polaris so folks who are interested can get some hands-on experience with the proposal

@singhpk234
Copy link
Contributor Author

singhpk234 commented Jul 21, 2025

Managing FGAC rules and exposing protection instructions are two different domains.

@snazy i understand that and in my design i want to leverage existing policy store implementation to manage FGAC rules and then what you are calling as protection instructions which is the result of policy evaluation is my view SQL

The first is defined by the catalog and won't be part of Iceberg, the latter (to be) defined by Iceberg.
I think we should not restrict our users to the limitations of Iceberg expressions.

I am bit confused about your conclusion here, as in one place you are suggesting iceberg needs to set standard for policy evaluation result but at the same time we are discarding iceberg expression which are first class citizens of iceberg world (which are portable and are used in different language like iceberg rust / iceberg python ...) ? IMHO we should not go into dialect specific SQL as it makes the evaluation harder and specially given that we already have path forward with iceberg expressions and UDF's. please let me know you thoughts and lets have more discussions !

@singhpk234
Copy link
Contributor Author

@kevinjqliu thank you for the interest in the PR, unfortunately there is not getting-started (let me create a ticket for it) with policies that i am aware of, here is some integration tests -
https://github.com/apache/polaris/blob/main/integration-tests/src/main/java/org/apache/polaris/service/it/test/PolarisPolicyServiceIntegrationTest.java
which can help in understanding the flow. Happy to help you over slack as well !

@snazy
Copy link
Member

snazy commented Jul 22, 2025

Iceberg Expressions as of today can IMHO not be used to represent a bunch of simple and none of the more complex use cases. AFAIU it is not possible to:

  • replace a column value with a constant expression
  • credit-card masking (substr + string concatenation)
  • perform any kind of calculation or rounding or trimming or interval calculation (and more)

I would love to use a portable standard, but as of today (as of Jan 2022 actually) Iceberg Expressions are not yet suitable for this use case.

Until we have a way to express even simple, standard use cases, I strongly believe we have to have to support SQL. Which means, that requiring Iceberg Expressions, as proposed here, is IMHO not a viable option to satisfy the needs of our users today.

As a side note I also believe that the "front end" use case (provide protection instructions) should drive the "CRUD" parts, not the other way around.

It is completely fine when users only rely on Iceberg Expressions, but it would be IMHO quite a mistake to intentionally make it impossible for users to have a much more flexible way.

I also do not think that UDFs are the answer to everything, despite that Iceberg UDFs are not that concrete nor even implemented. But even if Iceberg UDFs would be there, you would still have to deal with data types and casting and null-handling. None of these three aspects is currently covered by Iceberg Expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants