Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[da-vinci][common] DVC consumer for materialized view (batch only) #1466

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xunyin8
Copy link
Contributor

@xunyin8 xunyin8 commented Jan 22, 2025

[da-vinci][common] DVC consumer for materialized view (batch only)

Limiting the change to support batch only materialized view to keep the PR small. Hybrid DVC consumer support and related features such as heartbeat will be added in a separate PR

  1. New DaVinciClientFactory APIs for creating DVC for a given view.

  2. Defined a new "storeName" rule for views to be used for metrics reporting and shared DVC client. See VeniceView.getStoreAndViewName for details. Trying to reuse much of the existing DVC code and structure while providing the support for users to subscribe to the original store and different views of the same store.

  3. Introduced ReadOnlyMaterializedViewVersion and StoreViewBackend abstractions to minimize the code change needed for DVC to understand and work with views. ReadOnlyMaterializedViewVersion will provide view sepcific properties for a ReadOnlyVersion and StoreViewBackend will provide view specific properties for a StoreBackend.

There is some issue with chunking support on the read path. Despite writers are properly configured the read path would return null for materialized view if chunking is enabled. Still looking into it.

How was this PR tested?

Unit tests WIP
Added new integration test

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

Limiting the change to support batch only materialized view to keep the PR small.
Hybrid DVC consumer support and related features such as heartbeat  will be added in a separate PR

1. New DaVinciClientFactory APIs for creating DVC for a given view.

2. Defined a new "storeName" rule for views to be used for metrics reporting and shared DVC client.
See VeniceView.getStoreAndViewName for details. Trying to reuse much of the existing DVC code and
structure while providing the support for users to subscribe to the original store and different views
of the same store.

3. Introduced ReadOnlyMaterializedViewVersion and StoreViewBackend abstractions to minimize the
code change needed for DVC to understand and work with views. ReadOnlyMaterializedViewVersion
will provide view sepcific properties for a ReadOnlyVersion and StoreViewBackend will provide
view specific properties for a StoreBackend.
@xunyin8 xunyin8 force-pushed the MaterializedViewDVCConsumer branch from 42b48dd to e546936 Compare January 23, 2025 00:07
@gaojieliu
Copy link
Contributor

Hi @xunyin8
I have a big question about how the adapting layer looks like and I read your PR and I think you are trying to plugin the adapting layer in a more spread fashion where sometimes we use the original store name and some other times, we use the combined name by concatenating store name and view name.
There are two points I can think of:

  1. The code would become less maintainable as we need to make a decision at different places to use either the original store name or the concatenated name.
  2. In theory, this solution is not 100% safe as some store name can be identical to the concatenated name, and if the customer subscribes both, it will encounter weird issues.

Here is one idea in my mind:

  1. Make sure the concatenated name unique in Venice platform, and it means we need to guarantee the uniqueness among store names and concatenated view names.
  2. In DaVinci world, we can treat the concatenated view name as the store name, and we can build a new adapting layer in the common repositories, such as StoreRepository, SchemaRepository and they need to handle both view and store request. For example, for the passed param, it can detect whether it is a view or a store and if it is a view, it will find out the corresponding store name and take the corresponding actions.
    With this way, it will be very safe to avoid conflict and we only need to change very few places to make it work everywhere and the idea is similar to HelixReadOnlySchemaRepositoryAdapter, which can handle both system stores and user stores in a transparent way.
  3. Less error prone as the adapting logics are only in several important repositories, nowhere else (in theory).

If we build such capability, the change in DVRT or CDC will be fairly straightforward.

WDYT?

@xunyin8
Copy link
Contributor Author

xunyin8 commented Jan 26, 2025

Member

Hi @xunyin8 I have a big question about how the adapting layer looks like and I read your PR and I think you are trying to plugin the adapting layer in a more spread fashion where sometimes we use the original store name and some other times, we use the combined name by concatenating store name and view name. There are two points I can think of:

  1. The code would become less maintainable as we need to make a decision at different places to use either the original store name or the concatenated name.
  2. In theory, this solution is not 100% safe as some store name can be identical to the concatenated name, and if the customer subscribes both, it will encounter weird issues.

Here is one idea in my mind:

  1. Make sure the concatenated name unique in Venice platform, and it means we need to guarantee the uniqueness among store names and concatenated view names.
  2. In DaVinci world, we can treat the concatenated view name as the store name, and we can build a new adapting layer in the common repositories, such as StoreRepository, SchemaRepository and they need to handle both view and store request. For example, for the passed param, it can detect whether it is a view or a store and if it is a view, it will find out the corresponding store name and take the corresponding actions.
    With this way, it will be very safe to avoid conflict and we only need to change very few places to make it work everywhere and the idea is similar to HelixReadOnlySchemaRepositoryAdapter, which can handle both system stores and user stores in a transparent way.
  3. Less error prone as the adapting logics are only in several important repositories, nowhere else (in theory).

If we build such capability, the change in DVRT or CDC will be fairly straightforward.

WDYT?

This makes sense, I was hoping to be able to only keep the adapting layer in StoreBackend and ReadOnlyVersion. However, as you mentioned and probably noticed while reviewing the PR other components of DVC and ingestion path also requires differentiating a store view from a store hence the adapter keeps leaking. I think we can enforce (1) fairly easily during view creation and (2) could introduce more initial change but let me give it a try and it may be able to solve some of the known issue (e.g. chunking not working) cleanly since the suggested solution is more bullet proof.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants