Skip to content

Conversation

@chinyeungli
Copy link
Contributor

  • Ensure extra_data is populated in "find_jvm_packages()" for all resources for mapping purposes
  • Adjust test logic to update extra_data after map_checksum, matching d2d pipeline sequence

Signed-off-by: Chin Yeung Li [email protected]

- Ensure extra_data is populated in "find_jvm_packages()" for all resources for mapping purposes
- Adjust test logic to update extra_data after map_checksum, matching d2d pipeline sequence

Signed-off-by: Chin Yeung Li <[email protected]>
Signed-off-by: Chin Yeung Li <[email protected]>
@chinyeungli chinyeungli requested a review from TG1999 October 28, 2025 23:01
resources = (
project.codebaseresources.files().no_status().from_codebase().has_no_relation()
)
resources = project.codebaseresources.files().no_status().from_codebase()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What issue or problem this change solves ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for #1854
This update addresses a scenario where both the development and deployment codebases contain the same .java file, but the deployed version also includes the corresponding .class file.

I think the tool relies on the "extra_data" field such as extra_data={"java_package": "org.apache.flume.node"}, (https://github.com/aboutcode-org/scancode.io/blob/main/scanpipe/tests/pipes/test_d2d.py#L402) to perform a mapping from .class to .java. (I may be wrong, but it seems I need this field to trigger the "path" mapping in the test)

However, if a .java file in the development codebase has already been checksum-matched to its counterpart in the deployed codebase, it won't be indexed again, meaning it won’t receive extra_data and won’t be available for mapping.

Since "extra_data" is generated in find_jvm_packages(), this change ensures that all source files in the from_codebase are indexed, even if they’ve already been matched by checksum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, @tdruez please have a look as well. Thanks!

@chinyeungli chinyeungli requested a review from tdruez October 29, 2025 13:46
@tdruez
Copy link
Contributor

tdruez commented Oct 29, 2025

@chinyeungli we need unit tests related to those changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants