Skip to content

Conversation

anujmodi2021
Copy link
Contributor

This is the first PR in series of work done under Parent Jira: HADOOP-19596 to improve the performance of sequential reads in ABFS Driver.
Please refer to Parent JIRA for more details.

Description of PR

Jira: https://issues.apache.org/jira/browse/HADOOP-19613

Read Buffer Manager used today was introduced way back and has been stable for quite a while.
Read Buffer Manager to be introduced as part of HADOOP-19596 will introduce many changes incrementally over time. While the development goes on and we are able to fully stabilise the optimized version we need the current flow to be functional and undisturbed.

This work item is to isolate that from new code by refactoring ReadBufferManager class to have 2 different implementations with same public interfaces: ReadBufferManagerV1 and ReadBufferManagerV2.

This will also introduce new configs that can be used to toggle between new and old code.

How was this patch tested?

Existing tests were modified to work with the Refactored Classes.
More tests will be added with coming up PRs where new implementation will be introduced.
Test suite result added.

@anujmodi2021 anujmodi2021 marked this pull request as ready for review July 14, 2025 06:11
@anujmodi2021
Copy link
Contributor Author

============================================================
HNS-OAuth-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 818, Failures: 0, Errors: 0, Skipped: 165
[WARNING] Tests run: 156, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 23

============================================================
HNS-SharedKey-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 821, Failures: 0, Errors: 0, Skipped: 117
[WARNING] Tests run: 156, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 10

============================================================
NonHNS-SharedKey-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 660, Failures: 0, Errors: 0, Skipped: 223
[WARNING] Tests run: 156, Failures: 0, Errors: 0, Skipped: 11
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 11

============================================================
AppendBlob-HNS-OAuth-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 818, Failures: 0, Errors: 0, Skipped: 176
[WARNING] Tests run: 133, Failures: 0, Errors: 0, Skipped: 11
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-SharedKey-Blob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 664, Failures: 0, Errors: 0, Skipped: 134
[WARNING] Tests run: 156, Failures: 0, Errors: 0, Skipped: 5
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 11

============================================================
NonHNS-OAuth-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 657, Failures: 0, Errors: 0, Skipped: 225
[WARNING] Tests run: 156, Failures: 0, Errors: 0, Skipped: 11
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

============================================================
NonHNS-OAuth-Blob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 661, Failures: 0, Errors: 0, Skipped: 147
[WARNING] Tests run: 156, Failures: 0, Errors: 0, Skipped: 5
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

============================================================
AppendBlob-NonHNS-OAuth-Blob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 659, Failures: 0, Errors: 0, Skipped: 165
[WARNING] Tests run: 133, Failures: 0, Errors: 0, Skipped: 6
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

============================================================
HNS-Oauth-DFS-IngressBlob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 692, Failures: 0, Errors: 0, Skipped: 172
[WARNING] Tests run: 156, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-OAuth-DFS-IngressBlob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 657, Failures: 0, Errors: 0, Skipped: 223
[WARNING] Tests run: 156, Failures: 0, Errors: 0, Skipped: 11
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

@anujmodi2021 anujmodi2021 self-assigned this Jul 15, 2025
@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@anujmodi2021 anujmodi2021 requested a review from Copilot July 23, 2025 11:08
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the ReadBufferManager to isolate the existing implementation from new code being developed for performance improvements. It introduces ReadBufferManagerV1 (current implementation) and ReadBufferManagerV2 (future implementation with skeleton methods) to allow independent development while maintaining backwards compatibility.

  • Refactors ReadBufferManager into an abstract base class with V1 and V2 implementations
  • Adds configuration options for toggling between V1 and V2 implementations
  • Updates all existing tests to use the refactored class structure

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ReadBufferManager.java Converted to abstract base class with common interface and utilities
ReadBufferManagerV1.java Complete implementation of existing ReadBufferManager functionality
ReadBufferManagerV2.java Skeleton implementation with TODO placeholders for future development
ReadBufferWorker.java Updated to accept ReadBufferManager instance instead of using singleton
AbfsConfiguration.java Added V2-specific configuration properties and getters
ConfigurationKeys.java Added configuration keys for ReadAhead V2 feature
FileSystemConfigurations.java Added default values for V2 configuration
AbfsInputStreamContext.java Added V2 enabled flag and corresponding getter/setter
AbfsInputStream.java Updated to use appropriate ReadBufferManager version based on configuration
AzureBlobFileSystemStore.java Added V2 enabled flag to input stream context
TestAbfsInputStream.java Updated all references to use ReadBufferManagerV1
ITestReadBufferManager.java Updated to use ReadBufferManagerV1 and corrected class references

@hadoop-yetus

This comment was marked as outdated.

private static final int ONE_MB = ONE_KB * ONE_KB;

private static int thresholdAgeMilliseconds;
private static int blockSize = 4 * ONE_MB; // default block size for read-ahead in bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also be configurable right or should come from configuration class as default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already configurable. 4 MB is just default value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right if we are refactoring, we can make this also come from the configuration class?

private byte[][] buffers; // array of byte[] buffers, to hold the data that is read
private Stack<Integer> freeList = new Stack<>(); // indices in buffers[] array that are available
private byte[][] buffers;
private static ReadBufferManagerV1 bufferManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a singleton instance should be declared as volatile ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed I think. Volatile is needed if someone is updating the value, but here only once it is init and a static single copy of vairable is shared among all streams.
Also, this was like this always

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volatile is needed in double-checked locking to prevent instruction reordering and ensure visibility across threads. Without it, a thread may see a partially constructed Singleton instance. This is a new learning I had, can be added or kept as is.

String testAccountName = "testAccount.dfs.core.windows.net";
String defaultUri = this.getTestUrl().replace(this.getAccountName(), testAccountName);
String defaultUri = getRawConfiguration().get(FS_DEFAULT_NAME_KEY).
replace("blob.core.windows.net","dfs.core.windows.net");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use our available ".blob." constants here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will take it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken

@IntegerConfigurationValidatorAnnotation(ConfigurationKey =
FS_AZURE_READAHEAD_V2_CACHED_BUFFER_TTL_MILLISECONDS,
DefaultValue = DEFAULT_READAHEAD_V2_CACHED_BUFFER_TTL_MILLISECONDS)
private int readAheadV2CachedBufferTTLMilliseconds;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep attribute naming format constant? Above attribute name ends with TTLInMilliSeconds, this one has TTLMilliseconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if possible we can shorten the attribute name like readAheadV2CachedBufferTTLInMillis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion. Taken

return readAheadExecutorServiceTTLInMilliSeconds;
}

public int getReadAheadV2CachedBufferTTLMilliseconds() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, method name can follow same naming format every where.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Taken

/**
* TTL in milliseconds for the idle threads in executor service used by read ahead v2.
*/
public static final String FS_AZURE_READAHEAD_V2_EXECUTOR_SERVICE_TTL_MILLISECONDS = "fs.azure.readahead.v2.executor.service.ttl.seconds";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fs.azure.readahead.v2.executor.service.ttl.seconds" -> "fs.azure.readahead.v2.executor.service.ttl.milliseconds"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catc. Fixed it

ReadBufferManagerV2.setReadBufferManagerConfigs(
readAheadBlockSize, client.getAbfsConfiguration());
readBufferManager = ReadBufferManagerV2.getBufferManager();
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be under else if (readAheadEnabled) instead of else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always had RBM initlialised today.
Wanted to retain it to avoid any unexplored usage through NPE.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 22m 2s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 43m 51s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 38s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 33s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 42s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 35s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 12s trunk passed
+1 💚 shadedclient 41m 15s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 41m 38s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
-1 ❌ mvninstall 0m 23s /patch-mvninstall-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
-1 ❌ compile 0m 24s /patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ javac 0m 24s /patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ compile 0m 23s /patch-compile-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt hadoop-azure in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ javac 0m 23s /patch-compile-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt hadoop-azure in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 21s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 10 new + 3 unchanged - 0 fixed = 13 total (was 3)
-1 ❌ mvnsite 0m 24s /patch-mvnsite-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
-1 ❌ spotbugs 0m 23s /patch-spotbugs-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
+1 💚 shadedclient 45m 25s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 0m 27s /patch-unit-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
160m 48s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7801/11/artifact/out/Dockerfile
GITHUB PR #7801
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ef87c3fbf779 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e9ad12a
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7801/11/testReport/
Max. process+thread count 570 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7801/11/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 14m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 27s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 39s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 36s trunk passed
+1 💚 mvnsite 0m 43s trunk passed
+1 💚 javadoc 0m 43s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 35s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 10s trunk passed
+1 💚 shadedclient 35m 42s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 36m 4s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 32s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 33s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 21s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 9 new + 3 unchanged - 0 fixed = 12 total (was 3)
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 35m 18s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 3m 4s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
138m 50s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7801/12/artifact/out/Dockerfile
GITHUB PR #7801
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 16dbb289dfab 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 07:33:10 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 92dfbda
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7801/12/testReport/
Max. process+thread count 546 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7801/12/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@anujmodi2021 anujmodi2021 merged commit 9d5e111 into apache:trunk Jul 28, 2025
4 checks passed
anujmodi2021 added a commit to ABFSDriver/AbfsHadoop that referenced this pull request Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants