-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19613. [ABFS][ReadAheadV2] Refactor ReadBufferManager to isolate new code with the current working code #7801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
============================================================
|
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV1.java
Show resolved
Hide resolved
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV1.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV1.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBuffer.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV1.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV1.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferWorker.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManager.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV1.java
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the ReadBufferManager to isolate the existing implementation from new code being developed for performance improvements. It introduces ReadBufferManagerV1 (current implementation) and ReadBufferManagerV2 (future implementation with skeleton methods) to allow independent development while maintaining backwards compatibility.
- Refactors ReadBufferManager into an abstract base class with V1 and V2 implementations
- Adds configuration options for toggling between V1 and V2 implementations
- Updates all existing tests to use the refactored class structure
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
ReadBufferManager.java | Converted to abstract base class with common interface and utilities |
ReadBufferManagerV1.java | Complete implementation of existing ReadBufferManager functionality |
ReadBufferManagerV2.java | Skeleton implementation with TODO placeholders for future development |
ReadBufferWorker.java | Updated to accept ReadBufferManager instance instead of using singleton |
AbfsConfiguration.java | Added V2-specific configuration properties and getters |
ConfigurationKeys.java | Added configuration keys for ReadAhead V2 feature |
FileSystemConfigurations.java | Added default values for V2 configuration |
AbfsInputStreamContext.java | Added V2 enabled flag and corresponding getter/setter |
AbfsInputStream.java | Updated to use appropriate ReadBufferManager version based on configuration |
AzureBlobFileSystemStore.java | Added V2 enabled flag to input stream context |
TestAbfsInputStream.java | Updated all references to use ReadBufferManagerV1 |
ITestReadBufferManager.java | Updated to use ReadBufferManagerV1 and corrected class references |
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV2.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV2.java
Outdated
Show resolved
Hide resolved
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV1.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
private static final int ONE_MB = ONE_KB * ONE_KB; | ||
|
||
private static int thresholdAgeMilliseconds; | ||
private static int blockSize = 4 * ONE_MB; // default block size for read-ahead in bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should also be configurable right or should come from configuration class as default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already configurable. 4 MB is just default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right if we are refactoring, we can make this also come from the configuration class?
...s/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV1.java
Show resolved
Hide resolved
private byte[][] buffers; // array of byte[] buffers, to hold the data that is read | ||
private Stack<Integer> freeList = new Stack<>(); // indices in buffers[] array that are available | ||
private byte[][] buffers; | ||
private static ReadBufferManagerV1 bufferManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is a singleton instance should be declared as volatile ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed I think. Volatile is needed if someone is updating the value, but here only once it is init and a static single copy of vairable is shared among all streams.
Also, this was like this always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volatile is needed in double-checked locking to prevent instruction reordering and ensure visibility across threads. Without it, a thread may see a partially constructed Singleton instance. This is a new learning I had, can be added or kept as is.
String testAccountName = "testAccount.dfs.core.windows.net"; | ||
String defaultUri = this.getTestUrl().replace(this.getAccountName(), testAccountName); | ||
String defaultUri = getRawConfiguration().get(FS_DEFAULT_NAME_KEY). | ||
replace("blob.core.windows.net","dfs.core.windows.net"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use our available ".blob." constants here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Will take it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
@IntegerConfigurationValidatorAnnotation(ConfigurationKey = | ||
FS_AZURE_READAHEAD_V2_CACHED_BUFFER_TTL_MILLISECONDS, | ||
DefaultValue = DEFAULT_READAHEAD_V2_CACHED_BUFFER_TTL_MILLISECONDS) | ||
private int readAheadV2CachedBufferTTLMilliseconds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep attribute naming format constant? Above attribute name ends with TTLInMilliSeconds, this one has TTLMilliseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if possible we can shorten the attribute name like readAheadV2CachedBufferTTLInMillis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion. Taken
return readAheadExecutorServiceTTLInMilliSeconds; | ||
} | ||
|
||
public int getReadAheadV2CachedBufferTTLMilliseconds() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, method name can follow same naming format every where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Taken
/** | ||
* TTL in milliseconds for the idle threads in executor service used by read ahead v2. | ||
*/ | ||
public static final String FS_AZURE_READAHEAD_V2_EXECUTOR_SERVICE_TTL_MILLISECONDS = "fs.azure.readahead.v2.executor.service.ttl.seconds"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"fs.azure.readahead.v2.executor.service.ttl.seconds" -> "fs.azure.readahead.v2.executor.service.ttl.milliseconds"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catc. Fixed it
ReadBufferManagerV2.setReadBufferManagerConfigs( | ||
readAheadBlockSize, client.getAbfsConfiguration()); | ||
readBufferManager = ReadBufferManagerV2.getBufferManager(); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be under else if (readAheadEnabled) instead of else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always had RBM initlialised today.
Wanted to retain it to avoid any unexplored usage through NPE.
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
…te new code with the current working code (apache#7801)
This is the first PR in series of work done under Parent Jira: HADOOP-19596 to improve the performance of sequential reads in ABFS Driver.
Please refer to Parent JIRA for more details.
Description of PR
Jira: https://issues.apache.org/jira/browse/HADOOP-19613
Read Buffer Manager used today was introduced way back and has been stable for quite a while.
Read Buffer Manager to be introduced as part of HADOOP-19596 will introduce many changes incrementally over time. While the development goes on and we are able to fully stabilise the optimized version we need the current flow to be functional and undisturbed.
This work item is to isolate that from new code by refactoring ReadBufferManager class to have 2 different implementations with same public interfaces: ReadBufferManagerV1 and ReadBufferManagerV2.
This will also introduce new configs that can be used to toggle between new and old code.
How was this patch tested?
Existing tests were modified to work with the Refactored Classes.
More tests will be added with coming up PRs where new implementation will be introduced.
Test suite result added.