-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
When using Hive-style partitioned tables where partition values contain URL-encoded characters (like / encoded as %2F or spaces as %20), DataFusion returns the literal encoded string instead of the decoded value.
For example, given a file at:
s3://bucket/table/category=foo%2Fbar/file.parquet
The partition column category returns the literal value foo%2Fbar instead of the expected decoded value foo/bar.
Related Issues
This is a follow-up to #7877, which was partially addressed by #8012.
While #8012 fixed URL decoding for the Table URL (ListingTableUrl::parse()), it did not apply decoding to the extracted partition values from the actual file paths within parse_partitions_for_path().
To Reproduce
use datafusion::datasource::listing::helpers::parse_partitions_for_path;
use datafusion::datasource::listing::ListingTableUrl;
use object_store::path::Path;
#[test]
fn test_reproduce_partition_decoding_issue() {
let table_url = ListingTableUrl::parse("s3://bucket/table").unwrap();
// Path contains URL encoded slash %2F
let file_path = Path::from("bucket/table/category=foo%2Fbar/file.parquet");
let partitions = parse_partitions_for_path(&table_url, &file_path, vec!["category"]);
// Current behavior: Some(["foo%2Fbar"])
// Expected behavior: Some(["foo/bar"])
assert_eq!(partitions, Some(vec!["foo/bar".to_string()]));
}Expected behavior
Partition values should be URL-decoded, consistent with how ListingTableUrl handles URL-encoded paths. This matches the behavior of Apache Spark and Apache Hive.
Additional context
The fix involves updating parse_partitions_for_path in datafusion/catalog-listing/src/helpers.rs to use percent-encoding.
Because decoding creates a new string, the function signature needs to change from Option<Vec<&str>> to Option<Vec<String>>.
This affects users storing data in Hive-partitioned layouts on object stores (S3/GCS/Azure) where special characters in paths are standard.
Common examples:
category=Electronics%2FComputers→Electronics/Computerscity=San%20Francisco→San Francisco