Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions common/utils/src/main/resources/error/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,13 @@ The terms error class, state, and condition come from the SQL standard.
* Error sub-condition: `TOLERANCE_IS_UNFOLDABLE`
* Error sub-condition: `UNSUPPORTED_DIRECTION`

### Optional subconditions

Subconditions are **optional**: you may raise or check an error at the main condition level even when the condition has subconditions in `error-conditions.json`.

* **Raising:** You may throw using only the main condition (e.g. `CANNOT_LOAD_STATE_STORE`) when that condition has a `subClass` in the JSON. In that case you do not provide any subcondition name or subcondition-specific message parameters; the main condition's message template is used.
* **Checking (tests):** In tests, `checkError` (Scala) and `check_error` (PySpark) match by condition by default: passing the main condition matches both the main condition and any of its subconditions, and you do not have to assert subcondition parameters. This allows evolving a condition from having no subconditions to having subconditions without breaking existing tests. When a test must assert the exact subcondition and exact parameters, use the strict option (`matchExactConditionAndParameters` in Scala, `match_exact_condition_and_parameters` in PySpark).

### Inconsistent Use of the Term "Error Class"

Unfortunately, we have historically used the term "error class" inconsistently to refer both to a proper error class like `42` and also to an error condition like `DATATYPE_MISSING_SIZE`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,8 @@ class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {
val errorInfo = errorInfoMap.getOrElse(
mainErrorClass,
throw SparkException.internalError(s"Cannot find main error class '$errorClass'"))
assert(errorInfo.subClass.isDefined == subErrorClass.isDefined)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I understand the motivation, I think this might be too lenient if left just like this. it would be nice to have an "opt-in" mechanism, so we have some kind of an indication that this is a desired behavior for a specific error condition. otherwise, we are very silently converting all error conditions to a new behavior - over time this might lead to subclasses being effectively unused, since the path of least resistance is always the main condition.

I think it's not too complex to mitigate though. We can make an opt-in mechanism in error-conditions.json, something like:

"TABLE_OR_VIEW_NOT_FOUND": {                                                                                                                                      
    "message": ["..."],                                                                                                                                             
    "subClassOptional": true,                                                                                                                                       
    "subClass": {                                                                                                                                                   
      "PATH": { "message": ["..."] }                                                                                                                                
    }                                                                                                                                                               
  }

And then use subClassOptional in ErrorClassesJSONReader to relax the condition when the value is set to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to to that. Let's see what other think. Note that this is NOT what we do for SQL Scripting handlers though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how is this directly related to handlers? Here, we are changing how errors are raised, not handled. Regarding handler finding mechanism, you already explained that it would handle the new behavior, right?

If you are talking about not having SIGNAL in SQL Scripting, that's a gap that needs to be addressed. Once we add SIGNAL, I guess it should follow the same behavior as raise_error does.

But that's a separate matter - is there any special behavior that we should be aware of when supporting SIGNAL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"subClassOptional": true looks wrong to me. We should always allow the error raising side to skip the sub error condition when not needed (we should still check error parameters to be an exact match to avoid mistakes).

while for test, we should still do extract match by default (if error was thrown with sub error condition, checkError should use MAIN.SUB. if error was not thrown with sub error condition, checkError should use MAIN). I'm OK to special case some widely tested errors like TABLE_OR_VIEW_NOT_FOUND: when it starts to have error conditions, we can avoid touching many tests. So checkError should have a new "strict" flag, which by default is true (except for the special cases). Tests can override it if needed, and it's a test behavior, not a property of the error (allow optional sub error condition or not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't do that. If I need to change code to supend the proper check, I may as well change the code to do the proper check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that I should always be able to raise an error without a subcondition. n excpliict default should not be necessary.

Copy link
Contributor Author

@srielau srielau Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, thsi is what I'm trying to avoid:
Screenshot 2026-03-18 at 3 56 29 PM

So the idea is to have: TABLE_OR_VIEW_NOT_FOUND.WITH_PATH and use it where it's worth testing (or roll it out in chunks). But I cannot break existing uisers just testing for TABLE_OR_VIEW_NOT_FOUND. That's a breaking change.


// When main-only (no subcondition), use main message template even if the condition has
// subconditions in JSON.
if (subErrorClass.isEmpty) {
errorInfo.messageTemplate
} else {
Expand Down
66 changes: 51 additions & 15 deletions core/src/test/scala/org/apache/spark/SparkTestSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -276,38 +276,74 @@ trait SparkTestSuite

/**
* Checks an exception with an error condition against expected results.
* By default, condition matches if the exception's condition equals the expected condition
* or is a subcondition (e.g. CONDITION.SUBCONDITION when expecting CONDITION). When passing
* only the main condition, parameters are not required to be full (subset check). Use
* matchExactConditionAndParameters when a test must require the exact condition with no
* subcondition and exact parameters.
* @param exception The exception to check
* @param condition The expected error condition identifying the error
* @param sqlState Optional the expected SQLSTATE, not verified if not supplied
* @param parameters A map of parameter names and values. The names are as defined
* in the error-classes file.
* @param matchPVals Optionally treat the parameters value as regular expression pattern.
* false if not supplied.
* @param matchExactConditionAndParameters When true, require exact condition match (no prefix
* match) and full parameter equality. When false (default), condition
* may be main or subcondition; param check is subset when prefix match.
*/
protected def checkError(
exception: SparkThrowable,
condition: String,
sqlState: Option[String] = None,
parameters: Map[String, String] = Map.empty,
matchPVals: Boolean = false,
queryContext: Array[ExpectedContext] = Array.empty): Unit = {
assert(exception.getCondition === condition)
queryContext: Array[ExpectedContext] = Array.empty,
matchExactConditionAndParameters: Boolean = false): Unit = {
val actualCondition = exception.getCondition
val conditionMatches = if (matchExactConditionAndParameters) {
actualCondition === condition
} else {
actualCondition === condition || (condition.nonEmpty && actualCondition != null &&
actualCondition.startsWith(condition + "."))
}
assert(conditionMatches,
s"Expected condition '$condition' (matchExact=$matchExactConditionAndParameters), " +
s"got '$actualCondition'")
sqlState.foreach(state => assert(exception.getSqlState === state))
val expectedParameters = exception.getMessageParameters.asScala
if (matchPVals) {
assert(expectedParameters.size === parameters.size)
expectedParameters.foreach(
exp => {
val parm = parameters.getOrElse(exp._1,
throw new IllegalArgumentException("Missing parameter" + exp._1))
if (!exp._2.matches(parm)) {
throw new IllegalArgumentException("For parameter '" + exp._1 + "' value '" + exp._2 +
"' does not match: " + parm)
}
val expectedParameters = exception.getMessageParameters.asScala.toMap
val isPrefixMatch = !matchExactConditionAndParameters && actualCondition != null &&
actualCondition.startsWith(condition + ".")
if (isPrefixMatch) {
// When matching by main condition only, only require that passed parameters match (subset).
parameters.foreach { case (key, value) =>
assert(expectedParameters.contains(key),
s"Expected parameter '$key' not found in: $expectedParameters")
if (matchPVals) {
assert(value != null && expectedParameters(key).matches(value),
s"For parameter '$key' value '${expectedParameters(key)}' does not match: $value")
} else {
assert(expectedParameters(key) === value,
s"Expected parameter '$key' = '$value', got '${expectedParameters(key)}'")
}
)
}
} else {
assert(expectedParameters === parameters)
// Exact condition match: full parameter equality as before.
if (matchPVals) {
assert(expectedParameters.size === parameters.size)
expectedParameters.foreach(
exp => {
val parm = parameters.getOrElse(exp._1,
throw new IllegalArgumentException("Missing parameter" + exp._1))
if (!exp._2.matches(parm)) {
throw new IllegalArgumentException("For parameter '" + exp._1 + "' value '" + exp._2 +
"' does not match: " + parm)
}
}
)
} else {
assert(expectedParameters === parameters)
}
}
val actualQueryContext = exception.getQueryContext()
assert(actualQueryContext.length === queryContext.length, "Invalid length of the query context")
Expand Down
47 changes: 47 additions & 0 deletions core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,53 @@ class SparkThrowableSuite extends SparkFunSuite {
assert(e.getMessageParameters().get("message").contains("Undefined error message parameter"))
}

test("SPARK-56029: Optional subconditions - main-only condition when JSON has subClass") {
// CANNOT_LOAD_STATE_STORE has subconditions in JSON; we can use main-only (no subcondition).
val mainOnlyTemplate = errorReader.getMessageTemplate("CANNOT_LOAD_STATE_STORE")
assert(mainOnlyTemplate === "An error occurred during loading state.")
val mainOnlyParams = errorReader.getMessageParameters("CANNOT_LOAD_STATE_STORE")
assert(mainOnlyParams.isEmpty)
// Raising with main-only and empty params should work.
val ex = new SparkRuntimeException("CANNOT_LOAD_STATE_STORE", Map.empty[String, String])
assert(ex.getCondition === "CANNOT_LOAD_STATE_STORE")
assert(ex.getMessage.startsWith(
"[CANNOT_LOAD_STATE_STORE] An error occurred during loading state."))
assert(ex.getMessageParameters.asScala.isEmpty)
}

test("SPARK-56029: checkError default matches condition or subcondition (prefix match)") {
val exWithSubcondition = new SparkRuntimeException(
"CANNOT_LOAD_STATE_STORE.UNCATEGORIZED",
Map.empty[String, String])
checkError(exWithSubcondition, "CANNOT_LOAD_STATE_STORE", parameters = Map.empty)
checkError(exWithSubcondition, "CANNOT_LOAD_STATE_STORE.UNCATEGORIZED", parameters = Map.empty)
}

test("SPARK-56029: checkError with matchExactConditionAndParameters requires exact condition") {
val exWithSubcondition = new SparkRuntimeException(
"CANNOT_LOAD_STATE_STORE.UNCATEGORIZED",
Map.empty[String, String])
checkError(
exWithSubcondition,
"CANNOT_LOAD_STATE_STORE.UNCATEGORIZED",
parameters = Map.empty,
matchExactConditionAndParameters = true)
val exMainOnly = new SparkRuntimeException("CANNOT_LOAD_STATE_STORE", Map.empty[String, String])
checkError(
exMainOnly,
"CANNOT_LOAD_STATE_STORE",
parameters = Map.empty,
matchExactConditionAndParameters = true)
val ex = intercept[org.scalatest.exceptions.TestFailedException] {
checkError(
exWithSubcondition,
"CANNOT_LOAD_STATE_STORE",
parameters = Map.empty,
matchExactConditionAndParameters = true)
}
assert(ex.getMessage.contains("matchExact"))
}

test("Error message is formatted") {
assert(
getMessage(
Expand Down
47 changes: 47 additions & 0 deletions python/pyspark/sql/tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
AnalysisException,
ParseException,
PySparkAssertionError,
PySparkException,
PySparkValueError,
IllegalArgumentException,
SparkUpgradeException,
Expand Down Expand Up @@ -1797,6 +1798,52 @@ def test_assert_schema_equal_with_decimal_types(self):
with self.assertRaises(PySparkAssertionError):
assertSchemaEqual(s1, s2)

def test_check_error_optional_subconditions_spark_56029(self):
"""SPARK-56029: check_error default matches condition or subcondition (prefix match)."""
# Use an error class that exists in Python error-conditions.json and has sub_class
params = {"method": "setRuntimeConf"}
ex = PySparkException(
errorClass="SESSION_MUTATION_IN_DECLARATIVE_PIPELINE.SET_RUNTIME_CONF",
messageParameters=params,
)
self.check_error(ex, "SESSION_MUTATION_IN_DECLARATIVE_PIPELINE", messageParameters=params)
self.check_error(
ex,
"SESSION_MUTATION_IN_DECLARATIVE_PIPELINE.SET_RUNTIME_CONF",
messageParameters=params,
)

def test_check_error_match_exact_condition_spark_56029(self):
"""SPARK-56029: check_error with match_exact_condition_and_parameters."""
params = {"method": "setRuntimeConf"}
ex_sub = PySparkException(
errorClass="SESSION_MUTATION_IN_DECLARATIVE_PIPELINE.SET_RUNTIME_CONF",
messageParameters=params,
)
self.check_error(
ex_sub,
"SESSION_MUTATION_IN_DECLARATIVE_PIPELINE.SET_RUNTIME_CONF",
messageParameters=params,
match_exact_condition_and_parameters=True,
)
ex_main = PySparkException(
errorClass="SESSION_MUTATION_IN_DECLARATIVE_PIPELINE",
messageParameters=params,
)
self.check_error(
ex_main,
"SESSION_MUTATION_IN_DECLARATIVE_PIPELINE",
messageParameters=params,
match_exact_condition_and_parameters=True,
)
with self.assertRaises(AssertionError):
self.check_error(
ex_sub,
"SESSION_MUTATION_IN_DECLARATIVE_PIPELINE",
messageParameters=params,
match_exact_condition_and_parameters=True,
)


class UtilsTests(ReusedSQLTestCase, UtilsTestsMixin):
pass
Expand Down
105 changes: 76 additions & 29 deletions python/pyspark/testing/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,17 @@ def check_error(
query_context_type: Optional[QueryContextType] = None,
fragment: Optional[str] = None,
matchPVals: bool = False,
match_exact_condition_and_parameters: bool = False,
):
"""
Check that the exception has the expected error condition and (optionally) parameters.

By default, condition matches if the exception's condition equals the expected condition
or is a subcondition (e.g. CONDITION.SUBCONDITION when expecting CONDITION). When
passing only the main condition, message parameters are not required to be full (subset
check). Use match_exact_condition_and_parameters=True when a test must require the
exact condition with no subcondition and exact parameters.
"""
query_context = exception.getQueryContext()
assert bool(query_context) == (query_context_type is not None), (
"`query_context_type` is required when QueryContext exists. "
Expand All @@ -427,40 +437,77 @@ def check_error(
f"checkError requires 'PySparkException', got '{exception.__class__.__name__}'.",
)

# Test error class
# Test error class (exact or prefix match by default)
expected = errorClass
actual = exception.getCondition()
self.assertEqual(
expected, actual, f"Expected error class was '{expected}', got '{actual}'."
actual_condition = exception.getCondition() or ""
if match_exact_condition_and_parameters:
condition_matches = actual_condition == expected
else:
condition_matches = actual_condition == expected or (
expected and actual_condition.startswith(expected + ".")
)
self.assertTrue(
condition_matches,
f"Expected error class was '{expected}' "
f"(match_exact={match_exact_condition_and_parameters}), got '{actual_condition}'.",
)

# Test message parameters
expected = messageParameters
actual = exception.getMessageParameters()
if matchPVals:
self.assertEqual(
len(expected),
len(actual),
"Expected message parameters count does not match actual message parameters count"
f": {len(expected)}, {len(actual)}.",
)
for key, value in expected.items():
self.assertIn(
key,
actual,
f"Expected message parameter key '{key}' was not found "
"in actual message parameters.",
)
self.assertRegex(
actual[key],
value,
f"Expected message parameter value '{value}' does not match actual message "
f"parameter value '{actual[key]}'.",
),
actual_params = exception.getMessageParameters() or {}
is_prefix_match = not match_exact_condition_and_parameters and actual_condition.startswith(
expected + "."
)
if is_prefix_match:
# When matching by main condition only, only require that passed parameters match.
if messageParameters:
for key, value in messageParameters.items():
self.assertIn(
key,
actual_params,
f"Expected message parameter key '{key}' not found in {actual_params}",
)
if matchPVals:
self.assertRegex(
actual_params[key],
value,
f"Parameter '{key}' value '{actual_params[key]}' "
f"does not match pattern '{value}'",
)
else:
self.assertEqual(
actual_params[key],
value,
f"Parameter '{key}': expected '{value}', "
f"got '{actual_params[key]}'",
)
else:
self.assertEqual(
expected, actual, f"Expected message parameters was '{expected}', got '{actual}'"
)
expected_params = messageParameters if messageParameters is not None else {}
if matchPVals:
self.assertEqual(
len(expected_params),
len(actual_params),
"Expected message parameters count does not match actual message "
f"parameters count: {len(expected_params)}, {len(actual_params)}.",
)
for key, value in expected_params.items():
self.assertIn(
key,
actual_params,
f"Expected message parameter key '{key}' was not found "
"in actual message parameters.",
)
self.assertRegex(
actual_params[key],
value,
f"Expected message parameter value '{value}' does not match "
f"actual message parameter value '{actual_params[key]}'.",
)
else:
self.assertEqual(
expected_params,
actual_params,
f"Expected message parameters was '{expected_params}', got '{actual_params}'",
)

# Test query context
if query_context:
Expand Down