Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failures for EXPLAIN ANALYZE with CTEs #23824

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

infvg
Copy link
Contributor

@infvg infvg commented Oct 14, 2024

Description

The EXPLAIN ANALYZE operator only supports one substage when returning the output.
When outputting in TEXT format, modify it to loop through the substages and return
the substage ID and its plan.
For JSON format, return a list of plans.

Motivation and Context

Multiple substages were not previously supported in the output
Resolves: #23798

Impact

Modifies the EXPLAIN ANALYZE output

New text output
 Stage ID: 20241014_184402_00071_gx85r.1                                                                                                                       
 Fragment 1 [COORDINATOR_ONLY]                                                                                                                                 
     CPU: 16.78ms, Scheduled: 70.36ms, Input: 6 rows (1.45kB); per task: avg.: 6.00 std.dev.: 0.00, Output: 1 row (9B), 1 tasks                                
     Output layout: [rows]                                                                                                                                     
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableCommit[PlanNodeId 388][Optional[TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=__temp_ctes__, tableName=__presto_tem>
             CPU: 14.00ms (2.52%), Scheduled: 55.00ms (3.66%), Output: 1 row (9B)                                                                              
             Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                         
         - RemoteSource[2] => [rows_6:bigint, fragments:varbinary, commitcontext:varbinary]                                                                    
                 CPU: 1.00ms (0.18%), Scheduled: 13.00ms (0.86%), Output: 6 rows (1.45kB)                                                                      
                 Input avg.: 6.00 rows, Input std.dev.: 0.00%                                                                                                  
                                                                                                                                                               
 Fragment 2 [ROUND_ROBIN]                                                                                                                                      
     CPU: 543.08ms, Scheduled: 1.48s, Input: 3 rows (15B); per task: avg.: 0.75 std.dev.: 1.30, Output: 6 rows (1.45kB), 4 tasks                               
     Output layout: [rows_6, fragments, commitcontext]                                                                                                         
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableWriterMerge[PlanNodeId 447] => [rows_6:bigint, fragments:varbinary, commitcontext:varbinary]                                                       
             CPU: 8.00ms (1.44%), Scheduled: 95.00ms (6.32%), Output: 6 rows (1.45kB)                                                                          
             Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                         
         - LocalExchange[PlanNodeId 446][SINGLE] () => [partialrowcount:bigint, partialfragments:varbinary, partialcontext:varbinary]                          
                 CPU: 1.00ms (0.18%), Scheduled: 31.00ms (2.06%), Output: 9 rows (1.87kB)                                                                      
                 Input avg.: 1.13 rows, Input std.dev.: 29.40%                                                                                                 
             - TableWriter[PlanNodeId 389] => [partialrowcount:bigint, partialfragments:varbinary, partialcontext:varbinary]                                   
                     CPU: 525.00ms (94.59%), Scheduled: 1.27s (84.38%), Output: 9 rows (1.87kB)                                                                
                     Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                 
                     _c0_field := field (1:42)                                                                                                                 
                     Statistics collected: 0                                                                                                                   
                 - LocalExchange[PlanNodeId 445][ROUND_ROBIN] () => [field:integer]                                                                            
                         CPU: 2.00ms (0.36%), Scheduled: 25.00ms (1.66%), Output: 3 rows (15B)                                                                 
                         Input avg.: 0.19 rows, Input std.dev.: 387.30%                                                                                        
                     - RemoteSource[3] => [field:integer]                                                                                                      
                             CPU: 1.00ms (0.18%), Scheduled: 3.00ms (0.20%), Output: 3 rows (15B)                                                              
                             Input avg.: 0.19 rows, Input std.dev.: 387.30%                                                                                    
                                                                                                                                                               
 Fragment 3 [SINGLE]                                                                                                                                           
     CPU: 3.91ms, Scheduled: 13.35ms, Input: 3 rows (15B); per task: avg.: 3.00 std.dev.: 0.00, Output: 3 rows (15B), 1 tasks                                  
     Output layout: [field]                                                                                                                                    
     Output partitioning: ROUND_ROBIN []                                                                                                                       
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - Values[PlanNodeId 0] => [field:integer]                                                                                                                 
             CPU: 3.00ms (0.54%), Scheduled: 13.00ms (0.86%), Output: 3 rows (15B)                                                                             
             Input avg.: 3.00 rows, Input std.dev.: 0.00%                                                                                                      
             (INTEGER'1')                                                                                                                                      
             (INTEGER'2')                                                                                                                                      
             (INTEGER'3')                                                                                                                                      
                                                                                                                                                               
 Stage ID: 20241014_184402_00071_gx85r.4                                                                                                                       
 Fragment 4 [hive:buckets=128, bucketFunctionType=HIVE_COMPATIBLE, types=[string]]                                                                             
     CPU: 71.89ms, Scheduled: 100.49ms, Input: 3 rows (346B); per task: avg.: 0.75 std.dev.: 1.30, Output: 3 rows (15B), 4 tasks                               
     Output layout: [field_7]                                                                                                                                  
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableScan[PlanNodeId 390][TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=__temp_ctes__, tableName=__presto_temporary_tabl
             CPU: 71.00ms (100.00%), Scheduled: 100.00ms (100.00%), Output: 3 rows (15B)                                                                       
             Input avg.: 3.00 rows, Input std.dev.: 0.00%                                                                                                      
             LAYOUT: __temp_ctes__.__presto_temporary_table_parquet_20241014_184402_00071_gx85r_9a59cb18_71c7_4c64_b224_9b51b88a83c9{buckets=128}              
             field_7 := _c0_field:int:0:REGULAR (1:42)                                                                                                         
             Input: 3 rows (346B), Filtered: 0.00%                  
Old text output
 Fragment 1 [COORDINATOR_ONLY]                                                                                                                                 
     CPU: 5.39ms, Scheduled: 26.15ms, Input: 6 rows (1.45kB); per task: avg.: 6.00 std.dev.: 0.00, Output: 1 row (9B), 1 tasks                                 
     Output layout: [rows]                                                                                                                                     
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableCommit[PlanNodeId 388][Optional[TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=__temp_ctes__, tableName=__presto_tem
             CPU: 3.00ms (6.38%), Scheduled: 20.00ms (9.30%), Output: 1 row (9B)                                                                               
             Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                         
         - RemoteSource[2] => [rows_6:bigint, fragments:varbinary, commitcontext:varbinary]                                                                    
                 CPU: 1.00ms (2.13%), Scheduled: 5.00ms (2.33%), Output: 6 rows (1.45kB)                                                                       
                 Input avg.: 6.00 rows, Input std.dev.: 0.00%                                                                                                  
                                                                                                                                                               
 Fragment 2 [ROUND_ROBIN]                                                                                                                                      
     CPU: 45.53ms, Scheduled: 283.01ms, Input: 3 rows (15B); per task: avg.: 0.75 std.dev.: 1.30, Output: 6 rows (1.45kB), 4 tasks                             
     Output layout: [rows_6, fragments, commitcontext]                                                                                                         
     Output partitioning: SINGLE []                                                                                                                            
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - TableWriterMerge[PlanNodeId 447] => [rows_6:bigint, fragments:varbinary, commitcontext:varbinary]                                                       
             CPU: 7.00ms (14.89%), Scheduled: 40.00ms (18.60%), Output: 6 rows (1.45kB)                                                                        
             Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                         
         - LocalExchange[PlanNodeId 446][SINGLE] () => [partialrowcount:bigint, partialfragments:varbinary, partialcontext:varbinary]                          
                 CPU: 4.00ms (8.51%), Scheduled: 23.00ms (10.70%), Output: 9 rows (1.87kB)                                                                     
                 Input avg.: 1.13 rows, Input std.dev.: 29.40%                                                                                                 
             - TableWriter[PlanNodeId 389] => [partialrowcount:bigint, partialfragments:varbinary, partialcontext:varbinary]                                   
                     CPU: 26.00ms (55.32%), Scheduled: 98.00ms (45.58%), Output: 9 rows (1.87kB)                                                               
                     Input avg.: 0.00 rows, Input std.dev.: ?%                                                                                                 
                     _c0_field := field (1:42)                                                                                                                 
                     Statistics collected: 0                                                                                                                   
                 - LocalExchange[PlanNodeId 445][ROUND_ROBIN] () => [field:integer]                                                                            
                         CPU: 1.00ms (2.13%), Scheduled: 15.00ms (6.98%), Output: 3 rows (15B)                                                                 
                         Input avg.: 0.19 rows, Input std.dev.: 387.30%                                                                                        
                     - RemoteSource[3] => [field:integer]                                                                                                      
                             CPU: 1.00ms (2.13%), Scheduled: 9.00ms (4.19%), Output: 3 rows (15B)                                                              
                             Input avg.: 0.19 rows, Input std.dev.: 387.30%                                                                                    
                                                                                                                                                               
 Fragment 3 [SINGLE]                                                                                                                                           
     CPU: 5.86ms, Scheduled: 7.59ms, Input: 3 rows (15B); per task: avg.: 3.00 std.dev.: 0.00, Output: 3 rows (15B), 1 tasks                                   
     Output layout: [field]                                                                                                                                    
     Output partitioning: ROUND_ROBIN []                                                                                                                       
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                             
     - Values[PlanNodeId 0] => [field:integer]                                                                                                                 
             CPU: 4.00ms (8.51%), Scheduled: 5.00ms (2.33%), Output: 3 rows (15B)                                                                              
             Input avg.: 3.00 rows, Input std.dev.: 0.00%                                                                                                      
             (INTEGER'1')                                                                                                                                      
             (INTEGER'2')                                                                                                                                      
             (INTEGER'3')                                                                                                                                      
New JSON output
[{                                                                                                                                                            
    "1" : {                                                                                                                                                     
      "plan" : { ... }                                                                                                                                                        
    },                                                                                                                                                          
    "2" : {                                                                                                                                                     
      "plan" : { ... }                                                                                                                                                          
    },                                                                                                                                                          
    "3" : { ... }                                                                                                                                                          
  },{                                                                                                                                                           
    "4" : {                                                                                                                                                     
      "plan" : { ... }                                                                                                                                                       
    }                                                                                                                                                          
}]       
Old JSON output
 {                                                                                                                                                             
   "1" : {                                                                                                                                                     
     "plan" : { ... }                                                                                                                                                            
   },                                                                                                                                                          
   "2" : { ... },                                                                                                                                                          
   "3" : { ... }                                                                                                                                                        
 }      

Test Plan

Tested locally using the HiveQueryRunner

== RELEASE NOTES ==

General Changes
* Improve the EXPLAIN ANALYZE output for TEXT and JSON formats, so that TEXT now returns each stage ID followed by its plan. JSON returns a list of plans. :pr:`23824`

@infvg infvg force-pushed the PRESTO_23798 branch 2 times, most recently from 0e64d8c to dbe88ac Compare October 15, 2024 13:37
@infvg infvg marked this pull request as ready for review October 15, 2024 15:58
@infvg infvg requested a review from presto-oss October 15, 2024 15:58
@infvg infvg requested a review from ZacBlanco October 17, 2024 07:19
@infvg infvg force-pushed the PRESTO_23798 branch 5 times, most recently from e3a5e8f to b075136 Compare October 24, 2024 11:12
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! A few more comments

private final JsonCodec<Map<PlanFragmentId, JsonPlanFragment>> planMapCodec;
private final JsonCodec<JsonRenderedNode> codec;
private final JsonCodec<Map<PlanFragmentId, JsonPlan>> deserializationCodec;
private final JsonCodec<List<Map<PlanFragmentId, JsonPlan>>> deserializationCodec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change the top-level JSON form of the EXPLAIN (format JSON, type DISTRIBUTED) output as well as EXPLAIN ANALYZE (format JSON).

@aaneja do you have any objections to this? We will probably need to update some internal tooling to parse properly if this change goes through.

@infvg infvg force-pushed the PRESTO_23798 branch 2 times, most recently from 92a00e9 to 59bf4e8 Compare October 27, 2024 14:32
@infvg infvg requested a review from ZacBlanco October 27, 2024 14:33
@steveburnett
Copy link
Contributor

Thanks for the release note entry! Nit suggestion to rephrase to follow the Order of changes in the Release Note Guidelines.

== RELEASE NOTES ==

General Changes
* Improve the EXPLAIN ANALYZE output for TEXT and JSON formats, so that TEXT now returns each stage ID followed by its plan. JSON returns a list of plans. :pr:`23824`

@ZacBlanco ZacBlanco requested a review from aaneja December 3, 2024 20:32
@ZacBlanco ZacBlanco changed the title Modified EXPLAIN ANALYZE output Fix failures for EXPLAIN ANALYZE with CTEs Dec 3, 2024
The EXPLAIN ANALYZE operator only supports one substage when returning the output.
When outputting in TEXT format, modify it to loop through the substages and return
the substage ID and its plan.
For JSON format, return a list of plans.

Resolves: prestodb#23798
@aaneja
Copy link
Contributor

aaneja commented Jan 6, 2025

I think the root cause here is not with a failure in rendering, but with how the EXPLAIN ANALYZE is executing the query. PFA, two QueryInfo JSON's -

  • plain_materializedcte_queryinfo.json : Is the execution of WITH t as (VALUES 1, 2, 3) SELECT * FROM t. On this I notice that outputStage.subStages is an array with only 1 element

  • explain_analyze_materialized_cte_query_info.json : Is the execution of EXPLAIN ANALYZE WITH t as (VALUES 1, 2, 3) SELECT * FROM t. On this I notice that outputStage.subStages is an array with 2 elements. This is the reason we observe the failed assertion. The 'extra' sub-stage is a TableScanNode for the temp materialized CTE table. It's stageId differs from the 'regular' query run as well

To me, this look like a bug with how the plan got fragmented. I need to dive deeper to confirm this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EXPLAIN ANALYZE fails on queries with CTE materialization
4 participants