Bug bash wap 2024 11 #397

cbb330 · 2025-11-18T22:15:44Z

Summary

Issue] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

- 20 test cases (10 Spark SQL + 10 Java API) - Pre-assigned to 10 team members - Comprehensive test prompts and templates - Automated results collection script - Reference documentation included

- Fixed grep errors with special characters and emojis - Corrected bug detection logic (was showing bugs on all tests) - Added error suppression for grep commands - Now works correctly on macOS

- Step-by-step instructions for team members (clone, execute, commit) - Status emoji guide (🔲 → 🔄 → ✅/❌) - Git workflow (pull before starting, push after completing) - Progress monitoring section for organizers - Tips for smooth collaboration - Makes it crystal clear how to participate in bug bash

New Files: - QUICKSTART.md: Fast setup guide with 3 options - start-testing.sh: Interactive setup wizard - spark-shell-command.sh: One-liner spark-shell launcher Features: - Guides team through SSH/ksudo steps - Auto-generates personalized log directories - Shows correct spark-shell command with OpenHouse configs - Displays test assignments and quick reference - Creates session logs: logs/{name}/session_{timestamp}.log Updated: - README.md: Added links to QUICKSTART and new scripts - File structure: Documented new helper scripts Makes it fast and easy for team members to: 1. SSH to ltx1-holdemgw03.grid.linkedin.com 2. Authenticate with ksudo -e openhouse 3. Start spark-shell with correct configs 4. Begin testing immediately Usage: ./start-testing.sh (interactive) OR ./spark-shell-command.sh your-name (on gateway)

…ateway script with absolute paths

…nce, better status guidance Changes: - Replace 'Copy and Run' with 'How to Start Testing' (numbered steps) - Add comprehensive operation table (Spark SQL vs Java API side-by-side) - Include DataFile creation example for Java tests - Clarify status update with exact markdown syntax to edit - Better formatting with bold labels and clear sections Quick Reference now includes: - Write data, Create branch, Cherry-pick, Fast-forward, Expire, WAP ops - Java API DataFiles.builder() example for test file creation - All operations shown in both SQL and Java Status update now shows exact syntax: **Status:** 🔲 NOT STARTED → 🔄 IN PROGRESS → ✅ PASS/❌ FAIL

- Script now runs on local machine, shows commands to copy-paste - No need to clone repo on gateway - work from local repo instead - Shows personalized test assignments - Generates exact 3-step workflow: ssh → ksudo → spark-shell - Updated README and QUICKSTART to reflect local execution Workflow: 1. Run ./start-testing.sh locally 2. Enter your name 3. Copy-paste the 3 commands shown 4. Start testing on gateway

- Script now automatically SSHs to gateway - Runs ksudo authentication - Starts spark-shell with correct config - All in one command - no manual steps! - Uses ssh -t for proper pseudo-terminal allocation - Updated docs to reflect automated workflow Workflow: 1. Run ./start-testing.sh locally 2. Enter your name 3. Authenticate when prompted (2FA/ksudo) 4. spark-shell starts automatically 5. Start testing! Much simpler for team members - just one script to run.

Changes: - start-testing.sh now only shows info (assignments, tips, commands) - Generates logs/{name}/connect.sh script for actual connection - Updated ksudo command: ksudo -s OPENHOUSE,HDFS,WEBHDFS,SWEBHDFS,HCAT,RM -e openhouse -- bash -c 'spark-shell...' - Shows full quick reference table with SQL/Java API commands - Displays testing tips (table names, status updates, cleanup) - Two-step workflow: 1. ./start-testing.sh (setup & info) 2. logs/{name}/connect.sh (connect & start) Benefits: - Users can review all info before connecting - Separate script can be rerun if connection drops - Proper ksudo service list for HDFS/HCAT access - Clean separation of concerns

Added to Quick Reference Commands: 1. Create Table example with dummy columns (id INT, name STRING) 2. Java API imports - all necessary Iceberg and OpenHouse imports - org.apache.iceberg._ - org.apache.iceberg.catalog._ - org.apache.iceberg.types.Types._ - org.apache.iceberg.data._ - org.apache.iceberg.spark._ - com.linkedin.openhouse.spark.OpenHouseSparkUtils 3. Common types & accessors: - How to get catalog from spark session - How to load Table - How to access Snapshot - How to get TableMetadata 4. Query current snapshot ID and parent ID examples This makes it much easier for team members to get started without hunting for import statements or type definitions.

The bash -c wrapper was causing spark-shell to receive a quit signal immediately upon startup. Changed from: ksudo ... -- bash -c 'spark-shell ...' To: ksudo ... -- spark-shell ... This allows spark-shell to properly receive stdin and stay interactive. The ksudo -- syntax directly passes the command without shell wrapping.

Changed from single-line ksudo -- spark-shell to multi-line: 1. ksudo authenticates 2. exec spark-shell runs with credentials Before: ksudo ... -- spark-shell ... After: ksudo -s OPENHOUSE,HDFS,WEBHDFS,SWEBHDFS,HCAT,RM -e openhouse exec spark-shell --conf ... This gives spark-shell proper terminal control after authentication. Using exec replaces the shell process with spark-shell for clean interaction.

The issue: ksudo creates an interactive subshell that waits for input. Attempts to pipe or exec spark-shell after ksudo don't work because ksudo's subshell consumes input differently than expected. New approach: Clear 3-step manual instructions 1. SSH to gateway 2. Run ksudo (creates authenticated subshell) 3. Manually run spark-shell in that subshell Benefits: - Works reliably with ksudo's interactive subshell behavior - spark-shell gets full terminal control - Clear, simple workflow - Saves spark-shell command to file for easy reference The spark-shell command is saved to logs/{name}/spark-shell-cmd.txt for easy copy-pasting.

Iceberg classes in LinkedIn's OpenHouse are shaded/relocated under: com.linkedin.openhouse.relocated.org.apache.iceberg.* Changed all import statements from: import org.apache.iceberg._ import org.apache.iceberg.catalog._ import org.apache.iceberg.types.Types._ import org.apache.iceberg.data._ import org.apache.iceberg.spark._ To: import com.linkedin.openhouse.relocated.org.apache.iceberg._ import com.linkedin.openhouse.relocated.org.apache.iceberg.catalog._ import com.linkedin.openhouse.relocated.org.apache.iceberg.types.Types._ import com.linkedin.openhouse.relocated.org.apache.iceberg.data._ import com.linkedin.openhouse.relocated.org.apache.iceberg.spark._ Also updated SparkCatalog cast to use the relocated package. Now imports will work correctly in spark-shell without errors.

Changed from: com.linkedin.openhouse.relocated.org.apache.iceberg.* To: liopenhouse.relocated.org.apache.iceberg.* LinkedIn's internal package structure uses 'liopenhouse' as the base package for relocated/shaded Iceberg dependencies.

OpenHouseSparkUtils class doesn't exist in the codebase. Removed the import line from the quick reference. The core Iceberg imports are sufficient for most testing needs.

Added pointers to existing test files as examples: - BranchTestSpark3_5.java: Comprehensive Spark SQL multi-branch tests - WapIdJavaTest.java: Java API WAP workflow example Team members can reference these files to see working examples of the operations they need to test.

Updated all table references from: openhouse.d1.test_xxx To: openhouse.u_openhouse.test_xxx This affects: - CREATE TABLE examples in start-testing.sh - Identifier.of() examples in Java API section - All metadata queries (snapshots, refs, branches) - DROP TABLE cleanup command - TEMPLATE.md verification queries Using u_openhouse database for all bug bash testing.

…house Changed database from d1 to u_openhouse in: - create-test-files.sh SQL test template - create-test-files.sh Java test template - All 20 regenerated test result files (sql-* and java-*) All test files now use openhouse.u_openhouse as the database.

Added: val timestamp = System.currentTimeMillis() This allows the ${timestamp} variable in table names to be set dynamically in spark-shell. Updated create-test-files.sh and regenerated all 10 SQL test result files.

Changed from raw SQL to Scala spark.sql() calls: - Changed code block language from 'sql' to 'scala' - Wrapped all SQL statements in spark.sql(s"...") - Added string interpolation with 's' prefix for ${timestamp} - Changed verification queries to use .show(false) - Updated all 10 SQL test result files This fixes the 'not found: value CREATE' error when running tests.

Manually updated sql-08-rohit.md, sql-09-selena.md, and sql-10-shanthoosh.md to match the format of other test files: - Changed from `sql` to `scala` code blocks - Added val timestamp = System.currentTimeMillis() - Wrapped SQL in spark.sql(s"...") - Changed d1 to u_openhouse - Updated verification queries to use .show(false) All 10 SQL test files now use the correct spark-shell syntax.

Removed unnecessary 'USING iceberg' clause that was causing errors: - Updated create-test-files.sh template - Regenerated all 10 SQL test files - Updated start-testing.sh example command CREATE TABLE now uses simple syntax: spark.sql(s"CREATE TABLE openhouse.u_openhouse.test_xxx (name string)")

Added comprehensive Quick Reference section to all 20 test files: SQL tests (sql-1 through sql-10): - Common Spark SQL operations - WAP configuration - Branch operations - Cherry-pick and fast-forward commands - Query examples for snapshots, refs, and branch data Java tests (java-1 through java-10): - Java API imports with relocated packages - Catalog and table access - Snapshot operations - Branch reference management - Table metadata queries Now when testers open a result file in vim, they have all the reference commands right there without switching to other docs.

## New Tests (SQL 11-17 + Java 11-17) New assignees: simbarashe, aastha, jiefan, zhe, kevin, junhao, ruolin SQL Tests: - SQL-11: Interleaved WAP and Direct Commits on Same Branch - SQL-12: Branch from WAP Snapshot Before Cherry-Pick - SQL-13: Concurrent Branch Commits During Fast-Forward Window - SQL-14: WAP Branch Target with Non-Existent Branch - SQL-15: Snapshot Expiration with Cross-Branch Dependencies - SQL-16: Rename Branch via Ref Management - SQL-17: WAP ID Collision and Override Java Tests: - Java-11: Transactional Multi-Branch Update with Rollback - Java-12: Branch Creation from Detached Snapshot - Java-13: Parallel Branch Append with Metadata Conflicts - Java-14: Snapshot Ref with Custom Metadata Properties - Java-15: Cross-Table Snapshot Reference Attempt - Java-16: Bulk Branch Creation and Snapshot Reuse - Java-17: Snapshot Replace with WAP Metadata Preservation ## Template Improvements - Added tableName variable in Quick Reference for easier copy-paste - Simplified from 'Steps Executed' to 'Input' section - Simplified from complex verification sections to single 'Output' section - Removed verbose Expected/Actual Results table - Streamlined Issues Found section - All existing test files updated to new format ## Updates - Updated assignments.md: 20 tests → 34 tests - Updated create-test-files.sh with new tests and improved template - Fixed cleanup reminder to use u_openhouse instead of d1 - Total: 34 test files (17 SQL + 17 Java)

Test case with face forward race

cbb330 added 30 commits November 3, 2025 14:48

introducing branching

4b60c70

wap branch green tests

8546d43

accidentally commented line

e206888

remove test with old behavior

ef1e5b4

fixing multi-branch commits and ambiguous references

d0de1da

refactoring for readability

554a3c3

fixed edge case

ea5ff0e

refactoring to make more simple

4fc3792

removing unused function

9d6aec0

workign tests for ambiguous commits

bf5a474

tests for the replication use case

4d9dae0

refactoring pipeline

abdf335

working tests and restructured code

4087462

adding comments

a101d72

working tests

11be438

complete refactor + new tests

c7426b4

fixing broken tests

afe2627

centralizing maps/lists in constructor and reusing in applyTo

6ba98f5

responding to comments

39b6cf1

removing guava

ad57067

rebasing from PR1 for the business logic, TODO: test rebasing

aa6f73e

small changes

c9c05ae

fixing comments

99f5636

working multi branch with improved and more reliable logic

2f1cad2

Bug Bash: SnapshotDiffApplier multi-branch testing setup

85aaf07

- 20 test cases (10 Spark SQL + 10 Java API) - Pre-assigned to 10 team members - Comprehensive test prompts and templates - Automated results collection script - Reference documentation included

Fix collect-results.sh for macOS BSD grep compatibility

bed76f8

- Fixed grep errors with special characters and emojis - Corrected bug detection logic (was showing bugs on all tests) - Added error suppression for grep commands - Now works correctly on macOS

Fix start-testing.sh: remove confusing prompts, generate executable g…

0aa5660

…ateway script with absolute paths

cbb330 and others added 30 commits November 18, 2025 15:28

Remove non-existent OpenHouseSparkUtils import

748b4cb

OpenHouseSparkUtils class doesn't exist in the codebase. Removed the import line from the quick reference. The core Iceberg imports are sufficient for most testing needs.

Add timestamp variable before CREATE TABLE in all SQL tests

724d42a

Added: val timestamp = System.currentTimeMillis() This allows the ${timestamp} variable in table names to be set dynamically in spark-shell. Updated create-test-files.sh and regenerated all 10 SQL test result files.

updated one test

b7c047a

fixing stuff

b10b3bc

Bug bash docs: fix spark import guidance

44c0436

Docs: use TableIdentifier in spark snippets

aba8c1d

Docs: document S1 commit example

6e9af76

Docs: add transaction example for S3

380cbde

Docs: add DataFile helper snippet

c294f54

finished test

186b9a5

Docs: remove java tests

9074a43

Update sql-2-daniel.md

85c8535

Test case with face forward race

Finish test4 sql

22ce58e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug bash wap 2024 11 #397

Bug bash wap 2024 11 #397

Uh oh!

cbb330 commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Bug bash wap 2024 11 #397

Are you sure you want to change the base?

Bug bash wap 2024 11 #397

Uh oh!

Conversation

cbb330 commented Nov 18, 2025

Summary

Changes

Testing Done

Additional Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants