-
Notifications
You must be signed in to change notification settings - Fork 63
Bug bash wap 2024 11 #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cbb330
wants to merge
63
commits into
main
Choose a base branch
from
bug-bash-wap-2024-11
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Bug bash wap 2024 11 #397
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- 20 test cases (10 Spark SQL + 10 Java API) - Pre-assigned to 10 team members - Comprehensive test prompts and templates - Automated results collection script - Reference documentation included
- Fixed grep errors with special characters and emojis - Corrected bug detection logic (was showing bugs on all tests) - Added error suppression for grep commands - Now works correctly on macOS
- Step-by-step instructions for team members (clone, execute, commit) - Status emoji guide (🔲 → 🔄 → ✅/❌) - Git workflow (pull before starting, push after completing) - Progress monitoring section for organizers - Tips for smooth collaboration - Makes it crystal clear how to participate in bug bash
New Files:
- QUICKSTART.md: Fast setup guide with 3 options
- start-testing.sh: Interactive setup wizard
- spark-shell-command.sh: One-liner spark-shell launcher
Features:
- Guides team through SSH/ksudo steps
- Auto-generates personalized log directories
- Shows correct spark-shell command with OpenHouse configs
- Displays test assignments and quick reference
- Creates session logs: logs/{name}/session_{timestamp}.log
Updated:
- README.md: Added links to QUICKSTART and new scripts
- File structure: Documented new helper scripts
Makes it fast and easy for team members to:
1. SSH to ltx1-holdemgw03.grid.linkedin.com
2. Authenticate with ksudo -e openhouse
3. Start spark-shell with correct configs
4. Begin testing immediately
Usage:
./start-testing.sh (interactive)
OR
./spark-shell-command.sh your-name (on gateway)
…ateway script with absolute paths
…nce, better status guidance Changes: - Replace 'Copy and Run' with 'How to Start Testing' (numbered steps) - Add comprehensive operation table (Spark SQL vs Java API side-by-side) - Include DataFile creation example for Java tests - Clarify status update with exact markdown syntax to edit - Better formatting with bold labels and clear sections Quick Reference now includes: - Write data, Create branch, Cherry-pick, Fast-forward, Expire, WAP ops - Java API DataFiles.builder() example for test file creation - All operations shown in both SQL and Java Status update now shows exact syntax: **Status:** 🔲 NOT STARTED → 🔄 IN PROGRESS → ✅ PASS/❌ FAIL
- Script now runs on local machine, shows commands to copy-paste - No need to clone repo on gateway - work from local repo instead - Shows personalized test assignments - Generates exact 3-step workflow: ssh → ksudo → spark-shell - Updated README and QUICKSTART to reflect local execution Workflow: 1. Run ./start-testing.sh locally 2. Enter your name 3. Copy-paste the 3 commands shown 4. Start testing on gateway
- Script now automatically SSHs to gateway - Runs ksudo authentication - Starts spark-shell with correct config - All in one command - no manual steps! - Uses ssh -t for proper pseudo-terminal allocation - Updated docs to reflect automated workflow Workflow: 1. Run ./start-testing.sh locally 2. Enter your name 3. Authenticate when prompted (2FA/ksudo) 4. spark-shell starts automatically 5. Start testing! Much simpler for team members - just one script to run.
Changes:
- start-testing.sh now only shows info (assignments, tips, commands)
- Generates logs/{name}/connect.sh script for actual connection
- Updated ksudo command: ksudo -s OPENHOUSE,HDFS,WEBHDFS,SWEBHDFS,HCAT,RM -e openhouse -- bash -c 'spark-shell...'
- Shows full quick reference table with SQL/Java API commands
- Displays testing tips (table names, status updates, cleanup)
- Two-step workflow:
1. ./start-testing.sh (setup & info)
2. logs/{name}/connect.sh (connect & start)
Benefits:
- Users can review all info before connecting
- Separate script can be rerun if connection drops
- Proper ksudo service list for HDFS/HCAT access
- Clean separation of concerns
Added to Quick Reference Commands: 1. Create Table example with dummy columns (id INT, name STRING) 2. Java API imports - all necessary Iceberg and OpenHouse imports - org.apache.iceberg._ - org.apache.iceberg.catalog._ - org.apache.iceberg.types.Types._ - org.apache.iceberg.data._ - org.apache.iceberg.spark._ - com.linkedin.openhouse.spark.OpenHouseSparkUtils 3. Common types & accessors: - How to get catalog from spark session - How to load Table - How to access Snapshot - How to get TableMetadata 4. Query current snapshot ID and parent ID examples This makes it much easier for team members to get started without hunting for import statements or type definitions.
The bash -c wrapper was causing spark-shell to receive a quit signal immediately upon startup. Changed from: ksudo ... -- bash -c 'spark-shell ...' To: ksudo ... -- spark-shell ... This allows spark-shell to properly receive stdin and stay interactive. The ksudo -- syntax directly passes the command without shell wrapping.
Changed from single-line ksudo -- spark-shell to multi-line: 1. ksudo authenticates 2. exec spark-shell runs with credentials Before: ksudo ... -- spark-shell ... After: ksudo -s OPENHOUSE,HDFS,WEBHDFS,SWEBHDFS,HCAT,RM -e openhouse exec spark-shell --conf ... This gives spark-shell proper terminal control after authentication. Using exec replaces the shell process with spark-shell for clean interaction.
The issue: ksudo creates an interactive subshell that waits for input.
Attempts to pipe or exec spark-shell after ksudo don't work because
ksudo's subshell consumes input differently than expected.
New approach: Clear 3-step manual instructions
1. SSH to gateway
2. Run ksudo (creates authenticated subshell)
3. Manually run spark-shell in that subshell
Benefits:
- Works reliably with ksudo's interactive subshell behavior
- spark-shell gets full terminal control
- Clear, simple workflow
- Saves spark-shell command to file for easy reference
The spark-shell command is saved to logs/{name}/spark-shell-cmd.txt
for easy copy-pasting.
Iceberg classes in LinkedIn's OpenHouse are shaded/relocated under: com.linkedin.openhouse.relocated.org.apache.iceberg.* Changed all import statements from: import org.apache.iceberg._ import org.apache.iceberg.catalog._ import org.apache.iceberg.types.Types._ import org.apache.iceberg.data._ import org.apache.iceberg.spark._ To: import com.linkedin.openhouse.relocated.org.apache.iceberg._ import com.linkedin.openhouse.relocated.org.apache.iceberg.catalog._ import com.linkedin.openhouse.relocated.org.apache.iceberg.types.Types._ import com.linkedin.openhouse.relocated.org.apache.iceberg.data._ import com.linkedin.openhouse.relocated.org.apache.iceberg.spark._ Also updated SparkCatalog cast to use the relocated package. Now imports will work correctly in spark-shell without errors.
Changed from: com.linkedin.openhouse.relocated.org.apache.iceberg.* To: liopenhouse.relocated.org.apache.iceberg.* LinkedIn's internal package structure uses 'liopenhouse' as the base package for relocated/shaded Iceberg dependencies.
OpenHouseSparkUtils class doesn't exist in the codebase. Removed the import line from the quick reference. The core Iceberg imports are sufficient for most testing needs.
Added pointers to existing test files as examples: - BranchTestSpark3_5.java: Comprehensive Spark SQL multi-branch tests - WapIdJavaTest.java: Java API WAP workflow example Team members can reference these files to see working examples of the operations they need to test.
Updated all table references from: openhouse.d1.test_xxx To: openhouse.u_openhouse.test_xxx This affects: - CREATE TABLE examples in start-testing.sh - Identifier.of() examples in Java API section - All metadata queries (snapshots, refs, branches) - DROP TABLE cleanup command - TEMPLATE.md verification queries Using u_openhouse database for all bug bash testing.
…house Changed database from d1 to u_openhouse in: - create-test-files.sh SQL test template - create-test-files.sh Java test template - All 20 regenerated test result files (sql-* and java-*) All test files now use openhouse.u_openhouse as the database.
Added: val timestamp = System.currentTimeMillis()
This allows the ${timestamp} variable in table names to be set dynamically
in spark-shell. Updated create-test-files.sh and regenerated all 10 SQL
test result files.
Changed from raw SQL to Scala spark.sql() calls:
- Changed code block language from 'sql' to 'scala'
- Wrapped all SQL statements in spark.sql(s"...")
- Added string interpolation with 's' prefix for ${timestamp}
- Changed verification queries to use .show(false)
- Updated all 10 SQL test result files
This fixes the 'not found: value CREATE' error when running tests.
Manually updated sql-08-rohit.md, sql-09-selena.md, and sql-10-shanthoosh.md to match the format of other test files: - Changed from `sql` to `scala` code blocks - Added val timestamp = System.currentTimeMillis() - Wrapped SQL in spark.sql(s"...") - Changed d1 to u_openhouse - Updated verification queries to use .show(false) All 10 SQL test files now use the correct spark-shell syntax.
Removed unnecessary 'USING iceberg' clause that was causing errors: - Updated create-test-files.sh template - Regenerated all 10 SQL test files - Updated start-testing.sh example command CREATE TABLE now uses simple syntax: spark.sql(s"CREATE TABLE openhouse.u_openhouse.test_xxx (name string)")
Added comprehensive Quick Reference section to all 20 test files: SQL tests (sql-1 through sql-10): - Common Spark SQL operations - WAP configuration - Branch operations - Cherry-pick and fast-forward commands - Query examples for snapshots, refs, and branch data Java tests (java-1 through java-10): - Java API imports with relocated packages - Catalog and table access - Snapshot operations - Branch reference management - Table metadata queries Now when testers open a result file in vim, they have all the reference commands right there without switching to other docs.
## New Tests (SQL 11-17 + Java 11-17) New assignees: simbarashe, aastha, jiefan, zhe, kevin, junhao, ruolin SQL Tests: - SQL-11: Interleaved WAP and Direct Commits on Same Branch - SQL-12: Branch from WAP Snapshot Before Cherry-Pick - SQL-13: Concurrent Branch Commits During Fast-Forward Window - SQL-14: WAP Branch Target with Non-Existent Branch - SQL-15: Snapshot Expiration with Cross-Branch Dependencies - SQL-16: Rename Branch via Ref Management - SQL-17: WAP ID Collision and Override Java Tests: - Java-11: Transactional Multi-Branch Update with Rollback - Java-12: Branch Creation from Detached Snapshot - Java-13: Parallel Branch Append with Metadata Conflicts - Java-14: Snapshot Ref with Custom Metadata Properties - Java-15: Cross-Table Snapshot Reference Attempt - Java-16: Bulk Branch Creation and Snapshot Reuse - Java-17: Snapshot Replace with WAP Metadata Preservation ## Template Improvements - Added tableName variable in Quick Reference for easier copy-paste - Simplified from 'Steps Executed' to 'Input' section - Simplified from complex verification sections to single 'Output' section - Removed verbose Expected/Actual Results table - Streamlined Issues Found section - All existing test files updated to new format ## Updates - Updated assignments.md: 20 tests → 34 tests - Updated create-test-files.sh with new tests and improved template - Fixed cleanup reminder to use u_openhouse instead of d1 - Total: 34 test files (17 SQL + 17 Java)
Test case with face forward race
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Issue] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.