Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add STDOUT logging handler for Cloud Logging integration (SCP-5902) #383

Merged
merged 2 commits into from
Feb 24, 2025

Conversation

bistline
Copy link
Contributor

BACKGROUND & CHANGES

This update adds a logging handler for STDOUT to supplement the existing logging for ingest processes. By adding this, we will leverage the existing integration of cloud logging into Batch API jobs. This will add much greater visibility for jobs as we will not need to rely on the log file being delocalized back to the bucket. This benefits running jobs the most as we will get real-time log messages, allowing us to debug the state of a job without waiting for completion.

Note: This also contains an unrelated update to the vault-gsm-migration.sh script that was needed to help import secrets from Vault into GSM better.

MANUAL TESTING

  1. Initialize your environment as normal and run the following command from the ingest folder:
python ingest_pipeline.py --study-id 5d276a50421aa9117c982845 --study-file-id 5dd5ae25421aa910a723a337 ingest_cluster --cluster-file ../tests/data/test_1k_cluster_data.csv --ingest-cluster --name cluster1 --domain-ranges "{'x':[-1, 1], 'y':[-1, 1], 'z':[-1, 1]}"
  1. In addition to normal output, you should see messages that normally would only print to user_log.txt:
Transforming header CLUSTER
Transforming header SUBCLUSTER
Transforming header 1
Creating model for 5d276a50421aa9117c982845
Creating data array for header NAME
Attempting to load cluster header : NAME
Creating data array for header x
Attempting to load cluster header : x
Creating data array for header y
Attempting to load cluster header : y
Creating data array for header z
Attempting to load cluster header : z
Creating data array for header CLUSTER
Attempting to load cluster header : CLUSTER
Creating data array for header SUBCLUSTER
Attempting to load cluster header : SUBCLUSTER
Creating data array for header 1
Attempting to load cluster header : 1

@bistline bistline requested review from eweitz and jlchang January 30, 2025 19:03
Copy link
Member

@eweitz eweitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good!

Are there use cases that old file-based logging satisfies that this new stdout + cloud logging doesn't also satisfy? At a glance, this new approach seems like it might make that prior one redundant.

@bistline
Copy link
Contributor Author

Code looks good!

Are there use cases that old file-based logging satisfies that this new stdout + cloud logging doesn't also satisfy? At a glance, this new approach seems like it might make that prior one redundant.

That's a good point. Let me do a little testing but you might be right.

@bistline
Copy link
Contributor Author

Code looks good!
Are there use cases that old file-based logging satisfies that this new stdout + cloud logging doesn't also satisfy? At a glance, this new approach seems like it might make that prior one redundant.

That's a good point. Let me do a little testing but you might be right.

Just to close the loop on this - the one place where we still need the physical log file is when Rails reads it to send an error email to users. Because we can't get the Cloud Logging client for Ruby to work properly, this is the only way to get that information.

@bistline bistline merged commit 8c7ac7b into development Feb 24, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants