[SCHEMATIC-240] stripped google sheet information from open telemetry span status and message #1573

linglp · 2025-02-06T20:57:59Z

Problem

When an error involving Google Sheets is logged, the message may include the URL of the Google Sheet being processed and the google sheet link can be found in signoz both in log and in traces.

Solution

Create a custom log processor to filter out the sensitive information in the log

Evidence that this is working

schematic/manifest/generator.py

andrewelamb · 2025-02-06T21:30:26Z

Perhaps we should split the Jira issue out, with another to further investiagte doing this via OTEL/Signoz instead of the code?

andrewelamb · 2025-02-06T21:31:08Z

schematic/manifest/generator.py

+        try:
+            wb.set_dataframe(manifest_df, (1, 1), fit=True)
+        except HttpError as ex:
+            pattern = r"https://sheets\.googleapis\.com/v4/spreadsheets/[\w-]+"


Will this catch all google sheet urls?

Also my question. Could a change to the google api, that would otherwise be non-breaking, change the format of the URL used so that it's no longer appropriately found in the message string?

A change to the format of the URL would break this regex pattern. The regex could be updated to look for any /v. instead of just /v4. This is the only thing I think we could reasonable anticipate that may change, and we can resolve that potential issue now ahead of time.

linglp · 2025-02-06T22:16:12Z

@andrewelamb I'm not entirely sure if I'm on the right track, so I'd like to wait until @BryanFauble returns to confirm. This implementation removes Google Sheet links in spans, but I'm unsure if the ticket also requires a more general approach to stripping all sensitive information. I'll make further modifications as I discuss more with Bryan.

thomasyu888 · 2025-02-07T06:56:50Z

not entirely sure if I'm on the right track... This implementation removes Google Sheet links in spans, but I'm unsure if the ticket also requires a more general approach to stripping all sensitive information.

@linglp this is a good line of thought. Some questions to guide you. Is this the only place googlesheets are logged? Off the top of your head, do you know of any other sensitive information that is logged? Is there functionality with OTEL that filters out logs? (e,g https://signoz.io/blog/sending-and-filtering-python-logs-with-opentelemetry/#how-the-default-filter-keeps-out-unwanted-logs?)

@andrewelamb what are your thoughts? What we want to avoid is sensitive data being transferred to signoz cloud.

I'll add my personal views after this discussion.

thomasyu888 · 2025-02-07T07:03:05Z

schematic/manifest/generator.py

-        wb.set_dataframe(manifest_df, (1, 1), fit=True)
+        try:
+            wb.set_dataframe(manifest_df, (1, 1), fit=True)
+        except HttpError as ex:


Pending further discussion of whether we think this is the right approach, this new exception that's being caught should have a unit test.

This is a good example of tiny design before doing the work would be helpful, that said, sometimes you have to do a little bit before knowing your options.

Instead of an exception being caught in this way I would rather not have us modify any application code to implement a solution here.

I think the idea of using a log or span processor in the OTEL Python SDK is the appropriate solution.

The reason why I say this is:
Using the OTEL SDK gives us one spot where all redaction logic lives, we do not need to hunt around the code base to find all the spots we need to modify. It also gives us the template to follow and apply to other projects where we want to implement similar functionality.

Additional thoughts @andrewelamb @SageGJ ?

andrewelamb · 2025-02-07T16:24:58Z

@thomasyu888 I don't have anythign to add at the moment on data getting into Signoz that shouldn't(Except to agree that it's bad :) ). This is somethign I'll keep in mind as I start working more cloesly with Signoz however.

schematic/manifest/generator.py

SageGJ · 2025-02-07T17:34:40Z

What are y'all's thoughts about wrapping/modifying sys.excepthook, catching HttpErrors within, and sanitizing the messages there?

Something like*

import sys
def custom_except_hook(type, value, traceback):
  if type == HttpError:
    # message sanitizing
    # message raising
  else:
    sys.__excepthook__(type, value, traceback)
sys.excepthook = custom_except_hook

would apply to multiple locations where an error is raised that includes a google sheets url
could be extended to modify different error types that contain other sensitive information
avoids having to wrap multiple blocks of code in try: catch: statements

@linglp @andrewelamb @thomasyu888

*modified from here

linglp · 2025-02-07T20:00:52Z

Thanks for all the discussion here. In retrospect, a design document could help clarify things further. To summarize, the Google Sheet link was originally found in SigNoz traces, and this solution specifically removes it from traces and spans within a single function call. My plan was to confirm with Bryan whether removing sensitive information from traces is necessary, as the ticket only mentioned "logs," and whether handling it directly in the function (rather than using a custom trace processor) is an acceptable approach. I can also document what I’ve tried and why those approaches didn’t work separately.

Since the error originates from Google APIs, any part of the system that interacts with them could potentially trigger it. If Bryan confirms this approach is acceptable, I can proceed with adding unit tests and considering how to wrap the exception.

thomasyu888 · 2025-02-10T09:58:16Z

What are y'all's thoughts about wrapping/modifying sys.excepthook, catching HttpErrors within, and sanitizing the messages there?

@SageGJ here are my thoughts. Using sys.excepthook to filter out sensitive Google API URLs from exception messages can work and is creative, but some considerations:

Setting sys.excepthook changes how all unhandled exceptions are processed globally in one module. I wonder if it would have issues when it's run in a multi-threaded or multi-module environment.
This only catches uncaught exceptions. If the HttpError is handled somewhere else (e.g., inside a try-except block), this function won't intercept it.
There can be potential suppression of useful debug Info. This is a general issue, but we actually want these logs when this is run in the CLI/library, but we just don't want to send the information to SigNoz. @linglp . For example, users of schematic CLI should have these googlesheet links returned to them.

SageGJ · 2025-02-11T19:30:04Z

@thomasyu888 thanks for adding!
For the points you've added:

I agree with the concern about multi-threaded or multi module environments. I was envisioning applying this across all of schematic, in the __init__.py file since we could specifically catch and modify the appropriate HTTP errors with sheets urls in them and pass the other exceptions to the regular handler.
If the error is caught and handled without being raised will it still be logged in signoz?
I agree. There's also the concern when signoz is used locally with the library/cli where we'd want to censor this information from reaching signoz.

Given points 1 and 3, and if we decide we'd want to handle this within schematic and not within OTEL itself, we could modify __init__.py where the tracing is currently set up so that it also includes something like

import sys
import os

def custom_except_hook(type, value, traceback):
  if type == HttpError:
    # message sanitizing
    # message raising
  else:
    sys.__excepthook__(type, value, traceback)


signoz_enabled = os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT")
if signoz_enabled:
    sys.excepthook = custom_except_hook

I noticed tracing is still enabled in the absence of the OTEL headers so it might be better to check for the presence of TRACING_EXPORT_FORMAT or LOGGING_EXPORT_FORMAT for tracing in general.
I also realize changing how we process all exceptions could be a bit much so if we go this route we'd want to be really strict in selecting which ones are caught and to minimize side effects.

thomasyu888 · 2025-02-12T03:33:03Z

schematic/__init__.py

+        self._exporter = exporter
+        self._shutdown = False
+
+    def redact_google_sheet(self, message: str) -> str:


It's important we keep the logs when people are running the CLI. If I'm not mistaken, I think some CLI commands rely on the gsheets link being returned.

My guess is that this would fail some of the CLI tests.

See:

schematic/tests/integration/test_commands.py

Lines 675 to 681 in c50c599

google_sheet_result = [

result

for result in result_list

if result.startswith("https://docs.google.com/spreadsheets/d/")

]

assert len(google_sheet_result) == 1

google_sheet_url = google_sheet_result[0]

I don't think this is a concern for CLI usage. The reason is that the log messages that are sent via OTEL is a (deep?) copy of the data sent to stdout/console.

If I understand this solution it will only affect the messages making their way into SigNoz, but leave them as is in something like cloudwatch. Which, is probably fine since if someone had access to cloud watch, we're already in serious trouble. But, access to SigNoz is going to be given more freely, so we want to curate it carefully.

However, it should be tested to verify these assumption with the cli usage

BryanFauble · 2025-02-12T17:57:04Z

Thanks for the discussion everyone (@linglp @andrewelamb @thomasyu888 @linglp )

#1573 (comment)

Captures my thoughts. I would rather we utilize the OTEL Python SDK to handle the sanitization of the data, rather than the traditional native python ways. By implementing the logic via processors in the Python SDK it will work similar to the concept of how it would be implemented within the OpenTelemtry Collector shown here: https://opentelemetry.io/docs/collector/architecture/#pipelines

Specifically it has the concept of using one or more processors to handle data transformation before that data is exported. Since we are developing code locally without a collector that may always be present, It's important to be able to filter this data out before it leaves the machine where the data is produced. Technically when this code is running in AWS we could use the collector approach, but i wanted us to figure out if we could do it in code before the collector (Lingling is proving that we can in this pull request!)

BryanFauble · 2025-02-12T22:01:58Z

schematic/__init__.py

+        if span.status.status_code == trace.StatusCode.ERROR:
+            if span.events:
+                redacted_span = self._create_redacted_span(span)
+                self.export.export([redacted_span])


Suggested change

if span.status.status_code == trace.StatusCode.ERROR:

if span.events:

redacted_span = self._create_redacted_span(span)

self.export.export([redacted_span])

if span.status.status_code == trace.StatusCode.ERROR:

if span.events:

redacted_span = self._create_redacted_span(span)

self.export.export([redacted_span])

return

self.export.export([span])

Is this needed? I'm not sure tbh

I don't think this is going to work with how we have this set up because of:
https://github.com/open-telemetry/opentelemetry-python/blob/a7fe4f8bac7fa36291c6acf86982bbb356e3ae6d/opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py#L173-L175

Specifically:

We have a BatchSpanProcessor that was already responsible for queuing up and exporting the spans to SigNoz

Based on the code at the link above the span is sent to each processor, in this case the data is likely being sent twice. Once in the BatchSpanProcessor that is still being called, and once here.

Here is a hack how we can get around this issue:

Instead of using a processor to handle this logic, as we can see, won't work we could monkey patch the _readable_span function call: https://github.com/open-telemetry/opentelemetry-python/blob/a7fe4f8bac7fa36291c6acf86982bbb356e3ae6d/opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py#L906-L921

In the monkey patch we: 1) Start by calling our sensitive data redaction process to strip data out of the span, 2) Return a call to the original function.

By monkey patching this, it would allow us to essentially "slip" in logic before a read-only span has been created.

Let me know what questions you have.

schematic/__init__.py

sonarqubecloud · 2025-02-19T00:18:45Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
91.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

linglp added 2 commits February 6, 2025 15:17

catch google http error and sanitized error message

18711dd

remove unncessary code

4d2e0e3

linglp requested a review from a team as a code owner February 6, 2025 20:57

remove print

7efb1a2

andrewelamb reviewed Feb 6, 2025

View reviewed changes

schematic/manifest/generator.py Outdated Show resolved Hide resolved

andrewelamb reviewed Feb 6, 2025

View reviewed changes

schematic/manifest/generator.py Outdated Show resolved Hide resolved

andrewelamb reviewed Feb 6, 2025

View reviewed changes

linglp added 2 commits February 6, 2025 16:59

remove unused imports

c8c185b

add comment

aec48b6

thomasyu888 reviewed Feb 7, 2025

View reviewed changes

SageGJ reviewed Feb 7, 2025

View reviewed changes

schematic/manifest/generator.py Outdated Show resolved Hide resolved

linglp marked this pull request as draft February 7, 2025 20:26

add a custom log processor to remove sensitive information in the log

15f39ad

thomasyu888 reviewed Feb 12, 2025

View reviewed changes

linglp added 2 commits February 12, 2025 16:00

add span processor

1e6dd23

use export

76b24f0

BryanFauble reviewed Feb 12, 2025

View reviewed changes

BryanFauble mentioned this pull request Feb 12, 2025

Provide hooks to modify span content before conversion into ReadableSpan open-telemetry/opentelemetry-python#4424

Open

patch the _readable_span method in span class and remove processor

9a24156

BryanFauble reviewed Feb 13, 2025

View reviewed changes

schematic/__init__.py Outdated Show resolved Hide resolved

add test; move function to util

c7872de

linglp added 6 commits February 17, 2025 13:57

add module docstring and fix syntax

b698db5

remove try except in code block

45f87e1

remove unnecessary import

0d67245

added the space back

0cd35d1

revert space changes

2687365

remove sensitive info in status description

1e88ba7

BryanFauble reviewed Feb 18, 2025

View reviewed changes

schematic/__init__.py Outdated Show resolved Hide resolved

BryanFauble reviewed Feb 18, 2025

View reviewed changes

schematic/__init__.py Outdated Show resolved Hide resolved

linglp added 4 commits February 18, 2025 16:52

modify attribute directly

5fc18a4

remoove unused imports

d6825d2

strip sensitive info in log in a consistent way

0e60545

remove unused import

fd397ad

linglp marked this pull request as ready for review February 19, 2025 00:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCHEMATIC-240] stripped google sheet information from open telemetry span status and message #1573

[SCHEMATIC-240] stripped google sheet information from open telemetry span status and message #1573

linglp commented Feb 6, 2025 •

edited

Loading

andrewelamb commented Feb 6, 2025

andrewelamb Feb 6, 2025

SageGJ Feb 7, 2025

BryanFauble Feb 12, 2025

linglp commented Feb 6, 2025

thomasyu888 commented Feb 7, 2025 •

edited

Loading

thomasyu888 Feb 7, 2025

andrewelamb Feb 7, 2025

BryanFauble Feb 12, 2025

andrewelamb commented Feb 7, 2025

SageGJ commented Feb 7, 2025

linglp commented Feb 7, 2025 •

edited

Loading

thomasyu888 commented Feb 10, 2025 •

edited

Loading

SageGJ commented Feb 11, 2025

thomasyu888 Feb 12, 2025 •

edited

Loading

BryanFauble Feb 12, 2025

BryanFauble commented Feb 12, 2025

BryanFauble Feb 12, 2025

BryanFauble Feb 12, 2025

BryanFauble Feb 12, 2025

BryanFauble Feb 12, 2025

sonarqubecloud bot commented Feb 19, 2025

	google_sheet_result = [
	result
	for result in result_list
	if result.startswith("https://docs.google.com/spreadsheets/d/")
	]
	assert len(google_sheet_result) == 1
	google_sheet_url = google_sheet_result[0]

[SCHEMATIC-240] stripped google sheet information from open telemetry span status and message #1573

Are you sure you want to change the base?

[SCHEMATIC-240] stripped google sheet information from open telemetry span status and message #1573

Conversation

linglp commented Feb 6, 2025 • edited Loading

Problem

Solution

Evidence that this is working

andrewelamb commented Feb 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linglp commented Feb 6, 2025

thomasyu888 commented Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewelamb commented Feb 7, 2025

SageGJ commented Feb 7, 2025

linglp commented Feb 7, 2025 • edited Loading

thomasyu888 commented Feb 10, 2025 • edited Loading

SageGJ commented Feb 11, 2025

thomasyu888 Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BryanFauble commented Feb 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Feb 19, 2025

Quality Gate passed

linglp commented Feb 6, 2025 •

edited

Loading

thomasyu888 commented Feb 7, 2025 •

edited

Loading

linglp commented Feb 7, 2025 •

edited

Loading

thomasyu888 commented Feb 10, 2025 •

edited

Loading

thomasyu888 Feb 12, 2025 •

edited

Loading