Skip to content

feat!: Add support for filtering with hierarchal and auto-generated keys. #199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
"@mui/icons-material": "^6.4.1",
"@mui/joy": "^5.0.0-beta.51",
"axios": "^1.7.9",
"clp-ffi-js": "^0.4.0",
"clp-ffi-js": "^0.5.0",
"dayjs": "^1.11.13",
"monaco-editor": "0.50.0",
"react": "^19.0.0",
Expand Down
4 changes: 3 additions & 1 deletion src/services/decoders/ClpIrDecoder/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import {
convertToDayjsTimestamp,
isJsonObject,
} from "../JsonlDecoder/utils";
import {parseFilterKeys} from "../utils";
import {
CLP_IR_STREAM_TYPE,
getStructuredIrNamespaceKeys,
Expand All @@ -42,7 +43,8 @@ class ClpIrDecoder implements Decoder {
dataArray: Uint8Array,
decoderOptions: DecoderOptions
) {
this.#streamReader = new ffiModule.ClpStreamReader(dataArray, decoderOptions);
const readerOptions = parseFilterKeys(decoderOptions, true);
this.#streamReader = new ffiModule.ClpStreamReader(dataArray, readerOptions);
this.#streamType =
this.#streamReader.getIrStreamType() === ffiModule.IrStreamType.STRUCTURED ?
CLP_IR_STREAM_TYPE.STRUCTURED :
Expand Down
23 changes: 17 additions & 6 deletions src/services/decoders/JsonlDecoder/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,10 @@ import {
LogEvent,
LogLevelFilter,
} from "../../../typings/logs";
import {getNestedJsonValue} from "../../../utils/js";
import YscopeFormatter from "../../formatters/YscopeFormatter";
import {postFormatPopup} from "../../MainWorker";
import {parseFilterKeys} from "../utils";
import {
convertToDayjsTimestamp,
convertToLogLevelValue,
Expand All @@ -34,9 +36,9 @@ class JsonlDecoder implements Decoder {

#dataArray: Nullable<Uint8Array>;

#logLevelKey: string;
#logLevelKeyParts: string[];

#timestampKey: string;
#timestampKeyParts: string[];

#logEvents: LogEvent[] = [];

Expand All @@ -52,8 +54,11 @@ class JsonlDecoder implements Decoder {
*/
constructor (dataArray: Uint8Array, decoderOptions: DecoderOptions) {
this.#dataArray = dataArray;
this.#logLevelKey = decoderOptions.logLevelKey;
this.#timestampKey = decoderOptions.timestampKey;

const filterKeys = parseFilterKeys(decoderOptions, false);
this.#logLevelKeyParts = filterKeys.logLevelKey.parts;
this.#timestampKeyParts = filterKeys.timestampKey.parts;

this.#formatter = new YscopeFormatter({formatString: decoderOptions.formatString});
if (0 === decoderOptions.formatString.length) {
postFormatPopup();
Expand Down Expand Up @@ -165,8 +170,14 @@ class JsonlDecoder implements Decoder {
if (false === isJsonObject(fields)) {
throw new Error("Unexpected non-object.");
}
level = convertToLogLevelValue(fields[this.#logLevelKey]);
timestamp = convertToDayjsTimestamp(fields[this.#timestampKey]);
level = convertToLogLevelValue(getNestedJsonValue(
fields,
this.#logLevelKeyParts
));
timestamp = convertToDayjsTimestamp(getNestedJsonValue(
fields,
this.#timestampKeyParts
));
} catch (e) {
if (0 === line.length) {
return;
Expand Down
58 changes: 58 additions & 0 deletions src/services/decoders/utils.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import {DecoderOptions} from "../../typings/decoders";
import {
ParsedKey,
REPLACEMENT_CHARACTER,
} from "../../typings/formatters";
import {
EXISTING_REPLACEMENT_CHARACTER_WARNING,
parseKey,
replaceDoubleBacklash,
UNEXPECTED_AUTOGENERATED_SYMBOL_ERROR_MESSAGE,
} from "../formatters/YscopeFormatter/utils";


/**
* Preprocesses filter key by removing escaped backlash to facilitate simpler parsing, then parses
* the key.
*
* @param filterKey
* @return The parsed key.
*/
const preprocessThenParseFilterKey = (filterKey: string): ParsedKey => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(no need to address this right now - we can revisit in a future PR. we can file an issue if you also agree

With my recent experiences, i believe the Unicode Replacement character is used for replacing sequences that can't be decoded as UTF-8 characters, which means it may have a higher chance of occurrence than any other Unicode characters, depending on how users generate IRv2 logs.

In that case, we may want use a different character for replacement.

anyways, let's make sure this behaviour is covered when we put up the docs

Copy link
Contributor Author

@davemarco davemarco Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. tbh i only chose Unicode Replacement character because i thought the name was fitting. Note however, the error will only trigger on the keys not the values, which are probably human generated, so less likely to encounter replacement character.

Also, i think we want to get rid of this eventually with antler parser. With all that being said, if you want to change it to something else, we just need to change the character in the code in one place and replace the function docstrings in a few places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error will only trigger on the keys not the values, which are probably human generated, so less likely to use replacement character.

fair. a rare example of how this would happen is that people can write SHIFT-JIS Japanese in the keys and the logging framework / text editor decides to encode the log events as only UTF-8 in the logs. Then to select the key in the log viewer as a level / timestamp key, they would put the key partially (or fully) containing the unicode replacement character. again, i agree this is a multi-point failure, should be really rare, and for the existing use cases we're serving, likely we won't encounter such issues.

we want to get rid of this eventually with antler parser

yeah, hopefully we won't really see odd use cases before we migrate there

if (filterKey.includes(REPLACEMENT_CHARACTER)) {
console.warn(EXISTING_REPLACEMENT_CHARACTER_WARNING);
}

return parseKey(replaceDoubleBacklash(filterKey));
};

/**
* Parses the log level key and timestamp key from the decoder options.
*
* @param decoderOptions
* @param supportsAutoGeneratedKeys
* @return An object containing the parsed log level key and timestamp key.
* @throws {Error} If the keys contain reserved symbols.
*/
const parseFilterKeys = (decoderOptions: DecoderOptions, supportsAutoGeneratedKeys: boolean): {
logLevelKey: ParsedKey;
timestampKey: ParsedKey;
} => {
const parsedLogLevelKey = preprocessThenParseFilterKey(decoderOptions.logLevelKey);
const parsedTimestampKey = preprocessThenParseFilterKey(decoderOptions.timestampKey);

if (false === supportsAutoGeneratedKeys &&
(parsedLogLevelKey.isAutoGenerated || parsedTimestampKey.isAutoGenerated)) {
throw new Error(UNEXPECTED_AUTOGENERATED_SYMBOL_ERROR_MESSAGE);
}

return {
logLevelKey: parsedLogLevelKey,
timestampKey: parsedTimestampKey,
};
};

export {
parseFilterKeys,
preprocessThenParseFilterKey,
};
16 changes: 12 additions & 4 deletions src/services/formatters/YscopeFormatter/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import {
FIELD_PLACEHOLDER_REGEX,
Formatter,
FormatterOptionsType,
ParsedKey,
REPLACEMENT_CHARACTER,
YscopeFieldFormatter,
YscopeFieldPlaceholder,
Expand All @@ -11,10 +12,13 @@ import {LogEvent} from "../../../typings/logs";
import {jsonValueToString} from "../../../utils/js";
import {StructuredIrNamespaceKeys} from "../../decoders/ClpIrDecoder/utils";
import {
EXISTING_REPLACEMENT_CHARACTER_WARNING,
getFormattedField,
parseKey,
removeEscapeCharacters,
replaceDoubleBacklash,
splitFieldPlaceholder,
UNEXPECTED_AUTOGENERATED_SYMBOL_ERROR_MESSAGE,
YSCOPE_FIELD_FORMATTER_MAP,
} from "./utils";

Expand All @@ -34,8 +38,7 @@ class YscopeFormatter implements Formatter {
this.#structuredIrNamespaceKeys = options.structuredIrNamespaceKeys ?? null;

if (options.formatString.includes(REPLACEMENT_CHARACTER)) {
console.warn("Unicode replacement character `U+FFFD` found in format string; " +
"it will be replaced with \"\\\"");
console.warn(EXISTING_REPLACEMENT_CHARACTER_WARNING);
}

this.#processedFormatString = replaceDoubleBacklash(options.formatString);
Expand Down Expand Up @@ -90,8 +93,13 @@ class YscopeFormatter implements Formatter {
throw Error("Field placeholder regex is invalid and does not have a capture group");
}

const {parsedFieldName, formatterName, formatterOptions} =
splitFieldPlaceholder(groupMatch, this.#structuredIrNamespaceKeys);
const {fieldName, formatterName, formatterOptions} =
splitFieldPlaceholder(groupMatch);

const parsedFieldName: ParsedKey = parseKey(fieldName);
if (null === this.#structuredIrNamespaceKeys && parsedFieldName.isAutoGenerated) {
throw new Error(UNEXPECTED_AUTOGENERATED_SYMBOL_ERROR_MESSAGE);
}

let fieldFormatter: Nullable<YscopeFieldFormatter> = null;
if (null !== formatterName) {
Expand Down
44 changes: 26 additions & 18 deletions src/services/formatters/YscopeFormatter/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import {
AUTO_GENERATED_KEY_PREFIX,
COLON_REGEX,
DOUBLE_BACKSLASH,
ParsedFieldName,
ParsedKey,
PERIOD_REGEX,
REPLACEMENT_CHARACTER,
SINGLE_BACKSLASH,
Expand All @@ -22,6 +22,22 @@ import RoundFormatter from "./FieldFormatters/RoundFormatter";
import TimestampFormatter from "./FieldFormatters/TimestampFormatter";


/**
* Error message for when a key is prefixed with the `@` symbol in the format string or
* filter key for JSON logs.
*/
const UNEXPECTED_AUTOGENERATED_SYMBOL_ERROR_MESSAGE =
"`@` is a reserved symbol for CLP IR logs and must be escaped with `\\` for JSONL logs.";


/**
* Warning message for when Unicode replacement character is found in format string or filter
* key.
*/
const EXISTING_REPLACEMENT_CHARACTER_WARNING =
"Unicode replacement character `U+FFFD` found in format string or filter key; " +
"it will be replaced with a double backlash";

/**
* List of currently supported field formatters.
*/
Expand Down Expand Up @@ -81,7 +97,7 @@ const replaceDoubleBacklash = (str: string): string => {

/**
* Retrieves fields from auto-generated or user-generated namespace of a structured IR log
* event based on the prefix of the parsed key.
* event based on the prefix of the parsed field name.
*
* @param logEvent
* @param structuredIrNamespaceKeys
Expand All @@ -93,7 +109,7 @@ const replaceDoubleBacklash = (str: string): string => {
const getFieldsByNamespace = (
logEvent: LogEvent,
structuredIrNamespaceKeys: StructuredIrNamespaceKeys,
parsedFieldName: ParsedFieldName
parsedFieldName: ParsedKey
): JsonObject => {
const namespaceKey = parsedFieldName.isAutoGenerated ?
structuredIrNamespaceKeys.autoGenerated :
Expand Down Expand Up @@ -162,7 +178,7 @@ const validateComponent = (component: string | undefined): Nullable<string> => {
* @param key The key to be parsed.
* @return The parsed key.
*/
const parseKey = (key: string): ParsedFieldName => {
const parseKey = (key: string): ParsedKey => {
const isAutoGenerated = AUTO_GENERATED_KEY_PREFIX === key.charAt(0);
const keyWithoutAutoPrefix = isAutoGenerated ?
key.substring(1) :
Expand All @@ -176,22 +192,20 @@ const parseKey = (key: string): ParsedFieldName => {
};

/**
* Splits a field placeholder string into its components: parsed field name, formatter name, and
* Splits a field placeholder string into its components: field name, formatter name, and
* formatter options.
*
* @param placeholderString
* @param structuredIrNamespaceKeys
* @return - An object containing:
* - parsedFieldName: The parsed field name.
* - fieldName: The field name.
* - formatterName: The formatter name, or `null` if not provided.
* - formatterOptions: The formatter options, or `null` if not provided.
* @throws {Error} If the field name could not be parsed.
*/
const splitFieldPlaceholder = (
placeholderString: string,
structuredIrNamespaceKeys: Nullable<StructuredIrNamespaceKeys>
): {
parsedFieldName: ParsedFieldName;
fieldName: string;
formatterName: Nullable<string>;
formatterOptions: Nullable<string>;
} => {
Expand All @@ -207,14 +221,6 @@ const splitFieldPlaceholder = (
throw Error("Field name could not be parsed");
}

const parsedFieldName: ParsedFieldName = parseKey(fieldName);
if (null === structuredIrNamespaceKeys && parsedFieldName.isAutoGenerated) {
throw new Error(
"`@` is a reserved symbol and must be escaped with `\\` " +
"for JSONL logs."
);
}

formatterName = validateComponent(formatterName);
if (null !== formatterName) {
formatterName = removeEscapeCharacters(formatterName);
Expand All @@ -225,15 +231,17 @@ const splitFieldPlaceholder = (
formatterOptions = removeEscapeCharacters(formatterOptions);
}

return {parsedFieldName, formatterName, formatterOptions};
return {fieldName, formatterName, formatterOptions};
};


export {
EXISTING_REPLACEMENT_CHARACTER_WARNING,
getFormattedField,
parseKey,
removeEscapeCharacters,
replaceDoubleBacklash,
splitFieldPlaceholder,
UNEXPECTED_AUTOGENERATED_SYMBOL_ERROR_MESSAGE,
YSCOPE_FIELD_FORMATTER_MAP,
};
8 changes: 4 additions & 4 deletions src/typings/formatters.ts
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,12 @@ interface YscopeFieldFormatterMap {


/**
* Parsed field name from YScope format string.
* Parsed key from YScope format string or filter keys.
*
* @property isAutoGenerated whether the key is prefixed with `AUTO_GENERATED_KEY_PREFIX`.
* @property parts The key split into its hierarchical components.
*/
type ParsedFieldName = {
type ParsedKey = {
isAutoGenerated: boolean;
parts: string[];
};
Expand All @@ -74,7 +74,7 @@ type ParsedFieldName = {
* Parsed field placeholder from a YScope format string.
*/
type YscopeFieldPlaceholder = {
parsedFieldName: ParsedFieldName;
parsedFieldName: ParsedKey;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, how do we differentiate between a parsedKey and a parsedFieldName after this type renaming?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline:

The concept of "field names" are usually referenced in the context of field formatters / formatter strings. The docs also references this part of the specifier as <field-name>. For field formatters, it is fine to leave the name as-is. We can rename the field in a future refactoring PR if the name also changes in the docs.

fieldFormatter: Nullable<YscopeFieldFormatter>;

// Location of field placeholder in format string including braces.
Expand Down Expand Up @@ -124,7 +124,7 @@ const PERIOD_REGEX = Object.freeze(/(?<!\\)\./);
export type {
Formatter,
FormatterOptionsType,
ParsedFieldName,
ParsedKey,
YscopeFieldFormatter,
YscopeFieldFormatterMap,
YscopeFieldPlaceholder,
Expand Down