Skip to content

Commit

Permalink
[PY] fix: Optimizing the to_string Function (#2107)
Browse files Browse the repository at this point in the history
The to_string function has been optimized to add logic for handling
string types and null values.

## Linked issues

closes: #2065

## Details

If non-English characters are used in the conversation, the entire
conversation history is saved as an escaped character, causing confusion
for the AI model

#### Change details

> Describe your changes, with screenshots and code snippets as
appropriate

**code snippets**:

```python
def to_string(tokenizer: Tokenizer, value: Any, as_json: bool = False) -> str:
    """
    Converts a value to a string representation.
    Dates are converted to ISO strings and Objects are converted to JSON or YAML,
    whichever is shorter.

    Args:
        tokenizer (Tokenizer): The tokenizer object used for encoding.
        value (Any): The value to be converted.
        as_json (bool, optional): Flag indicating whether to return the value as JSON string.
          Defaults to False.

    Returns:
        str: The string representation of the value.
    """
    if value is None:
        return ""
    
    if hasattr(value, "isoformat") and callable(value.isoformat):
        # Used when the value is a datetime object
        return value.isoformat()
    value = todict(value)

    if as_json:
        return json.dumps(value, default=lambda o: o.__dict__, ensure_ascii=False)

    # Return shorter version of object
    yaml_str = yaml.dump(value, allow_unicode=True)
    json_str = json.dumps(value, default=lambda o: o.__dict__, ensure_ascii=False)
    if len(tokenizer.encode(yaml_str)) < len(tokenizer.encode(json_str)):
        return yaml_str

    return json_str
```

**screenshots**:

<img width="876" alt="image"
src="https://github.com/user-attachments/assets/2c7fb9e4-c424-4ff4-a0f6-1c7ee7bc4913">



## Attestation Checklist

- [ ] My code follows the style guidelines of this project

- I have checked for/fixed spelling, linting, and other errors
- I have commented my code for clarity
- I have made corresponding changes to the documentation (updating the
doc strings in the code is sufficient)
- My changes generate no new warnings
- I have added tests that validates my changes, and provides sufficient
test coverage. I have tested with:
  - Local testing
  - E2E testing in Teams
- New and existing unit tests pass locally with my changes
  • Loading branch information
jamiesun authored Oct 14, 2024
1 parent 22473ff commit d4bf9cc
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 4 deletions.
9 changes: 5 additions & 4 deletions python/packages/ai/teams/utils/to_string.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,24 +24,25 @@ def to_string(tokenizer: Tokenizer, value: Any, as_json: bool = False) -> str:
tokenizer (Tokenizer): The tokenizer object used for encoding.
value (Any): The value to be converted.
as_json (bool, optional): Flag indicating whether to return the value as JSON string.
Defaults to False.
Defaults to False.
Returns:
str: The string representation of the value.
"""
if value is None:
return ""

if hasattr(value, "isoformat") and callable(value.isoformat):
# Used when the value is a datetime object
return value.isoformat()
value = todict(value)

if as_json:
return json.dumps(value, default=lambda o: o.__dict__)
return json.dumps(value, default=lambda o: o.__dict__, ensure_ascii=False)

# Return shorter version of object
yaml_str = yaml.dump(value)
json_str = json.dumps(value, default=lambda o: o.__dict__)
yaml_str = yaml.dump(value, allow_unicode=True)
json_str = json.dumps(value, default=lambda o: o.__dict__, ensure_ascii=False)
if len(tokenizer.encode(yaml_str)) < len(tokenizer.encode(json_str)):
return yaml_str

Expand Down
3 changes: 3 additions & 0 deletions python/packages/ai/tests/utils/test_to_string.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,6 @@ def test_to_string_with_object(self):
def test_to_string_with_object_as_json(self):
obj = {"key": "value", "key2": [1, 2, 3]}
self.assertEqual(to_string(self.tokenizer, obj, as_json=True), json.dumps(obj))

def test_to_string_with_nonen_string(self):
self.assertEqual(to_string(self.tokenizer, "非英文"), '"非英文"')

0 comments on commit d4bf9cc

Please sign in to comment.