Add simple getter methods to fetch defaults #58

AshishSardana · 2025-04-24T21:56:32Z

Key info

TLM's integrations with frameworks (like LLamaIndex / Langchain) requires us to expose some important attributes, like the base model, quality present, max_output_tokens, etc.

As we use defaults for each of these, which the user can configure using the options argument, its useful to expose the defaults rather than hardcoding them in these integrations.

Not exposing the defaults might result in higher maintenance (of the integration) efforts and restrict forward compatibility with TLM.

What changed?

Create new utility to get default - TLM base model, and quality present.

What do you want the reviewer(s) to focus on?

The new file: src/cleanlab_tlm/utils/config.py

Checklist

Did you link the GitHub issue?
No issue raised as this functionality was discussed with Jonas on Slack.
Did you follow deployment steps or bump the version if needed?
Followed development.md.
Did you add/update tests?
No.
What QA did you do?
- Install cleanlab_tlm from my branch and execute the methods

src/cleanlab_tlm/utils/config.py

jwmueller · 2025-04-27T02:43:17Z

JFYI @jas2600 @huiwengoh we will need to update:
src/cleanlab_tlm/internal/constants.py

whenever we update our TLM defaults.

jwmueller

note the CI is complaining:

Also unit tests, but without looking, I'd guess that is not due to this PR (you should check).

Please do add some unit tests for this PR though. Here are some minimum tests (add others you think would be good):

Test 1: initialize TLM w default settings and verify that tlm.get_model_name() returns the same thing as utils.get_default_model()

Test 2: verify TLM with default settings works on an input string, which is just under utils.get_default_context_limit() tokens, and does not work on a much longer input string.

Test 3: verify TLM with default settings is able to generate an output of length just under utils.get_default_max_tokens() tokens, and that its output gets truncated before it significantly surpassses utils.get_default_max_tokens() tokens in length. (I'm unsure how to do this, maybe ask it to generate a really long story of at least XYZ pages).

Test 4: some test of utils.get_default_quality_preset(), i'm not sure what the best test would be without looking at other code.

Without tests like these, future developers of TLM are going to forget to update the constants you've defined here.

jas2600 · 2025-04-28T17:24:32Z

seems like all the tests failed immediately with 401 UNAUTHORIZED (i just rerun and got the same so likely not intermittent)

@AshishSardana i noticed for some reason this PR is trying to merge 5 commits into cleanlab:main (instead of cleanlab-tlm), and that might messed up how the CI gets the CLEANLAB_TLM_API_KEY secrets

AshishSardana · 2025-04-28T18:17:40Z

I am thinking about which tests would be unique for this feature.

Note I'm not adding any defaults myself. I only added _TLM_DEFAULT_CONTEXT_LIMIT but just found it is defined for tests/. It's definition is missing from API constants. Did I understand it right @huiwengoh ?

Test 1: ✅
Test 2: Already exists here.
Test 3: Is there a test that verifies the TLMResponse.text is not longer than TLMOptions.max_tokens (which be default is 512)? Leaning onto @huiwengoh to share any insights on these and guidance on how to implement. I only found tests for verifying response text is not longer than constant max_tokens (defined as 70k) here.

jwmueller · 2025-04-28T18:34:20Z

Test 2: Already exists here.

No that test is not using: utils.get_default_context_limit()

You need to be testing utils.get_default_context_limit()

I'm not adding any defaults myself.

Correct you should not be adding defaults or testing against hardcoded values.

You should be testing that the functions you added actually return the default values, no matter how those default values are updated in the future

jwmueller · 2025-04-30T01:34:45Z

src/cleanlab_tlm/utils/config.py

+    _TLM_DEFAULT_CONTEXT_LIMIT,
+    _TLM_MAX_TOKEN_RANGE
+)
+


every function in here needs to appear in a meaningful unit test. That unit test should rely on hardcoded values as little as possible.

E.g. testing get_default_model() can be achieved by initializing TLM w default settings, calling tlm.get_model_name() and asserting that it matches get_default_model() output

To test: get_default_context_limit()

You will probably need to hardcode this in the unit test:

CURRENT_DEFAULT_LIMIT = 70000

and then:

make a string of tokens whose length is just around CURRENT_DEFAULT_LIMIT - 500 say, and verify tlm.prompt() and tlm.score() successfully run when this string is passed as prompt.

make a string of tokens whose length is just around CURRENT_DEFAULT_LIMIT + 500 say, and verify tlm.prompt() and tlm.score() provide an expected error message when this string is passed as prompt.

AshishSardana · 2025-05-01T21:02:09Z

Since this PR didn't trigger CI (with environment variables required for TLM) properly, I raised another PR (branch in this repo v.s. a fork) here - #59

@jwmueller I've added 3 tests. Please review this instead #59

Add simple getter methods to fetch defaults to use in TLM's integrations

11f9237

jwmueller reviewed Apr 25, 2025

View reviewed changes

src/cleanlab_tlm/utils/config.py Show resolved Hide resolved

AshishSardana added 3 commits April 25, 2025 09:45

Add const _TLM_DEFAULT_CONTEXT_LIMIT=70000

318f5b1

Add gettr methods for context limit and max output tokens

0e88ef3

Fix typo

2a1badd

AshishSardana changed the title ~~Add simple getter methods to fetch defaults to use in TLM's integrations~~ Add simple getter methods to fetch defaults Apr 25, 2025

formatting

bfe326b

jwmueller self-requested a review April 27, 2025 02:50

jwmueller requested changes Apr 27, 2025

View reviewed changes

jwmueller reviewed Apr 30, 2025

View reviewed changes

jwmueller closed this May 2, 2025

AshishSardana mentioned this pull request May 2, 2025

Add getter functions for TLM defaults #59

Merged

4 tasks

AshishSardana deleted the asardana/get-default-config branch May 22, 2025 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add simple getter methods to fetch defaults #58

Add simple getter methods to fetch defaults #58

Uh oh!

AshishSardana commented Apr 24, 2025

Uh oh!

Uh oh!

jwmueller commented Apr 27, 2025

Uh oh!

jwmueller left a comment

Uh oh!

jas2600 commented Apr 28, 2025

Uh oh!

AshishSardana commented Apr 28, 2025

Uh oh!

jwmueller commented Apr 28, 2025 •

edited

Loading

Uh oh!

jwmueller Apr 30, 2025

Uh oh!

jwmueller Apr 30, 2025

Uh oh!

AshishSardana commented May 1, 2025

Uh oh!

Uh oh!

Add simple getter methods to fetch defaults #58

Add simple getter methods to fetch defaults #58

Uh oh!

Conversation

AshishSardana commented Apr 24, 2025

Key info

What changed?

What do you want the reviewer(s) to focus on?

Checklist

Uh oh!

Uh oh!

jwmueller commented Apr 27, 2025

Uh oh!

jwmueller left a comment

Choose a reason for hiding this comment

Uh oh!

jas2600 commented Apr 28, 2025

Uh oh!

AshishSardana commented Apr 28, 2025

Uh oh!

jwmueller commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jwmueller Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

jwmueller Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

AshishSardana commented May 1, 2025

Uh oh!

Uh oh!

jwmueller commented Apr 28, 2025 •

edited

Loading