Skip to content

Add simple getter methods to fetch defaults #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

AshishSardana
Copy link
Contributor

Key info

TLM's integrations with frameworks (like LLamaIndex / Langchain) requires us to expose some important attributes, like the base model, quality present, max_output_tokens, etc.

As we use defaults for each of these, which the user can configure using the options argument, its useful to expose the defaults rather than hardcoding them in these integrations.

Not exposing the defaults might result in higher maintenance (of the integration) efforts and restrict forward compatibility with TLM.

What changed?

Create new utility to get default - TLM base model, and quality present.

What do you want the reviewer(s) to focus on?

The new file: src/cleanlab_tlm/utils/config.py


Checklist

  • Did you link the GitHub issue?
    No issue raised as this functionality was discussed with Jonas on Slack.

  • Did you follow deployment steps or bump the version if needed?
    Followed development.md.

  • Did you add/update tests?
    No.

  • What QA did you do?

    • Install cleanlab_tlm from my branch and execute the methods

Screenshot 2025-04-24 at 14 55 44

@AshishSardana AshishSardana changed the title Add simple getter methods to fetch defaults to use in TLM's integrations Add simple getter methods to fetch defaults Apr 25, 2025
@jwmueller
Copy link
Member

JFYI @jas2600 @huiwengoh we will need to update:
src/cleanlab_tlm/internal/constants.py

whenever we update our TLM defaults.

@jwmueller jwmueller self-requested a review April 27, 2025 02:50
Copy link
Member

@jwmueller jwmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note the CI is complaining:

Screenshot 2025-04-26 at 7 50 28 PM

Also unit tests, but without looking, I'd guess that is not due to this PR (you should check).

Screenshot 2025-04-26 at 7 50 45 PM

Please do add some unit tests for this PR though. Here are some minimum tests (add others you think would be good):

Test 1: initialize TLM w default settings and verify that tlm.get_model_name() returns the same thing as utils.get_default_model()

Test 2: verify TLM with default settings works on an input string, which is just under utils.get_default_context_limit() tokens, and does not work on a much longer input string.

Test 3: verify TLM with default settings is able to generate an output of length just under utils.get_default_max_tokens() tokens, and that its output gets truncated before it significantly surpassses utils.get_default_max_tokens() tokens in length. (I'm unsure how to do this, maybe ask it to generate a really long story of at least XYZ pages).

Test 4: some test of utils.get_default_quality_preset(), i'm not sure what the best test would be without looking at other code.

Without tests like these, future developers of TLM are going to forget to update the constants you've defined here.

@jas2600
Copy link
Collaborator

jas2600 commented Apr 28, 2025

seems like all the tests failed immediately with 401 UNAUTHORIZED (i just rerun and got the same so likely not intermittent)

@AshishSardana i noticed for some reason this PR is trying to merge 5 commits into cleanlab:main (instead of cleanlab-tlm), and that might messed up how the CI gets the CLEANLAB_TLM_API_KEY secrets

@AshishSardana
Copy link
Contributor Author

I am thinking about which tests would be unique for this feature.

Note I'm not adding any defaults myself. I only added _TLM_DEFAULT_CONTEXT_LIMIT but just found it is defined for tests/. It's definition is missing from API constants. Did I understand it right @huiwengoh ?

Test 1: ✅
Test 2: Already exists here.
Test 3: Is there a test that verifies the TLMResponse.text is not longer than TLMOptions.max_tokens (which be default is 512)? Leaning onto @huiwengoh to share any insights on these and guidance on how to implement. I only found tests for verifying response text is not longer than constant max_tokens (defined as 70k) here.

@jwmueller
Copy link
Member

jwmueller commented Apr 28, 2025

Test 2: Already exists here.

No that test is not using: utils.get_default_context_limit()

You need to be testing utils.get_default_context_limit()

I'm not adding any defaults myself.

Correct you should not be adding defaults or testing against hardcoded values.

You should be testing that the functions you added actually return the default values, no matter how those default values are updated in the future

_TLM_DEFAULT_CONTEXT_LIMIT,
_TLM_MAX_TOKEN_RANGE
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every function in here needs to appear in a meaningful unit test. That unit test should rely on hardcoded values as little as possible.

E.g. testing get_default_model() can be achieved by initializing TLM w default settings, calling tlm.get_model_name() and asserting that it matches get_default_model() output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test: get_default_context_limit()

You will probably need to hardcode this in the unit test:

CURRENT_DEFAULT_LIMIT = 70000

and then:

  1. make a string of tokens whose length is just around CURRENT_DEFAULT_LIMIT - 500 say, and verify tlm.prompt() and tlm.score() successfully run when this string is passed as prompt.

  2. make a string of tokens whose length is just around CURRENT_DEFAULT_LIMIT + 500 say, and verify tlm.prompt() and tlm.score() provide an expected error message when this string is passed as prompt.

@AshishSardana
Copy link
Contributor Author

Since this PR didn't trigger CI (with environment variables required for TLM) properly, I raised another PR (branch in this repo v.s. a fork) here - #59

@jwmueller I've added 3 tests. Please review this instead #59

@jwmueller jwmueller closed this May 2, 2025
@AshishSardana AshishSardana deleted the asardana/get-default-config branch May 22, 2025 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants