-
Notifications
You must be signed in to change notification settings - Fork 1
Add simple getter methods to fetch defaults #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simple getter methods to fetch defaults #58
Conversation
JFYI @jas2600 @huiwengoh we will need to update: whenever we update our TLM defaults. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note the CI is complaining:
Also unit tests, but without looking, I'd guess that is not due to this PR (you should check).
Please do add some unit tests for this PR though. Here are some minimum tests (add others you think would be good):
Test 1: initialize TLM w default settings and verify that tlm.get_model_name() returns the same thing as utils.get_default_model()
Test 2: verify TLM with default settings works on an input string, which is just under utils.get_default_context_limit() tokens, and does not work on a much longer input string.
Test 3: verify TLM with default settings is able to generate an output of length just under utils.get_default_max_tokens() tokens, and that its output gets truncated before it significantly surpassses utils.get_default_max_tokens() tokens in length. (I'm unsure how to do this, maybe ask it to generate a really long story of at least XYZ pages).
Test 4: some test of utils.get_default_quality_preset(), i'm not sure what the best test would be without looking at other code.
Without tests like these, future developers of TLM are going to forget to update the constants you've defined here.
seems like all the tests failed immediately with @AshishSardana i noticed for some reason this PR is trying to merge 5 commits into |
I am thinking about which tests would be unique for this feature. Note I'm not adding any defaults myself. I only added Test 1: ✅ |
No that test is not using: utils.get_default_context_limit() You need to be testing utils.get_default_context_limit()
Correct you should not be adding defaults or testing against hardcoded values. You should be testing that the functions you added actually return the default values, no matter how those default values are updated in the future |
_TLM_DEFAULT_CONTEXT_LIMIT, | ||
_TLM_MAX_TOKEN_RANGE | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every function in here needs to appear in a meaningful unit test. That unit test should rely on hardcoded values as little as possible.
E.g. testing get_default_model() can be achieved by initializing TLM w default settings, calling tlm.get_model_name() and asserting that it matches get_default_model() output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test: get_default_context_limit()
You will probably need to hardcode this in the unit test:
CURRENT_DEFAULT_LIMIT = 70000
and then:
-
make a string of tokens whose length is just around CURRENT_DEFAULT_LIMIT - 500 say, and verify tlm.prompt() and tlm.score() successfully run when this string is passed as
prompt
. -
make a string of tokens whose length is just around CURRENT_DEFAULT_LIMIT + 500 say, and verify tlm.prompt() and tlm.score() provide an expected error message when this string is passed as
prompt
.
Since this PR didn't trigger CI (with environment variables required for TLM) properly, I raised another PR (branch in this repo v.s. a fork) here - #59 @jwmueller I've added 3 tests. Please review this instead #59 |
Key info
TLM's integrations with frameworks (like LLamaIndex / Langchain) requires us to expose some important attributes, like the base model, quality present, max_output_tokens, etc.
As we use defaults for each of these, which the user can configure using the
options
argument, its useful to expose the defaults rather than hardcoding them in these integrations.Not exposing the defaults might result in higher maintenance (of the integration) efforts and restrict forward compatibility with TLM.
What changed?
Create new utility to get default - TLM base model, and quality present.
What do you want the reviewer(s) to focus on?
The new file: src/cleanlab_tlm/utils/config.py
Checklist
Did you link the GitHub issue?
No issue raised as this functionality was discussed with Jonas on Slack.
Did you follow deployment steps or bump the version if needed?
Followed development.md.
Did you add/update tests?
No.
What QA did you do?
cleanlab_tlm
from my branch and execute the methods