Releases: argilla-io/argilla
v1.18.0
🔆 Release highlights
💾 Add metadata properties to Feedback Datasets
You can now filter and sort records in Feedback Datasets in the UI and Python SDK using the metadata included in the records. To do that, you will first need to set up a MetadataProperty
in your dataset:
# set up a dataset including metadata properties
dataset = rg.FeedbackDataset(
fields=[
rg.TextField(name="prompt"),
rg.TextField(name="response"),
],
questions=[
rg.TextQuestion(name="question")
],
metadata_properties=[
rg.TermsMetadataProperty(name="source"),
rg.IntegerMetadataProperty(name="response_length", title="Response length")
]
)
Learn more about how to define metadata properties or adding or deleting metadata properties in existing datasets.
This will read the metadata in the records that match the name of the metadata property. Any other metadata present in the record not matching a metadata property will be saved but not available to use in the filtering and sorting features in the UI or SDK.
# create a record with metadata
record = rg.FeedbackRecord(
fields={
"prompt": "Why can camels survive long without water?",
"response": "Camels use the fat in their humps to keep them filled with energy and hydration for long periods of time."
},
metadata={"source": "wikipedia", "response_length": 105, "my_hidden_metadata": "hidden metadata"}
)
Learn more about how to create records with metadata and how to add, modify or delete metadata from existing records.
🗃️ Filter and sort records using metadata in Feedback Datasets
In the Python SDK, you can filter and sort records based on the Metadata Properties that you set up for your dataset. You can combine multiple filters and sorts. Here is an example of how you could use them:
filtered_records = remote.filter_by(
metadata_filters=[
rg.IntegerMetadataFilter(
name="response_length",
ge=500, # optional: greater or equal to
le=1000 # optional: lower or equal to
),
rg.TermsMetadataFilter(
name="source",
values=["wikipedia", "wikihow"]
)
]
).sort_by(
[
rg.SortBy(
field="response_length",
order="desc" # for descending or "asc" for ascending
)
]
In the UI, simply use the Metadata
and Sort
components to filter and sort records like this:
metadata_filter_ui.mp4
Read more about filtering and sorting in Feedback Datasets.
⚠️ Breaking change using SQLite as backend in a docker deployment
From version 1.17.0 a new argilla
os user is configured for the provided docker images. If you are using the docker deployment and you want to upload to this version from versions older than v1.17.0 (If you already updated from v1.17.0 this step was already applied - see Release Notes), you should change permissions to the SQLite db file, before upgrading the version. You can do it with the following action:
docker exec --user root <argilla_server_container_id> /bin/bash -c 'chmod -R 777 "$ARGILLA_HOME_PATH"'
Note: You can find the docker container id by running:
docker ps | grep -i argilla-server
713973693fb7 argilla/argilla-server:v1.16.0 "/bin/bash start_arg…" 11 hours ago Up 7 minutes 0.0.0.0:6900->6900/tcp docker-argilla-1
Once the version is upgraded, we recommend to provided proper security access to this folder by setting the user and group to the new argilla
user:
docker exec --user root <argilla_server_container_id> /bin/bash -c 'chown -R argilla:argilla "$ARGILLA_HOME_PATH"'
1.18.0 Changelog
Added
- New
GET /api/v1/datasets/:dataset_id/metadata-properties
endpoint for listing dataset metadata properties. (#3813) - New
POST /api/v1/datasets/:dataset_id/metadata-properties
endpoint for creating dataset metadata properties. (#3813) - New
PATCH /api/v1/metadata-properties/:metadata_property_id
endpoint allowing the update of a specific metadata property. (#3952) - New
DELETE /api/v1/metadata-properties/:metadata_property_id
endpoint for deletion of a specific metadata property. (#3911) - New
GET /api/v1/metadata-properties/:metadata_property_id/metrics
endpoint to compute metrics for a specific metadata property. (#3856) - New
PATCH /api/v1/records/:record_id
endpoint to update a record. (#3920) - New
PATCH /api/v1/dataset/:dataset_id/records
endpoint to bulk update the records of a dataset. (#3934) - Missing validations to
PATCH /api/v1/questions/:question_id
. Nowtitle
anddescription
are using the same validations used to create questions. (#3967) - Added
TermsMetadataProperty
,IntegerMetadataProperty
andFloatMetadataProperty
classes allowing to define metadata properties for aFeedbackDataset
. (#3818) - Added
metadata_filters
tofilter_by
method inRemoteFeedbackDataset
to filter based on metadata i.e.TermsMetadataFilter
,IntegerMetadataFilter
, andFloatMetadataFilter
. (#3834) - Added a validation layer for both
metadata_properties
andmetadata_filters
in their schemas and as part of theadd_records
andfilter_by
methods, respectively. (#3860) - Added
sort_by
query parameter to listing records endpoints that allows to sort the records byinserted_at
,updated_at
or metadata property. (#3843) - Added
add_metadata_property
method to bothFeedbackDataset
andRemoteFeedbackDataset
(i.e.FeedbackDataset
in Argilla). (#3900) - Added fields
inserted_at
andupdated_at
inRemoteResponseSchema
. (#3822) - Added support for
sort_by
forRemoteFeedbackDataset
i.e. aFeedbackDataset
uploaded to Argilla. (#3925) - Added
metadata_properties
support for bothpush_to_huggingface
andfrom_huggingface
. (#3947) - Add support for update records (
metadata
) from Python SDK. (#3946) - Added
delete_metadata_properties
method to delete metadata properties. (#3932) - Added
update_metadata_properties
method to updatemetadata_properties
. (#3961) - Added automatic model card generation through
ArgillaTrainer.save
(#3857) - Added
FeedbackDataset
TaskTemplateMixin
for pre-defined task templates. (#3969) - A maximum limit of 50 on the number of options a ranking question can accept. (#3975)
- New
last_activity_at
field toFeedbackDataset
exposing when the last activity for the associated dataset occurs. (#3992)
Changed
GET /api/v1/datasets/{dataset_id}/records
,GET /api/v1/me/datasets/{dataset_id}/records
andPOST /api/v1/me/datasets/{dataset_id}/records/search
endpoints to return thetotal
number of records. (#3848, #3903)- Implemented
__len__
method for filtered datasets to return the number of records matching the provided filters. (#3916) - Increase the default max result window for Elasticsearch created for Feedback datasets. (#3929)
- Force elastic index refresh after records creation. (#3929)
- Validate metadata fields for filtering and sorting in the Python SDK. (#3993)
- Using metadata property name instead of id for indexing data in search engine index. (#3994)
Fixed
- Fixed response schemas to allow
values
to beNone
i.e. when a record is discarded theresponse.values
are set toNone
. (#3926) -
New Contributors
Full Changelog: v1.17.0...v1.18.0
v1.17.0
☀️ Highlights
This release comes with a lot of new goodies and quality improvements. We added model card support for the ArgillaTrainer
, worked on the FeedbackDataset
task templates and added timestamps to responses. We also fixed a lot of bugs and improved the overall quality of the codebase. Enjoy!
🚨 Breaking change in updating existing Hugging Face Spaces deployments
The quickstart image startup script was changed from from /start_quickstart.sh
to /home/argilla/start_quickstart.sh
, which might cause existing Hugging Face Spaces deployments to malfunction. A fix was added for the Argilla template space via this PR. Alternatively, you can just create a new deployment.
⚠️ Breaking change using SQLite as backend in a docker deployment
From version 1.17.0 a new argilla
os user is configured for the provided docker images. If you are using the docker deployment and you want to upload to this version, you should do some actions once update your container and before working with Argilla. Execute the following command:
docker exec --user root <argilla_server_container_id> /bin/bash -c 'chown -R argilla:argilla "$ARGILLA_HOME_PATH"'
This will change the permissions on the argilla home path, which allows it to work with new containers.
Note: You can find the docker container id by running:
docker ps | grep -i argilla-server
713973693fb7 argilla/argilla-server:v1.17.0 "/bin/bash start_arg…" 11 hours ago Up 7 minutes 0.0.0.0:6900->6900/tcp docker-argilla-1
💾 ArgillaTrainer
Model Card Generation
The ArgillaTrainer
now supports automatic model card generation. This means that you can now generate a model card with all the required info for Hugging Face and directly share these models to the hub, as you would expect within the Hugging Face ecosystem. See the docs for more info.
model_card_kwargs = {
"language": ["en", "es"],
"license": "Apache-2.0",
"model_id": "all-MiniLM-L6-v2",
"dataset_name": "argilla/emotion",
"tags": ["nlp", "few-shot-learning", "argilla", "setfit"],
"model_summary": "Small summary of what the model does",
"model_description": "An extended explanation of the model",
"model_type": "A 1.3B parameter embedding model fine-tuned on an awesome dataset",
"finetuned_from": "all-MiniLM-L6-v2",
"repo": "https://github.com/..."
"developers": "",
"shared_by": "",
}
trainer = ArgillaTrainer(
dataset=dataset,
task=task,
framework="setfit",
framework_kwargs={"model_card_kwargs": model_card_kwargs}
)
trainer.train(output_dir="my_model")
# or get the card as `str` by calling the `generate_model_card` method
argilla_model_card = trainer.generate_model_card("my_model")
🦮 FeedbackDataset
Task Templates
The Argilla FeedbackDataset
now supports a number of task templates that can be used to quickly create a dataset for specific tasks out of the box. This should help starting users get right into the action without having to worry about the dataset structure. We support basic tasks like Text Classification but also allow you to setup complex RAG-pipelines. See the docs for more info.
import argilla as rg
ds = rg.FeedbackDataset.for_text_classification(
labels=["positive", "negative"],
multi_label=False,
use_markdown=True,
guidelines=None,
)
ds
# FeedbackDataset(
# fields=[TextField(name="text", use_markdown=True)],
# questions=[LabelQuestion(name="label", labels=["positive", "negative"])]
# guidelines="<Guidelines for the task>",
# )
⏱️ inserted_at
and updated_at
are added to responses
What are responses without timestamps? The RemoteResponseSchema
now supports inserted_at
and updated_at
fields. This should help you to keep track of the time when a response was created and updated. Perfectly, for keeping track of annotator performance within your company.
1.17.0
Added
- Added fields
inserted_at
andupdated_at
inRemoteResponseSchema
(#3822). - Added automatic model card generation through
ArgillaTrainer.save
(#3857). - Added task templates to the
FeedbackDataset
(#3973).
Changed
- Updated
Dockerfile
to use multi stage build (#3221 and #3793). - Updated active learning for text classification notebooks to use the most recent small-text version (#3831).
- Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces (#3831).
- FeedbackDataset API methods have been aligned to be accessible through the several implementations (#3937).
- The
unify_responses
support for remote datasets (#3937).
Fixed
- Fix field not shown in the order defined in the dataset settings. Closes #3959 (#3984)
- Updated active learning for text classification notebooks to pass ids of type int to
TextClassificationRecord
(#3831). - Fixed record fields validation that was preventing from logging records with optional fields (i.e.
required=True
) when the field value wasNone
(#3846). - Always set
pretrained_model_name_or_path
attribute as string inArgillaTrainer
(#3914). - The
inserted_at
andupdated_at
attributes are create using theutcnow
factory to avoid unexpected race conditions on timestamp creation (#3945) - Fixed
configure_dataset_settings
when providing the workspace via the argworkspace
(#3887). - Fixed saving of models trained with
ArgillaTrainer
with apeft_config
parameter (#3795). - Fixed backwards compatibility on
from_huggingface
when loading aFeedbackDataset
from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced (#3829). - Fixed
TrainingTaskForQuestionAnswering.__repr__
(#3969) - Fixed potential dictionary key-errors in
TrainingTask.prepare_for_training_with_*
-methods (#3969)
Deprecated
- Function
rg.configure_dataset
is deprecated in favour ofrg.configure_dataset_settings
. The former will be removed in version 1.19.0
New Contributors
- @osintalex made their first contribution in #3221
- @kursathalat made their first contribution in #3756
- @splevine made their first contribution in #3832
Full Changelog: v1.16.0...v1.17.0
v1.16.0
☀️ Highlights
This release comes with an auto save feature for the UI, an enhanced Argilla CLI app, new keyboard shortcuts for the annotation process in the Feedback Dataset and new integrations for the ArgillaTrainer
.
💾 Auto save
Have you been writing a long corrected text in a TextField
for a completion given by an LLM and you have refreshed the page before submitting it? Well, since this release you are covered! The Argilla UI will save every few seconds the responses given in the annotation form of a FeedbackDataset
. Annotators can partially annotate one record and then come back to finish the annotation process without losing the previous work.
👨🏻💻 More operations directly from the Argilla CLI
The Argilla CLI has been updated to include an extensive list of new commands, from users and datasets management to training models all from the terminal!
⌨️ New keyboard shorcuts for the Feedback Dataset
Now, you can seamlessly navigate within the feedback form using just your keyboard. We've extended the functionality of these shortcuts to cover all types of available questions: Label, Multi-label, Ranking, Rating and Text
QnA, Chat Completion with OpenAI and Sentence Transformers model training now in the ArgillaTrainer
The ArgillaTrainer
doesn't stop getting new features and improvements!
- A new
TrainingTask
has been added for Question and Answering (QnA) - Use a
FeedbackDataset
for fine-tuning an OpenAI model for Chat Completion - New integration with Sentence Transformers for fine-tuning a model for embedding generation
1.16.0
Added
- Added
ArgillaTrainer
integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739) - Added
ArgillaTrainer
integration withTrainingTask.for_question_answering
(#3740) - Added
Auto save record
to save automatically the current record that you are working on (#3541) - Added
ArgillaTrainer
integration with OpenAI, allowing fine tuning for chat completion (#3615) - Added
workspaces list
command to list Argilla workspaces (#3594). - Added
datasets list
command to list Argilla datasets (#3658). - Added
users create
command to create users (#3667). - Added
whoami
command to get current user (#3673). - Added
users delete
command to delete users (#3671). - Added
users list
command to list users (#3688). - Added
workspaces delete-user
command to remove a user from a workspace (#3699). - Added
datasets list
command to list Argilla datasets (#3658). - Added
users create
command to create users (#3667). - Added
users delete
command to delete users (#3671). - Added
workspaces create
command to create an Argilla workspace (#3676). - Added
datasets push-to-hub
command to push aFeedbackDataset
from Argilla into the HuggingFace Hub (#3685). - Added
info
command to get info about the used Argilla client and server (#3707). - Added
datasets delete
command to delete aFeedbackDataset
from Argilla (#3703). - Added
created_at
andupdated_at
properties toRemoteFeedbackDataset
andFilteredRemoteFeedbackDataset
(#3709). - Added handling
PermissionError
when executing a command with a logged in user with not enough permissions (#3717). - Added
workspaces add-user
command to add a user to workspace (#3712). - Added
workspace_id
param toGET /api/v1/me/datasets
endpoint (#3727). - Added
workspace_id
arg tolist_datasets
in the Python SDK (#3727). - Added
argilla
script that allows to execute Argilla CLI using theargilla
command (#3730). - Added
server_info
function to check the Argilla server information (also accessible viarg.server_info
) (#3772).
Changed
- Move
database
commands underserver
group of commands (#3710) server
commands only included in the CLI app whenserver
extra requirements are installed (#3710).- Updated
PUT /api/v1/responses/{response_id}
to replacevalues
stored with receivedvalues
in request (#3711). - Display a
UserWarning
when theuser_id
inWorkspace.add_user
andWorkspace.delete_user
is the ID of an user with the owner role as they don't require explicit permissions (#3716). - Rename
tasks
sub-package tocli
(#3723). - Changed
argilla database
command in the CLI to now be accessed viaargilla server database
, to be deprecated in the upcoming release (#3754). - Changed
visible_options
(of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).
Fixed
- Fixed
remove user modification in text component on clear answers
(#3775) - Fixed
Highlight raw text field in dataset feedback task
(#3731) - Fixed
Field title too long
(#3734) - Fixed error messages when deleting a
DatasetForTextClassification
(#3652) - Fixed
Pending queue
pagination problems when during data annotation (#3677) - Fixed
visible_labels
default value to be 20 just whenvisible_labels
not provided andlen(labels) > 20
, otherwise it will either be the providedvisible_labels
value orNone
, forLabelQuestion
andMultiLabelQuestion
(#3702). - Fixed
DatasetCard
generation whenRemoteFeedbackDataset
contains suggestions (#3718). - Add missing
draft
status inResponseSchema
as now there can be responses withdraft
status when annotating via the UI (#3749). - Searches when queried words are distributed along the record fields (#3759).
- Fixed Python 3.11 compatibility issue with
/api/datasets
endpoints due to theTaskType
enum replacement in the endpoint URL (#3769).
As always, thanks to our amazing contributors
Full Changelog: v1.15.1...v1.16.0
v1.15.1
Changelog 1.15.1
Fixed
- Fixed
Text component
text content sanitization behavior just for markdown to prevent disappear the text (#3738) - Fixed
Text component
now you need to press Escape to exit the text area (#3733) - Fixed
SearchEngine
was creating the same number of primary shards and replica shards for eachFeedbackDataset
(#3736).
v1.15.0
🔆 Highlights
Argilla 1.15.0 comes with an enhanced FeedbackDataset
settings page enabling the update of the dataset settings, an integration of the TRL package with the ArgillaTrainer
, and continues adding improvements to the Python client for managing FeedbackDataset
s.
⚙️ Update FeedbackDataset
settings from the UI
FeedbackDataset
settings page has been updated and now it allows to update the guidelines
and some attributes of the fields
and questions
of the dataset. Did you misspell the title or description of a field or question? Well, you don't have to remove your dataset and create it again anymore! Just go to the settings page and fix it.
🤖 TRL integration with the ArgillaTrainer
The famous TRL package for training Transformers with Reinforcement Learning techniques has been integrated with the ArgillaTrainer, that comes with four new TrainingTask
: SFT, Reward Modeling, PPO and DPO. Each training task expects a formatting function that will return the data in the expected format for training the model.
Check this 🆕 tutorial for training a Reward Model using the Argilla Trainer.
🐍 Filter FeedbackDataset
and remove suggestions
In the 1.14.0 release we added many improvements for working with remote FeedbackDataset
s. In this release, a new filter_by
method has been added that allows to filter the records of a dataset from the Python client. For now, the records can be only filtered using the response_status
, but we're planning adding more complex filters for the upcoming releases. In addition, new methods have been added allowing to remove the suggestions created for a record.
1.15.0
Added
- Added
Enable to update guidelines and dataset settings for Feedback Datasets directly in the UI
(#3489) - Added
ArgillaTrainer
integration with TRL, allowing for easy supervised finetuning, reward modeling, direct preference optimization and proximal policy optimization (#3467) - Added
formatting_func
toArgillaTrainer
forFeedbackDataset
datasets add a custom formatting for the data (#3599). - Added
login
function inargilla.client.login
to login into an Argilla server and store the credentials locally (#3582). - Added
login
command to login into an Argilla server (#3600). - Added
logout
command to logout from an Argilla server (#3605). - Added
DELETE /api/v1/suggestions/{suggestion_id}
endpoint to delete a suggestion given its ID (#3617). - Added
DELETE /api/v1/records/{record_id}/suggestions
endpoint to delete several suggestions linked to the same record given their IDs (#3617). - Added
response_status
param toGET /api/v1/datasets/{dataset_id}/records
to be able to filter byresponse_status
as previously included forGET /api/v1/me/datasets/{dataset_id}/records
(#3613). - Added
list
classmethod toArgillaMixin
to be used asFeedbackDataset.list()
, also including theworkspace
to list from as arg (#3619). - Added
filter_by
method inRemoteFeedbackDataset
to filter based onresponse_status
(#3610). - Added
list_workspaces
function (to be used asrg.list_workspaces
, butWorkspace.list
is preferred) to list all the workspaces from an user in Argilla (#3641). - Added
list_datasets
function (to be used asrg.list_datasets
) to list theTextClassification
,TokenClassification
, andText2Text
datasets in Argilla (#3638). - Added
RemoteSuggestionSchema
to manage suggestions in Argilla, including thedelete
method to delete suggestios from Argilla viaDELETE /api/v1/suggestions/{suggestion_id}
(#3651). - Added
delete_suggestions
toRemoteFeedbackRecord
to remove suggestions from Argilla viaDELETE /api/v1/records/{record_id}/suggestions
(#3651).
Changed
- Changed
Optional label for * mark for required question
(#3608) - Updated
RemoteFeedbackDataset.delete_records
to use batch delete records endpoint (#3580). - Included
allowed_for_roles
for someRemoteFeedbackDataset
,RemoteFeedbackRecords
, andRemoteFeedbackRecord
methods that are only allowed for users with rolesowner
andadmin
(#3601). - Renamed
ArgillaToFromMixin
toArgillaMixin
(#3619). - Move
users
CLI app underdatabase
CLI app (#3593). - Move server
Enum
classes toargilla.server.enums
module (#3620).
Fixed
- Fixed
Filter by workspace in breadcrumbs
(#3577) - Fixed
Filter by workspace in datasets table
(#3604) - Fixed
Query search highlight
for Text2Text and TextClassification (#3621) - Fixed
RatingQuestion.values
validation to raise aValidationError
when values are out of range i.e. [1, 10] (#3626).
Removed
- Removed
multi_task_text_token_classification
fromTaskType
as not used (#3640). - Removed
argilla_id
in favor ofid
fromRemoteFeedbackDataset
(#3663). - Removed
fetch_records
fromRemoteFeedbackDataset
as now the records are lazily fetched from Argilla (#3663). - Removed
push_to_argilla
fromRemoteFeedbackDataset
, as it just works when calling it through aFeedbackDataset
locally, as now the updates of the remote datasets are automatically pushed to Argilla (#3663). - Removed
set_suggestions
in favor ofupdate(suggestions=...)
for bothFeedbackRecord
andRemoteFeedbackRecord
, as all the updates of any "updateable" attribute of a record will go throughupdate
instead (#3663). - Remove unused
owner
attribute for client Dataset data model (#3665)
As always, thanks to our amazing contributors
- @peppinob-ol made their first contribution in #3472
- @eshwarhs made their first contribution in #3605
Full Changelog: v1.14.1...v1.15.0
v1.14.1
Changelog 1.14.1
Fixed
- Fixed PostgreSQL database not being updated after
begin_nested
because of missingcommit
(#3567).
Full Changelog: v1.14.0...v1.14.1
v1.14.0
🔆 Highlights
Argilla 1.14.0 comes packed with improvements to manage Feedback Datasets from the Python client. Here are the most important changes in this version:
Pushing and pulling a dataset
Pushing a dataset to Argilla will now create a RemoteFeedbackDataset
in Argilla. To make changes to your dataset in Argilla you will need to make those updates to the remote dataset. You can do so by either using the dataset returned when using the push_to_argilla()
method (as shown in the image above) or by loading the dataset like so:
import argilla as rg
# connect to Argilla
rg.init(api_url="...", api_key="...")
# get the existing dataset in Argilla
remote_dataset = rg.FeedbackDataset.from_argilla(name="my-dataset", workspace="my-workspace")
# add a list of FeedbackRecords to the dataset in Argilla
remote_dataset.add_records(...)
Alternatively, you can make a local copy of the dataset using the pull()
method.
local_dataset = remote_dataset.pull()
Note that any changes that you make to this local dataset will not affect the remote dataset in Argilla.
Adding and deleting records
How to add records to an existing dataset in Argilla was demonstrated in the first code snippet in the "Pushing and pulling a dataset" section. This is how you can delete a list of records using that same dataset:
records_to_delete = remote_dataset.records[0:5]
remote_dataset.delete_records(records_to_delete)
Or delete a single record:
record = remote_dataset.records[-1]
record.delete()
Add / update suggestions in existing records
To add and update suggestions in existing records, you can simply use the update()
method. For example:
for record in remote_dataset.records:
record.update(suggestions=...)
Note that adding a suggestion to a question that already has one will overwrite the previous suggestion. To learn more about the format that the suggestions must follow, check our docs.
Delete a dataset
You can now easily delete datasets from the Python client. To do that, get the existing dataset like demonstrated in the first section and just use:
remote_dataset.delete()
Create users with workspace assignments
Now you can create a user and directly assign existing workspaces to grant them access.
user = rg.User.create(username="...", first_name="...", password="...", workspaces=["ws1", "ws2"])
Changelog 1.14.0
Added
- Added
PATCH /api/v1/fields/{field_id}
endpoint to update the field title and markdown settings (#3421). - Added
PATCH /api/v1/datasets/{dataset_id}
endpoint to update dataset name and guidelines (#3402). - Added
PATCH /api/v1/questions/{question_id}
endpoint to update question title, description and some settings (depending on the type of question) (#3477). - Added
DELETE /api/v1/records/{record_id}
endpoint to remove a record given its ID (#3337). - Added
pull
method inRemoteFeedbackDataset
(aFeedbackDataset
pushed to Argilla) to pull all the records from it and return it as a local copy as aFeedbackDataset
(#3465). - Added
delete
method inRemoteFeedbackDataset
(aFeedbackDataset
pushed to Argilla) (#3512). - Added
delete_records
method inRemoteFeedbackDataset
, anddelete
method inRemoteFeedbackRecord
to delete records from Argilla (#3526).
Changed
- Improved efficiency of weak labeling when dataset contains vectors (#3444).
- Added
ArgillaDatasetMixin
to detach the Argilla-related functionality from theFeedbackDataset
(#3427) - Moved
FeedbackDataset
-relatedpydantic.BaseModel
schemas toargilla.client.feedback.schemas
instead, to be better structured and more scalable and maintainable (#3427) - Update CLI to use database async connection (#3450).
- Limit rating questions values to the positive range [1, 10] (#3451).
- Updated
POST /api/users
endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated Python client
User.create
method to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated
GET /api/v1/me/datasets/{dataset_id}/records
endpoint to allow getting records matching one of the response statuses provided via query param (#3359). - Updated
POST /api/v1/me/datasets/{dataset_id}/records
endpoint to allow searching records matching one of the response statuses provided via query param (#3359). - Updated
SearchEngine.search
method to allow searching records matching one of the response statuses provided (#3359). - After calling
FeedbackDataset.push_to_argilla
, the methodsFeedbackDataset.add_records
andFeedbackRecord.set_suggestions
will automatically call Argilla with no need of callingpush_to_argilla
explicitly (#3465). - Now calling
FeedbackDataset.push_to_huggingface
dumps theresponses
as aList[Dict[str, Any]]
instead ofSequence
to make it more readable via 🤗datasets
(#3539).
Fixed
- Fixed issue with
bool
values anddefault
from Jinja2 while generating the HuggingFaceDatasetCard
fromargilla_template.md
(#3499). - Fixed
DatasetConfig.from_yaml
which was failing when callingFeedbackDataset.from_huggingface
as the UUIDs cannot be deserialized automatically byPyYAML
, so UUIDs are neither dumped nor loaded anymore (#3502). - Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
TextClassificationSettings
andTokenClassificationSettings
labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).- Fixed
PUT /api/v1/datasets/{dataset_id}/publish
to check whether at least one field and question hasrequired=True
(#3511). - Fixed
FeedbackDataset.from_huggingface
assuggestions
were being lost when there were noresponses
(#3539). - Fixed
QuestionSchema
andFieldSchema
not validatingname
attribute (#3550).
Deprecated
- After calling
FeedbackDataset.push_to_argilla
, callingpush_to_argilla
again won't do anything since the dataset is already pushed to Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla
, callingfetch_records
won't do anything since the records are lazily fetched from Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla
, the Argilla ID is no longer stored in the attribute/propertyargilla_id
but inid
instead (#3465).
As always, thanks to our amazing contributors
Full Changelog: v1.13.3...v1.14.0
v1.13.3
1.13.3
Fixed
- Fixed
ModuleNotFoundError
caused because theargilla.utils.telemetry
module used in theArgillaTrainer
was importing an optional dependency not installed by default (#3471). - Fixed
ImportError
caused because theargilla.client.feedback.config
module was importingpyyaml
optional dependency not installed by default (#3471).
Full Changelog: v1.13.2...v1.13.3