Proposal of improvements both in the guidelines of the dataset collection and in the tasks for dataset collection #3364
Replies: 11 comments
-
I would like to comment and make it very clear that all of these problems I have mentioned are very important to be dealt with for better dataset collection and to have a more high quality dataset. All of these things I have mentioned then, are high priority and should not be ignored, unless the rest of the 37k message trees of the dataset have the same problems as the current 13k. |
Beta Was this translation helpful? Give feedback.
-
Editing has several challenges. Firstly it can invalidate already collected labels for the message - it's not clear how best to handle this. Secondly if we use a voting approach it can create a lot more tasks to produce data and if there are several proposed edits due to disagreements could be an issue. We need to find a limited editing solution, but what we really need is people willing to work on this stuff.
I'm not against this but again would need people to work on the backend and frontend for this.
It's not possible to detect ChatGPT outputs reliably, but we are implementing message search for moderators so that at least obvious copy-pastes with keyphrases (e.g. "As a language model") can be purged soon.
This would be interesting but again we would need someone to do the work to implement this.
Feel free to submit a PR to the documentation with clarifications on this if you feel it would be useful. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the time to review my proposals and write back @olliestanley!
I think message search is a great feature that will help but it won't fix the problem enough though. I meant that the user would be warned if a detection model thinks the message is copied from ChatGPT because, as you mentioned, they aren't really reliable. A small thing that could help (without such a model) could just be having the warning upon pasting text into the textarea.
Yeah It does cause problems with labelling. I think we can get around this though by perhaps not allowing editing if the message has been labeled once already, or allowing it to be edited only if the labels are at a minimum, then we clear them. About your point on adding more tasks to the dashboard, I think that the editing of messages is very important and should really be a task as well. This is because it is crucial for the quality, if we make do not make it a task it would probably be ignored and would not end up being so used. About the disagreements and the issues that could come from it, I think we can resolve disagreements with moderator help so that in these situations mods can resolve a conflict choosing if an edit should be accepted or not. Another thing is that I think it would be nice for this to have a kind of versioning system to compare the messages and merge different edits together. This is really not such a hard thing. Summarizing I think this editing system would consist of the following features:
About the markdown quality validating model I mentioned I think I can generate a dataset following the approach I mentioned. This is something I am going to take a look at as well. Although later someone will need to help me with actually defining the architecture and training of the model since I am not an expert. I will certainly work at least on the editing feature and start a PR for this soon so all things I mentioned can be discussed further. |
Beta Was this translation helpful? Give feedback.
-
@gabrielmfern Thanks for your input .. all valid and valuable points. But unless you are a developer yourself and take on this issue I suggest to reduce requirements at least to 10% .. i.e. simplify as much as possible, identify 3 SIMPLE points to implement .. and make all others nice-to-have. Also remember: This is not a product, nor do we have a professional dev team. If you want to something to be changed, e.g. in the documentation or text files, please consider submitting a PR yourself.. chances that others will do it are otherwise extremely low (just look at 100 other issues here that contain 'good ideas'). |
Beta Was this translation helpful? Give feedback.
-
Well I hope to shout!!
On Wed, May 24, 2023 at 8:43 AM Andreas Köpf ***@***.***> wrote:
@gabrielmfern <https://github.com/gabrielmfern> Thanks for your input ..
all valid and valuable points. But unless you are a developer yourself and
take on this issue I suggest to reduce requirements at least to 10% .. i.e.
simplify as much as possible, identify 3 SIMPLE points to implement .. and
make all others nice-to-have. Also remember: This is not a product, nor do
we have a professional dev team. If you want to something to be changed,
e.g. in the documentation or text files, please consider submitting a PR
yourself.. chances that others will do it are otherwise extremely low (just
look at 100 other issues here that contain 'good ideas').
—
Reply to this email directly, view it on GitHub
<#3185 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2DWCFTFNWABIJ3W3BT7NV3XHYUCZANCNFSM6AAAAAAYFH6D4U>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Earle F.
|
Beta Was this translation helpful? Give feedback.
-
I have started working on a pull request that will have a basic editing system. Thank you for the answer and for the feedback. |
Beta Was this translation helpful? Give feedback.
-
That's awesome, thank you very much! If you like you can also ping Ollie or me in OA discord for direct support in the contributor section. @Earlef if you want to help fixing any of the points that @gabrielmfern mentioned, please let us know. |
Beta Was this translation helpful? Give feedback.
-
The problems you mentioned were considered when writing the guidelines and thinking about how to ensure a high quality dataset, though there isn't an easy solution. Models to analyze content would be nice, but they are too unreliable for full automation and have been seen as too much work as an assistant leaving soft warnings. We could use a linter to check for markdown syntax, but those can only detect whether markdown was used incorrectly, not whether it should have been used in the first place. Editing was a hot topic as well, since it was clear that there were a lot of messages that had poor spelling or formatting but great content, and vice versa. One thing that has to work would be making sure the labels fit after the edit. If a message is being edited instead of discarded, chances are it already was long or difficult to research in the first place, so prompting random users to put in the work to confirm it would waste a lot of time. My suggestion was to remember which users reviewed a message and resend it for review in an additional queue, where they are also asked whether the edit is an improvement - if enough people agree, the tree with the changes is added and the old one gets archived. Your point about changing the topic is probably correct - I wanted to avoid a scenario in which the AI would change topics on its own, or get facts mixed up, but users do actually bring up completely different topics and expect the assistant to adjust. Lastly, could you link your PR in this issue? Thanks. |
Beta Was this translation helpful? Give feedback.
-
Created a PR with a basic UI implementation for the editing system in the front-end for now. #3289 |
Beta Was this translation helpful? Give feedback.
-
happy to help anyone here if needs and related docs changes etc. still need to read this all more to catch up but just wanted to mention. |
Beta Was this translation helpful? Give feedback.
-
On this one - i wonder if being able to add annotations to votes might help - maybe a bit late for this but agree that it cant be hard to exactly "put yourself in the shoes" of person who thumbed up or down and can be hard to understand why. Usually i just assume will all get washed out on average and be fine - but maybe being able to add (and for other's to see) the annotations or a comment attached to each thumb up/down if it exists could be a novel and useful thing. |
Beta Was this translation helpful? Give feedback.
-
I have been contributing to the dataset for some time now and I have noticed a few things that lead to the following problems:
I hereby propose the following solutions to the problems I have just bringed to your attention:
Beta Was this translation helpful? Give feedback.
All reactions