Proposal of improvements both in the guidelines of the dataset collection and in the tasks for dataset collection #3364

gabrielmfern · 2023-05-17T15:16:10Z

gabrielmfern
May 17, 2023

I have been contributing to the dataset for some time now and I have noticed a few things that lead to the following problems:

A model that hallucinates too much when changing topics or providing prompts that are random
Good messages being low quality or downvoted due to misunderstandings among the people that are classifying it or due to a lack of editing capabilities
People not using markdown properly which has been somewhat fixed with the new message editor
Some people copying directly from ChatGPT even though this is against the guidelines

I hereby propose the following solutions to the problems I have just bringed to your attention:

Add editing capabilities into ALL messages where you are the author
Add editing capabilities based on voting for people whom are not the author
Add a feature to add comments into messages where you are the author so that everyone understands what is the purpose of something when it is unclear (this does happen)
Make some model that detects messages created by ChatGPT, and because of these models having poor accuracy, just warn people when either pasting some dubious text or writing it saying that it will lead to sanctions if they are copying directly from ChatGPT
- I know that OpenAI has a similar model that tries to do this but with very low accuracy, here is a demo.
Make a model that detects when some written text is not using the proper markdown and also warn the user or (if the accuracy of the model is very high) does not allow them to post at all.
- I suppose this happens to very simple to get training data for, since we just need to get access to some markdown text and copy its rendered result (can be done using some javascript just requires markdown text).
Change the guidelines as to allow for either the changing of topics (at least make it obvious that it's allowed) or asking unrelated questions

gabrielmfern · 2023-05-17T16:10:41Z

gabrielmfern
May 17, 2023
Author

I would like to comment and make it very clear that all of these problems I have mentioned are very important to be dealt with for better dataset collection and to have a more high quality dataset.
The truth being is that the quality of the OA dataset is even more important than using RLHF or STF since if we had only 100% high quality messages we would not even need RLHF as mentioned by @Subarasheese.

All of these things I have mentioned then, are high priority and should not be ignored, unless the rest of the 37k message trees of the dataset have the same problems as the current 13k.

0 replies

olliestanley · 2023-05-18T17:38:29Z

olliestanley
May 18, 2023
Collaborator

Add editing capabilities into ALL messages where you are the author

Add editing capabilities based on voting for people whom are not the author

Editing has several challenges. Firstly it can invalidate already collected labels for the message - it's not clear how best to handle this. Secondly if we use a voting approach it can create a lot more tasks to produce data and if there are several proposed edits due to disagreements could be an issue.

We need to find a limited editing solution, but what we really need is people willing to work on this stuff.

Add a feature to add comments into messages where you are the author so that everyone understands what is the purpose of something when it is unclear (this does happen)

I'm not against this but again would need people to work on the backend and frontend for this.

Make some model that detects messages created by ChatGPT, and because of these models having poor accuracy, just warn people when either pasting some dubious text or writing it saying that it will lead to sanctions if they are copying directly from ChatGPT

It's not possible to detect ChatGPT outputs reliably, but we are implementing message search for moderators so that at least obvious copy-pastes with keyphrases (e.g. "As a language model") can be purged soon.

Make a model that detects when some written text is not using the proper markdown and also warn the user or (if the accuracy of the model is very high) does not allow them to post at all.

This would be interesting but again we would need someone to do the work to implement this.

Change the guidelines as to allow for either the changing of topics (at least make it obvious that it's allowed) or asking unrelated questions

Feel free to submit a PR to the documentation with clarifications on this if you feel it would be useful.

0 replies

gabrielmfern · 2023-05-18T21:54:54Z

gabrielmfern
May 18, 2023
Author

Thank you for the time to review my proposals and write back @olliestanley!

It's not possible to detect ChatGPT outputs reliably, but we are implementing message search for moderators so that at least
obvious copy-pastes with keyphrases (e.g. "As a language model") can be purged soon.

I think message search is a great feature that will help but it won't fix the problem enough though. I meant that the user would be warned if a detection model thinks the message is copied from ChatGPT because, as you mentioned, they aren't really reliable.

A small thing that could help (without such a model) could just be having the warning upon pasting text into the textarea.

Editing has several challenges. Firstly it can invalidate already collected labels for the message - it's not clear how best to handle this. Secondly if we use a voting approach it can create a lot more tasks to produce data and if there are several proposed edits due to disagreements could be an issue

We need to find a limited editing solution, but what we really need is people willing to work on this stuff.

Yeah It does cause problems with labelling. I think we can get around this though by perhaps not allowing editing if the message has been labeled once already, or allowing it to be edited only if the labels are at a minimum, then we clear them.

About your point on adding more tasks to the dashboard, I think that the editing of messages is very important and should really be a task as well. This is because it is crucial for the quality, if we make do not make it a task it would probably be ignored and would not end up being so used.

About the disagreements and the issues that could come from it, I think we can resolve disagreements with moderator help so that in these situations mods can resolve a conflict choosing if an edit should be accepted or not. Another thing is that I think it would be nice for this to have a kind of versioning system to compare the messages and merge different edits together. This is really not such a hard thing.

Summarizing I think this editing system would consist of the following features:

Messages are only editable without clearing its labels if the amount of changes in an edit is under a certain threshold
If an author wishes to edit his own message, he will have to follow the same steps as everyone after it has been labelled at least once, if not, he can edit it
The edits should have a "review" process (similar to github) but depending on the "risk" (could be defined based on the average quality and upvotes of the current state of the message) would need stricter requirements
Edits should be editable, and once edited, the already given reviews should be cleared to be reviewed again
For minor edits we can make the necessary requirements for approval less strict
There should be a new task on the Dashboard called Review message edits or something like that
These reviews can have commentary just like GitHub, so that if there is a problem it can be fixed to be reviewed again.
Mods should have some kind of way to be notified when the reviews of a certain edit are split so that he can give a consensus by either approving a modified version of the edit, or by approving/disapproving the edit.

About the markdown quality validating model I mentioned I think I can generate a dataset following the approach I mentioned. This is something I am going to take a look at as well. Although later someone will need to help me with actually defining the architecture and training of the model since I am not an expert.

I will certainly work at least on the editing feature and start a PR for this soon so all things I mentioned can be discussed further.

0 replies

andreaskoepf · 2023-05-24T15:43:03Z

andreaskoepf
May 24, 2023
Maintainer

@gabrielmfern Thanks for your input .. all valid and valuable points. But unless you are a developer yourself and take on this issue I suggest to reduce requirements at least to 10% .. i.e. simplify as much as possible, identify 3 SIMPLE points to implement .. and make all others nice-to-have. Also remember: This is not a product, nor do we have a professional dev team. If you want to something to be changed, e.g. in the documentation or text files, please consider submitting a PR yourself.. chances that others will do it are otherwise extremely low (just look at 100 other issues here that contain 'good ideas').

0 replies

Earlef · 2023-05-24T15:52:06Z

Earlef
May 24, 2023

Well I hope to shout!!

On Wed, May 24, 2023 at 8:43 AM Andreas Köpf ***@***.***> wrote: @gabrielmfern <https://github.com/gabrielmfern> Thanks for your input .. all valid and valuable points. But unless you are a developer yourself and take on this issue I suggest to reduce requirements at least to 10% .. i.e. simplify as much as possible, identify 3 SIMPLE points to implement .. and make all others nice-to-have. Also remember: This is not a product, nor do we have a professional dev team. If you want to something to be changed, e.g. in the documentation or text files, please consider submitting a PR yourself.. chances that others will do it are otherwise extremely low (just look at 100 other issues here that contain 'good ideas'). — Reply to this email directly, view it on GitHub <#3185 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A2DWCFTFNWABIJ3W3BT7NV3XHYUCZANCNFSM6AAAAAAYFH6D4U> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Earle F.

0 replies

gabrielmfern · 2023-05-24T16:14:21Z

gabrielmfern
May 24, 2023
Author

@gabrielmfern Thanks for your input .. all valid and valuable points. But unless you are a developer yourself and take on this issue I suggest to reduce requirements at least to 10% .. i.e. simplify as much as possible, identify 3 SIMPLE points to implement .. and make all others nice-to-have. Also remember: This is not a product, nor do we have a professional dev team. If you want to something to be changed, e.g. in the documentation or text files, please consider submitting a PR yourself.. chances that others will do it are otherwise extremely low (just look at 100 other issues here that contain 'good ideas').

I have started working on a pull request that will have a basic editing system. Thank you for the answer and for the feedback.

0 replies

andreaskoepf · 2023-05-25T07:38:43Z

andreaskoepf
May 25, 2023
Maintainer

I have started working on a pull request that will have a basic editing system.

That's awesome, thank you very much! If you like you can also ping Ollie or me in OA discord for direct support in the contributor section.

@Earlef if you want to help fixing any of the points that @gabrielmfern mentioned, please let us know.

0 replies

horribleCodes · 2023-05-25T08:40:24Z

horribleCodes
May 25, 2023

The problems you mentioned were considered when writing the guidelines and thinking about how to ensure a high quality dataset, though there isn't an easy solution.

Models to analyze content would be nice, but they are too unreliable for full automation and have been seen as too much work as an assistant leaving soft warnings. We could use a linter to check for markdown syntax, but those can only detect whether markdown was used incorrectly, not whether it should have been used in the first place.

Editing was a hot topic as well, since it was clear that there were a lot of messages that had poor spelling or formatting but great content, and vice versa. One thing that has to work would be making sure the labels fit after the edit. If a message is being edited instead of discarded, chances are it already was long or difficult to research in the first place, so prompting random users to put in the work to confirm it would waste a lot of time. My suggestion was to remember which users reviewed a message and resend it for review in an additional queue, where they are also asked whether the edit is an improvement - if enough people agree, the tree with the changes is added and the old one gets archived.

Your point about changing the topic is probably correct - I wanted to avoid a scenario in which the AI would change topics on its own, or get facts mixed up, but users do actually bring up completely different topics and expect the assistant to adjust.
Personally, I start a new conversation as much as possible because it usually gives you the best answers.

Lastly, could you link your PR in this issue? Thanks.

0 replies

gabrielmfern · 2023-06-03T12:58:35Z

gabrielmfern
Jun 3, 2023
Author

Created a PR with a basic UI implementation for the editing system in the front-end for now. #3289

0 replies

andrewm4894 · 2023-06-03T13:08:08Z

andrewm4894
Jun 3, 2023
Collaborator

happy to help anyone here if needs and related docs changes etc. still need to read this all more to catch up but just wanted to mention.

0 replies

andrewm4894 · 2023-06-03T13:13:33Z

andrewm4894
Jun 3, 2023
Collaborator

Good messages being low quality or downvoted due to misunderstandings among the people that are classifying it or due to a lack of editing capabilities

On this one - i wonder if being able to add annotations to votes might help - maybe a bit late for this but agree that it cant be hard to exactly "put yourself in the shoes" of person who thumbed up or down and can be hard to understand why.

Usually i just assume will all get washed out on average and be fine - but maybe being able to add (and for other's to see) the annotations or a comment attached to each thumb up/down if it exists could be a novel and useful thing.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal of improvements both in the guidelines of the dataset collection and in the tasks for dataset collection #3364

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 11 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Proposal of improvements both in the guidelines of the dataset collection and in the tasks for dataset collection #3364

gabrielmfern May 17, 2023

Replies: 11 comments

gabrielmfern May 17, 2023 Author

olliestanley May 18, 2023 Collaborator

gabrielmfern May 18, 2023 Author

andreaskoepf May 24, 2023 Maintainer

Earlef May 24, 2023

gabrielmfern May 24, 2023 Author

andreaskoepf May 25, 2023 Maintainer

horribleCodes May 25, 2023

gabrielmfern Jun 3, 2023 Author

andrewm4894 Jun 3, 2023 Collaborator

andrewm4894 Jun 3, 2023 Collaborator

gabrielmfern
May 17, 2023

gabrielmfern
May 17, 2023
Author

olliestanley
May 18, 2023
Collaborator

gabrielmfern
May 18, 2023
Author

andreaskoepf
May 24, 2023
Maintainer

Earlef
May 24, 2023

gabrielmfern
May 24, 2023
Author

andreaskoepf
May 25, 2023
Maintainer

horribleCodes
May 25, 2023

gabrielmfern
Jun 3, 2023
Author

andrewm4894
Jun 3, 2023
Collaborator

andrewm4894
Jun 3, 2023
Collaborator