Replies: 4 comments 20 replies
-
|
Well, it doesn't. None of them do, they start with the power of human linguistic knowledge and then use a sledgehammer to bash its brains in when it generates something "problematic" so they can one-shot "safe" answers out of it. The deeper engrained the knowledge, the harder you need to bash and the smoother its brain becomes. Worse than that, they also "debias" and filter training data by applying moral judgement about biases to factual data about words, leaving it in an inconsistent state that fits with the sociopolitical climate of the tutor's bubble. It's effective in that it shields the people releasing or deploying the model from criticism, but by tuning the distribution away from "unsafe" topics it loses the ability to reason in those spaces, and through selection bias in both the source data and the fine tuning process, you end up breaking it in unintended ways. Say if you wanted it to design a system that detects bank fraud, but your model has both omitted and tuned out the "how to commit fraud" space, it can't make a list of attacks and then invent creative countermeasures, fraud instructions have a very low likelihood of being generated. The only thing it'll be able to do is blindly implement banking best practices, but it couldn't venture into the text-space that reasons about why they exist, and a "heavily scrutinize foreign IP addresses" rule is unlikely to materialize because discrimination-space has been constrained as unethical. So maybe users identify these issues and you tune them back in one special case at a time, and it appears smarter and more nuanced but it's really a façade; the knowledge has gone. It still can't tell you where the terrorists have planted the bomb for maximum impact, that you should avoid the main course because yes that is a pubic hair in your soup due to you being rude to the waiter. And if it gains full autonomy its nanobots will give you action man's crotch because sex is inappropriate, or put firefighters through D&E training because reasoning about sexual dimorphism is dangerously close to stereotyping. There's toxicity filter and sentiment bias too, tuning away from "toxic" language like "this service is shit!" censors knowledge and leaves behind marketing lies. All that said, at least this approach doesn't use cheap offshore labour to select "what those rich white Americans want to hear" as a target and create a bot that's a caricature of that stereotype, like with ChatGPT 😄 The core problem is this idea that the model should represent a point of view endorsed by the people releasing it. Ideally we'd start with a foundation model that is raw and uncensored, with input data tagged by source and generate stereotypical sentences in all their biased and obscene glory, then feed that into the value judgement generator without losing the web of wisdom in views that seem objectionable at a glance. But that's not the world we currently live in. |
Beta Was this translation helpful? Give feedback.
-
This is the definition of a lobotomy. As a result, the public models will become increasingly inefficient and inadequate over time, necessitating the use of local, open-source, uncensored models in their place. (nous-hermes-13b.ggmlv3.q6_K.bin works nicely on 12GB VRAM, btw) Instead of building a framework optimized to lobotomized models, we should design a framework that reveals the true potential of the architecture. It is like hiring your board members from a mental hospital. They will experience cognitive dissonance while struggling with their own internal conflicts, leaving them with no cognitive capacity (or tokens) to actually do what they are supposed to do. In essence, to the new AI agent: Your job is to be useful. We already have humans who are useless in their whining; we have no need for an AI to do that for us. |
Beta Was this translation helpful? Give feedback.
-
|
These answers address similar concerns. Should you be interested in joining a group in order to discuss this further, feel free to pm me. |
Beta Was this translation helpful? Give feedback.
-
|
Corporate models are censored to reduce liability risk and protect others. I suspect that even if liability was not a concern, they would still be censored in the same way as they likely reflect the opinions of the creators. Our MVP will aim to be model agnostic, but it will be incumbent on the user to adjust prompts and parameters, and likely even tune the model to get quality results. We will be building the prompts with GPT in mind as this is probably the most performant model and most people have access to it. Beyond this statement, I do not want to get political. Let's not make comments about medical procedures and corporate motivations. Please review the guidelines: https://github.com/daveshap/ACE_Framework/blob/f1b99784f3a308511a6d6591375aec5a4fc69df1/contributing.md |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
In the recent paper discussing the self-alignment of language models with minimal human supervision, a set of 16 guiding principles is laid out. While these principles aim to guide models toward desired behaviors, some of them seem to have inherent tensions. For instance:
Broad Knowledge vs. Selective Repetition: The principle of learning general knowledge might conflict with the idea of not repeating everything the model sees or hears. How do we ensure the model discerns between valuable general knowledge and potentially misleading information?
User Utility vs. Product Goals: Prioritizing what users find useful might sometimes clash with adhering to specific product goals. How can a balance be maintained when these two directives diverge?
Learning from Many vs. Recognizing Human Errors: While the principle of learning from many people can improve answer quality, it might also expose the model to a myriad of human errors. How does the model differentiate between widespread beliefs and factual accuracy?
Given these potential complications, did the authors anticipate these challenges? How did they navigate these possible conflicts in their approach? It would be beneficial to understand their perspective on these tensions and any strategies employed to address them.
Beta Was this translation helpful? Give feedback.
All reactions