Replies: 1 comment 2 replies
-
This is on our radar and we have some upcoming solutions :) |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It seems to me as if the AI Scientist suffer from the same problem as other research oriented tools which are based on LLM inferencing and draws their research backround information from Semantic Scholar. While this judgement could very well be field specific, it still seems to pose quite a large problem for the turstworthiness of academic research and its disemniation. The issue mainly seems to pertain to how the model reads and understands the input text and which aspects of it should be attributed to what author.
If we look at the generated paper "DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models" and the very first citation made in the introduction:
This is, even with the citation, borderline plagiarism of the first sentence of the abstract in the source:
That isn't, however, the main issue, which instead is that Yang et al. is cited as the primary source of the statement, which I would guess is related to how the model approached the text and wether the model had access to the entire text or just the abstract. Becuase, if we read the very first sentence of Introduction in the cited source, it is clear that Yang et al. is but a secondary source on the asserted statement:
Essentially, what the AI Scientist is doing here is ascribing the work of numerous other researchers to someone who has condensed their work into a statement in an abstract. This, in some instances, could be considered research misconduct, depending on national regulations. It is, however, independet of national regulations, bad research practice.
So, while the AI Scientist is an interesting project and indeed show potential, it still seems, at least to me, to be facing the same problem as other initiatives. Perhaps a way to leaviate this could be to, if possible, hook it up to other APIs, like the SCOPUS, Taylor & Francis, etc. and do full text retrievals, embed the fulltexts in some database and query the entire source papers for information when writing the paper? I guess that each in-text citation in the text could be connected to some metadata containing the correct - primary - source to facilitate the paper writing for the model.
Beta Was this translation helpful? Give feedback.
All reactions