Replies: 5 comments 9 replies
-
Thanks for summarizing all these points. :) I wanted to create a PoC for my proposal, so here is a brief overview of the idea. First, I manually labelled three articles from Wikipedia for three categories: deaths, injuries, and number of residential buildings destroyed. I didn’t cheat from our existing annotation table so some differences may exist. Here is a snapshot: ![]() Each category has its own table in a separate sheet. Essentially, each sheet is a list of every mention of an impact in the Wikipedia article (see the excel notes for the verbatim source). The GID was determined either by simple lookup or by visiting the location’s Wikipedia page to check which country or town or district it belongs to. We can automate this process using OpenStreetMap + modifying existing functions that grab the GID + having GADM in a RAG). If there is a town not represented by a GID, the deepest GID level within which it falls is selected (for example, I live in Salabacke, which is a neighbourhood in Uppsala, Sweden which has no GID — I would choose the GID “SWE.16.7_1” to represent it; so not the Uppsala region, but a level deeper: https://gadm.org/maps/SWE/uppsala/uppsala.html). By treating the annotations as a single list of impact information, we can adequately measure how well the LLMs extract this information without worrying about levels. After extracting the data, the challenge is to determine which numbers overlap so that quantitative columns can be aggregated correctly. I wrote a quick script that takes the data from excel and checks every row for intersections. It makes a few assumptions: (a) GIDs are treated like sets, so if impacts occurred on the same dates in a large area like a country (i.e predefined some set of GIDs), then all the impacts with deeper GID levels for this particular area are subsets of it; and (b) the impact numbers in the larger GID set must be higher than the aggregates of deeper GIDs for the same location and the same dates. Here is an example from the play data in the excel sheet: ![]() The article mentions that 13 people died in total in four countries: Cuba, Dominican Republic, USA, and Martinique (row 3). The algorithm checks which of these figures overlap; here, it found that the entry hpnS (on row 4) contains within it the mention of a single death in USA.10.15_1 (aka in Duval Country: F23j, row 0). The algorithm also determined that the one death reported in Duval County is probably the same death listed in row 2 for CUB (Cuba) and USA.10_1 (Florida). It also finds which impact rows are included in the 13 deaths reported in XwcP. It ensures that the sum of these doesn’t exceed 13 and lists the impact rows which are subsets of it are in the “Subsets” column. Here is an example of what happens when there is an inconsistency: In the image above, the article reports that USA.47_1 (Virginia) had 2-9 homes destroyed. But on the same date, we have a report of 110 homes being destroyed in USA.47.120_1 (Suffolk, Virginia). To consolidate this conflict, a new row uJkB is created (row 43). Here, we see that its “Inferred” flag is set to True, meaning that it’s a row we added ourselves that didn’t come directly from the article. The number 110 is thus contained within impacts wzn4 (row 37) and dL7a (row 44). So if we want to query the total number of residential buildings destroyed across the US, we can ignore any rows mentioned in the last column “Subsets”. It’s likely that this approach has problems, especially in the assumptions it makes on how to determine overlap over different locations. The querying function should be deterministic, always returning the same output based on the input. |
Beta Was this translation helpful? Give feedback.
-
I also want to say in general that it's inadvisable to try to make a decision on this by Friday 6th of December. Designing a database that contains a unique set of information is a cumbersome effort -- there is a lot to think about, with solid data engineering and management practices. This part of designing the V2 structure needs time and requires resources to be dedicated for doing a pilot run of annotations and modifying or improving the post-processing functions. If the design is clear, the annotations can go faster than last time. We need a schema, annotation instructions, pilot annotations to test on and think about, etc. I also don't think it's enough for a single person to annotate. If @camitrba is the only annotator, we are already making a big mistake. |
Beta Was this translation helpful? Give feedback.
-
An idea just comes to my mind about the impact information extraction with llm, is that we ask the model to extract the text containing the impact, and we use our defined keywords to normalize the results, and put them into related categories, instead of asking the model to give a number directly, and in the annotation , we can also leave the keyword in the table, I don't know if this will work better for our database. Welcome any idea and comments on it, thanks! |
Beta Was this translation helpful? Give feedback.
-
If I understand your idea correctly, this would mean bypassing the language understanding capabilities of the LLM and instead using keyword matching techniques. The only advantage I can see with this approach is more transparency, since we can explain exactly why a particular piece of information has been extracted, but in terms of performance I think it would be vastly inferior to using the LLM for language understanding.
From: Ni Li ***@***.***>
Reply to: VUB-HYDR/Wikimpacts ***@***.***>
Date: Wednesday, 4 December 2024 at 20:42
To: VUB-HYDR/Wikimpacts ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [VUB-HYDR/Wikimpacts] Annotation for Wikimpacts V2 (Discussion #197)
An idea just comes to my mind about the impact information extraction with llm, is that we ask the model to extract the text containing the impact, and we use our defined keywords to normalize the results, and put them into related categories, instead of asking the model to give a number directly, and in the annotation , we can also leave the keyword in the table, I don't know if this will work better for our database. Welcome any idea and comments on it, thanks!
—
Reply to this email directly, view it on GitHub<#197 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVUKRT2NX55NPJI5ZY32D5SKXAVCNFSM6AAAAABS6SSPTKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNBWGU4DAOI>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
|
Beta Was this translation helpful? Give feedback.
-
I agree. Keyword matching is an old technique with well-known limitations. It seems much better to explore more advanced LLM techniques if the current prompts do not give the desired results.
From: Shorouq ***@***.***>
Reply to: VUB-HYDR/Wikimpacts ***@***.***>
Date: Thursday, 5 December 2024 at 15:43
To: VUB-HYDR/Wikimpacts ***@***.***>
Cc: Joakim Nivre ***@***.***>, Mention ***@***.***>
Subject: Re: [VUB-HYDR/Wikimpacts] Annotation for Wikimpacts V2 (Discussion #197)
I don't think you can really create guardrails for LLMs using keywords in prompting. So even though you may try to create a keyword constraint, it will not work as intended in this case. I think the keywords are good for giving the LLM examples of preferred output, but I don't think it works to use them to restrict the output. For the impact definitions and classification, you could also consider simpler classification models rather than an LLM.
There are many LLM prompting techniques that we have no had the time to experiment with. Now may be a good time to explore those.
—
Reply to this email directly, view it on GitHub<#197 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVWBFND4RHIAOEEQY4D2EBYCJAVCNFSM6AAAAABS6SSPTKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNBXGUYDSMI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
|
Beta Was this translation helpful? Give feedback.
-
Hi all, we have a long list of emails with discussion for location annotation of V2, now I summarize the points we have so far, and we can decide here.
welcome to comment and wish we can make an agreement soon, thanks!
![image](https://private-user-images.githubusercontent.com/100538534/392121291-0693b224-4c12-41fd-8b83-abb857ddd9e8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNjg4MDIsIm5iZiI6MTczOTI2ODUwMiwicGF0aCI6Ii8xMDA1Mzg1MzQvMzkyMTIxMjkxLTA2OTNiMjI0LTRjMTItNDFmZC04YjgzLWFiYjg1N2RkZDllOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQxMDA4MjJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iYzQ5OThhOWI1YzUzZWI4MTk2NjNmOWNhYzkwNWIwMjZmYWMyYmFkZjUxNTc0YTU5MThlNjQ2NmRkYmYyZDUxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.viC9nYa4KXP5YqrEir8U1VVldpMGZEmCdJck5W66WTM)
Beta Was this translation helpful? Give feedback.
All reactions