Reevaluate the option to use Mariona's solution to place name disambiguation #26

alexhebing · 2019-05-27T10:32:54Z

There appears to be some room in the current project, in terms of time, to add a component that does something interesting when collecting GIS coordinates for NER locations (i.e. more interesting than consuming GeoNames or OpenStreetMap). I estimate this is about 50 / 60 hours.

Assess whether it is feasible to create a script as part of the pipeline that implements (parts of) the solution proposed by Mariona Coll Ardanuy in this article in the available time.

More broadly, assess the available methods in the field and pick one that is doable in the available time.

jgonggrijp · 2019-05-28T10:08:52Z

"Feasible"? Do you perhaps mean to assess how much time it would take? Is there an a priori time limit for a go/no go decision?

alexhebing · 2019-05-28T11:30:14Z

Thank you @jgonggrijp for your ever critical eye. I updated the description with more context to explain better what I have in mind. A quick first look at the article (that I have seen before obviously, but I have a much better understanding of the broader context now) already shows that a complete implementation is out of the question, but perhaps parts of the solution are usable.

alexhebing · 2019-07-03T06:02:14Z

Mariona's solution has two main components: 1) a knowlegde base extracted from Wikipedia (and a small part from GeoNames); 2) A series of (two) scripts to suggest geocoordinates on the basis of a given location with 100 context words (50 left, 50 right). The scripts that we have received from Jaap/Mariona only cover step 2, whereas the knowledge base is included as sql dumps. After discussion with Berit, I see two options that should be implementable in the time estimated above:

Use the knowledge base as is (4 dbs with wiki data from 2014), do a minimal clean up / refactoring of the scripts in step 2 (e.g. leave the db queries and the scoring algorithms as they are). Instead of having to run them one by one manually, make them do their work from one call. This might require not storing the candidates selected by the first script to files, but store them in memory. Create a webservice to wrap the scripts, so that the placenamedisambiguation pipeline can call it.
Create a new (possibly dynamic) knowledge base, on the basis of how Mariona did it. Perhaps change the data structure to make it more convenient for an application trying to consume this data. Base it either on a new dump of Wikipedia, or call the Wikipedia API to retrieve relevant pages. Do 'something interesting' in terms of semantic comparison between the query term (with context) and the Wikipedia content. This 'something interesting' should be minimal, I think there won't be enough time to implement anew the algorithms from 1).

Closing this, continued in #30 .

alexhebing added the question Further information is requested label May 27, 2019

alexhebing added this to the Command line v2 / evaluation milestone May 27, 2019

alexhebing mentioned this issue Jul 3, 2019

Get Mariona's algorithms up and running #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reevaluate the option to use Mariona's solution to place name disambiguation #26

Reevaluate the option to use Mariona's solution to place name disambiguation #26

alexhebing commented May 27, 2019 •

edited

Loading

jgonggrijp commented May 28, 2019

alexhebing commented May 28, 2019

alexhebing commented Jul 3, 2019

Reevaluate the option to use Mariona's solution to place name disambiguation #26

Reevaluate the option to use Mariona's solution to place name disambiguation #26

Comments

alexhebing commented May 27, 2019 • edited Loading

jgonggrijp commented May 28, 2019

alexhebing commented May 28, 2019

alexhebing commented Jul 3, 2019

alexhebing commented May 27, 2019 •

edited

Loading