Last Friday Roxane and Marieke visited the 22 Annual Meeting of Computational Linguistics in the Netherlands (CLIN) in Tilburg to find out about the latest work. Here’s an account of the talks that we thought most relevant to Agora.
Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction ~ Alexander Panchenko
In this talk, the differences in evaluation measures for semantic relations are investigated. There are several measures around and each provides a different type of semantic information. Panchenko tried to combine several measures to provide a more rounded evaluation.
Joint Learning of dependency parsing and semantic role labelling ~ Antal van den Bosch, Roser Morante and Sander Canisius
The problem with a lot of NLP pipeline is that modules work sequentially, and errors from the one module permeate through to the others. In this talk two task that are generally performed sequentially, namely dependency parsing and semantic role labelling are combined in order to prevent the errors cascading. Their findings indicate that dependency parsing works better separately, whereas semantic role labelling actually benefits from combining it. I think this is an interesting way to think about problems, and perhaps it would also be interesting for the Agora setting where we start dealing with roles.
Defining modality categories for NLP ~ Roser Morante
In this talk ongoing work on an annotation scheme for modality was presented based on older theories and recent models. For now this is not really on the Agora radar, but it will become more important as we drill down to the finer grained information about events, to find instances in texts of events that may have happened but didn’t, such as the assassination of Hitler or mythical events. Currently we cannot distinguish between things that people have only talked about and things that actually happened, but we should probably do something with this. Also, mythological ‘events’ did not actually take place and their actors are not real persons, but they are depicted in paintings, so we should not do away with them entirely, but figure out how to detect and represent these properly.
Evaluating DAISY Summarisation with Approximated Summaries ~ Mandy Schiffrin, Fabrice Nauze and Begoña Villada
Summarisation is a difficult task to evaluate, as there is not one correct possible summary. Here an approach was presented on a modified pyramid evaluation and rouge evaluation where automatically generated summaries were evaluated with queries. A similar situation arises in Agora where events descriptions can have similar configurations, sometimes one can refer to an event by mentioning its most important actors, and sometimes more fine grained, we are currently working on figuring out what to represent where, and such evaluation measures may come in handy there.
Mining Cultural Heritage Metadata ~ Kalliopi Zervanou, Ioannis Korkontzelos, Antal van den Bosch and Sophia Anadiadou
From our sister project, HiTime, work was presented about extracting information from object descriptions from archive documents. This is something that one of our master’s students is currently working on, as opening up the treasures from is something we hadn’t gotten round to yet. The work of Zervanou et al. focuses on an analysis of the languages in the archive they use, and they only extract named entities (person, location and organisation) and subjects and other, whereas we are trying to extract some more types of information, but the lessons they learnt from their work (see also Zervanou, K., Korkontzelos, I, Van den Bosch, A., and Ananiadou, S. (2011). Enrichment and structuring of archival description metadata. In K. Zervanou and P. Lendvai (Eds.), Proceedings of the Fifth ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH-2011), Portland, OR, pp. 44-53. [pdf]) will also come in handy, in particular the way they employed the domain experts’ knowledge to help evaluate their approach.
Final words: topic detection in suicide notes ~ Bart Desmet and Véronique Hoste
Desmet presented work on emotion (topic) detection and classification in a very special text corpus: suicide notes. A corpus of 900 suicide notes was annotated with 13 emotions such as anger, pity and peacefullness. For each emotion, a SVM was trained to detect the best features. Results show that half of the emotion types in the testcorpus was difficult to detect, due to data-sparseness. For the long run emotion detection in documents is interesting for Agora as it adds view and polarity pertaining to the events reported in a document.
Evaluating cross-domain Dutch semantic role annotation ~ Orphée De Clercq, Véronique Hoste and Paola Monachesi
In this talk, work on semantic role labeling (SRL) in the context of the SONAR corpus was presented. A manually verified subset was used to train a labeler that could perform SRL on other parts of the SONAR corpus. Training was performed for in-domain text and combinations of text coming from different domains. Genre specific training proves crucial for a optimal performance.
Event prediction through Social Network Analysis ~ Matje van de Camp
Matje van de Camp, from the sister project HITIME, presented ongoing research on the creation of a time-stamped social network based on biographies of Dutch people related to the socialist movement. Ultimately, the social network will be employed to predict events based on sudden bursts of activity in the network. Hitime and Agora already started a collaboration to share results and techniques.
Computational Linguistics in public safety and security ~
Wauter Bosma discussed three NLP related issues that play an important role in diverse projects the NFI (Dutch Forensic Institute). Specific areas of interest are 1) deviations in processes and events that may point to fraud, 2) network analysis (e.g. constructing criminal networks from various documents and reports) and 3) author identification and disambiguation.
DutchSemCor: from manual annotation to active learning ~
In this talk, DutchSemcor was presented, a semantically annotated corpus based on Cornetto, a lexico-semantic database. For the most frequent and polysemous part of the Dutch language, at least 25 example sentences are annotated manually for each sense. Sentences come from the SONAR corpus and additional webqueries using TextCorp. Next, different WSD systems are trained on these annotations: a KB-based WSD and TiMBL. The DutchSemCor will be available medio 2012.
Unfortunately the speaker presenting on detection and classification of event nominals could not make it, but we will contact her. We keep you posted!