Trip report CLIN 2012, Tilburg

Last Friday Roxane and Marieke visited the 22 Annual Meeting of Computational Linguistics in the Netherlands (CLIN) in Tilburg to find out about the latest work. Here’s an account of the talks that we thought most relevant to Agora.

Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction ~ Alexander Panchenko
In this talk, the differences in evaluation measures for semantic relations are investigated. There are several measures around and each provides a different type of semantic information. Panchenko tried to combine several measures to provide a more rounded evaluation.

Joint Learning of dependency parsing and semantic role labelling ~ Antal van den Bosch, Roser Morante and Sander Canisius
The problem with a lot of NLP pipeline is that modules work sequentially, and errors from the one module permeate through to the others. In this talk two task that are generally performed sequentially, namely dependency parsing and semantic role labelling are combined in order to prevent the errors cascading. Their findings indicate that dependency parsing works better separately, whereas semantic role labelling actually benefits from combining it. I think this is an interesting way to think about problems, and perhaps it would also be interesting for the Agora setting where we start dealing with roles.

Defining modality categories for NLP ~ Roser Morante
In this talk ongoing work on an annotation scheme for modality was presented based on older theories and recent models. For now this is not really on the Agora radar, but it will become more important as we drill down to the finer grained information about events, to find instances in texts of events that may have happened but didn’t, such as the assassination of Hitler or mythical events. Currently we cannot distinguish between things that people have only talked about and things that actually happened, but we should probably do something with this. Also, mythological ‘events’ did not actually take place and their actors are not real persons, but they are depicted in paintings, so we should not do away with them entirely, but figure out how to detect and represent these properly.

Evaluating DAISY Summarisation with Approximated Summaries ~ Mandy Schiffrin, Fabrice Nauze and Begoña Villada
Summarisation is a difficult task to evaluate, as there is not one correct possible summary. Here an approach was presented on a modified pyramid evaluation and rouge evaluation where automatically generated summaries were evaluated with queries. A similar situation arises in Agora where events descriptions can have similar configurations, sometimes one can refer to an event by mentioning its most important actors, and sometimes more fine grained, we are currently working on figuring out what to represent where, and such evaluation measures may come in handy there.

Mining Cultural Heritage Metadata ~ Kalliopi Zervanou, Ioannis Korkontzelos, Antal van den Bosch and Sophia Anadiadou
From our sister project, HiTime, work was presented about extracting information from object descriptions from archive documents. This is something that one of our master’s students is currently working on, as opening up the treasures from is something we hadn’t gotten round to yet. The work of Zervanou et al. focuses on an analysis of the languages in the archive they use, and they only extract named entities (person, location and organisation) and subjects and other, whereas we are trying to extract some more types of information, but the lessons they learnt from their work (see also Zervanou, K., Korkontzelos, I, Van den Bosch, A., and Ananiadou, S. (2011). Enrichment and structuring of archival description metadata. In K. Zervanou and P. Lendvai (Eds.), Proceedings of the Fifth ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH-2011), Portland, OR, pp. 44-53. [pdf]) will also come in handy, in particular the way they employed the domain experts’ knowledge to help evaluate their approach.

Final words: topic detection in suicide notes ~ Bart Desmet and Véronique Hoste
Desmet presented work on emotion (topic) detection and classification in a very special text corpus: suicide notes. A corpus of 900 suicide notes was annotated with 13 emotions such as anger, pity and peacefullness. For each emotion, a SVM was trained to detect the best features. Results show that half of the emotion types in the testcorpus was difficult to detect, due to data-sparseness. For the long run emotion detection in documents is interesting for Agora as it adds view and polarity pertaining to the events reported in a document.

Evaluating cross-domain Dutch semantic role annotation ~ Orphée De Clercq, Véronique Hoste and Paola Monachesi
In this talk, work on semantic role labeling (SRL) in the context of the SONAR corpus was presented. A manually verified subset was used to train a labeler that could perform SRL on other parts of the SONAR corpus. Training was performed for in-domain text and combinations of text coming from different domains. Genre specific training proves crucial for a optimal performance.

Event prediction through Social Network Analysis ~ Matje van de Camp
Matje van de Camp, from the sister project HITIME, presented ongoing research on the creation of a time-stamped social network based on biographies of Dutch people related to the socialist movement. Ultimately, the social network will be employed to predict events based on sudden bursts of activity in the network. Hitime and Agora already started a collaboration to share results and techniques.

Computational Linguistics in public safety and security ~
Wauter Bosma discussed three NLP related issues that play an important role in diverse projects the NFI (Dutch Forensic Institute). Specific areas of interest are 1) deviations in processes and events that may point to fraud, 2) network analysis (e.g. constructing criminal networks from various documents and reports) and 3) author identification and disambiguation.

DutchSemCor: from manual annotation to active learning ~
In this talk, DutchSemcor was presented, a semantically annotated corpus based on Cornetto, a lexico-semantic database. For the most frequent and polysemous part of the Dutch language, at least 25 example sentences are annotated manually for each sense. Sentences come from the SONAR corpus and additional webqueries using TextCorp. Next, different WSD systems are trained on these annotations: a KB-based WSD and TiMBL. The DutchSemCor will be available medio 2012.

Unfortunately the speaker presenting on detection and classification of event nominals could not make it, but we will contact her. We keep you posted!

Posted in Events | Leave a comment

Agora at Mediamatic Ignite Amsterdam 13

On December 14, Marieke will present Agora at Mediamatic Ignite Amsterdam 13. Agora will be one of the 12 projects to be presented this evening. This will be an easy, low-tech way to get in touch with the Agora project, first during the 5-minute speedy talk, then over drinks.

Posted in Uncategorized | Leave a comment

Agora at DISH

Coming week the bi-annual DISH conference will take place again in Rotterdam. The Agora team will host a session on Wednesday 7 December 14:00 – 16:00 on linked open data and user participation for heritage institutions.

For our session, we have invited colleagues from the Amsterdam Museum, Netherlands Institute for Sound and Vision and the Rijksmuseum to talk about their experiences with opening up their (digital) collections to the public, whilst preserving the context and their collaborations with the public in making their collections better. The session is concluded by a panel.

The session programme:
Chair: Susan Legêne – VU University Amsterdam
14:00 – 14:10: Introduction by chair Susan Legêne (VU University Amsterdam)
14:10 – 14:30: Judith van Gent (Amsterdam Museum) and Victor de Boer (VU University Amsterdam) – Amsterdam Museum Linked Data
14:30 – 14:50: Lotte Belice Baltussen and Johan Oomen (Netherlands Institute for Sound and Vision) – Crowdsourcing
14:50 – 15:05: Geertje Jacobs (Rijksmuseum Amsterdam) – the Rijksmuseum API
15.05 – 15:20 Marieke van Erp and Lora Aroyo (VU University Amsterdam) – the Agora project
15:20 – 15:55: Panel, led by Susan Legêne (VU University Amsterdam) – will include Geertje Jacobs, Judith van Gent, Lora Aroyo and Johan Oomen.
15:55 – 16:00: Closing remarks

You can still register for DISH at http://www.dish2011.nl/tickets

Posted in Uncategorized | Leave a comment

DeRiVE Workshop Recap

Agora and Glocal do not only exchange their research, but also collaborated on the organisation of the first workshop about Detection, Representation, and Exploitation of Events (DeRiVE 2011) at ISWC 2011 in Bonn. The main goal of the workshop was to bring together researchers and developers from different disciplines that are interested in recognising, modelling and using events.

Each of the sessions had very interesting papers, followed by lively discussions (the discussions took a while to get going, as everyone is of course still figuring out the group etc, but after lunch everyone really got going). For Agora, it was nice to see that the cultural heritage domain was well represented, and the first paper of the day (An Event-Based Approach to Describing and Understanding Museum Narratives by Paul Mulholland, Annika Wolff, Trevor Collins and Zdenek Zdrahal) was also very closely related to our project as they are modelling narratives as well, but from a different starting point. With our digital hermeneutics work, we have been taking off from the event model and historical interpretation. They have been analysing museum exhibitions to figure out how curators go about creating narratives. As it happens, the curators’ narratives collide quite nicely with our notion of conceptual narratives, as many museum exhibitions are centred around a particular topic. We will stay in touch with the DECIPHER project to see how we can exchange ideas and reuse each other’s models.

The detection session provided very interesting insights in current ongoing work for extracting events from different types of media (text, photo collections and videos). In particular the distinction between different events from Crowdsourcing Event Detection in YouTube Videos by Thomas Steiner, Ruben Verborgh, and Michael Hausenblas that uses users’ click behaviour, shot change information as well as title descriptions etc. to mark up videos with interest events, visual events and occurrence events. It may be interesting to reuse some of these ideas to further slice the videos we have in our datasets to present users with the most scenes most relevant to their queries.

Both Glocal and Agora presented in the exploitation session, which was also the session that sparked most questions and discussion. It seems that there is really a demand for applications of event-driven systems, in particular ones that use good visualisations.

The exploitation theme continued with the DeRiVE challenge, organised by Willem Robert van Hage (VU) and Laura Hollink (TUDelft). Event-driven research has not yet reached the point where we could organise a benchmark challenge, but we thought it was nice to provide people with a data set anyway if they did not have one themselves to play with. The challenge assignment was also deliberately kept broad (“do something with the EventMedia dataset”, which was provided by co-organiser Raphaël Troncy) so that authors could surprise us with their creativity. The three papers presented in this session were very different; Pierre-Yves Vandenbussche and Charles Teissèdre (Events Retrieval Using Enhanced Semantic Web Knowledge) really focused on helping the user query the data better, Kristian Slabbekoorn, Laura Hollink and Geert-Jan Houben (Domain-aware Matching of Events to DBpedia) worked on creating high quality links between the EventMedia dataset and DBpedia, and Kia Teymourian, Malte Rohde, Ahmad Hassan-Haidar and Adrian Paschke (Fusion of Event Data Stream and Background Knowledge for Semantic-Enabled CEP) showed how you can recognise events in real time.

Afterwards the audience got to vote on the best challenge paper, resulting in Pierre-Yves Vandenbussche and Charles Teissèdre taking home the first DeRiVE Challenge Prize.

As organisers, we are quite happy that we received nice papers on a variety of topics, and that we managed to meet new people working on events. We hope that next year we can organise a follow-up workshop.

All the papers as well as the slides of the presentations are available through the DeRiVE website. There is also an titanpad available with summaries of the discussions from the day at: http://titanpad.com/derive2011

Posted in Uncategorized | Leave a comment

Agora collaboration with Glocal

For a while we’ve been running into people from the Glocal project at meetings and symposia and we have been discussing possible ways to collaborate. A few weeks ago we finally shaped this collaboration in a visit from Glocal developer Sven Buschbeck to Agora.

Like Agora, the Glocal project assumes that events are a good way to organise and index media. They have so far been working on a data set about the FIFA 2010 World Cup and we on historical data, but the idea is the same. Where we differ is that Glocal has from the beginning worked on how to present events, whereas Agora has been working more on the back-end: extracting and representing events. A collaboration where we could learn from each other’s experiences on the aspects the other party has been working on most seems therefore most beneficial.

So far, we have mostly exchanged information and the programmers have tried to make the Agora data work with the Glocal event presentation, but in future we are planning to at least work on a common data set and see where else we can combine forces.

We’ll keep you posted.

Posted in Uncategorized | Leave a comment

IJCAI’11 Highlights

To stay up-to-date with the latest and greatest work in AI, one of the Agora team members visited IJCAI’11, the International Joint Conference on Artificial Intelligence. Here are some of the most interesting papers for current and near-future Agora work.

A good start of the conference was the paper Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images by Shane Bergsma and Benjamin Van Durme in the first NLP session. They use image analysis to aid building bilingual dictionaries. They would for example identify several images depicting candles on the web and then use the associated tags in the different languages (e.g., Dutch and English) to identify word pairs in different languages (e.g., kaars – candle). The fact that it is so cross-disciplinary inspires to try to think more out of the box.

The invited talk by Daphne Koller on Wednesday was also very interesting, it was titled “Rich Probabilistic Models for Image Understanding”,  and she explained the problems in automatic image analysis (sparse annotated data, very few images annotated on pixel level, annotations that completely ignore the background) and how she was finding ways around this (adapting her models to still use weak annotations). This is particularly interesting for the Agora event detection problem, as we also have limited event annotations, but we have yet to figure out how her work translates to an NLP setting.

Another interesting paper was Domain Adaptation with Ensemble of Feature Groups by Rajhans Samdani and Scott Wen-tau Yih. They presented a method in which they use different sets of features differently for a domain adaptation task where there is a lot of training data available for the initial task and much less for the task in the new domain. They presented results on email spam classification where some features change rapidly (the text in the features) and others much less rapidly (sender features). By putting different weights on the different features they showed how performance on a new domain could be boosted considerably. In Agora, we are thinking of expanding our work to domains other than history of Indonesia, so there are some interesting ideas to take from this paper.

One of the cool features of the IJCAI conference was that they also had a track with best papers from sister conferences. There were many interesting talks in that track, but for Agora the most relevant one was probably the talk by Dafna Shahaf, who presented her paper from KDD2010: Connecting the Dots between News Articles (coauthored by Carlos Guestrin). They presented an approach to find chains of news articles that tell a coherent story, they present it as a way to help users navigate large numbers of news articles. This is very much related to the narratives we want to present to users to help them navigate the Rijksmuseum Amsterdam and Sound and Vision collections. Although we already have some extra hooks to go by, such as the historical proto-narratives that we defined for our WebSci paper, the work by Shahaf & Guestrin helps us think about how to construct other narratives, perhaps to highlight other dimensions of particular parts of the collection.

For the full program, papers, as well as videos of the presentations see the IJCAI website.

Posted in Uncategorized | Leave a comment

Agora at WebSci’11

The video lecture of Chiel and Marieke’s talk at WebSci is available online at videolectures.net


The questions that we got from the audience were mostly about our future work (how to incorporate different perspectives on (art)history and how we are going to build our social platform, so we reckon we’re on the right track with planning to do those things.

Unfortunately we did not win the best paper award, but it was already an honour to be nominated. The prize was shared between two papers, namely The Effect of User Features on Churn in Social Networks by Marcel Karnstedt, Matthew Rowe, Jeffrey Chan, Harith Alani and Conor Hayes and Sic Transit Gloria Mundi Virtuali? Promise and Peril at the Intersection of Computational Social Science and Online Clandestine Organizations by Brian Keegan, Muhammad Aurangzeb Ahmad, Dmitri Williams, Jaideep Srivastava and Noshir Contractor.

Another interesting sessions was the birds of a feather session about events, here we got together with other people from WebSci who are working with or interested in working with events to exchange experiences. Sadly the time was limited, so we didn’t get much further than introducing ourselves and swapping email addresses, but hopefully there’s more to come on that soon.

Posted in Uncategorized | Leave a comment

WebScience’11 Recap

From June 15 until 17 the ACM WebScience’11 conference took place in Koblenz, Germany.  The topic of WebScience is naturally diverse, and the conference presented a large cross section of it. The humanities were a little bit underrepresented to the taste of Agora, but there were still enough ideas from the papers dealing with other domains that transfer.

The conference kicked off with a fabulous keynote by Jamie Teevan of Microsoft Research. She presented research and results on how the way web pages change affects how we find new information. Some pages change rapidly, such as newspapers, and it it sometimes very difficult to find back information, whereas other pages are rather static and any update is quite apparent. At Microsoft they have developed a browser plugin diff-IE that highlights changes on websites since you last saw them (unfortunately only for Internet Explorer). It’s quite an interesting way to attract your attention to new content that would otherwise perhaps be overlooked. I think this could be particularly interesting for museum websites and possibly the Agora platform as people are often driven by finding out ‘what’s new’, so this is definitely something to think about.

One paper that touched upon an issue central to Agora is “Survey on Governance of User-generated Content in Web Communities“ by Felix Schwagereit, Ansgar Scherp and Steffen Staab that was presented on Thursday afternoon. It addresses the issue of how one can maintain high quality in user-generated content. To museums, who have traditionally been gatekeepers of their content and its quality, quality assurance is core requirement for anything they do or support online. This paper at WebScience described a review of web communities that are successful at creating and sharing user-generated content, which gives the Agora team useful guidelines when further developing the social modules of our platform.

Next to ensuring quality of the content on the Agora platform, we also want to make sure that the platform presents different perspectives, up until now, we have treated the events in the event thesaurus more or less as objects that do not reflect a particular perspective, but this is of course a simplification of the history domain. An aggressor, or winner in a battle will recount an event or even call the event by a different name than a victim or ‘the loser’ in the battle. In most online communities quality control is exercised through some sort of democratic scheme, which makes the content usually a reflection of the majority opinion. At WebScience there was a  paper that exactly analysed these issues further and presented possible solutions to this, namely “Towards a diversity-minded Wikipedia” by Fabian Flöck, Denny Vrandečić and Elena Simperl. This is definitely some more stuff to think about for Agora.

During the poster session there were many interesting posters on various topics ranging from ethics to health to economics in various domains such as music, social web (in particular Twitter). My favourite poster was “The Syzygy Surfer: Creative Technology for the World Wide Web” by James Hendler and Andrew Hugill. They present an idea to and first attempts at facilitate more creative searching and browsing on the web than traditional search provides based on ambiguity. Now for Agora browsing via ambiguous relations is not what we are looking for, but all types of ways to enable more creative browsing (or rather to enable the user to discover more serendipitous results) spark our interest.

Next to the regular sessions there was also a birds-of-a-feather sessions on events just before the conference dinner. Unfortunately the time was too short to really get into discussing problems and possible solutions, so we had to stick with introductions and exchanging information, but it’s was nice to even exchange some thoughts on events, representation etc.

The Agora team went home with a bunch of inspiration for the project will definitely try to be present at next year’s WebScience conference.

Posted in Uncategorized | Leave a comment

Kom Je Ook? – Slides online

On 16 June, Lora and Johan gave a presentation at Kom Je Ook? – Crowdsourcing symposium organised by Mediamatic in Amsterdam. Their presentation, entitled “Crowdsourcing en Cultureel Erfgoed: Kansen & Uitdagingen” (Crowdsourcing and Cultural Heritage: Opportunities and Challenges) they explained how cultural heritage institutions deal with the new opportunities crowdsourcing offers, they presented best-practice examples, and reflected on the challenges these new opportunities bring. You can find the slides on slideshare.

Posted in Uncategorized | Leave a comment

Digital Hermeneutics Presentation Slides Online

We just finished our presentation at WebSci11 and now you can find the slides online at Prezi. Thanks everyone in the room for listening and asking us interesting questions, and we hope that the slides and paper will inspire more discussions, so don’t hesitate to bug us!

Posted in Uncategorized | Leave a comment