Question answering based on large and noisy knowledge sources

With the advent of massive online encyclopedic corpora such as Wikipedia, it has become possible to apply a systematic analysis to a wide range of documents covering a significant part of human knowledge. Using semantic parsers or related techniques, it has become possible to extract such knowledge in the form of propositions (predicate‚Äďargument structures) and build large proposition databases from these documents. Christensen et al. (2010) showed that using a semantic parser in information extraction can yield a higher precision and recall in areas where shallow syntactic approaches have failed. This deeper analysis can be applied to discover temporal and location-based propositions from documents. McCord et al. (2012) describe how the syntactic and semantic parsing components of the IBM Watson question answering system are applied to the text content and the questions to find answers and Fan et al. (2012) show how access to a large amount of knowledge was critical for its success.

Question answering systems are notable applications of semantic processing. They reached a milestone in 2011 when IBM Watson outperformed all its human co-contestants in the Jeopardy! quiz show (Ferrucci et al., 2010; Ferrucci, 2012). Watson answers questions in any domain posed in natural language using knowledge extracted from Wikipedia and other textual sources, encyclopedias, dictionaries such as WordNet, as well as databases such as DBPedia, and Yago (Fan et al., 2012). A goal of the project which will ensure its visibility is to replicate the IBM Watson system for Swedish with knowledge extracted from different sources such as the newspaper corpora stored in Språkbanken, the classical Swedish literary works available in Litteraturbanken, etc.