Reuters, the news service is sponsoring Calais, an initiative to create metadata for for material on the Web to help drive the Semantic Web. Here’s the press release from January.
Here’s a down and dirty quickie reminder about the Semantic Web if you’ve not thought about it for a while. Or read up at Wikipedia.
It’s the idea, put forward by Tim Berners-Lee that ultimately we can make the Web smart enough to answer questions which are “beyond” search today. For example: Find all the barber shops that offer mohawks for under $25 in a radius of 20 miles from Boston and are open on Tuesday nights until 10 pm. You can do bits of that now, and certainly much of the information needed to answer the question may be on the Web now. The challenge is that right now we can’t quite use it. The Semantic Web, with appropriate tagging, will make that data more findable and usable.
Ok, so this Calais tool, at this release (a few more are planned for the rest of the year) can create metadata (RDF). From the FAQ, here’s what Calais does:
From a user perspective it’s pretty simple: You hand the web service unstructured text (like news articles, blog postings, your term paper, etc) and it returns semantic metadata in RDF format. What’s happening in the background is a little more complicated.
Using natural language processing and machine learning techniques, the Calais web service looks inside your text and locates the entities (people, places, products, etc), facts (John Doe works for Acme Corp) and events (Jane Doe was appointed as a Board member of Acme Corp) in the text. Calais then processes the entities, facts and events extracted from the text and returns them to the caller in RDF format.
That metadata, then can be used to answer questions. For now, there’s an API and some sample apps.
So, why is this cool? Well for one, it pulls out location using natural language processing (form a company called ClearForest). That could be useful for managing all that local search information, right?
For two, Reuters, a major publisher of news content is behind it. Clearly, that company is thinking ahead about making its content available in new ways and ideally making money in new ways.