Semantics, Semantics, Everywhere, Nor Any Drop to Drink
March 20th, 2008by Sean Gorman
It seems like it is a daily dose of semantic web on the tech blogs of late. Today it was Textwise’s Million Dollar Semantic Hacker Challenge and a few days ago it was Yahoo opening their search platform to support a wide variety of semantic web standards. This has lead to a good bit of proselytizing, mostly in the comments, that this heralds the arrival of the Semantic Web, or Web 3.0 or the Next Generation Web. All of which sounds like the circling of the marketing band wagons.
Unfortunately when the wagons circle everything starts picking up the label - in this case semantic. This is especially dangerous when you have a word like “semantics” that can be defined, so many different ways. Just look at the definition tree created by Wikipedia:
*Semantics is the study of meaning in communication.
*In computer science semantics reflects the meaning of programs or functions.
*The Semantic Web refers to the extension of the World Wide Web through the embedding of additional semantic metadata
More often I see folks labeling things semantic that are really syntax. “Syntax” being the rules to construct and define something like a sentence or line of code and “semantics” the meaning of those rules or definitions. Syntax is fairly easy and semantics are fairly hard, as most folks in artificial intelligence would argue. Even going so far as saying all programming languages other than LISP are syntax and not semantic.
This is a bit more clear with an example. Lets take the Textwise announcement - a technology that will parse plain text on a website or elsewhere and categorizes it to predefined topics. One example in the Techcrunch comments was the following:
input text:
Call us crazy, but we think there are some brilliant minds out there that can find some really amazing uses for this incredibly powerful and scalable technology. Think you’re up to the Challenge? We think you are!
categories (ranked from 0 (worst) to 100 (best)):
Shopping/Health/Alternative/Hypnotherapy/Audio_and_Video 43 Business/Telecommunications/Services/Wireless/Software 33 Arts/Music/Bands_and_Artists/311/Tablature 28
Computers/Internet/Consultants/Research 26 Shopping/Health/Alternative/Meditation/Audio_and_Video 25
The output is really not telling me anything about the meaning of the text just setting up rules to provide categorization. So I would definitely put this in the syntax and not semantic category. I would also say what Yahoo! is doing is really more syntax than semantics although there is the possibility of building truly semantic technologies on top of what they are enabling. They’ve created a set of rules based on rich standards to allow applications to be built. Remains to be seen what will come of it, but in rush of market buzz I think it is easy to miss that building truly semantic technologies is quite hard. Some folks in AI (the Chinese room) would argue machines are not even capable of semantic meaning or understanding.
From this perspective I think we’ll see a lot of people building applications based on syntax that reorganize and categorize content by giving the “page web” a bit of structure. Oddly its like we’ve gone full circle back to DMOZ. While these technologies may be clever and useful I do not think they will fundamentally change the Web. In the other category I think we’ll see a few companies pushing towards something more sophisticated (call it a semantic, implicit, computational web) where new data and services are mixed with existing web content to provide answers to users questions.
Popularity: 33% [?]






Leave a Reply