I think that…


Semantic Web Introduction

Filed under: Semantic Web — Jiří Procházka @ 12:25

This a general introduction/overview of the Semantic Web world I wrote for zdrojak.root.cz (czech). The audience are general web developers. Feel free to comment.

In the text I avoid using general term “semantic web” and use “Semantic Web” as a term which means usage of RDF as main model for information representation.

Idea of Semantic Web is old more than ten years, however still not too well known, but recently this has changed. So what Semantic Web is, what it is composed of and what it could be useful for?

In a world, where information on the web are published using Semantic Web technologies, when you need to visit your dentist, you don’t have to use various search engines to get to his website, to find out the opening hours, but your intelligent calendar application finds out to which dentist you are registered, compares his opening hours with your timetable, fixes up the transportation and proposes you the suggestions. Of course this is a very idealistic vision, but possible.

Semantic web is a label for a group of technologies, which allow information on the web be expressed in such way so they are comprehensible not only by humans, but also by machines, well mainly the software running on them. It was initiated by Tim Berners-Lee and W3C, who still are leaders of development in this area. Semantic web is not a competition to current web, but an addition, improving utilization of the potential which networks such as Internet have. HTML describes documents and relations between them, on the other side the purpose of Semantic Web technologies is description of anything (people, things, services, events, roles…) and their relations. Such network is called Semantic web, Web of Data, Giant Global Graph, or even Web 3.0.

The basic Semantic Web technologies are RDF, RDFS, OWL and SPARQL. RDF (Turtle version) defines model of expressing information, which are formulated using triple “subject predicate object”, where predicate defines relation between subject and object. Identifiers, or “words” forming these “sentences”, are URIs. A set of triples forms a RDF graph. RDF alone is just a model, which uses has many serializations for textual representation – the oldest RDF/XML, then we have RDFa for integration of RDF inside HTML, well readable Turtle, primitive N-Triples and others… To be of any use, we need to give our “words” some meaning. Exactly for this there are ontologies, RDF vocabularies, alias RDF schemas, for example FOAF (spec.), SIOC (spec.), GoodRelations (spec.)… These are defined using RDFS and more complex OWL, which themselves are ontologies too. SPARQL is a language originating from SQL meant for manipulation, primarily querying, of RDF databases alias triplestores. Microformats are related to Semantic Web technologies and data can be easily extracted from them to RDF using GRDDL, however are not as flexible as RDFa. Linked data is a name for a few principles, which try to ensure some level of usefulness of information (creating subset of Semantic Web). Community project Linking Open Data publishes open databases as linked data, best example being DBpedia which is a dataset of information extracted from Wikipedia.

In some industries such as health-care and life sciences and gas and oil industries these technologies have gained wide acceptance because of need for sharing information with clearly defined terms and automatic inference of additional data. Implicit decentralization of Semantic Web technologies is used for tearing down the walls of social networks like Facebook, Myspace etc. by RDF data using FOAF ontology, with which every server can act as a FOAF profile hosting, creating one big open decentralized social network. Because not all information should be available to everyone, a WebAccessControl system was created, which uses authentication protocol FOAF+SSL, which is faster and simpler than OpenID and in most cases even more user friendly (no need to remember your username/ID or password, just have your certificate).

Yahoo and Google utilize RDF data for improving search results. There are projects of semantic desktop on Linux by KDE and Gnome. Drupal 7 is going to publish it’s internal data in RDF.

Semantic Web will never reach widespread awareness of general public, and it shouldn’t, because it’s representation of internal information. RDF was created as answer to demand for unified model for information exchange in decentralized heterogeneous network of systems. If you are developing an application, publish the information which you want to be shared in RDF. Development of Semantic Web has 3 branches:

  1. Technology development
  2. Publishing of information using the technologies
  3. Development of applications and automated agents working with the information

The first phase, when the development was centered around the first branch, is the past, now is the time to cultivate the second branch, because without it, the third phase cannot come into blossom and that is our goal – simplification of our everyday life. Semantic web has great potential, but fame and wealth for taking advantage of it in an innovative way won’t just come to you on it’s own, it is for you to make it.


  1. I try to make my websites compatible to the semantic web as possible, but it requires a lot of work. So for small websites it is not worth the time.

    Comment by marketer — 2010/06/17 @ 15:41

  2. I share your point of view, but really semantic web is a paradigm shift – it takes time to change our software development processes to align with it and hopefully be more sensible…

    Comment by Jiří Procházka — 2010/06/17 @ 16:14

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress