I think that…

2009/10/22

Semantic Web Introduction

Filed under: Semantic Web — Jiří Procházka @ 12:25

This a general introduction/overview of the Semantic Web world I wrote for [http://zdrojak.root.cz/clanky/uvod-do-semantickeho-webu/ zdrojak.root.cz] (czech). The audience are general web developers. Feel free to comment.

In the text I avoid using general term “semantic web” and use “Semantic Web” as a term which means usage of RDF as main model for information representation.

Idea of Semantic Web is old more than ten years, however still not too well known, but recently this has changed. So what Semantic Web is, what it is composed of and what it could be useful for?

In a world, where information on the web are published using Semantic Web technologies, when you need to visit your dentist, you don’t have to use various search engines to get to his website, to find out the opening hours, but your intelligent calendar application finds out to which dentist you are registered, compares his opening hours with your timetable, fixes up the transportation and proposes you the suggestions. Of course this is a very idealistic vision, but possible.

[http://www.w3.org/2001/sw/ Semantic web] is a label for a group of technologies, which allow information on the web be expressed in such way so they are comprehensible not only by humans, but also by machines, well mainly the software running on them. It was initiated by [http://en.wikipedia.org/wiki/Tim_Berners-Lee Tim Berners-Lee] and [http://en.wikipedia.org/wiki/World_Wide_Web_Consortium W3C], who still are leaders of development in this area. Semantic web is not a competition to current web, but an addition, improving utilization of the potential which networks such as Internet have. HTML describes documents and relations between them, on the other side the purpose of Semantic Web technologies is description of anything (people, things, services, events, roles…) and their relations. Such network is called Semantic web, Web of Data, Giant Global Graph, or even Web 3.0.

The basic Semantic Web technologies are RDF, RDFS, OWL and SPARQL.
[http://www.w3.org/TR/rdf-primer/ RDF] ([http://www.w3.org/2007/02/turtle/primer/ Turtle version]) defines model of expressing information, which are formulated using triple “subject predicate object”, where predicate defines relation between subject and object. Identifiers, or “words” forming these “sentences”, are [http://en.wikipedia.org/wiki/Uniform_Resource_Identifier URI]s. A set of triples forms a RDF graph. RDF alone is just a model, which uses has many serializations for textual representation – the oldest [http://www.w3.org/TR/rdf-syntax-grammar/ RDF/XML], then we have [http://www.w3.org/TR/xhtml-rdfa-primer/ RDFa] for integration of RDF inside HTML, well readable [http://www.w3.org/TeamSubmission/turtle/ Turtle], primitive [http://www.w3.org/TR/rdf-testcases/#ntriples N-Triples] and others…
To be of any use, we need to give our “words” some meaning. Exactly for this there are ontologies, RDF vocabularies, alias RDF schemas, for example [http://www.foaf-project.org/ FOAF] ([http://xmlns.com/foaf/spec/ spec.]), [http://sioc-project.org/ SIOC] ([http://rdfs.org/sioc/spec/ spec.]), [http://www.heppnetz.de/projects/goodrelations/ GoodRelations] ([http://www.heppnetz.de/projects/goodrelations/primer/ spec.])… These are defined using [http://www.w3.org/TR/rdf-schema/ RDFS] and more complex [http://www.w3.org/TR/owl2-primer/ OWL], which themselves are ontologies too.
[http://www.w3.org/TR/rdf-sparql-query/ SPARQL] is a language originating from SQL meant for manipulation, primarily querying, of RDF databases alias [http://en.wikipedia.org/wiki/Triplestore triplestores].
[http://microformats.org/ Microformats] are related to Semantic Web technologies and data can be easily extracted from them to RDF using [http://www.w3.org/TR/grddl-primer/ GRDDL], however are not as flexible as RDFa.
[http://linkeddata.org/ Linked data] is a name for a few principles, which try to ensure some level of usefulness of information (creating subset of Semantic Web). Community project [http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/ Linking Open Data] publishes open databases as linked data, best example being [http://dbpedia.org/About DBpedia] which is a dataset of information extracted from Wikipedia.

In some industries such as health-care and life sciences and gas and oil industries these technologies have gained wide acceptance because of need for sharing information with clearly defined terms and automatic inference of additional data.
Implicit decentralization of Semantic Web technologies is used for tearing down the walls of social networks like Facebook, Myspace etc. by RDF data using [http://www.foaf-project.org/ FOAF] ontology, with which every server can act as a FOAF profile hosting, creating one big open decentralized social network. Because not all information should be available to everyone, a [http://esw.w3.org/topic/WebAccessControl WebAccessControl] system was created, which uses authentication protocol [http://esw.w3.org/topic/foaf+ssl FOAF+SSL], which is faster and simpler than [http://openid.net/ OpenID] and in most cases even more user friendly (no need to remember your username/ID or password, just have your certificate).

[http://www.techcrunch.com/2008/03/13/yahoo-embraces-the-semantic-web-expect-the-web-to-organize-itself-in-a-hurry/ Yahoo] and [http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html Google] utilize RDF data for improving search results. There are projects of semantic desktop on Linux by [http://nepomuk.semanticdesktop.org/xwiki/bin/view/Main1/ KDE] and [http://live.gnome.org/SemanticDesktop Gnome]. [http://drupaleasy.com/blogs/ultimike/2009/06/rdf-drupal-future-rdf-drupal-7 Drupal 7 is going to publish it’s internal data in RDF].

Semantic Web will never reach widespread awareness of general public, and it shouldn’t, because it’s representation of internal information. RDF was created as answer to demand for unified model for information exchange in decentralized heterogeneous network of systems. If you are developing an application, publish the information which you want to be shared in RDF. Development of Semantic Web has 3 branches:

1. Technology development
2. Publishing of information using the technologies
3. Development of applications and automated agents working with the information

The first phase, when the development was centered around the first branch, is the past, now is the time to cultivate the second branch, because without it, the third phase cannot come into blossom and that is our goal – simplification of our everyday life.
Semantic web has great potential, but fame and wealth for taking advantage of it in an innovative way won’t just come to you on it’s own, it is for you to make it.

Powered by WordPress