The terminal man & the semantic web
With its feet firmly rooted in open standards and interoperability, the Semantic Web seeks to build a meaningful and heavily decentralised net which favours language and understanding over the bland concerns of control. Martin Howse investigates technologies which could radically rewrite the Web in truly emergent fashion
Proprietary interests have always favoured a heavily biased client-server model of networking, with key software components or content never even having to touch the paying customer's lowly and untrusted hardware. They argue that processing power and storage needs can readily be offloaded to centralised heavyweight servers, whilst at the same time pushing desktop product which puts former mainframes to shame, and happily caning Moore's law with the stick of runaway clock speeds and alien architectures.
This centralised model, highly reminiscent of the old days of green screened terminals happily hanging off the shirt tails of an overworked VAX, can be usefully applied in smaller scenarios, and semi dumb terminals, as witnessed by the success of the LTSP (Linux Terminal Server Project), work wonders in schools, labs and less tech infested environs. Yet in the wider context of the World Wide Web and emerging technologies such as P2P, it certainly begins to appear commercially transparent, and the focus on control is difficult to disguise in this already open environment.
The net is a vast entity which engages openly with meaning on a huge array of levels and is thus resistant to any form of control which could only really be implemented at a protocol level or, more aggressively, within client hardware itself. Proprietary interests can well supply services, but it's down to the consumer to subscribe to them, and in the contemporary networked landscape it's becoming readily apparent that centralised and closed solutions simply cannot measure up to the task of dealing with a meaningful Web, with meaning generated not by raw data, but within the context of necessarily decentralised groupings or communities of clients or rather client agents which can rewrite the codes, undressing, extracting and dressing up data which will barely resemble that pumped out by the holy server, and could readily have quite different meanings attached.
From a closed viewpoint, for those guarding data, it's a dog eat dog world. Yet from an open client's angle, it's all about emergence, community and perhaps subscription to a new model of desktop computing. And, in keeping with the resolutely low tech world of our famous text pistols from last issue, primarily code-based textual solutions are the order of the day. Text is opaque, text is transformation, text is code.
A question of semantics
Of course the Semantic Web that is implicated within this brave new networked world is far from resolving the protocol embedded imbalance between client and server but it does go a good distance in furthering the freedom of the Web by untying networks of meanings from otherwise ubiquitous search engines and the like; meanings which we ourselves, as creators rather than consumers, generate. And technologies such as RDF (Resource Description Framework), which opens things up hypertext-wise way beyond linking plain old URLs, also go hand in hand with enabling and highly social phenomena such as blogging, collaborative annotation, social bookmarking and the wiki. Access to and control over information, and more importantly meta-information, is the name of the game here, and it's easy to see, with search engines defined as creators of meta-data, why no one proprietary entity can win out. The Internet ball is very much back in the court of open standards, with the W3C (World Wide Web Consortium) driving on the game.
In common with other such modern technologies, the Semantic Web is both described and implemented in hierarchical fashion with smaller components neatly slotting into a wider framework. XML (Extensible Markup Language), though far from a personal favourite in being mis-applied to all manner of otherwise unstructured matter, is nevertheless our base here, forming the foundation of the Semantic Web and harking well back to the Standard Generalised Markup Language (SGML) of the late 80s, of which it forms a subset.
Technologies such as RDF, and RSS, historically expanding to Rich Site Summary, RDF Site Summary and Really Simple Syndication, build on XML and form the cornerstones of the Semantic Web. Spawned by Netscape's attempts to snare the portal market back in the late 90s, the RSS family, which refers to multiple syndication formats, has borne witness to a strange history, with forked versions and brief flirtations with the more heavyweight RDF before emerging as the darling of the blogging community.
RSS is very much a popular format, in contrast to RDF as punted by the W3C which is a blue skies affair in respect to a good many of its ambitions which could well be seen as pointing towards some kind of global AI. RSS is all about syndication or re-distribution of any content, from simple news or weblog text to audio by way of podcasting, under a subscription model. RSS enabled sites offer up feeds which feed-reader-ware can poll at intervals for new content. The attraction is obvious, and multiple formats such as atom feeds, also based on XML, do exist.
Although popular and socially enabling, RSS pales in comparison with the ambitions of RDF which is further embedded by the W3C in the OWL Web Ontology Language, a seriously overarching affair as the name would imply, describing how the components of the Semantic Web are related.
Ontology is a suitably powerful term, with roots in metaphysics and philosophy, which is really all about describing concepts and relations, the essence of the Semantic Web. Beyond fancy formatting which could provide some clues as to meaning for a chunk of code or a software agent, HTML is more or less a flat affair, a field of equivalence meaning-wise. Of course we can read the meaning and relations between elements of a site within a rich cultural context indulging habit, but machines have a tougher time, and it's machines who need to parse this mass of data which overwhelms the human.
The decentralised net must be self-organising and client-driven agents are the answer under the Semantic Web. RDF and brother OWL provide for a conceptual and descriptive framework which inserts meaning back into the equation for the poor old machine. As its name implies, RDF is all about resources. The revolutionary thing here is that such resources need not necessarily reside on the Web just as long as they can be referred to. As long as we can describe it, we can point to it. Moreover, these resources are identified and described within a graph of URIs (Uniform Resource Identifiers), of which the good old URL represents a seriously specialised form, and their properties.
The universality, decentralised nature and pure simplicity of URIs within a heterogeneous environment is what makes them attractive as relational carriers of meaning for RDF pundits. It's where interoperability enters the picture with all Web-based apps talking the same language. And it's a language which can be used to build new languages for talking about things both on and off-line. This is the essence of the Semantic Web. In such simplified terms assertion and quotation, or talking about assertions, naturally follow. That's what RDF is all about; representing knowledge in a universal yet specialised and extensible manner.
With URIs providing a way to refer to and link almost anything, the next step on the road to RDF is through combining URIs into statements which are officially penned in an XML representation of RDF. Such statements are unfettered from any centralised agency and can say anything about anything. Statements about resources are presented as graphs of inter and intra linkages at a defined number of levels; RDF properties can themselves sport linked URIs. Indeed it's all about linkage and meaning, with RDF/XML encoded graphs, or diagrams of relation, openly exchangeable and readily machine readable. With key reference to knowledge representation, RDF, as specified in highly readable primers from the W3C, is a rich seam to mine with both URI and the global and devastatingly open quality of the net as core topics.
RDF is all about creating shared vocabularies which can express machine-readable meanings, for further human and textual code processing in an open global feedback loop. Examples which compare the Semantic Web to database structures do little justice to the full unpacking of this concept. Indeed, some would argue that such parallels, and the reliance on XML as structured basic format, make of the RDF/XML implemented Semantic Web a doomed initiative. It's a supreme historical showdown between W3C director and World Wide Web initiator, Tim Berners-Lee, and hypertext maverick inventor Ted Nelson who virulently despises XML, and, indeed, any form of embedded markup. He argues that embedded markup can do little to approach the true nature of hypertext in terms of transpublishing and open re-use. Embedded markup creates all manner of problems, and solves few, driving the World Wide Web down the cul de sac of trivial linkage.
His arguments, severely self-censored before online publication, are well worth considering in relation to the W3C's contemporary efforts, and indeed comparisons could readily be made between RDF functioning as a parallel structural technology to the world of embedded markup, and Nelson's visionary use of a similar layered representational system in the Xanadu project. Nelson favours what he called intertwingularity over squeezing deep linkages to suit a hierarchy. Yet neither approach goes so far as to question the need for metadata itself or indeed post facto knowledge representation in relation to the machine. And perhaps deeper questions of knowledge representation have been swamped by the technical needs of an implementation. Intriguing philosophical investigations here would doubtless uncover reference to the failed cyc representation system which has recently, in OpenCyc form, wedded with OWL.
RDF is only a minor part of the roadmap for the Semantic Web. In Web Architecture from 50,000 feet, Tim Berners-Lee further outlines the layers of the Semantic Web which well unravel sequentially along a timeline with missing pieces projected as future work. Next up from RDF in the stack we find schema languages on which we can build good old ontologies. Schemas define or describe vocabularies which specific interest groups within specialised problem domains can use to express those RDF statements.
Vocabularies are described as to their classes within the RDF Schema language. We're in the realm of the application specific and RDF Schema answers the question of what terms are needed to attack a domain. It's a circular affair, with RDF Schema itself making use of a specialised RDF vocabulary. Like our timelined onion skin, it's all layers, with meaning added at each skin and building on prior resources. As long as the graph or statement is valid it can well be interpreted by a machine at any level.
It's just that more meaning is attached the higher up we go through our stack. Adding more meaning means richer semantics for machines to make ever more meaningful connections and inferences from. This is where the ontology languages such as DMAL+OIL and OWL come into play, wrapping up our RDF Schema in a more complex relational form. But, according to Berners-Lee's map, it goes a whole lot further, with logical layers connecting all manner of RDF apps, and implying inference rules which allow for such conversion of schemas. It's all about universal ways of talking and understanding, pushed on by code; an emergent and highly textual network of meaning for code and humans.
Not just for the machines
Contemporary examples of RDF in the field, and a brief glance at the syntax, remind us that the human is still very much central spider in this Semantic Web. After all, other than being implicated in the delivery of encoded or binary data such as images or MP3s, the Semantic Web's languages are obviously totally textual. The executable is elsewhere. The W3C provides good documentation of both toy and real-world sample RDF vocabularies, though coverage of perhaps the first meta data initiative, Dublin Core, is certainly tough on any human.
Online material also well explains the syntax and format of RDF with reference to the Notation 3 (N3) syntax, a simplified teaching language. N3, with the three describing the statement's triplet of subject, verb and object, is roughly equivalent to the standard XML syntax, but is easier for humans to read, write and understand. Machines can readily translate between the two formats. The W3C also supports and provides educational material for the excellent Semantic Web Application Platform (SWAP) coded in PYTH0N. PYTH0N does make a good fit with RDF and the utilities and example code amply demonstrate how simple the Semantic Web is at its base and how easily we can toy with sample code and utilities. It's worth noting that SWAP equally well expands into "Semantic Web Area for Play."
Yet perhaps the best introduction is to examine current examples of RDF vocabularies undergoing active use, development and, above all, documentation. It's a bootstrapping affair with incremental technologies aiding and abetting such efforts. The earlier RSS 1.0, with RSS referring to RDF Site Summary, is perhaps the best known manifestation of RDF, with parallel RSS efforts ditching the RDF in favour of simplicity of syndication. Plentiful tutorials exist for this simpler variant, and most blogging and wiki software automatically generates RSS feeds. Of more interest from a vocabulary point of view is the well documented and indeed rather ageing FOAF, or friend-of-a-friend, vocabulary which allows for auto-discovery and linkage of expressed social identities within spawning communities. Again, with an emphasis on human readability, online tutorials walk the novice coder through sample code, basic object oriented data structures, properties and supported values.
For example, the homepage property takes quite obvious values, and so on for name and phone number. The nested "knows" property is one key here, and identity is supremely established by way of unique email address rather than often shared proper names. Further properties also liven up the playing field, with depiction, as in a photograph of the individual, being also another attribute which can be shared. Two or more people can obviously appear in the same image, thus constituting a further quality of linkage. Collective endeavours can also be well represented, allowing for collaborative efforts and community building. And of course FOAF files can be further aggregated (see Golden del.icio.us) and transformed to good effect. Other vocabularies of note include MusicBrainz, for the creation of a community led music meta database which well highlights the open qualities of the semantic web in contrast to initiatives such as the CDDB database which locked in user contributed metadata. FOAF plays well with the MusicBrainz vocab, alongside the vast Wordnet RDF representation which throws English nouns, verbs and further grammatical elements together into linked synonym sets which represent underlying concepts. The RDF file for nouns alone weighs in at around 10 MB.
And just when you thought we'd forgotten all about our hardcore Text Pistols, it's time to throw GNU Emacs back into the picture as an application which well exposes the interface of human and machine readability and meaning, extruding code in the form of easily parsable elisp files.
It's a seriously heady brew, though whether GNUS reading and FOAF using bloggers can achieve critical mass is another matter. What we're looking at here is a rich and deep extension and remodelling of the bad old desktop at a textual and terminal level. Hyperlinked meaning doesn't reside solely in the GUI browser.
If you're looking for further Emacs integration of everyday semantic web implementations, then it's all there. Indeed Emacs is the most widely used OWL editor, with a fully featured mode devoted to this Web ontology language which describes the meaning of terms in vocabularies and the further relations of those terms. It's a language for languages in a way. Again the W3C is the first port of call for those seeking further enlightenment. And if it's just plain old RSS subscription you're after, the newsticker.el code should do the job, or you could stick with GNUS as usual client. When it comes down to editing wikis, and aside from emacs-wiki.el which is a more self-contained affair purely concerned with page generation, wiki-remote.el should satisfy most popular engines.
Though GNU Emacs provides one route into the Semantic Web, and with a superb PYTH0N mode you can also play with fontified code under SWAP, it would be great if command-line tools could get a look in on the action, to further round out a decent environment. Many use terminal-based browsers purely as a quick and dirty replacement for a GUI browser, say during an SSH session, but such command-line apps can readily be scripted and coaxed into engaging in Semantic Webbery. With power tools such as grep we can readily parse RDF or other metadata sources on a basic level. Lynx, which readily plays its own essential part in the history of the Web as early manifestation of a distributed hypertext system, is the tool of choice here with an array of useful command-line options such as post-data to read form data from stdin or a pipe.
Elinks, wget and w3m are also worthy of mention, and we can nudge Emacs into the limelight once again with emacs-w3m and emacs-wget building bridges for the latter two text mode browsers. Walking the command-line, Snownews represents the Lynx of RSS aggregators or feed readers, well managing subscriptions and offering a text based experience in common with that browser. And a vast array of Perl scripts are on offer within Snownews creators' repositories, allowing for RSS feed generation from a number of sites who have yet to implement this feature. Such scripts also form a useful jumping off point for fans of that language to dive into RSS, but what we're really after is a collision of old school Unix power tools with the impending Semantic Web.
And it doesn't take long to dig into SWAP again and pull out Cwm, well described as the sed and awk of the Semantic Web. Or in simpler terms as a forward chaining reasoner. Either way you pan it, Cwm, or Closed World Machine, performs all manner of querying, filtering or transforming operation on RDF/XML or the simpler N3 variant we encountered briefly. It's a PYTH0N-based tool with a powerful command-line syntax, and with RDF as language for specifying rules and operations. It's seriously hardcore with an inference engine coming close to some of co-author Berners-Lee's specifications for logic within the Semantic Web, and a vast array of built in functions which include the cryptographic. Cwm provides a free software solution which couples the power of a purely textual Unix approach with the depth and openness of an as yet unrealised Semantic Web. And its expanded name, Closed World Machine poses deep questions as to deep linking and the representation of knowledge within this web. Whether proprietary interests can still manage to round up users by window dressing their very own data is another open question.
Aside from formal specifications and perhaps one rather vast and obvious online industry, it's the clients who really drive on the net. Aggregation is the name of the game, driving users away from search engines with a flat and bulky model of the net into the arms of specialist groupings and community-led sub-nets. It's a wonderfully open model exemplified by the spawning Planet style sites such as Planet Lisp, Planet GNOME and even the recursively enabling Planet RDF aggregates, which by way of RSS feeds can collect and collate multiple thematically connected weblogs for further consumption.
There's also a wider feedback effect here, which means that, as well as making for economic and highly usable browsing, such sites also provide for a community front. Common grouped interests become more visible and open to review and comparison both for the group and for welcome newcomers. Yet as with such meta-data ridden technologies the temptation is always to get meta. Why not collect the collections? We'd simply get back to a centralised model if such collation was undertaken by just one impersonal agency, and it would certainly go against the grain to operate in such a way. Rather, such collections would have to be under control of the people themselves and del.icio.us does exactly that, throwing personal bookmarking and RSS into tight intimacy and thus passing on feeds of collected feeds and bookmarks.
Things can only get meta. Personal bookmarking is made public and the result is a giant feedback loop of tweaks, additions and removals. It functions both as vast knowledge map and as annotated archive. Tags categorise content and can further generate feeds which can be well specified with groupings of tags, exclusions and so on. Del.icio.us exemplifies the emergent qualities of the open net, even before we really get started with talk of a Semantic Web.
See also:The Text Pistols
RDF Primer: http://www.w3.org/TR/rdf-primer
Ted Nelson on XML: http://www.xml.com/pub/a/w3j/s3.nelson.html
Web Architecture from 50,000 feet: http://www.w3.org/:/Architecture.html
RDF Schema: http://www.w3.org/TR/rdf-schema
BBDB and FOAF: http://www.emacswiki.org/cgi-bin/wiki/
Planner mode: http://www.emacswiki.org/cgi-bin/wiki/