Turtles all the way down
For some time I’ve had this bon mots rattling around in my head, like an earworm, infinitely repeating as if it knew it’s own meaning. When this happens it usually means either that my brain has made a decision and it’s waiting for the rest of me to realise, or it’s a request for more brain resources on a particularly tricky problem. I like to think that it’s the brain switching gears into automatic, popping on some hold music and working its way to the end of a problem.
It turns out that this time it was a combination of the two. I had both solved a problem and found another that I didn’t know the answer to. Last week I wrote that the web is just a representation of an arbitrary reality. It’s a representation of a representation. Representations all the way down. I wrote this while acknowledging my current challenges – how to get context into semantic data to distinguish between the same thing in different contexts. I’ve also been thinking about how to design some URLs for things that can live in various places on a big site.
SCENARIO: Imagine you’ve got a site with a page about Paris. You might want to put this Paris-thing in the context of France, you might want to refer to it historically, or from the point of view of Paris as a tourist attraction. In other places on the site you might want to point to this Paris-thing, but specifically Paris-in-1911.
For some time, the web was just documents. These hyperlinked texts form a graph of connected pages. The old web was like collecting lots of bits of paper into folders (and sometimes placing these folders into other folders). IA used to primarily be about organising folders. Matching your internal folder names with what your users would naturally call them. It was about understanding and constructing mental models through information architecture.
Now we have new forms of the web. We have the semantic web, which isn’t so much about documents as it is about things – with identifiers representing the things. Even when we’re not talking about the semantic web, dynamic publishing means that the content that used to fill our ring binders is a bit more magic. Now, those “pages” we grew accustomed to are often dynamically generated, and can belong in multiple ring binders at the same time.
“The basic intuition of model-theoretic semantics is that asserting a sentence makes a claim about the world: it is another way of saying that the world is, in fact, so arranged as to be an interpretation which makes the sentence true. In other words, an assertion amounts to stating a constraint on the possible ways the world might be.”(Hayes 2004)
This post is about URLs and ontologies – both of which are technologies based on the physics of statements. URLs make statements about resources by giving them a home. In the old world of the net, URLs revealed your IA. They showed your folder structure, the names you’ve given your ring binder. The problem now, and a problem highlighted through ontology design is that we’re no longer forcing content to fit into mono-hierarchical structures.
I think that the web, as experienced by users, has always been a graph. Sites may have had structured hierarchies, but as a user I can decide on the direction of travel. I can go down and up and sideways, as well as entirely off your site – there isn’t always an inherent direction of travel. On the other hand, journeys have direction. And certain real world things point in specific directions. This is especially true in the area that most interests me, learning. It makes sense to go from broad information to specialised, as you become more familiar with a subject. But how do we describe directions on a web that sometimes appears directionless. And how do we give a home to something that lives in multiple places?
In the world of semantics we recognise that even though a word can be used in any number of contexts, its precise meaning is never entirely the same. Human beings are fine with this. We adapt, because meaning isn’t solely the preserve of words. Meaning is naturally contextualised by the moment in which it is being conveyed. In the world of the old web we relied heavily on those ring binders. They helped to provide the context. But the ring binders are broken. Because content can live in multiple places, the ‘location’ of a resource can’t be relied on to solve our problems. We’re living in a world where URI, not URLs rule. We now need to describe our resources with a little more detail than simply where we left them.
Who are you calling studip?
Content on the web is stupid – or rather, it might contain something clever, but because it’s expressed in natural language it’s not smart. People can learn from it. And people can often even work out where it belongs (online and in their own mental models). But sometimes finding a home for content is very difficult. And then there are our other users, computers. Often, without lots of effort, it’s just too difficult to work out the meaning of content on the web.
So we have a few problems – some that I’d realised we had, and some that the rattling earworm was leading me towards:
- The modern internet isn’t just a home for lots of hyperlinked documents. The internet is potentially everything for everyone – an interlinked web of sharable things.
- On an internet of things, stuff on the web doesn’t fit neatly into nicely labelled folders. The internet pre-supposes sharing, so if someone can take my thing and put it in their ring binder, the issue of where it lives is a bit more difficult.
- Locations are therefore much less important than identifiers. We need to be sure what the things is (before we need to know where it should live).
- When you describe what a thing is (on the web) you make a statement about it. And when you make a statement you need to be confident that the statement is accurate. A big part of accuracy is choosing the right identifier.
URLs and ontologies (or rather statements made using the rules described in ontologies) are statements. I know saying statement are statements is a tautology (but I hoped the now overused brackets would confuse you into missing it). We make statement under the assumption that they’re true and that we know what we’re talking about. But as we’ve seen, it’s sometimes more complicated than that.
It turns out that the standard building blocks of both URLs and ontologies come with certain limitations. URLs should be RESTful so we can’t pass states and solve the problems of multiple homes that way. Ontologies force us to make statements in triples, ensuring that context is limited to various forms of reification (either through actual reification or creating additional classes and properties to handle the context). It turns out that the earworm turtle quote was hinting that this was all the same problem, and so perhaps has the same solution. It all has to do with meaning, context and identity.
Ever since search engines started employing robots and spiders, we’ve had to think about users that weren’t exclusively fleshy and bi-pedal. In other words we need great content that works for computers and for our human users. The most practical way of doing this is to create fantastic content that delights our users, and to hard-wire into this some of the IA that will benefit users and make the content meaningful to computers.
Helpfully, Sheth (2005) provided some distinctions and definitions for us to use when thinking about the various levels of meaning on the semantic web:
- Implicit semantics reside within the minds of humans as a collective consensus.
- Formal semantics are in a machine-readable format in the form of an ontology and are primarily used for human consumption and for machine hardwiring.
- Soft semantics are extracted from probabilistic and fuzzy reasoning mechanisms supporting degree of membership and certainty.
I think my current fixation on the problem of where things live and how to contain context in RDF statements is because each of these types of semantics are embodied on the web, and when they combine as requirements they create a challenge.
I’ve sorted of been arguing that implicit semantics on the web used to be primarily communicated through URLs. We settled on a dominant paradigm on our site and carried it through to its conclusion (whether logical or otherwise). This led to arbitrary classification of content. In good cases this arbitration was done following good IA practice, so things worked nicely. But with a more complicated webby (graphy) web, context cannot be conveyed through location. And without the defined rules of formal semantics, not only is fuzzy and probabilistic reasoning a distant dream, we can’t even very accurately describe what a thing is or how it stands in relation to any other thing. We entirely lack perspective.
Meaning on the web must therefore begin in the formal semantics of an ontology. Using formal semantics we might continue to be unable to give a thing a home in the sense of a single hierarchical location. But we can put it somewhere in a relatively flat structure and give it meaning, turning the implicit meaning previously contained in location into explicit meaning contained in RDF statements. We make the implicit into formal meaning.
We can create a flat structure that represents a formal semantic relationship (usually based on ‘type’ or another broad/generic class-type thing) and contain the rest of the meaning (implicit and formal semantics) in RDF statements about the resource. I guess the distinction I’m pushing for here is that the location of a resource is much more about the class of the thing (what a thing is), than it’s properties (what a thing has). But what about the problem of the same thing in multiple ring binders (competing classes?) – is it really solved that simply?
I think ‘Yes’.
Choose the best place for your canonical home to be and put it there – linked data will do the rest – your ‘thing’ will appear and can be linked from the other places it is appropriate to. You still need to avoid repetition, which is just silly and creates version control.
Have one main thing, give it a home and link sensibly.
An ontology allows you to identify things. It lets you point at a thing and then say stuff (in this sense it’s quite like a bottle of vodka). Once we have an ontology we can describe the thing and its relationships. We can group things through relationships and distinguish between things. Distinguishing is powerful. It turns out that implicit semantics aren’t just this quirky little offshoot of meaning, they’re meanings bread and butter. Lots of meaning is implicit, and though they might all look the same, each one of that infinite stack of turtles is its own special, unique creation. Differentiation is the most important part of making statements online, otherwise ontologies become exactly the same as that vodka bottle – prone to enabling mis-pointing.
I mention all this because things change.Not only that, but the same thing can be altered due to a shift in context. Just think back to France:
- France in 1911
- France in the context of cooking
I think design all about studying ‘reality’ and picking the best bits. For an IA the challenge of studying reality is replicating the implicit best bits – which are sometimes hard to recognise. You then occasionally need to make this implicit thing visible, and make sure it’s still recognisable as the thing that people have been taking for granted. I’m starting to think that when we’re talking about ‘France’, there is a big-mental-model-type-thing called France and then different versions – instances or perspectives – on this ‘canonical’ thing.
Towards a solution
I don’t think this is a solution just yet, but it’s definitely got potential. With a ‘thing and instances’ approach we can decide later which things get ‘pages’, a place to live, and which are just data to add to the graph. Things that are altered by context get their own identifier, and if they’re important enough they can be given a page and a URL. These ‘instances’ are still things, but they stand in relation to the big-mental-model-type-thing that they are an instance of.
I reached this conclusion because I started thinking about turtles. That quote always makes me think of Douglas Adams (possibly via Terry Pratchett) and there’s a pleasing suspicion that I conflated two difficult problems because I couldn’t find a solution to either one when I was thinking about it on its own. Maybe a problem shared is a problem halved – because it’s starting to look like that by designing for contexts and making workable URLs, we can solve all our problems with the same sorts of ideas?