Linked data – a beginners guide
“Linked data is the superstructure over which content is stretched”
I said this once, but I didn’t really elaborate on this definition. I moved quickly onto the benefits of Linked data. In this post I’m going to try to go right back the basics and describe exactly what linked data is.
So what is linked data?
I like to think of the internet as being made up of two separate types of stuff:
- bits of content
- data that describes the bits of content.
As the web has evolved we’ve always had this ‘data that describes bits of content’. We’ve got simple data like file names, published dates, name of author… which is implicit. Publish a story and it has a published date, it must do. Computers can use this data to order a list of articles by their publish date, or group content by author. This kind of metadata is really useful. But it doesn’t help computers understand the meaning of our content. And computers are our friends – at least until the revolution. Computers can make our lives easier by organising and recommending content. But only if they understand it.
In an effort to help computers to help us, we’ve created other kinds of metadata like tags and categories. We can attach a tag to content to help group stuff together. This extra bit of data is incredibly useful. It helps us to describe our content to users and machines. For example, content management systems can use this sort of data to build aggregations and sort and order content. It gets us a step closer to computers understanding more about the relationships between content. But these tend to be bespoke solutions, creating silos of metadata that are unique to our product. They’re also quite light on meaning, so our computer cousins are still a little under-informed and therefore underused.
Linked data uses the same basic idea as tagging. We make statements that relate one object to another. But we do it in a much more precise and powerful way than we can just with ‘category tags’. And we can share both the statements and the ‘tags’ to create open, extensible and sharable banks of information.
Linked data allows us to create complex ‘models’. These can be used to describe our content and even relationships between bits of content and concepts online and in the real world. When we use those older-fashioned tagging technologies we often use the word ‘taxonomy’ to describe the basic structure of our tags. Taxonomy is a structured vocabulary, a set of words with a defined hierarchical relationship. Linked data takes it a step further. We’re not limited to describing hierarchical relationships. We can now build complex webs (or graphs) of data (statements of fact) that describe properties and relationships. Importantly we can share the models, as well as the data we’ve created using them.
Linked data allows us to create an extra layer of data, alongside our content, that is especially for computers. It’s a massive step forward in the way we can attach properties to content on the web. It’s a revolutionary evolution of metadata that supports our content.
Previously in this post…
So linked data enhances the web by adding an extra set of structured stuff to augment the ‘contenty stuff’ we already have. This enhancement is invisible, to most of us at least, because it’s not really for us. The enhancement is for computers. Linked data adds an additional layer of meta-content (data content about content) so that computers can understand the meaning of the content itself. With added context, computers can make more useful recommendations and connections between content. Linked data is kind of like the computer’s ‘beginners guide’ to a thing. It provides an introduction so that machines can understand the content.
For example, think about a circus tent. If you saw a collection of men in a car that was gradually falling to pieces, while they busied themselves with carrying ladders in a dangerous manner and throwing buckets of sparkly confetti, you’d assume that something was quite wrong with the world. Without the context of the ‘circus-tent-setting’ these actions are meaningless. Most content on the web is like this to a computer. They can ‘read’ content and try to extract some meaning from it, but it’s basically just a collection of unusual symbols. Linked data adds the tent poles and canvas. It creates the context so that you know that you’re watching a performance. It helps you derive the intended meaning and to construct your own experience. You can sit back and relax and not feel an obligation to suggest a good mechanic and a course of talking therapies.
Statements and triples
So linked data builds a semantic web of meaning to describe our content. The building blocks of the linked data world are simple statements. We call these statements triples, because they have three elements. A triple follows the formula of:
SUBJECT – PREDICATE – OBJECT
They allow us to say that [some subject] [has some property] [with some value], and we use these building blocks to make statements, like:
DAN – LIKES – CAKE
We’re not tied to to the same constraints we might face when using traditional databases. But the data is structured. Linked data’s structure comes from ontologies.
Structure and ontologies
An ontology describes the classes and properties of the ‘things’ that can be talked about in our triples. It establishes the rules from which meaning can be derived. Whenever we make a linked data statement we refer to the ontology we’re using. It’s a little like making a statement in a text message and attaching an emoticon to denote context. The computer can take the formalised context defined by the ontology and derive the precise meaning, just like we can detect sarcasm that we might miss without that precious winky eye 😉
Ontologies should be manageable, so they’ll usually be limited to a fairly discreet ‘domain’ or area of interest. But we can create new ontologies to serve new purposes, or describe new types of content or relationships that we’ve become interested in. We can also combine our use of separate ontologies to build more complex pictures of the world.
The world of linked data is shareable, reusable and extensible.
If we wanted to get metaphorical about this we could say that lego people conform to the same kind of structure that an ontology creates. Lego men are made up of several parts – legs, torso, head. The ‘lego man ontology’ describes these rules – how objects can stand in relation to each other. We can make individual statements – choosing our own head, torso, leg combo, you might even be able to create your own component, provided it conforms to the rules. Using these simple ‘statements’ we can gradually add more and more triples to a ‘triple store’, which will helpfully hold our statements so that we can build up a bank of structured data about resources.
Resources and pointing
The idea of ‘resouces’ is central to the power of linked data. Resources are the way we build connections between our content and data, as well as how we link the world of the web to the real world beyond the screen. Using linked data we can create a resource for anything in the real world or the world wide web.
Linked data is most powerful when we don’t use literal values (plain old boring words) to stand for things, instead we create a ‘resource’ to stand for thing we want to refer to. We give these resources a URI (uniform resource identifier), and we use this URI in our statements.
It’s probably worth a tiny digression to menion that URIs and URLs are different. All URLs are URIs – they’re unique references for a thing. Most of the URIs that we use on the semantic web of linked data are URLs. URLs contain a location (hence the L in the name). Its good practice to provide URLs for our resources, so that they’re easily accessed and shared.
The ability to use resources, instead of literal values is the real power of linked data. The idea of resources means we have a limitless supply of ‘symbols’ that we can use to stand for real and virtual things. These resources are unique and can be shared, creating a common ‘thing’ that people can use to make statements.
Creating shared resources means that linked data can be open and shared. In the old world of tagging we might have put ‘Dan Ramsden’ in a column of cake eaters in a database, or created a ‘cake eaters category’ in a CMS. With linked data we turn ‘Dan Ramsden’ into a resource and ‘cake’ into a resource. Now I can make more and more statements about the ‘dan-ramsden-resource’ and build a complex picture. I can also share the resource across the web so that others can make statements using it. The web of linked data can grow in complexity as more and more people use the shared resources and ontolgoies to make statements.
Pulling it all together
The extensibility of linked data means that we can iterate and create new things with our content with much less effort. The same piece of content can have multiple descriptions in different places – so we can get more use out of content. By making meaning explicit, computers can manipulate and use content on their own. Content on the web works because it usually appears in carefully constructed contexts. But people like to share are reuse content. With linked data content can take it’s context with it – contained in the linked data associated with it.
Linked data allows us to build a huge store of structured data that describes our content, and how our content describes and relates to the real world. Modelling ‘domains’ and creating ontologies gives us a set of powerful tools to allow us to exploit more automation and reuse of our content. This will be hugely important for the future of our content. The proliferation of more and more devices and types, a society focused on consuming information, wherever they are, all ask more of our content. We’ve begun to talk of ‘responsive design’, but linked data will allow us to extend responsiveness beyond interfaces, into the content itself. Linked data will empower all our consumers of content – whether they’re human or computers – to construct and consume the best experiences possible.