Building data commons with the Semantic Web
The Web is overflowing with content. Every second, we share billions of pieces of information with each other: social network posts, blog articles, products for sale, wiki pages, restaurant menus, school reports and so on...
Although shared digitally, these sets of letters, words and phrases make sense only to the humans who read them. In the eyes of a machine, the Web represents nothing more than a series of symbols to be stored and displayed on a screen. This is how it was conceived by its creator Tim Berners-Lee in the early 90s: decentralized text documents linked together by hyperlinks.
10 years later, Sir Berners-Lee and others realized that this "document-based" approach would not suffice to meet the growing needs of the digital society: it needed the machine to understand the data in order to process, organize and, ultimately, generate it (hello AI). Thus was born the concept of the Semantic Web.
If you think of the web today as turning all the documents in the world into one big book, then think of the Semantic Web as turning all the data into one big database, or one big mathematical formula.
Tim Berners-Lee, The Semantic Web
In simple terms, the Semantic Web is a different way of representing data exchanged over the Internet than HTML, so that the machine can understand it.
The Semantic Web is not a black box
How does it work? Nothing very complicated. Take this (small) HTML document:
The machine can't fully understand it, but if we want to make the Semantic Web, we'd rather store something that looks like this:
Technically, there are several ways of representing this information using syntax from the RDF family. Here's an example using Turtle syntax:
Thanks to this semantic representation of data, the machine is able to identify different subjects (Jane, Octree, the 'development' business) and the links between these subjects.
With this principle of characterized links between subjects, we can create a web of linked information and obtain a network of knowledge. This is the idea behind the Linked Open Data Cloud project, which aims to connect data from a multitude of sources (medical, governmental, geographic, scientific, media, etc.) with the aim of creating new knowledge by cross-referencing information.
From knowledge sharing to digital commons
The Semantic Web breaks down silos by creating databases that are not specific to one service or product. Better still! It enables the creation of new services that are more respectful of users, giving them back control over their data, as is the case with the Solid project, which is developing an alternative way of consuming and exchanging data. Another striking example is Fediverse (supported by the Mastodon service), which uses the ActivityPub protocol, itself based essentially on Semantic Web technologies.
In other words, the semantic web gives freedom back to the individual. As such, it marks the beginning of a shift away from an economic model based on the predation of private data, by enabling us to design alternatives to platform economies.
And what about artificial intelligence?
AI is also taking advantage of the Semantic Web. LLMs, the most publicly known form of AI today, ingest huge amounts of textual data and build statistical models to predict which word will follow the previous one to generate new text (in a nutshell). It's a great breakthrough!
However, this is also extremely energy-intensive (slight problem, the planet is burning) and could be greatly improved thanks to the Semantic Web: no need for extensive calculations and indexing if the information is already processed and present in a machine-understandable format. All you have to do is help yourself!
At Octree: linking interoperability and distributed governance
At Octree, we've started to implement and use semantic data. In order to get to grips with the concepts and technologies, we followed a "semantization" project, which involved centralizing data from the various tools we use (Notion, Harvest, GitLab, Jelastic, etc.) in a single database in RDF format. This now enables us to make cross-functional queries on our activity, in order to create metrics that facilitate decision-making in our distributed governance.
This success then enabled us to propose a semantization project to one of our customers seeking to reduce its dependence on a limited private ERP solution. This project is still underway, but we were able to free up the company's data and make it available to an open source alternative such as Odoo, without too much difficulty.
This mastery of the Semantic Web opens the door to Open Data (which we hold dear). By design, this not-so-alternative Web focuses above all on opening up and sharing data to create a common knowledge network.
So it's an unavoidable building block for our next projects, but also for defending an open Web.
Curious? Let's talk.
- Tim for Octree
Main image:
Network Lattice-Framework for a Zeiss Planetarium, n.d. Reprinted in László Moholy-Nagy, The New Vision: Fundamentals of Bauhaus Design, Painting, Sculpture, and Architecture (Mineola, NY: Dover, 1938/2005), 203. Source: Zeiss Archiv.