Semantic Knowledge for Better Data

4 August 2020

By Lambert Hogenhout, Chief of Data Analytics and Innovation, OICT

Databases have been around a long time, since the 1960s. Times when the world was less complicated and less inter-connected, at least in a data sense. Traditional databases focused on "things" with properties, like employees with birthdates and addresses and salaries. Even though they are called relational databases, the relationships were secondary.

Today, however, with ubiquitous digitization and data collection, the relationships outnumber the entities. Social platforms like Facebook and LinkedIn rely primarily on the connections between entities.

It is no wonder that a new type of database was invented that makes those relationships the focus: graph databases, based on the mathematical concept of a “graph”. Entities and relationships have equal weight in graph databases. Luckily, mathematicians had been obsessed with graphs for over 100 years and many theories, proofs and models existed that could readily be turned into algorithms to explore graphs and solve queries.

The Emerging Tech Lab (ETL) of the Office of Information and Communications Technology (OICT) has embarked on several projects that explore how to store knowledge in graphs.  

Use-case: SDGs

One use-case for such “knowledge graphs” could be the Sustainable Development Goals (SDGs). The SDGs are strongly inter-related: If you want to make progress on SDG 1 (Poverty), ensuring quality education is critical, so you are touching on SDG 4 (Education). In doing so, you may decide to ensure that education is available to women and girls as well, thereby supporting SDG 5 (Gender Equality).

One use-case for such “knowledge graphs” could be the Sustainable Development Goals (SDGs). The SDGs are strongly inter-related: If you want to make progress on SDG 1 (Poverty), ensuring quality education is critical, so you are touching on SDG 4 (Education). In doing so, you may decide to ensure that education is available to women and girls as well, thereby supporting SDG 5 (Gender Equality).

The image above depicts a few of the connections between the three SDGs mentioned; however, reality is much more complex –there are innumerable direct and indirect connections between the SDGs. Capturing these connections in a knowledge graph would make a very useful resource for anyone studying the SDGs and their inter-linkages. That is exactly what we in ETL are aiming for.

We organized a workshop earlier this year together with Accenture Labs to explore the idea of Knowledge Graphs for Social Good and form a community of people to help us towards this goal. We are continuing this project throughout 2020.

Humanitarian projects

Another example of an application of knowledge graphs is in humanitarian affairs. For its Country-based Pooled Funds, the United Nations Office for the Coordination of Humanitarian Affairs (OCHA) needs to align more than 100 proposals for humanitarian projects per year with its strategic priorities to ensure funds are used where they are needed most. This process of alignment is currently done by OCHA staff, who read through the proposals in detail.

ETL had already built a prototype using natural language processing (NLP) to find keywords in the proposals related to the strategic steers. But there are obvious limitations. For example, the phrase “education of women” may not appear in the text of a proposal, but the words “school” and “female students” may, which makes it a relevant project. ETL came up with the idea to use knowledge graphs to model strategic priorities. By having semantic knowledge around these concepts, we have a stronger basis to compare against the priorities.

We then partnered with the Slalom AI Center of Purpose, which very kindly helped us kick-start the implementation of this idea in an intense two-week effort that resulted in an awesome set of tools. We were much impressed by their work and their support.

ETL continues to develop the knowledge graphs and algorithms for this project.

The images below show some preliminary outputs. This is only a basis on which so much more can be built to increase efficiency, provide transparency and support decision making at OCHA.

Conclusion

In the 1960s, databases started out by simply storing the letters of words. An apple became .

But the computer had no idea that it was a fruit. In the past few years, NLP has become good enough that machines can routinely analyze a sentence like “Lambert eats an apple” and conclude that “Lambert” may be a person’s name, “eating” a verb, and “apple” a noun. However, the computer still has no concept of what an apple is (and that it may cease to exist when eaten!).

A semantic layer of knowledge is the next level of adding value to our data and it will help anyone involved in information management, analytics, AI or user interfaces. Search engines, chatbots, translation and automated document analysis will be taken to a whole new level by semantic knowledge. This is only the beginning.

Note: The views expressed herein are those of the author and do not necessarily reflect the views of the United Nations.