publications
For full publications, please refer to my Google Scholar page.
2024
- ESWC 2024OntoChat: a Framework for Conversational Ontology Engineering using Language ModelsBohui Zhang, Valentina Anita Carriero, Katrin Schreiberhuber, and 4 more authors2024
Ontology engineering (OE) in large projects poses a number of challenges arising from the heterogeneous backgrounds of the various stakeholders, domain experts, and their complex interactions with ontology designers. This multi-party interaction often creates systematic ambiguities and biases from the elicitation of ontology requirements, which directly affect the design, evaluation and may jeopardise the target reuse. Meanwhile, current OE methodologies strongly rely on manual activities (e.g., interviews, discussion pages). After collecting evidence on the most crucial OE activities, we introduce \textbfOntoChat, a framework for conversational ontology engineering that supports requirement elicitation, analysis, and testing. By interacting with a conversational agent, users can steer the creation of user stories and the extraction of competency questions, while receiving computational support to analyse the overall requirements and test early versions of the resulting ontologies. We evaluate OntoChat by replicating the engineering of the Music Meta Ontology, and collecting preliminary metrics on the effectiveness of each component from users. We release all code at https://github.com/King-s-Knowledge-Graph-Lab/OntoChat.
2023
- LM-KBC 23Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on WikidataBohui Zhang, Ioannis Reklos, Nitisha Jain, and 2 more authors2023
In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. We developed a pipeline using LLMs for Knowledge Engineering (LLMKE), combining knowledge probing and Wikidata entity mapping. The method achieved a macro-averaged F1-score of 0.701 across the properties, with the scores varying from 1.00 to 0.328. These results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base (e.g., Wikidata) completion and correction. The investigation of the results also suggests the promising contribution of LLMs in collaborative knowledge engineering. LLMKE won Track 2 of the challenge. The implementation is available at https://github.com/bohuizhang/LLMKE
- HHAI 2023Towards Explainable Automatic Knowledge Graph Construction with Human-in-the-loopBohui Zhang, Albert Meroño Peñuela, and Elena Simperl2023
Knowledge graphs are important in human-centered AI because of their ability to reduce the need for large labelled machine-learning datasets, facilitate transfer learning, and generate explanations. However, knowledge-graph construction has evolved into a complex, semi-automatic process that increasingly relies on opaque deep-learning models and vast collections of heterogeneous data sources to scale. The knowledge-graph lifecycle is not transparent, accountability is limited, and there are no accounts of, or indeed methods to determine, how fair a knowledge graph is in the downstream applications that use it. Knowledge graphs are thus at odds with AI regulation, for instance the EU’s upcoming AI Act, and with ongoing efforts elsewhere in AI to audit and debias data and algorithms. This paper reports on work in progress towards designing explainable (XAI) knowledge-graph construction pipelines with human-in-the-loop and discusses research topics in this space. These were grounded in a systematic literature review, in which we studied tasks in knowledge-graph construction that are often automated, as well as common methods to explain how they work and their outcomes. We identified three directions for future research: (i) tasks in knowledge-graph construction where manual input remains essential and where there may be opportunities for AI assistance; (ii) integrating XAI methods into established knowledge-engineering practices to improve stakeholder experience; as well as (iii) evaluating how effective explanations genuinely are in making knowledge-graph construction more trustworthy.
2022
- arXivEnriching Wikidata with Linked Open DataBohui Zhang, Filip Ilievski, and Pedro Szekely2022
Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users’ needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with a narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.