Making an Under-Resourced Language Available on the Wikidata Knowledge Graph: Quechua Language
Nov 22, 2024·
·
0 min read
Elwin Huaman
Image credit: Elwin HuamanAbstract
Knowledge Graphs (KGs) encode human knowledge in a structured format that can be used by Artificial Intelligence (AI) applications, such as Large Language Models, and Natural Language Processing. However, KGs are mainly built by harvesting the web data, which may over-represent certain viewpoints and languages, while Under-Resourced Language (URL) communities are not considered. As a result, AI applications may inherit these over-representation biases. Therefore, we research the holistic lifecycle of KGs to make the Quechua URL available as a KG. Specifically, we ingest Quechua lexical data into the Wikidata KG, then we explore Wikidata’s capabilities for a community-drive approach. Finally, we managed to add 1591 Quechua lexemes, their senses, forms, and pronunciation audio into Wikidata. Our approach demonstrates the feasibility and potential impact of bringing a URL into a KG, which can benefit a future where AI and its applications takes into account the languages and viewpoints of all.
Type
Publication
Springer

Authors
Research Scientist
Elwin Huaman is a Researcher, Project Manager, and an Activist for Under-Resourced Languages. He has experience creating Knowledge Graphs and applying Semantic Web technologies in academia and industry. He has co-authored the book: Knowledge Graphs - Methodology, Tools and Selected Use Cases, and leads the QICHWABASE Knowledge Graph that supports a harmonization process of the language and knowledge of Quechua communities across the world.