Meno:Kristína
Priezvisko:Sásiková
Názov:Semantic Textual Similarity in Slovak Language
Vedúci:Mgr. Marek ©uppa
Rok:2025
Kµúčové slová:Semantic Textual Similarity, Slovak Language, Language Models, Natural Language Processing, Dataset Creation, Multilingual Language Models
Abstrakt:Semantic Textual Similarity (STS) is a fundamental NLP task that measures the degree of meaning resemblance between texts, going beyond just mere word overlap. Accurate STS is crucial for different applications like machine translation evaluation, text summarization, information retrieval, and question answering. A main technique for determining STS involves sentence embeddings - a vector representations capturing the semantic meaning of text. By mapping sentences into a dense vector space, their semantic similarity can be easily quantified using different metrics like cosine similarity. In this thesis, we explored various similarity metrics to effectively measure the semantic similarity based on sentence embeddings. We also described two different approaches, SlovakBERT and Slovak T5 small, that have been done in this area so far. An important part of this research was the creation of high-quality, manually-verified Slovak evaluation sets (test/validation) for STS and the use of a state-of-the-art MT system (DeepL) for translating the training portion. The creation of this translated dataset was undertaken with the attempt of enhancing the Slovak STS Benchmark. Furthermore, we looked at what place generative models like GPT occupy within the field of STS and examined the feasibility of creating a sufficiently good dataset using such generative models.

Súbory diplomovej práce:
Autor nedal súhlas so zverejnením svojej diplomovej práce.

Súbory prezentácie na obhajobe:

Upravi»