Katedra informatiky - Detaily diplomovej práce

Meno:	Samuel
Priezvisko:	Revúcky
Názov:	Exploring Curriculum Vitae Data Extraction Using Large Language Models
Vedúci:	Mgr. Marek Šuppa
Rok:	2025
Kľúčové slová:	resume parsing, information extraction, LLMs, small language models, transformers
Abstrakt:	This thesis explores the application of language models to address the challenge of information extraction from unstructured data, namely Curriculum Vitae documents. Traditional algorithmic approaches often struggle with the complexity and noise in- herent in such data, prompting an exploration of use of more advanced techniques. A benchmark of 63 annotated samples was developed to lay foundation for model eval- uation, as well as comparison with existing commercial tools. Several small language models, including SmolLM-1.7B, LLaMA3.2-1B and Qwen2-1.5B, as well larger models such as LLaMA3.2-8B and GPT-4o-mini were fine-tuned and tested on the benchmark. Findings reveal that large transformer models outperform other tools in accuracy, while smaller models offer a practical trade-off for resource-constrained environments. The study provides performance and error analyses, as well as insights on the impact of different training data sizes on the small models.

Súbory diplomovej práce:

MSc_Revucky.zip

CV_Parsing___MSc_2025.pdf

Súbory prezentácie na obhajobe:

obhajoby_Revucky.pdf