Dec 13 – 14, 2025 HYBRID
Erzurum, Turkiye
Europe/Istanbul timezone

The Impact of Multi-Tagger Consistency on Semi-Supervised Natural Language Processing Models: A Bert-Based Tagging Approach

Dec 13, 2025, 4:15 PM
15m
D/1-2 - Hall 2 (Campus VSTS)

D/1-2 - Hall 2

Campus VSTS

20
Oral Presentation Artificial Intelligence and Machine Learning Applications Optimization Control and Decision Making

Speaker

Aleyna KARAOSMAN (atatürk üniversitesi)

Description

Manual annotation of large text datasets is both time- and cost-intensive, leading to a growing need for semi-supervised learning methods. Furthermore, inconsistencies among human labelers directly impact the quality of synthetic label generation due to the sensitivity of semi-supervised models to initial labels. This study examines the impact of multi-labeler consistency on BERT-based semi-supervised learning models and proposes a holistic framework for statistically modeling labeler reliability, establishing a core training set, and optimizing the synthetic labeling process. The proposed approach involves calculating labeler consistency using methods such as Cohen’s Kappa and Dawid-Skene, generating a core dataset of reliable examples, training the BERT model on this dataset, generating synthetic labels for unlabeled data, and redirecting low-confidence examples back to the labeler. Furthermore, the process is enhanced with consistency adjustment and noise reduction techniques, and a labeling interface is developed for practical use. In conclusion, the study demonstrates that multi-labeler consistency plays a critical role on the stability and accuracy of semi-supervised BERT models and provides a scalable, reliable and cost-effective automatic labeling infrastructure on large text datasets.

Author

Aleyna KARAOSMAN (atatürk üniversitesi)

Co-author

Yasemin Gültepe (Faculty of Engineering, Department of Software Engineering, Atatürk University, Erzurum, Türkiye)

Presentation materials

There are no materials yet.