A comparative analysis of text classification for Turkish language

Y�ld�r�m, Sava�; Y�ld�z, Tu�ba

Volume : 30 Issue : 2 Year : 2024

30/2Current Issue Ahead of Print Archive Most Accessed Articles Manuscript Submission

A comparative analysis of text classification for Turkish language [Pamukkale Univ Muh Bilim Derg]

Pamukkale Univ Muh Bilim Derg. 2018; 24(5): 879-886 | DOI: 10.5505/pajes.2018.15931

A comparative analysis of text classification for Turkish language

Sava� Y�ld�r�m, Tu�ba Y�ld�z
�stanbul Bilgi �niversity, Engineering and Natural Science Faculty, Computer Engineering, �stanbul

Text categorization plays important role in the field of Natural Language Processing. Recently, the rapid growth in the amount of textual data and requirement of automatic annotation makes the problem of text categorization more important. As a prominent one of the traditional methods, the bag-of-words approach has been successfully applied to text categorization problem for years. Recently, Neural Network Language Models (NNLM) have achieved successful results for various problems of Natural Language Processing (NLP). The most important advantage of the NNLM is to provide effective word and document representations. Those representations are lower dimensional and are found to be more effective than traditional methods. They have been exploited successfully for semantic and syntactic analysis. On the other hand, the traditional bag-of-words approaches that use one-hot long vector representation are still considered powerful in terms of their accuracy in document classification. However, comparing these approaches for Turkish language has not been attempted before. In this study, we compared them within a variety of analysis. We observed that the traditional bag-of-word representation utilizing an effective feature selection and a machine learning algorithm aligned with it have comparable performance with new generation vector based methods, namely word embeddings. In this study, we have conducted various experiments comparing these approaches and designated an effective text categorization architecture for Turkish Language.

Keywords: Text Classification, Machine Learning, ANN

T�rk�e i�in kar��la�t�rmal� metin s�n�fland�rma analizi

Sava� Y�ld�r�m, Tu�ba Y�ld�z
istanbul Bilgi �niversitesi M�hendislik ve Do�a Bilimleri Fak�ltesi, Bilgisayar M�hendisli�i, �stanbul

Metin S�n�fland�rma Do�al Dil ��leme (DD�) alan�nda �nemli bir yere sahiptir. Son zamanlarda metinsel verilerin artmas� ve otomatik etiketlenmesi gereklili�i, metin s�n�fland�rma probleminin �nemini art�rm��t�r. Geleneksel yakla��mlardan �ne ��kan kelime torbas� y�ntemi y�llard�r metin s�n�fland�rmas�nda ba�ar�l� olmaktad�r. Son zamanlarda sinir a�lar� dil modelleri DD� problemlerine ba�ar�l� bir �ekilde uygulanm�� ve baz� alanlarda b�y�k ba�ar� kaydetmi�lerdir. Yapay Sinir A�lar� (YSA) temelli mimarilerin en �nemli avantaj� daha etkili kelime ve metin g�sterilimlerin olu�turmas�d�r. Bu g�sterilimler, geleneksel y�ntemlere g�re daha az boyutlu ve daha etkili bulunmu�tur. �zellikle anlambilimsel ve s�zdizimsel analizlerde ba�ar�l� uygulamalar yap�lm��t�r. �te yandan daha uzun vekt�rlerle g�sterilim kullanan geleneksel kelime torbas� y�ntemleri, metin g�sterilimleri anlam�nda hala g�c�n� korumaktad�r. Ancak T�rk�e i�in bu iki yakla��m�n herhangi bir kar��la�t�r�lmas� yap�lmam��t�r. Bu �al��mada, geleneksel kelime torbas� yakla��m� ile sinir a�� temelli yeni g�sterilim yakla��mlar� metin s�n�fland�rmas� a��s�ndan kar��la�t�r�lm��t�r. Bu �al��malarda g�rd�k ki etkili �zellik se�imleri geleneksel y�ntemlerinin hala yeni ku�ak kelime g�mme (word embeddings) yakla��m� ile yar��acak d�zeydedir. Son olarak deneylerimizi bu iki yakla��m a��s�ndan �e�itlendirerek raporlad�k ve T�rk�e i�in ba�ar�l� metin s�n�fland�rma mimarisini bu raporda ayr�nt�l� tart��t�k.

Anahtar Kelimeler: Metin S�n�fland�rma, Makine ��renmesi, YSA

Sava� Y�ld�r�m, Tu�ba Y�ld�z. A comparative analysis of text classification for Turkish language. Pamukkale Univ Muh Bilim Derg. 2018; 24(5): 879-886

Corresponding Author: Sava� Y�ld�r�m, T�rkiye
Manuscript Language: Turkish

TOOLS Full Text PDF Print Download citation RIS EndNote BibTex Medlars Procite Reference Manager Share with email Share Send email to author Similar articles Google Scholar