Volume 1, Issue 1, May 2013, Page: 1-6
Rule-Based Sentence Detection Method (RBSDM) for Turkish
Özlem AKTAŞ, Computer Engineering Department, Dokuz Eylul University, Izmir, Turkey
Yalçın ÇEBİ, Computer Engineering Department, Dokuz Eylul University, Izmir, Turkey
Received: Mar. 20, 2013;       Published: May 2, 2013
DOI: 10.11648/j.ijll.20130101.11      View  3017      Downloads  174
The first process of generating a corpus, which is a representative of the language, is the determination of sen-tences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and machine learning. In this study, to determine the sentence boundaries in contemporary Turkish, a rule-based method called “Rule-Based Sentence Detection Method for Turkish (RBSDM)” was developed by considering the agglutinative and rule based structure of Turkish. This method was tested on two different test sets generated by randomly selected columns from two Turkish newspapers. RBSDM determines end of sentences correctly and efficiently, about means of time and other costs, and provides success rate in a range of 99.60% and 99.80%.
Linguistics, Natural Language Processing, Corpus, Turkish, Morphological Analysis, Sentence Boundary Detection
