ORTHOGRAPHIC ENRICHMENT FOR ARABIC GRAMMATICAL ANALYSIS

dc.contributor.advisorKuebler, Sandra
dc.contributor.authorMohamed, Emad Soliman
dc.date.accessioned2010-12-13T21:02:34Z
dc.date.available2012-02-27T02:22:30Z
dc.date.issued2010-12-13
dc.date.submitted2010
dc.descriptionThesis (Ph.D.) - Indiana University, Linguistics, 2010
dc.description.abstractThe Arabic orthography is problematic in two ways: (1) it lacks the short vowels, and this leads to ambiguity as the same orthographic form can be pronounced in many different ways each of which can have its own grammatical category, and (2) the Arabic word may contain several units like pronouns, conjunctions, articles and prepositions without an intervening white space. These two problems lead to difficulties in the automatic processing of Arabic. The thesis proposes a pre-processing scheme that applies word segmentation and word vocalization for the purpose of grammatical analysis: part of speech tagging and parsing. The thesis examines the impact of human-produced vocalization and segmentation on the grammatical analysis of Arabic, then applies a pipeline of automatic vocalization and segmentation for the purpose of Arabic part of speech tagging. The pipeline is then used, along with the POS tags produced, for the purpose of dependency parsing, which produces grammatical relations between the words in a sentence. The study uses the memory-based algorithm for vocalization, segmentation, and part of speech tagging, and the natural language parser MaltParser for dependency parsing. The thesis represents the first approach to the processing of real-world Arabic, and has found that through the correct choice of features and algorithms, the need for pre-processing for grammatical analysis can be minimized.
dc.identifier.urihttps://hdl.handle.net/2022/9821
dc.language.isoen
dc.publisher[Bloomington, Ind.] : Indiana University
dc.rightsThis work is licensed under the Creative Commons Attribution 3.0 Unported (CC BY 3.0) License.
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/
dc.subjectmachine learning
dc.subjectcomputational linguistics
dc.subjectArabic
dc.subjectparsing
dc.subjectorthography
dc.subjectmorphology
dc.subject.classificationComputer Science
dc.subject.classificationLanguage, Linguistics
dc.titleORTHOGRAPHIC ENRICHMENT FOR ARABIC GRAMMATICAL ANALYSIS
dc.typeDoctoral Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mohamed_indiana_0093A_10807.pdf
Size:
1.51 MB
Format:
Adobe Portable Document Format
Can’t use the file because of accessibility barriers? Contact us