Show simple item record

dc.contributor.advisor Jacob, Elin K. en_US
dc.contributor.author Yu, Ning en_US
dc.date.accessioned 2011-10-19T20:18:12Z
dc.date.available 2028-06-19T20:18:13Z
dc.date.available 2012-03-12T00:06:08Z
dc.date.issued 2011-10-19T20:18:12Z
dc.date.submitted 2011 en_US
dc.identifier.uri http://hdl.handle.net/2022/13690
dc.description Thesis (Ph.D.) - Indiana University, Information Science, 2011 en_US
dc.description.abstract Opinions published on the World Wide Web (Web) offer opportunities for detecting personal attitudes regarding topics, products, and services. The opinion detection literature indicates that both a large body of opinions and a wide variety of opinion features are essential for capturing subtle opinion information. Although a large amount of opinion-labeled data is preferable for opinion detection systems, opinion-labeled data is often limited, especially at sub-document levels, and manual annotation is tedious, expensive and error-prone. This shortage of opinion-labeled data is less challenging in some domains (e.g., movie reviews) than in others (e.g., blog posts). While a simple method for improving accuracy in challenging domains is to borrow opinion-labeled data from a non-target data domain, this approach often fails because of the domain transfer problem: Opinion detection strategies designed for one data domain generally do not perform well in another domain. However, while it is difficult to obtain opinion-labeled data, unlabeled user-generated opinion data are readily available. Semi-supervised learning (SSL) requires only limited labeled data to automatically label unlabeled data and has achieved promising results in various natural language processing (NLP) tasks, including traditional topic classification; but SSL has been applied in only a few opinion detection studies. This study investigates application of four different SSL algorithms in three types of Web content: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. SSL algorithms are also evaluated for their effectiveness in sparse data situations and domain adaptation. Research findings suggest that, when there is limited labeled data, SSL is a promising approach for opinion detection in Web content. Although the contributions of SSL varied across data domains, significant improvement was demonstrated for the most challenging data domain--the blogosphere--when a domain transfer-based SSL strategy was implemented. en_US
dc.language.iso en en_US
dc.publisher [Bloomington, Ind.] : Indiana University en_US
dc.rights This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license.
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/
dc.subject Semi-Supervised Learning en_US
dc.subject Sentiment Analysis en_US
dc.subject Domain Transfer en_US
dc.subject Opinion Detection en_US
dc.subject blog
dc.subject text mining
dc.subject Co-Training en_US
dc.subject Self-Training en_US
dc.subject.classification Information Science en_US
dc.subject.classification Information Technology en_US
dc.subject.classification Computer Science en_US
dc.title Semi-Supervised Learning For Identifying Opinions In Web Content en_US
dc.type Doctoral Dissertation en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license.

Search IUScholarWorks


Advanced Search

Browse

My Account

Statistics