Looking for a Haystack: Selecting Data Sources in a Distributed Retrieval System

dc.contributor.advisorLeake, David B.en
dc.contributor.advisorGasser, Michaelen
dc.contributor.authorScherle, Ryanen
dc.date.accessioned2010-06-01T21:59:27Z
dc.date.available2010-10-19T17:51:35Z
dc.date.issued2010-06-01
dc.date.submitted2006en
dc.descriptionThesis (PhD) - Indiana University, Computer Sciences, 2006en
dc.description.abstractThe Internet contains billions of documents and thousands of systems for searching over these documents. Searching for a useful document can be as difficult as the proverbial search for a needle in a haystack. Each search engine provides access to a different collection of documents. Collections may be large or small, focused or comprehensive. Focused collections may be centered on any possible topic, and comprehensive collections typically have particular topical areas with higher concentrations of documents. Some of these collections overlap, but many documents are available from only a single collection. To find the most needles, one must first select the best haystacks. This dissertation develops a framework for automatic selection of search engines. In this framework, the collection underlying each search engine is examined to determine how properties such as central topic, size, and degree of focus affect retrieval performance. When measured with appropriate techniques, these properties may be used to predict performance. A new distributed retrieval algorithm that takes advantage of this knowledge is presented and compared to existing retrieval algorithms.en
dc.identifier.urihttps://hdl.handle.net/2022/7473
dc.language.isoENen
dc.publisher[Bloomington, Ind.] : Indiana Universityen
dc.subjectdistributed information retrievalen
dc.subjectcollection selectionen
dc.subjectsearch engineen
dc.subjectmetasearchen
dc.subjectdatabase selectionen
dc.subjectinformation retrievalen
dc.subject.classificationComputer Science (0984)en
dc.subject.classificationInformation Science (0723)en
dc.titleLooking for a Haystack: Selecting Data Sources in a Distributed Retrieval Systemen
dc.typeDoctoral Dissertationen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-indiana-1545.pdf
Size:
702.98 KB
Format:
Adobe Portable Document Format
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.