Tools | Bookmark & Share | Make MrWhy My Homepage
MrWhy.com
Go
MrWhy.com » Videos » Boilerplate Detection Using Shallow Text Features
Boilerplate Detection Using Shallow Text Features
Boilerplate Detection Using Shallow Text Features
Report
Boilerplate Detection Using Shallow Text Features
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, may deteriorate search precision and thus needs to be detected properly. In this paper, we analyze a small set of shallow text features for classifying the individual text elements in a Web page. We compare the approach to complex, state- of-the-art techniques and show that competitive accuracy can be achieved, at almost no cost. Moreover, we derive a simple and plausible stochastic model for describing the boilerplate creation process. With the help of our model, we also quantify the impact of boilerplate removal to retrieval performance and show significant improvements over the baseline. Finally, we extend the principled approach by straight-forward heuristics, achieving a remarkable accuracy.
Channel: VideoLectures
Category: Educational
Video Length: 0
Date Found: October 11, 2010
Date Produced: October 07, 2010
View Count: 0
 
MrWhy.com Special Offers
1
2
3
4
5
 
About Us: About MrWhy.com | Advertise on MrWhy.com | Contact MrWhy.com | Privacy Policy | MrWhy.com Partners
Answers: Questions and Answers | Browse by Category
Comparison Shopping: Comparison Shopping | Browse by Category | Top Searches
Shop eBay: Shop eBay | Browse by Category
Shop Amazon: Shop Amazon | Browse by Category
Videos: Video Search | Browse by Category
Web Search: Web Search | Browse by Searches
Copyright © 2011 MrWhy.com. All rights reserved.