Search Engine Update System; Reducing capacity and bandwidth burden of web crawlers
August 2008
The fundamental approach to maintaining a Search Engine’s index of the web has not changed since the earliest search engines. The implementations have developed, as have the user interfaces and underlying algorithms, but the core technique of crawling web sites for content remains unchanged. The web is now so large that crawlers cannot get to sites often enough to effectively represent the current content contained on websites. The problem for Web Masters is a challenge large enough that a new market for Search Engine Optimisation has evolved to address the Web Masters’ desire to be ranked number one on the top search engines’ results page for relevant queries, but they are almost powerless to increase the frequency of visits from crawlers to ensure the most up-to-date content is represented.
This paper assesses the scale of the problem facing organisations that attempt to index or monitor the information published on the World Wide Web and explores the missed opportunities of content that is hidden within inaccessible parts of the web which has become known as the Deep Web.
A system of Site Update Notifications is described as a method for web masters to automatically feed data about content and presentation updates from web servers to those interested. This alternative to the traditional polling of sites distributes the burden and increases efficiency by reducing bandwidth requirements across the internet.
Keywords: Spider, web crawler scalability, optimization, change notification, site update notification
Published online
Henderson, C. 2008. Search Engine Update System; Reducing capacity and bandwidth burden of web crawlers: Online at