Products

The Site Update Notification suite of products for Search Engines and Crawlers, Web Masters and Hosting Services Companies provide a range of features and a resilient mechanism to provide dynamic information about sites updates.

This alternative to the traditional polling of sites distributes the burden across many servers and increases efficiency by reducing bandwidth requirements across the internet.

Goals

The Site Update Notification System seeks to ...

... address the problem with crawler scalability by distributing the work of the crawler among the web servers that host the content. With the system in place, it is no longer necessary for the crawler to access – and possibly download – every page on a site to determine if it has changed.

... reduce the crawler bandwidth and index load by means of eliminating changes that are generated through the delivery or presentation of the page but do not affect the referenceable information on the page. Such changes were discussed in the case study where a document delivered to a web crawler over HTTP changed between successive downloads although the source files have not changed. Client side agents would not deliver any change notification in situations such as these.

... reduce the time that a web site’s change takes to be available in a search engine’s index. News and current affairs sites are updated constantly. In the UK, the national broadcaster the BBC advertises their news web site to be “updated every minute”. It is unrealistic to expect a crawler to be able to keep up with this turnaround of content, but a client side agent is more than capable of feeding update information at this frequency.

... address the problem crawlers currently face in accessing the Deep Web. While current research centres on trying to penetrate the HTML-form access to deep web content, the Site Update Notification System provides an mechanism to access the Deep Web by working with content publishers.

... provide an infrastructure for agent data aggregation and delivery to subscribers that reduces the bandwidth of individual web sites and provides a mechanism for wider exposure of changing content to customers via RSS feeds.

 

 

Read More...


Products for Search Engine Companies and others that Crawler the Web

Products for Web Masters

Products for Web Hosting Companies