The Site Update Notification suite of products for Search Engines and Crawlers, Web Masters and Hosting Services Companies provide a range of features and a resilient mechanism to provide dynamic information about sites updates.
This alternative to the traditional polling of sites distributes the burden across many servers and increases efficiency by reducing bandwidth requirements across the internet.
The Site Update Notification System seeks to ...
... address the problem with crawler
scalability by distributing the work of the crawler among the web servers that host
the content. With the system in place, it is no longer necessary for the crawler to
access – and possibly download – every page on a site to determine if it has changed.
... reduce the crawler bandwidth and
index load by means of eliminating changes that are generated through the delivery
or presentation of the page but do not affect the referenceable information on the
page. Such changes were discussed in the case study where a document delivered to
a web crawler over HTTP changed between successive downloads although the source
files have not changed. Client side agents would not deliver any change notification
in situations such as these.
... reduce the time that a web site’s
change takes to be available in a search engine’s index. News and current affairs
sites are updated constantly. In the UK, the national broadcaster the BBC advertises
their news web site to be “updated every minute”. It is unrealistic to expect a crawler
to be able to keep up with this turnaround of content, but a client side agent is more
than capable of feeding update information at this frequency.
... address the problem crawlers currently
face in accessing the Deep Web. While current research centres on trying to penetrate
the HTML-form access to deep web content, the Site Update Notification System provides
an mechanism to access the Deep Web by working with content publishers.
... provide an infrastructure for agent data
aggregation and delivery to subscribers that reduces the bandwidth of individual web sites
and provides a mechanism for wider exposure of changing content to customers via RSS feeds.