Uses

There are two broad contexts in which this link awareness comes into play:

  1. Against the backdrop of today's published web content.
  2. More futuristically, against a backdrop of web content designed for link aware users.

In the first case, a link aware user finds content by exploring a locality of interlinked documents. For example, the user takes a given target document and finds related documents by finding cocited documents at dmoz.org. Loosely, what we mean by cocited documents are documents that are linked from a web page that also contains a link to the present target document.

Figure 2.2. Cocited Documents Are Often Related

Figure 2.2 illustrates the cocitation scenario. The user starts from a target document (represented by the black box) and using Ila finds a backlink to a web page maintained at dmoz.org (represented by the red box). Then, on examining the forelinks of that page, the user finds the cocited documents. This search strategy is the basis of many related-page search algorithms.

Ila represents each web resource -- generally speaking, a URI -- using a Node data structure that contains meta information about its links, link statistics, existence status, various timestamps and so forth. This information can be used to implement a variety of search algorithms. Moreover, search algorithms can be customized and refined for specific types of information.

Consider, for example, the fact that many web store fronts describe the products they have for sale by linking to the manufacturer's web site. This is true of many computer hardware retailers (see Figure 2.3): instead of describing the technical specifications of the product on their own web site, they refer users to the manufacturer's spec page. This means that a subset of the spec pages' backlinks will be from hardware retailers. This existing linking habit of computer hardware retailers may be used to craft an algorithm for finding retailers selling specific hardware components: given an OEM's spec page for a hardware component, you can navigate backward to the retailers that sell that component.

Figure 2.3. Hardware Retailers Link to OEM Spec Pages

Now, clearly, the computer hardware retailers cited in the example above are not presently linking their web pages to the manufacturers' spec pages for the benefit of link aware users: they are simply piggy riding the available spec information at the OEM's web site. However, if our fictitious search algorithm becomes widely popular, then it would not be a stretch to imagine a scenario where some retailers deliberately link to spec pages just to advertise the fact that they are selling that piece of hardware. This point illustrates the fact that a successful search algorithm creates an incentive for publishers to modify their publishing habits so as to skew the algorithm in their favor. The desired effect of this skewing, from the publishers perspective, is to rank a given web page high in the search results of the algorithm.