Sign in or 

This may remind you of the chicken and the egg: to determine the importance of a page, we first need to know the importance of all the pages linking to it. However, we may recast the problem into one that is more mathematically familiar. Let's first create a matrix, called the hyperlink matrix,
in which the entry in the ith row and jth column is
Notice that H has some special properties. First, its entries are all nonnegative. Also, the sum of the entries in a column is one unless the page corresponding to that column has no links. Matrices in which all the entries are nonnegative and the sum of the entries in every column is one are called stochastic; they will play an important role in our story. We will also form a vector
whose components are PageRanks--that is, the importance rankings--of all the pages. The condition above defining the PageRank may be expressed as
In other words, the vector I is an eigenvector of the matrix H with eigenvalue 1. We also call this a stationary vector of H. Let's look at an example. Shown below is a representation of a small collection (eight) of web pages with links represented by arrows.
The corresponding matrix is ![]() | with stationary vector | ![]() |
The method is founded on the following general principle that we will soon investigate. | General principle: The sequence I k will converge to the stationary vector I. |
| I 0 | I 1 | I 2 | I 3 | I 4 | ... | I 60 | I 61 |
| 1 | 0 | 0 | 0 | 0.0278 | ... | 0.06 | 0.06 |
| 0 | 0.5 | 0.25 | 0.1667 | 0.0833 | ... | 0.0675 | 0.0675 |
| 0 | 0.5 | 0 | 0 | 0 | ... | 0.03 | 0.03 |
| 0 | 0 | 0.5 | 0.25 | 0.1667 | ... | 0.0675 | 0.0675 |
| 0 | 0 | 0.25 | 0.1667 | 0.1111 | ... | 0.0975 | 0.0975 |
| 0 | 0 | 0 | 0.25 | 0.1806 | ... | 0.2025 | 0.2025 |
| 0 | 0 | 0 | 0.0833 | 0.0972 | ... | 0.18 | 0.18 |
| 0 | 0 | 0 | 0.0833 | 0.3333 | ... | 0.295 | 0.295 |
![]() | with matrix | ![]() |
| I 0 | I 1 | I 2 | I 3=I |
| 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 |
. As we surf randomly, we will denote by
the fraction of time that we spend on page Pj. Then the fraction of the time that we end up on page Pi page coming from Pj is
. If we end up on Pi, we must have come from a page linking to it. This means that
where the sum is over all the pages Pj linking to Pi. Notice that this is the same equation defining the PageRank rankings and so
. This allows us to interpret a web page's PageRank as the fraction of time that a random surfer spends on that web page. This may make sense if you have ever surfed around for information about a topic you were unfamiliar with: if you follow links for a while, you find yourself coming back to some pages more often than others. Just as "All roads lead to Rome," these are typically more important pages. Notice that, given this interpretation, it is natural to require that the sum of the entries in the PageRank vector I be one. Of course, there is a complication in this description: If we surf randomly, at some point we will surely get stuck at a dangling node, a page with no links. To keep going, we will choose the next page at random; that is, we pretend that a dangling node has a link to every other page. This has the effect of modifying the hyperlink matrix H by replacing the column of zeroes corresponding to a dangling node with a column in which each entry is 1/n. We call this new matrix S. In our previous example, we now have ![]() | with matrix | ![]() | and eigenvector | ![]() |
and that
We will also assume that there is a basis vj of eigenvectors for S with corresponding eigenvalues
. This assumption is not necessarily true, but with it we may more easily illustrate how the power method works. We may write our initial vector I 0 as
Then
Since the eigenvalues
with
have magnitude smaller than one, it follows that
if
and therefore
, an eigenvector corresponding to the eigenvalue 1. It is important to note here that the rate at which
is determined by
. When
is relatively close to 0, then
relatively quickly. For instance, consider the matrix
The eigenvalues of this matrix are
and
. In the figure below, we see the vectors I k, shown in red, converging to the stationary vector I shown in green.
Now consider the matrix
Here the eigenvalues are
and
. Notice how the vectors I k converge more slowly to the stationary vector I in this example in which the second eigenvalue has a larger magnitude.
and | I 0 | I 1 | I 2 | I 3 | I 4 | I 5 |
| 1 | 0 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 | 0 | 0 |
| 0 | 0 | 0 | 0 | 1 | 0 |
and so the argument we gave to justify the power method no longer holds. To guarantee that
S is ![]() | with stationary vector | ![]() |
Links come into this box, but none go out. Just as in the example of the dangling node we discussed above, these pages form an "importance sink" that drains the importance out of the other four pages. This happens when the matrix S is reducible; that is, S can be written in block form as
Indeed, if the matrix S is irreducible, we can guarantee that there is a stationary vector with all positive entries. A web is called strongly connected if, given any two pages, there is a way to follow links from the first page to the second. Clearly, our most recent example is not strongly connected. However, strongly connected webs provide irreducible matrices S. To summarize, the matrix S is stochastic, which implies that it has a stationary vector. However, we need S to also be (a) primitive so that
between 0 and 1. Now suppose that our random surfer moves in a slightly different way. With probability
, he is guided by S. With probability
, he chooses the next page at random. If we denote by 1 the
matrix whose entries are all one, we obtain the Google matrix:
Notice now that G is stochastic as it is a combination of stochastic matrices. Furthermore, all the entries of G are positive, which implies that G is both primitive and irreducible. Therefore, G has a unique stationary vector I that may be found using the power method. The role of the parameter
is an important one. Notice that if
, then G = S. This means that we are working with the original hyperlink structure of the web. However, if
, then
. In other words, the web we are considering has a link between any two pages and we have lost the original hyperlink structure of the web. Clearly, we would like to take
close to 1 so that we hyperlink structure of the web is weighted heavily into the computation. However, there is another consideration. Remember that the rate of convergence of the power method is governed by the magnitude of the second eigenvalue
. For the Google matrix, it has been proven that the magnitude of the second eigenvalue
. This means that when
is close to 1 the convergence of the power method will be very slow. As a compromise between these two competing interests, Serbey Brin and Larry Page, the creators of PageRank, chose
.
matrices where n is about 25 billion! In fact, the power method is especially well-suited to this situation. Remember that the stochastic matrix S may be written as
and therefore the Google matrix has the form
Therefore,
Now recall that most of the entries in H are zero; on average, only ten entries per column are nonzero. Therefore, evaluating HI k requires only ten nonzero terms for each entry in the resulting vector. Also, the rows of A are all identical as are the rows of 1. Therefore, evaluating AI k and 1I k amounts to adding the current importance rankings of the dangling nodes or of all web pages. This only needs to be done once. With the value of
chosen to be near 0.85, Brin and Page report that 50 - 100 iterations are required to obtain a sufficiently good approximation to I. The calculation is reported to take a few days to complete. Of course, the web is continually changing. First, the content of web pages, especially for news organizations, may change frequently. In addition, the underlying hyperlink structure of the web changes as pages are added or removed and links are added or removed. It is rumored that Google recomputes the PageRank vector I roughly every month. Since the PageRank of pages can be observed to fluctuate considerably during this time, it is known to some as the Google Dance. (In 2002, Google held a Google Dance!) |
rahuldutt1 |
Latest page update: made by rahuldutt1
, Jun 20 2007, 2:20 PM EDT
(about this update
About This Update
4052 words added 53 images added view changes - complete history) |
|
Keyword tags:
google hack
google search
page
page rank
page rank hack
wjat iis page rank
More Info: links to this page
|
| Started By | Thread Subject | Replies | Last Post | ||
|---|---|---|---|---|---|
| Anonymous | It is good! | 0 | Jul 25 2007, 12:51 AM EDT by Anonymous | ||
|
|
Thread started: Jul 25 2007, 12:51 AM EDT
Watch
It is good! I think so.
|
||||