Over the years, the Web has gone from millions to billions of pages, but how big is it now, in 2008, 10 years after Google first indexed Web's 26 million pages? The search giant hit a new milestone, counting 1 trillion unique URLs on the Web.
The impressive number doesn't include duplicates, Google said, adding that the number of individual pages on the Web is growing by several billion pages a day.
Google measured the 1 trillion pages by starting from a set of initial pages and following each of their links to new pages. Although many of the pages have multiple URLs with the same content or auto-generated copies of the URLs, Google said those weren't included in the counting.
But how many unique pages does the Web contain? Google explained that while it would be simply impossible to count the number of pages, which is infinite, they have been using their index search engine to do it.
“To keep up with this volume of information, our systems have come a long way since the first set of web data Google processed to answer queries,” the search giant said.
While in the past, Google used to process the 26 million pages in a matter of hours, and use that information as index for a predetermined period of time, now the technique has changed, by continuously downloading and updating page information and re-processing the entire Web link graph several times a day.
Google compared their daily work with exploring every intersection of every road in the United States, except that the map would be 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.
“As you can see, our distributed infrastructure allows applications to efficiently traverse a link graph with many trillions of connections, or quickly sort petabytes of data, just to prepare to answer the most important question: your next Google search,” the search giant concluded.