Google Grows Its Internet Index

Google continues to improve the way it indexes and displays information on the Internet to improve the search experience for its users. In the latest development, which was launched this week, Google can now display images from within pdf documents.

Google is still only testing the new technology, which means that not all users will see pdf images within the results. However, it is likely that the update will be rolled out for all users very soon.

For many years, Google focused on reading HTML web pages only – content in any other format was inaccessible to Googlebot. However, in the last few years Google started indexing Word Docs and PDF files, which has massively increased the amount of information indexed on the web.

Although Google could read text with pdf files and index this, until now it was unable to recognise and index individual images from a PDF document. You can read Google’s announcement here.

Many businesses and organizations store some of their most useful information in PDF format, which has, in the past, meant that the information is invisible to searchers. Often, the best quality information on a particular topic is only available in a PDF document and Google has therefore had to return lower quality information and images to searchers.

Google is currently focusing on improving the quality of the search results that are returned for all searches. To do so, Google has made some dramatic changes to its search engine. In SEO circles, the most talked about updates are the Panda and Penguin updates. These tackle low quality information and web spam, with the aim to eradicate spammy websites from the Internet completely.

The current testing of images within PDFs suggests that Google has moved on to the next level of improving quality – by making the best information, regardless of how it is stored, more visible.

When this update is fully live, we expect to see many new images within the search results, which may include images from rare books that have been transferred to digital format and figures and tables from scientific research papers.

Google has once again opened up a whole new world of information to its users with this new development. Many people search Google and assume, understandably so, that Google indexes everything on the Internet. When a piece of information, document or images cannot be found, the common response is to assume that it is simply not “out there” on the Internet. However, this is rarely the case.

Every day thousands of new documents are added to the Internet and many of these are not in a standard web format. HTML pages, which make up a majority of the World Wide Web of Internet sites, are easily crawled and indexed by Google. But, individual files, often in pdf or Word format, which themselves contain tens or hundreds of pages, images, tables and figures, are often hidden to Google and therefore “invisible” to searchers.

The question of “is it on the Internet” is now so entrenched that many people forget that what they are really asking is, has the document been indexed by the Google search engine. While the two are still not synonymous, google is making a great effort to address this.

Danny Hall is a co-director for the well-established digital marketing agency, FSE Online. Danny has worked within the SEO sector for many years and is always keeping up-to-date with Google’s latest news.