3.1 Web Search Engines

As web crawlers scour the Internet's web sites for content, they catalog pieces of potentially useful information. Search engines, such as Google, now provide advanced search functions that allow attackers to build a clearer picture of the network that they plan to attack later.

In particular, the following types of information are easily found:

  • Employee contact details and information

  • Email addresses

  • DDI telephone numbers

  • Physical addresses of offices from which the employees are based

  • Details of internal email systems

  • DNS layout and naming convention, including domains and hostnames

  • Documents that reside on publicly accessible servers

Direct-dial telephone numbers are especially useful to determined attackers, who may later launch war dialing and other telephone-based attacks. It is very difficult for organizations and companies to prevent this information from being ascertained; for example, it is made freely available every time a user posts to a mailing list with his signature. To manage this risk more effectively, companies should go through public record querying exercises to ensure that the information an attacker can collect doesn't lead to a compromise.

3.1.1 Google Advanced Search Functionality

Using a powerful advanced search function, Google can indirectly map networks and gather potentially useful information. The advanced search function itself is directly accessible at http://www.google.com/advanced_search?hl=en. In terms of the functionality, searches can be refined in the following ways:

Filtering words

Exclude pages that don't include specific words or phrases, for example


Filter results using over 30 specific languages

File format

Search for text strings within supported file types, such as:

  • Adobe PDF (.pdf)

  • Adobe PostScript (.ps)

  • Microsoft Word and Rich Text Format (.doc and .rtf)

  • Microsoft Excel (.xls)

  • Microsoft PowerPoint (.ppt)


Search for a text string in specific areas of a document:

  • Title of the document

  • Body text of the document

  • Links within the document


Search under specific domains Enumerating CIA contact details with Google

Google can easily enumerate staff at the CIA, with their email addresses, telephone, and fax numbers. An example of this follows in Figure 3-1, showing a Google search launched using the search string:

+"ucia.gov" +tel +fax
Figure 3-1. Using Google to enumerate users
figs/NSA_0301.gif Effective search query strings

The possibilities are virtually endless with Google searches, depending on the exact type of data you are trying to mine. For example, if you simply want to enumerate all the web servers Google knows under the sony.com domain, you can submit a query string such as sony site:.sony.com.

An effective security-related application of a Google search is to list misconfigured web servers with directories that can be indexed and browsed freely. Figure 3-2 displays the search results of the following string:

Figure 3-2. Identifying indexed web directories under *.redhat.com
allintitle: "index of /" site:.redhat.com

Often enough, web directories that provide file listings contain interesting files that aren't web-related (such as Word and Excel documents). An example of this is a large bank that stored its BroadVision rollout plans (including IP addresses and administrative usernames and passwords) in an indexed /cmc_upload/ directory. An automated scanner, such as N-Stealth, can't identify the directory, but Google can crawl through following links from elsewhere on the Internet.

Netcraft (http://www.netcraft.com) is another web querying site that actively scours Internet web sites. You can use it to map web farms and networks, as well as display the operating platform of each host and details of the web services running.

3.1.2 Searching Newsgroups

Internet newsgroup searches hold similar types of information as web searches. For example, using http://groups.google.com, you can issue a query of fedworld.gov, revealing usernames, machine names, accessible public servers, and other information as depicted in Figure 3-3.

Figure 3-3. Searching Usenet newsgroups through Google

After conducting web and newsgroup searches, an initial understanding of the target networks in terms of domain names and offices should be realized. NIC and DNS querying are used next to probe further and identify Internet-based points of presence, along with details of hostnames and operating platforms used.