Wednesday, November 4, 2009

Web server technology detection by using known filetypes

Web servers use server-side scripting engines to support web applications. Using the web server configuration, dynamic pages are identified by filename extensions and processed by their relevant processors.

Below is an example of IIS 6.0 application configuration. As you can see, asp files are processed by asp.dll and php files are processed by php-cgi application. Other web servers also use similar methods.

IIS 6 application configuration
In some situations, web servers respond differently to nonexistent files with known extensions. Sending requests to random filenames with known extensions and comparing the HTTP response results may reveal the server-side scripting technologies supported by the web server.

While scanning customer networks, I saw that various web servers can respond differently to many known extensions including asp, aspx, cfm, php, jsp, shtml. Also additional vulnerabilities can be found in these responses, including Internal IP address disclosure, application errors etc.


You can see some real life examples below. We will send a simple request to nonexistent asp, html, and php files. Then compare the responses. These tests will be made to the root directory, but you should also notice that some special subdirectories’ configuration might be different.

$ nc www.baidu.com 80
GET /CsARl9W0s9esF7Vl HTTP/1.1
Host: www.baidu.com

HTTP/1.1 302 Found
Date: Sat, 31 Oct 2009 15:19:34 GMT
Server: Apache/1.3.27
Location: http://www.baidu.com/search/error.html
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
...

GET /CsARl9W0s9esF7Vl.html HTTP/1.1
Host: www.baidu.com

HTTP/1.1 302 Found
Date: Sat, 31 Oct 2009 15:19:50 GMT
Server: Apache/1.3.27
Location: http://www.baidu.com/search/error.html
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
...

GET /CsARl9W0s9esF7Vl.asp HTTP/1.1
Host: www.baidu.com

HTTP/1.1 302 Found
Date: Sat, 31 Oct 2009 15:20:00 GMT
Server: Apache/1.3.27
Location: http://www.baidu.com/search/error.html
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
...

GET /CsARl9W0s9esF7Vl.php HTTP/1.1
Host: www.baidu.com

HTTP/1.1 302 Found
Date: Sat, 31 Oct 2009 15:20:22 GMT
Server: Apache/1.3.27
Location: http://www.baidu.com/forbiddenip/forbidden.html
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
...



As you can clearly see, the web server responds to nonexisting files as 302 found message and redirects us to /search/error.html. But when it comes to PHP files, we see a redirection to /forbiddenip/forbidden.html


Below is another example.

$ nc wiki.nginx.org 80
GET /CsARl9W0s9esF7Vl HTTP/1.1
Host: wiki.nginx.org

HTTP/1.1 404
Server: nginx/0.8.21
Date: Sat, 31 Oct 2009 15:26:10 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.1.6
Content-language: en
Vary: Accept-Encoding, Cookie
X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=wikidbToken;string-contains=wikidbLoggedOut;string-contains=wikidb_session
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: private, must-revalidate, max-age=0

2e8c
...


$ nc wiki.nginx.org 80
GET /CsARl9W0s9esF7Vl.asp HTTP/1.1
Host: wiki.nginx.org


HTTP/1.1 404
Server: nginx/0.8.21
Date: Sat, 31 Oct 2009 15:26:44 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.1.6
Content-language: en
Vary: Accept-Encoding, Cookie
X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=wikidbToken;string-contains=wikidbLoggedOut;string-contains=wikidb_session
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: private, must-revalidate, max-age=0

2eec
...


GET /CsARl9W0s9esF7Vl.php HTTP/1.1
Host: wiki.nginx.org


HTTP/1.1 404
Server: nginx/0.8.21
Date: Sat, 31 Oct 2009 15:27:15 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.1.6

19
No input file specified.

0


Website responds differently to nonexistent php files. So, X-Powered-By PHP header seems valid. Since URL rewriting is in place, we can clearly verify that the wiki application used by nginx is written in PHP (It is actually MediaWiki).

I call this method as technology detection by using known filetypes. Using this method in addition to other fingerprinting techniques, such as HTTP response banner grabbing, is useful to improve web security scanners.

I previously implemented this in Arachne, a simple web security scanner that I developed for my MSc thesis back in 2006. It sends requests to nonexistent files with known extensions, then compares the results to see if a technology is used in that web server.

With technology detection by using known filetypes;
  • Web application scans can be optimized for detected technologies
  • Web application scanners can reduce the number of tests performed
  • Scanners can reduce false-positives for nonexistent files
  • If URL rewriting is used, this method can be used to determine used technologies in the web application

Wednesday, October 7, 2009

New release - subdomainLookup - Find subdomains using Google

For those who didn't notice, there is a new version of the subdomainLookup script available, which can be downloaded from here.

This 0.4 version is fairly improved and the results really differ from the old one. Here is a comparison;

$ subdomainLookup.py example.com

19 subdomains found for example.com

beta.example.com
bp0.example.com
bp1.example.com
bp2.example.com
bp3.example.com
buzz.example.com
code.example.com
domains.example.com
draft.example.com
m.example.com
partners-test.example.com
play.example.com
pro.example.com
pro1.example.com
pro2.example.com
status.example.com
wireless.example.com
www.example.com
www2.example.com

Same output using the old one;
> subdomainLookup-v0.2.py example.com
beta.example.com
www.example.com
draft.example.com
www2.example.com

Warning: Use this script for your own domains only. By using this script, you may be violating Google's terms of service.

Friday, September 25, 2009

subdomainLookup - Find subdomains using Google

subdomainLookup is a python script that uses Google search results to find subdomains of the target domain name.

I have been using it in security assessments and it works pretty well. Essential for network mapping. Test your own domain and see the results.

Sample usage:
> subdomainLookup.py blogger.com
beta.blogger.com
www.blogger.com
draft.blogger.com
www2.blogger.com

Download subdomainLookup v0.4 from here
* Uses main Python libraries. Tested with Python 2.5.x on Linux and Windows.

Update: Bedirhan sent me some patches that improve the results of the subdomainLookup v0.2 script. With some tests and additional improvements, here is the new version, 0.4 which can be downloaded from the same location.

Wednesday, July 1, 2009

Finding almost all URLs with least requests

Introduction
A web crawler usually starts working on a given URL to find and visit all the links that can be followed. But a search engine crawler doesn't care much about the content it misses. While better, it is not supposed to parse all of the dynamic content.
Thanks to SEO, it is now the webmaster's duty to make the site search engine friendly.

The crawler part of a webapp security scanner(WASS) is very important for its success. Thus a scanner's crawler has to parse all of the dynamic content, fight with bad HTML code to find all of the pages, variables etc.

Usually crawling a whole web site can take a lot of your time. Here is an overview of how we can do a Quick Scan to find remote web application/site vulnerabilities, with minimum crawler requests. This also helps us to see a quick overview of the site structure, and sometimes we can find web application vulnerabilities faster if we are doing a quick assessment in a large network.


Robots.txt - The famous Robots Exclusion Protocol
If you don't know how and why a /robots.txt file is used, please read About /robots.txt.
I think some readers skip this part: "don't try to use /robots.txt to hide information"

If a search engine crawler wants to visit and index a website, it first accesses /robots.txt file to find the rules it should follow. If a link is forbidden, it will be excluded by the search engine crawler bot.

Use these rules in your crawler to find more information, like new directories and links.


Sitemaps
"Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site."*

I love sitemaps. They allow me to find most of the links and create the directory structure of the whole web site with 2-3 web requests! Of course we have to look for more pages that don't exist in sitemaps.

There are many reasons to use sitemaps. A huge website can use sitemap protocol to have search engine crawlers to index the required content only, when necessary. Also a webmaster can point out existing links that the search engine couldn't find and index.

When a webmaster reads Google's FAQ for Submitting a Sitemap they usually place their sitemap file in /sitemap.xml or /sitemap.xml.gz
Most of the times, they also add their sitemap location to /robots.txt file.

The sitemap location can be added to /robots.txt file as:
Sitemap: http://domain.com/sitemap_location.xml

Another thing related with the sitemap protocol is the Sitemap index file. This file includes other sitemap locations of the target website. Multiple large sitemap files can be split and listed in a sitemap index file, then submitted to search engines as a single entry.

Conclusion - How to find almost all URLs of a target website with minimum web requests?
My approach for a quick scan of a standard web application is:
  1. Request and parse the URL entered by the WASS user in the scan settings (1 request)
  2. Request /robots.txt, parse and extract links, sitemap index files and sitemaps (1 request)
  3. Request default sitemap and sitemap index file locations, parse and extract links out of them (6 requests - or more/less depending on your estimated default sitemap locations)
  4. Webmasters sometimes create a site map page for visitors. Using this page a visitor can quickly access the content he is looking for. In the response page of your first request, search for links containing the word "sitemap" or similar, to find these pages. Access these manually created sitemap files first! (1 request)

Using this methodology in our scanners crawler component, we do less than 10 request to remote web server and can;
  • Extract the directory/file structure of the target web server/application,
  • Possibly learn used technology based on found file extensions,
  • Find variables, if links like "news.asp?id=123" exist,
  • Find almost all URLs, even if your WASS can't parse JavaScript/Flash content!,
  • You may also find pages that we can't find during a normal crawling operation (Some webmasters misconfigure sitemap generators which lead to disclosure of sensitive files),
  • Visit these newly found URLs to find more links, variables, weaknesses etc.
  • And more!

Wednesday, June 24, 2009

Using WIVET to test your crawler

WIVET is a wonderful project for a web security scanner developer. Using WIVET, you can analyse the link extraction/crawling ability of your WASS.

I recommend you to download latest version or an SVN copy to your local web server and test your scanner's crawlers performance. Also you can test your scanners javascript, flash and form parsing ability using this web application.

Don't forget to exclude the offscanpages folder and the logoff link! Also your crawler should have Cookie support enabled since WIVET tracks the crawling ability via a cookie.

So in order to succeed, you already should have;
  • cookie support
  • an exclude capability
  • javascript support (to compete with other commercial scanners)
  • flash support (to compete with other commercial scanners)
Project home
And here is the latest coverage results of some commercial scanners tested by WIVET author. Have fun!

Friday, June 19, 2009

Detecting new URLs by disabling cookie support

An article for web application security scanner (WASS) developers.

You probably want to find every page in a web application. What if their smart web developer built a vulnerable, "cookie support in your browser is disabled" page for website users? Can your scanner find that page?

Sample Cookies disabled error page
Let's think like a web developer. How can I detect if a user's browser accepts cookies? I think the best way[1] is to set a temporary cookie while redirecting them to a controller page. This page then should check if our previously set temporary cookie was sent during client's new request.

A WASS can find this page. Here is a way of doing this;
  1. While crawling with cookie support, remember all pages which set cookies and redirect at the same time. We can create a NoCookieQueue for this operation. BTW, Cookies are set using "Set-Cookie" response header and redirection is made using a "Location" header during a 3xx HTTP response.
  2. After crawling phase is complete, if NoCookieQueue is not empty, our scanner should disable cookie support in its crawler module and re-visit those pages in NoCookieQueue. This way we can see if those pages redirect to another location that don't exist in our scanners' complete or error queues.

Now your WASS can find new pages or test parameters which a cookieless client should see. While these cookie error pages are mostly static, you might find a vulnerable dynamic page, another directory on server, or an HTML comment with sensitive information etc.

_____
[1] A second method is to set a cookie, and then, using JavaScript check if that cookie was set (If doesn't, redirect client to cookie error page). But these "no cookie pages" can also be found with a JavaScript parser in our WASS.