Host-extract
Description
Aung Khant from the YGN Ethical Hacker Group has developed a tool that parses the content of web sites to reveal clues in the source code.
The tool is capable of detecting architecture details, the usage of cache systems, Content Delivery Network (CDN), load balancers, local IP addresses.
Thanks to my friend Aung Khant for his reviews.
Installation
Prerequisites
You will need ruby rubygems
$ sudo apt-get install ruby
Then install mechanize via gem:
$ sudo gem install mechanize
host-extract.rb
$ mkdir -p /pentest/enumeration/www/ $ cd /pentest/enumeration/www/ $ svn co http://host-extract.googlecode.com/svn/trunk/ host-extract
Usage
host-extract.rb script
Syntax
To run host-extract against only one URL, use following syntax:
$ ruby host-extract [options]
Options
- -a
- find all ip/host patterns
- -j
- scan all js files
- -c
- scan all css files
- -v
- append view-source html snippet for manual verification
run.sh script
Launcher
To run it against a list of URLs, a script is included (run.sh). Feed the url-list file with the URLs to check (one per line) and use start the script as follows:
$ ./run.sh url-list ---------------------------------------------- Extracting IP/Domain Patterns from url-list ---------------------------------------------- ================================================================== IP/Host Pattern Extractor (c) Aung Khant, aungkhant[at]yehg.net YGN Ethical Hacker Group, Myanmar, http://yehg.net/ svn co http://host-extract.googlecode.com/svn/trunk/ host-extract ================================================================== Target: http://www.aldeid.com/ host: www.aldeid.com path: / -> http://www.aldeid.com/ | 301 (Redirected to : http://www.aldeid.com/wiki/Main_Page) host: www.aldeid.com path: /wiki/Main_Page [*] searching for internal IP patterns ... [x] no internal IP(s) found [*] searching for IP/domain patterns ... - aldeid.com #view-source: .. content="aldeid.com is a wiki about network and web applica - www.gnu.org #view-source: .. f="http://www.gnu.org/copyleft/fdl.html" /> [...Truncated...]
Notice that the script runs host-extract.rb script with the -a (find all ip/host pattern) option.
url-list file
The url-list (or whatever name you want to give) is a file that contains a list of URLs to check. Each entry must be placed on a new line.
$ cat url-list http://www.aldeid.com/ http://www.google.fr/ http://www.mcafee.com/ http://www.sophos.fr/ http://www.amazon.com/ http://www.twitter.com/
Examples
Cache system
The following example emphasizes the presence of cache servers:
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 09:02:05 # -------------------------------------------------------------------------- # URL: http://gawker.com # [*] searching for internal IP patterns ... # [x] no internal IP(s) found # [*] searching for IP/domain patterns ... www.facebook.com #view-source: .. b="http://www.facebook.com/2008/fbml"> gawker.com #view-source: .. grpid: 'gawker.com', betacache.gawkerassets.com #view-source: .. c="http://betacache.gawkerassets.com/assets/base.v10/js/gome fastcache.gawkerassets.com #view-source: .. f="http://fastcache.gawkerassets.com/assets/base.v10/css/../ v10.gawker.com #view-source: .. om/assets/v10.gawker.com/css/style.css?rev=20110311" /> www.google.com #view-source: .. c="http://www.google.com/jsapi"></script> [...TRUNCATED...]
Content Delivery Network (CDN)
The following example emphasizes the usage of CDN servers:
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 09:01:35 # -------------------------------------------------------------------------- # URL: http://digg.com # -> http://digg.com | 302 # (Redirected to : /news) ######################################################################### # URL: http://digg.com/news # [*] searching for internal IP patterns ... 10.2.129.82 #view-source: .. an title="10.2.129.82 build: 210 - fri mar 11 1 # [*] 1 internal IP(s) found! # [*] searching for IP/domain patterns ... cdn1.diggstatic.com #view-source: .. f="http://cdn1.diggstatic.com/img/favicon.a015f25c.ico"> cdn2.diggstatic.com #view-source: .. f="http://cdn2.diggstatic.com/css/two_column/library/global. ad.doubleclick.net #view-source: .. = 'http://ad.doubleclick.net/adj/dgg.tn/home_tn_uprrail1;pt= cdn4.diggstatic.com #view-source: .. c="http://cdn4.diggstatic.com/story/ipad_2_first_video_look_ cdn3.diggstatic.com #view-source: .. c="http://cdn3.diggstatic.com/story/more_aftershocks_explosi dads.new.digg.com #view-source: .. , 'http://dads.new.digg.com', 'kw=pos:3&kw=topics:%2a&kw=pag about.digg.com #view-source: .. f="http://about.digg.com/blog/breaking-breaking-news" class= [...TRUNCATED...]
Another example for lemonde.fr that makes use of amazonaws (Amazon Web Services):
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 09:02:17 # -------------------------------------------------------------------------- # URL: http://www.lemonde.fr/ # [*] searching for internal IP patterns ... # [x] no internal IP(s) found # [*] searching for IP/domain patterns ... monde.fr #view-source: .. <title>le monde.fr : actualité à la une</ti medias.lemonde.fr #view-source: .. f="http://medias.lemonde.fr/medias/info/favicon.ico" /> [...TRUNCATED...] google-analytics.com #view-source: .. www') + '.google-analytics.com/ga.js'; s3.amazonaws.com #view-source: .. "https://s3.amazonaws.com/" : "http://") +
Local IP addresses
The following example emphasizes the presence of a local IP address:
$ ruby host-extract.rb -a http://allrecipes.com ================================================================== IP/Host Pattern Extractor (c) Aung Khant, aungkhant[at]yehg.net YGN Ethical Hacker Group, Myanmar, http://yehg.net/ svn co http://host-extract.googlecode.com/svn/trunk/ host-extract ================================================================== Target: http://allrecipes.com host: allrecipes.com path: / [*] searching for internal IP patterns ... - 192.168.5.143 [*] 1 internal IP(s) found! [*] searching for IP/domain patterns ... - 192.168.5.143 - www.icra.org - allrecipes.com - images.media-allrecipes.com - secure.allrecipes.com - www.facebook.com - twitter.com - ad.doubleclick.net - bs.serving-sys.com - hostedjobs.openhire.com - mantestedrecipes.com - www.tasteofhome.com - www.rachaelraymag.com - www.rd.com - www.mozilla.org - allrecipes.cn - allrecipes.fr - allrecipes.de - allrecipes.jp - allrecipes.co.uk - metric.allrecipes.com - an.tacoda.net [*] total IP/Host pattern(s): 22 # Send bugs & suggestions to host-extract @ yehg.net
Another local IP address disclosure in the source code:
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 09:02:51 # -------------------------------------------------------------------------- # URL: http://tinypic.com # [*] searching for internal IP patterns ... 10.2.253.1 #view-source: .. <!-- 10.2.253.1 --> # [*] 1 internal IP(s) found! # [*] searching for IP/domain patterns ... static.tinypic.com #view-source: .. f="http://static.tinypic.com/s/global_v4.3.28.css" type="tex
Another example for meneycontrol.com, which reveals the loopback address:
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 09:00:45 # -------------------------------------------------------------------------- # URL: http://moneycontrol.com # -> http://moneycontrol.com | 301 # (Redirected to : http://www.moneycontrol.com/?) ######################################################################### # URL: http://www.moneycontrol.com/? # [*] searching for internal IP patterns ... 127.0.0.1 #view-source: .. ef=http://127.0.0.1/mccode/markets/homebody.php' /> <!-- # [*] 1 internal IP(s) found! # [*] searching for IP/domain patterns ... www.moneycontrol.com #view-source: .. rl=http://www.moneycontrol.com"> moneycontrol.com #view-source: .. ttp://www.moneycontrol.com"> [...TRUNCATED...] 127.0.0.1 #view-source: .. ef=http://127.0.0.1/mccode/markets/homebody.php' /> <!--
External IP addresses
The following example shows external IP addresses
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 09:01:19 # -------------------------------------------------------------------------- # URL: http://ovh.net/ # [*] searching for internal IP patterns ... # [x] no internal IP(s) found # [*] searching for IP/domain patterns ... ovh.net #view-source: .. <title>ovh.net</title> www.ripe.net #view-source: .. f="http://www.ripe.net/perl/whois?form_type=simple&full_quer www.renater.fr #view-source: .. f="http://www.renater.fr/sfinx/" target="_blank">sfinx</a> ( 194.68.129.144 #view-source: .. finx</a> (194.68.129.144)</b></td><td><small>(1x1gbps)</smal www.freeix.net #view-source: .. f="http://www.freeix.net/" target="_blank">freeix</a> (213.2 213.228.3.225 #view-source: .. eeix</a> (213.228.3.225)</b></td><td><small>(1x1gbps)</small www.ovh.com #view-source: .. f="http://www.ovh.com" target="_blank" title="dépôt de ovh.com #view-source: .. ttp://www.ovh.com" target="_blank" title="dépôt de domaines
Sub domains
The following example reveals sub domains:
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 09:00:45 # -------------------------------------------------------------------------- # URL: http://moneycontrol.com # -> http://moneycontrol.com | 301 # (Redirected to : http://www.moneycontrol.com/?) ######################################################################### # URL: http://www.moneycontrol.com/? # [*] searching for internal IP patterns ... 127.0.0.1 #view-source: .. ef=http://127.0.0.1/mccode/markets/homebody.php' /> <!-- # [*] 1 internal IP(s) found! # [*] searching for IP/domain patterns ... www.moneycontrol.com #view-source: .. rl=http://www.moneycontrol.com"> moneycontrol.com #view-source: .. ttp://www.moneycontrol.com"> img1.moneycontrol.com #view-source: .. c="http://img1.moneycontrol.com/images/ad/xerox/swfobject.js stat1.moneycontrol.com #view-source: .. c="http://stat1.moneycontrol.com/mcjs/common/relona_script_0 mmb.moneycontrol.com #view-source: .. n('http://mmb.moneycontrol.com/india/messageboard/po www.macromedia.com #view-source: .. e='http://www.macromedia.com/go/getflashplayer' type='applic im.in.com #view-source: .. rl(http://im.in.com/connect/images/enhanced_google.gif) no-r stat2.moneycontrol.com #view-source: .. f="http://stat2.moneycontrol.com/mccss/revamp/mc20 [...TRUNCATED...]
Development language
The following example run against symantec.com reveals the presence of Java programming language (jsp scripts).
$ ruby host-extract.rb http://www.symantec.com ================================================================== IP/Host Pattern Extractor (c) Aung Khant, aungkhant[at]yehg.net YGN Ethical Hacker Group, Myanmar, http://yehg.net/ svn co http://host-extract.googlecode.com/svn/trunk/ host-extract ================================================================== Target: http://www.symantec.com host: www.symantec.com path: / -> http://www.symantec.com | 301 (Redirected to : http://www.symantec.com/index.jsp?) host: www.symantec.com path: /index.jsp [*] searching for internal IP patterns ... [x] no internal IP(s) found # Send bugs & suggestions to host-extract @ yehg.net
Use of dropbox
The following example emphasizes a usage of dropbox to host PHP scripts:
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 09:00:32 # -------------------------------------------------------------------------- # URL: http://www.yousendit.com # [*] searching for internal IP patterns ... 192.168.40.30 #view-source: .. 192.168.40.30 </div>^M # [*] 1 internal IP(s) found! # [*] searching for IP/domain patterns ... yousendit.com #view-source: .. domain = "yousendit.com";^M blog.yousendit.com #view-source: .. f="http://blog.yousendit.com" target="_blank">blog</a ftf-641.yousendit.com #view-source: .. = "http://ftf-641.yousendit.com"; ftf.yousendit.com #view-source: .. =='http://ftf.yousendit.com') { dropbox.yousendit.com #view-source: .. = "http://dropbox.yousendit.com/transfer.php?action=dropb 192.168.40.30 #view-source: .. 192.168.40.30 </div>^M # [*] total IP/Host pattern(s): 6
Architecture
This example discloses the development server:
================================================================== IP/Host Pattern Extractor (c) Aung Khant, aungkhant[at]yehg.net YGN Ethical Hacker Group, Myanmar, http://yehg.net/ svn co http://host-extract.googlecode.com/svn/trunk/ host-extract ================================================================== Target: http://gawker.com host: gawker.com path: / [*] searching for internal IP patterns ... [x] no internal IP(s) found [*] searching for IP/domain patterns ... [...TRUNCATED...] - dev.gawker.com:8888 #view-source: .. 'host' : 'dev.gawker.com:8888', [*] total IP/Host pattern(s): 22 # Send bugs & suggestions to host-extract @ yehg.net
Another example run against gazeta.pl:
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 11:36:54 # -------------------------------------------------------------------------- # URL: http://www.gazeta.pl/ # -> http://www.gazeta.pl/ | 301 # (Redirected to : http://www.gazeta.pl/0,0.html?) ######################################################################### # URL: http://www.gazeta.pl/0,0.html? # [*] searching for internal IP patterns ... # [x] no internal IP(s) found # [*] searching for IP/domain patterns ... google-analytics.com #view-source: .. '.google-analytics.com/ga.js';^M jedynka:8130 #view-source: .. <!-- iw10jedynka:8130 hp 30 --> ^M www.booking.com #view-source: .. f="http://www.booking.com/index.html?aid=332229&lang=pl&labe clk.tradedoubler.com #view-source: .. p%3a%2f%2fclk.tradedoubler.com%2fclick%3fp%3d203575%26a%3d16 www.ciacha.net #view-source: .. f="http://www.ciacha.net/ciacha/0,0.html">ciacha</a></li><li www.pracownicy.it #view-source: .. f="http://www.pracownicy.it/">pracownicy it</a></li><li clas # [*] total IP/Host pattern(s): 6
Another example run against a Javascript hosted on McAfee website, about Omniture:
# Generated by host-extract (c) Aung Khant, http://yehg.net/lab/ # Send bugs/suggestions to host-extract @ yehg.net # Date: 2011-03-13 11:36:59 # -------------------------------------------------------------------------- # URL: http://www.mcafee.com/js/omniture/omniture_profile.js # [*] searching for internal IP patterns ... 172.31.30.227 #view-source: .. nname == "172.31.30.227" || domainname == "172.3 172.31.30.226 #view-source: .. nname == "172.31.30.226" || domainname == "daldevwebcms3:860 # [*] 2 internal IP(s) found! # [*] searching for IP/domain patterns ... www.mcafee.com #view-source: .. nname == "www.mcafee.com" || domainname == "mcaf mcafee.com #view-source: .. e == "www.mcafee.com" || domainname == "mcafee.com" secure.nai.com #view-source: .. nname == "secure.nai.com" || domainname == "vil.nai.com" || vil.nai.com #view-source: .. nname == "vil.nai.com" || domainname == "www.foundstone.com" www.foundstone.com #view-source: .. nname == "www.foundstone.com" || domainname == "secure.mcafe secure.mcafee.com #view-source: .. nname == "secure.mcafee.com") internal.nai.com #view-source: .. nname == "internal.nai.com") 161.69.217.15 #view-source: .. nname == "161.69.217.15" || 161.69.202.116 #view-source: .. nname == "161.69.202.116" || domainname == "161.69.202.117:8 dalqawebcms1:8600 #view-source: .. nname == "dalqawebcms1:8600" || domainname == "dalqawebcms2: dalqawebcms2:8600 #view-source: .. nname == "dalqawebcms2:8600" || sncstgwebcms1:8600 #view-source: .. nname == "sncstgwebcms1:8600" || domainname = sncstgwebcms2:8600 #view-source: .. nname == "sncstgwebcms2:8600" || domainname == "sncstgwebcms 172.31.30.227 #view-source: .. nname == "172.31.30.227" || domainname == "172.3 172.31.30.226 #view-source: .. nname == "172.31.30.226" || domainname == "daldevwebcms3:860 daldevwebcms3:8600 #view-source: .. nname == "daldevwebcms3:8600" || daldevwebcms3:8089 #view-source: .. nname == "daldevwebcms3:8089" || domainname == sncwebdevpview1:82 #view-source: .. nname == "sncwebdevpview1:82" || sncwebdevpview1:444 #view-source: .. nname == "sncwebdevpview1:444" || domainname = sncqawebpview1:440 #view-source: .. nname == "sncqawebpview1:440" || sncwebdevpview1:81 #view-source: .. nname == "sncwebdevpview1:81" || domainname == vil.qa.nai.com #view-source: .. nname == "vil.qa.nai.com" || domainname == "sncwebdevpview1: sncwebdevpview1:8084 #view-source: .. nname == "sncwebdevpview1:8084" || daldevwebcms4:8600 #view-source: .. nname == "daldevwebcms4:8600" ||domainname == devinternal.na.nai.com #view-source: .. nname == "devinternal.na.nai.com" || domainname dalqawebcms1:8080 #view-source: .. nname == "dalqawebcms1:8080" || domainname == "sncwwwprod1.p sncwwwprod1.prod.mcafee.com #view-source: .. nname == "sncwwwprod1.prod.mcafee.com" sncwwwprod2.prod.mcafee.com #view-source: .. nname == "sncwwwprod2.prod.mcafee.com" || domainna sncwwwprod3.prod.mcafee.com #view-source: .. nname == "sncwwwprod3.prod.mcafee.com" || domainname == "snc sncwwwprod4.prod.mcafee.com #view-source: .. nname == "sncwwwprod4.prod.mcafee.com" sncwwwprod5.prod.mcafee.com #view-source: .. nname == "sncwwwprod5.prod.mcafee.com" || domainna sncwwwprod6.prod.mcafee.com #view-source: .. nname == "sncwwwprod6.prod.mcafee.com" || domainname == "snc sncwebcms1:8600 #view-source: .. nname == "sncwebcms1:8600" 161.69.207.22 #view-source: .. nname == "161.69.207.22" || domainname == "161.69. 161.69.207.23 #view-source: .. nname == "161.69.207.23" ||domainname == "161.69.207.22:8600 sncwebcms2:8600 #view-source: .. nname == "sncwebcms2:8600" searchmcafee.mcafee.com #view-source: .. nname == "searchmcafee.mcafee.com" || domainname = phoenix-beta.mcafee.com #view-source: .. nname == "phoenix-beta.mcafee.com" || domainname == "dalqawe eloqua.com #view-source: .. nname == "eloqua.com" || domainname == "daldevwebcms3:9696" daldevwebcms3:9696 #view-source: .. nname == "daldevwebcms3:9696" phoenix.qa.nai.com #view-source: .. nname == "phoenix.qa.nai.com"||domainname == "sph sphoenix.qa.nai.com #view-source: .. nname == "sphoenix.qa.nai.com" phoenix.qa.nai.com:8600 #view-source: .. nname == "phoenix.qa.nai.com:8600"||domainname == sphoenix.qa.nai.com:8443 #view-source: .. nname == "sphoenix.qa.nai.com:8443" sphoenix-uat.qa.nai.com #view-source: .. nname == "sphoenix-uat.qa.nai.com"||domainname == phoenix-uat.qa.nai.com #view-source: .. name == "sphoenix-uat.qa.nai.com"||domainname == " sphoenix-uat.qa.nai.com:8600 #view-source: .. nname == "sphoenix-uat.qa.nai.com:8600"||domainna phoenix-uat.qa.nai.com:8443 #view-source: .. nname == "phoenix-uat.qa.nai.com:8443" phoenix.corp.nai.org #view-source: .. nname == "phoenix.corp.nai.org"||domainname == "s sphoenix.corp.nai.org #view-source: .. nname == "sphoenix.corp.nai.org" phoenix.corp.nai.org:8600 #view-source: .. nname == "phoenix.corp.nai.org:8600"||domainname sphoenix.corp.nai.org:8443 #view-source: .. nname == "sphoenix.corp.nai.org:8443" phoenix.dev.nai.com #view-source: .. nname == "phoenix.dev.nai.com"||domainname == "sp # [*] total IP/Host pattern(s): 53