Basic Perl Scripting Exercise 8 To be handed in (per email to kathrin@theochem.kth.se): Tuesday, October 30th In this exercise you want to analyze the content of a log file. Below you find a part of a log file, where the access to our web server was monitored. Each line consists of following fields: 1. from which IP adress someone entered 2. what date 3. which file was opened 4. response code (200 means success) 5. size of the returned document/page 6. which browser was used (Gecko = Firefox, MSIE=Internet Explorer) 75.33.250.83 - - [14/Oct/2007:04:03:11 +0200] "GET /research/projects.css HTTP/1.1" 200 761 "http://www.theochem.kth.se/research/xspectra/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7" 75.33.250.83 - - [14/Oct/2007:04:03:11 +0200] "GET /research/xspectra/xray_process.gif HTTP/1.1" 200 160024 "http://www.theochem.kth.se/research/xspectra/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7" 74.6.28.213 - - [14/Oct/2007:04:05:40 +0200] "GET /publications/abstract.php?id=568 HTTP/1.0" 200 1773 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 74.6.20.113 - - [14/Oct/2007:04:07:03 +0200] "GET /publications/ HTTP/1.0" 200 187532 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 74.6.29.31 - - [14/Oct/2007:04:08:51 +0200] "GET /molprop/addevent.php?id=60 HTTP/1.0" 200 2350 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 71.235.196.153 - - [14/Oct/2007:04:09:01 +0200] "GET / HTTP/1.1" 200 10409 "http://www.google.com/search?hl=en&q=se+theochem" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" 71.235.196.153 - - [14/Oct/2007:04:09:01 +0200] "GET /image/bakgrund.gif HTTP/1.1" 200 107 "http://www.theochem.kth.se/""Mozilla/4.0 (compatible; MSIE6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" You are supposed to analyze the above log file and make the following statistics: - Which HTML files were accessed and how often. Report them in the descending order of accesses. - What browsers were used and how often. Report these in the descending usage order. Below you can see an example (for another logfile) on how the output should look: Most often accessed HTML files: /favicon.ico 993 /robots.txt 561 /theochem.css 501 /standard.css 312 /~junjiang/bk.rar 273 /docs/latex/ 249 /%7Eshlchen/job/as-ls-p4-f.log 247 /dalton/user/dalton-2.0.tar.gz 205 Most often used browsers: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/y 3727 msnbot/1.0 (+http://search.msn.com/msnbot.htm) 2371 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) 1930 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.7) Gecko/2007 1851 msnbot-media/1.0 (+http://search.msn.com/msnbot.htm) 1456 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4 1210 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) 1052 Hand in your script and the output (remember your name as part of the file name). Good luck!