Archive for October, 2009


Seeing a real-time breakdown of web traffic by vhost

Occasionally our servers are hit by traffic spikes. Since we typically host a number of websites per server, we need a way to quickly determine which site is receiving the bulk of incoming requests. (Then we can improve caching on that site, perhaps.) In order to see a real-time indication of what vhosts are being requested, we use the following awk script:

histo.awk

# creates a histogram of values in the first column of piped-in data
function max(arr, big) {
    big = 0;
    for (i in cat) {
        if (cat[i] > big) { big=cat[i]; }
    }
    return big
}

NF > 0 {
    cat[$1]++;
    if (!start) { start = $6 }
    end = $6
}
END {
    printf "from %s to %s\n", start, end
    maxm = max(cat);
    for (i in cat) {
        scaled = 60 * cat[i] / maxm;
        printf "%-25.25s  [%8d]:", i, cat[i]
        for (i=0; i<scaled; i++) {
            printf "#";
        }
        printf "\n";
    }
}

Which can be used like this:

watch 'tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3'

which will give a histogram of the occurence of vhosts in the last 100 lines of the apache log, updating every 2 seconds, sorted with the most frequent vhosts at the top. (Note that this assumes you are using an apache log format which includes the vhost as the first column.) It looks something like this:

Every 2.0s: tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3       Thu Oct  1 09:51:41 2009

www.dogwoodinitiative.org  [      49]:############################################################
www.wildliferecreation.or  [      24]:##############################
www.earthministry.org      [      14]:##################
blogs.onenw.org            [       3]:####
www.tilth.org              [       2]:###
www.oeconline.org          [       2]:###
www.audubonportland.org    [       1]:##
oraction.org               [       1]:##
oeconline.org              [       1]:##
dogwoodinitiative.org      [       1]:##
bandon.onenw.org           [       1]:##
209.40.194.148             [       1]:##
from [01/Oct/2009:09:51:21 to [01/Oct/2009:09:48:40

(Another useful variant of this is to produce a histogram of requests by IP address, which can help determine what to block in a DOS attack.)