Kibana help us to analyze CDN logs

The logs analysis is too useful to know the state, understand the different behaviors and trends of every component in our platform. Furthermore it allow us fix mistakes, prevent failures and improve the product. Splunk has been the best solution that i has tested. The main disadvantage of Splunk is its price.

Due to the high cost of Splunk I’ve chosen Kibana+Elastcisearch+Logstash to analyze logs from my company CDNs, such as Akamai and Amazon Cloudfront. The main goal of this post is show us an alternative workaround cheaper than the another one.

Firstly, we’re going to import the Akamai logs, this is log format from official documentation:

waf_logformat

apache_logformat

This is the logstash filter:

filter {
    grok {
      type => "esw3c_waf"
      match => { "message" => "%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] (?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest}) %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{QS:cookies} \"%{WORD:WafPolicy}\|%{DATA:WafAlertRules}\|%{DATA:WafDenyRules}\"" }
    }

    date {
      type => "esw3c_waf"
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
      locale => "en"
    }

}

Moreover, we can see Cloudfront log format here

#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes
07/01/2012 01:13:11 FRA2 182 192.0.2.10 GET d111111abcdef8.cloudfront.net /view/my/file.html 200 www.displaymyfiles.com Mozilla/4.0%20(compatible;%20MSIE%205.0b1;%20Mac_PowerPC) - zip=98101 RefreshHit MRVMF7KydIvxMWfJIglgwHQwZsbG2IhRJ07sn9AkKUFSHS9EXAMPLE== d111111abcdef8.cloudfront.net http -

For one hand, logstash has got a S3 input type to read gzip log file directly from S3 bucket. For other hand, this is the filter applied:

filter {
    grok {
    type => "aws"
    pattern => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x-edge-location}\t(?:%{NUMBER:sc-bytes}|-)\t%{IPORHOST:c-ip}\t%{WORD:cs-method}\t%{HOSTNAME:cs-host}\t%{NOTSPACE:cs-uri-stem}\t%{NUMBER:sc-status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User-Agent}\t%{GREEDYDATA:cs-uri-stem}\t%{GREEDYDATA:cookies}\t%{WORD:x-edge-result-type}\t%{NOTSPACE:x-edge-request-id}\t%{HOSTNAME:x-host-header}\t%{URIPROTO:cs-protocol}\t%{INT:cs-bytes}"
}
mutate {
    type => "aws"
        add_field => [ "listener_timestamp", "%{date} %{time}" ]
    }
date {
      type => "aws"
      match => [ "listener_timestamp", "yy-MM-dd HH:mm:ss" ]
    }
}

Also, I recommend enable LifeCycle in S3 Bucket to set the log’s purge, it increases so quick!

Well, next step is install Elasticsearch 1.0, it has been released recently and i’m so proud because it was announced me by Honza Kral in Fosdem’2014

I want highlight two essential plugins that sysadmins will like it. The first one is HEAD and the other one is MARVEL.

HEAD plugin screenshot, 1 index for day/cdn
ElasticHead

Marvel plugin screenshot:
Marvel

These is a useful links:
GROK patterns list
Tool to build expressions

Finally, we use Kibana to create reports and charts, a Javascript application reads data from Elasticsearch instance. My first step was create two basic dashboard.

AWS Cloudfront Dashboard
CloudfrontDashboard

Akamai Dashboard
AkamaiDashboard

  1. Awesome post, good job man! Working on doing the same thing using fluentd instead of logstash. This helps a lot.

Leave a Reply

Your email address will not be published. Required fields are marked *

Secured By miniOrange