The logs analysis is too useful to know the state, understand the different behaviors and trends of every component in our platform. Furthermore it allow us fix mistakes, prevent failures and improve the product. Splunk has been the best solution that i has tested. The main disadvantage of Splunk is its price.
Due to the high cost of Splunk I’ve chosen Kibana+Elastcisearch+Logstash to analyze logs from my company CDNs, such as Akamai and Amazon Cloudfront. The main goal of this post is show us an alternative workaround cheaper than the another one.
Firstly, we’re going to import the Akamai logs, this is log format from official documentation:
This is the logstash filter:
filter { grok { type => "esw3c_waf" match => { "message" => "%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] (?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest}) %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{QS:cookies} \"%{WORD:WafPolicy}\|%{DATA:WafAlertRules}\|%{DATA:WafDenyRules}\"" } } date { type => "esw3c_waf" match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ] locale => "en" } }
Moreover, we can see Cloudfront log format here
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes 07/01/2012 01:13:11 FRA2 182 192.0.2.10 GET d111111abcdef8.cloudfront.net /view/my/file.html 200 www.displaymyfiles.com Mozilla/4.0%20(compatible;%20MSIE%205.0b1;%20Mac_PowerPC) - zip=98101 RefreshHit MRVMF7KydIvxMWfJIglgwHQwZsbG2IhRJ07sn9AkKUFSHS9EXAMPLE== d111111abcdef8.cloudfront.net http -
For one hand, logstash has got a S3 input type to read gzip log file directly from S3 bucket. For other hand, this is the filter applied:
filter { grok { type => "aws" pattern => "%{DATE_EU:date}\t%{TIME:time}\t%{WORD:x-edge-location}\t(?:%{NUMBER:sc-bytes}|-)\t%{IPORHOST:c-ip}\t%{WORD:cs-method}\t%{HOSTNAME:cs-host}\t%{NOTSPACE:cs-uri-stem}\t%{NUMBER:sc-status}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User-Agent}\t%{GREEDYDATA:cs-uri-stem}\t%{GREEDYDATA:cookies}\t%{WORD:x-edge-result-type}\t%{NOTSPACE:x-edge-request-id}\t%{HOSTNAME:x-host-header}\t%{URIPROTO:cs-protocol}\t%{INT:cs-bytes}" } mutate { type => "aws" add_field => [ "listener_timestamp", "%{date} %{time}" ] } date { type => "aws" match => [ "listener_timestamp", "yy-MM-dd HH:mm:ss" ] } }
Also, I recommend enable LifeCycle in S3 Bucket to set the log’s purge, it increases so quick!
Well, next step is install Elasticsearch 1.0, it has been released recently and i’m so proud because it was announced me by Honza Kral in Fosdem’2014
I want highlight two essential plugins that sysadmins will like it. The first one is HEAD and the other one is MARVEL.
HEAD plugin screenshot, 1 index for day/cdn
These is a useful links:
– GROK patterns list
– Tool to build expressions
Finally, we use Kibana to create reports and charts, a Javascript application reads data from Elasticsearch instance. My first step was create two basic dashboard.