fiff: monitoring

Showing posts with label monitoring. Show all posts

Tuesday, 8 May 2018

Prometheus Overview and Setup

Overview

Prometheus is an opensource monitoring solution that gathers time series based numerical data. It is a project which was started by Google's ex-employees at SoundCloud.

To monitor your services and infra with Prometheus your service needs to expose an endpoint in the form of port or URL. For example:- {{localhost:9090}}. The endpoint is an HTTP interface that exposes the metrics.

For some platforms such as Kubernetes and skyDNS Prometheus act as directly instrumented software that means you don't have to install any kind of exporters to monitor these platforms. It can directly monitor by Prometheus.

One of the best thing about Prometheus is that it uses a Time Series Database(TSDB) because of that you can use mathematical operations, queries to analyze them. Prometheus uses SQLite as a database but it keeps the monitoring data in volumes.

Pre-requisites

A CentOS 7 or Ubuntu VM
A non-root sudo user, preferably one named prometheus

Installing Prometheus Server

First, create a new directory to store all the files you download in this tutorial and move to it.

mkdir /opt/prometheus-setup
cd /opt/prometheus-setup

Create a user named "prometheus"

useradd prometheus

Use wget to download the latest build of the Prometheus server and time-series database from GitHub.

wget https://github.com/prometheus/prometheus/releases/download/v2.0.0/prometheus-2.0.0.linux-amd64.tar.gz

The Prometheus monitoring system consists of several components, each of which needs to be installed separately.

Use tar to extract prometheus-2.0.0.linux-amd64.tar.gz:

tar -xvzf ~/opt/prometheus-setup/prometheus-2.0.0.linux-amd64.tar.gz .

Place your executable file somewhere in your PATH variable, or add them into a path for easy access.

mv prometheus-2.0.0.linux-amd64  prometheus
sudo mv  prometheus/prometheus  /usr/bin/
sudo chown prometheus:prometheus /usr/bin/prometheus
mkdir /etc/prometheus
mv prometheus/prometheus.yml /etc/prometheus/
prometheus --version

You should see the following message on your screen:

  prometheus,       version 2.0.0 (branch: HEAD, revision: 0a74f98628a0463dddc90528220c94de5032d1a0)
  build user:       root@615b82cb36b6
  build date:       20171108-07:11:59
  go version:       go1.9.2

Create a service for prometheus

sudo vi /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus

[Service]
User=prometheus
ExecStart=/usr/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /opt/prometheus-setup/

[Install]
WantedBy=multi-user.target

systemctl daemon-reload

systemctl start prometheus

systemctl enable prometheus

Installing Node Exporter

Prometheus was developed for the purpose of monitoring web services. In order to monitor the metrics of your server, you should install a tool called Node Exporter. Node Exporter, as its name suggests, exports lots of metrics (such as disk I/O statistics, CPU load, memory usage, network statistics, and more) in a format Prometheus understands. Enter the Downloads directory and use wget to download the latest build of Node Exporter which is available on GitHub.

Node exporter is a binary which is written in go which monitors the resources such as cpu, ram and filesystem.

wget https://github.com/prometheus/node_exporter/releases/download/v0.15.1/node_exporter-0.15.1.linux-amd64.tar.gz

You can now use the tar command to extract : node_exporter-0.15.1.linux-amd64.tar.gz

tar -xvzf node_exporter-0.15.1.linux-amd64.tar.gz .

mv node_exporter-0.15.1.linux-amd64 node-exporter

Perform this action:-

mv node-exporter/node_exporter /usr/bin/

Running Node Exporter as a Service

Create a user named "prometheus" on the machine on which you are going to create node exporter service.

useradd prometheus

To make it easy to start and stop the Node Exporter, let us now convert it into a service. Use vi or any other text editor to create a unit configuration file called node_exporter.service.

sudo vi /etc/systemd/system/node_exporter.service

This file should contain the path of the node_exporter executable, and also specify which user should run the executable. Accordingly, add the following code:

[Unit]
Description=Node Exporter

[Service]
User=prometheus
ExecStart=/usr/bin/node_exporter

[Install]
WantedBy=default.target

Save the file and exit the text editor. Reload systemd so that it reads the configuration file you just created.

sudo systemctl daemon-reload

At this point, Node Exporter is available as a service which can be managed using the systemctl command. Enable it so that it starts automatically at boot time.

sudo systemctl enable node_exporter.service

You can now either reboot your server or use the following command to start the service manually:

sudo systemctl start node_exporter.service

Once it starts, use a browser to view Node Exporter's web interface, which is available at http://your_server_ip:9100/metrics. You should see a page with a lot of text:

Starting Prometheus Server with a new node

Before you start Prometheus, you must first edit a configuration file for it called prometheus.yml.

vim /etc/prometheus/prometheus.yml

Copy the following code into the file.

# my global configuration which means it will applicable for all jobs in file
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. scrape_interval should be provided for scraping data from exporters 
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. Evaluation interval checks at particular time is there any update on alerting rules or not.

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. Here we will define our rules file path 
#rule_files:
#  - "node_rules.yml"
#  - "db_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape: In the scrape config we can define our job definitions
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'node-exporter'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'. 
    # target are the machine on which exporter are running and exposing data at particular port.
    static_configs:
      - targets: ['localhost:9100']

After adding configuration in prometheus.yml. We should restart the service by

systemctl restart prometheus

This creates a scrape_configs section and defines a job called a node. It includes the URL of your Node Exporter's web interface in its array of targets. The scrape_interval is set to 15 seconds so that Prometheus scrapes the metrics once every fifteen seconds. You could name your job anything you want, but calling it "node" allows you to use the default console templates of Node Exporter.

Use a browser to visit Prometheus's homepage available at http://your_server_ip:9090. You'll see the following homepage. Visit http://your_server_ip:9090/consoles/node.html to access the Node Console and click on your server, localhost:9100, to view its metrics.

Saturday, 16 September 2017

EC2 Ssh Connection Refused

When ssh: connect to host ip_address port 22 Connection refused

Unable to access server???

Exactly when you see the error - “ssh: connect to host ip_address port 22: Connection refused” while connecting your AWS EC2 Instance. In order to find solution of the problem, you will go to AWS forum and other channels where you need to answers several questions first. But it's very difficult to find the actual problem.

In order to get clues what the problem is, we should provide as many details as possible about what we have tried and the results we are getting. Because there are hundreds of reason why a server or service might not be accessible, also connectivity is one of the toughest issue to diagnose, especially when you are hosting something critical on your box.

I've seen several topics on this problem, but none offers a solution to it. I was not aware for what should I look at first. So I walk through from the very basics and investigated the following thing

Use of verbose while ssh

$ ssh -vvv user@x.x.x.x

This didn’t help me as I haven't found any meaningful information except connection refused.

After that I looked for my security groups, well they haven’t provide me any hint for further steps.
Then I tried telnet at port 22 from my public and private network which was again a hard luck for me.

$ telnet X.X.X.X 22

Tried creating AMI and building new instance of it.
I've mounted the EBS of a broken instance on a running instance, look for the file configuration of my ssh.

$ cat /etc/ssh/sshd_config

and compare that with running instance.

Also checked for the entries in /etc/fstab, but entries were all perfect as per knowledge.
Tried starting the instance from the broken instance, but again the same error occured on the screen.

Coming to AWS UI console :-

Further moved over the AWS UI, under Action I found option to put user data

So below entry were made

#cloud-config:

snappy:

ssh_enabled: True

I had gone through different option in UI , just went through the system logs

And found that the issue is with swap, which is showing error while mounting.

So I stopped the broken instance and mount the broken ebs volume to the running one and commented the swap entry from /etc/fstab

Finally I found that my instance is up and running, again I looked for the system logs in aws UI, where login was prompt was able to access my instance again.

Conclusion :-

If you come across any such error then follow the AWS console of the machine & look for the issue and get to the core of the problem.

Thursday, 3 October 2013

System Monitoring

One of the main task of a system administrator is system monitoring, system monitoring usually involves monitoring the ram & disk space usage of the system .... In this blog I'll be talking about my experience as a system admin & how I do it.

Usually system monitoring is divided into 2 parts Continuous system monitoring and troubleshooting system issues when system crosses a threshold value & you have to figure out the issue & try to resolve it.

In continuous system monitoring a system is put under continuous monitoring i.e the system ram usage is within defined limit or not, the disk space occupancy should not cross a predefined threshold .... To achieve continuous monitoring you can use couple of tools available in market such as nagios, omd we are primarily using these tools their would be other tools available also for this purpose.

Continuous system monitoring serves one purpose where they notify about any deviation from the expected state of the system, the next step is to troubleshoot this issue & resolve it accordingly. As a first step I usually execute top command, top is a very powerful command apart from just viewing the processes activity in real you can do a lot of things i.e

If you want to add/remove fields : press f & then you can choose the fields to add/remove
If you want to change ordering of fields : press O & then you can move fields
If you want to change the sort order : press F or O

there are lot of other options available as well, if you want to explore them pressing h will provide you a list of all the options.

You can also read about htop, htop is an advanced form of top where you can view some graphs as well though I'ven't used htop so much but I'm planning to :)

One thing to note sometimes you are not able ot run top command due to high resource utilization, in that case you have to use cat /proc/loadavg to view the load on the system & cat /proc/meminfo to view current memory state of the system.

One of the useful command if top doesn't work
ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 1 -nr | head -5
This command will give the top 5 processes by memory usage.

Also there are couple of other commands that you can use
free : To view the memory usage of system
df : To view the file system information
du : To view the disk usage

One tip : To increase the memory of system you can create a swap memory & it is always recommended to create a swap on a partition only. Another best practice for swap area is if your system RAM is below 8 gb your swap area should be double of your ram otherwise it should be half of your RAM size