CloudWatch Monitoring Setup on AL2023

🎯 Goal

Set up comprehensive monitoring for an EC2 instance running nginx by implementing CloudWatch Agent to collect system metrics and application logs.

πŸ“‹ Prerequisites

Before beginning this exercise, you should:

  • Have an EC2 instance running Amazon Linux 2023
  • Understand basic IAM concepts (roles and policies)
  • Be familiar with SSH and Linux command line
  • Have basic knowledge of web servers (nginx)

πŸ“š Learning Objectives

By the end of this exercise, you will:

  • Configure IAM roles with proper permissions for CloudWatch and Systems Manager
  • Install and configure CloudWatch Agent to collect metrics and logs
  • Use Systems Manager Parameter Store to centrally manage agent configuration
  • Create a CloudWatch Dashboard to visualize system and application metrics
  • Understand the difference between default EC2 metrics and custom CloudWatch Agent metrics

πŸ“ Why This Matters

In real-world applications, monitoring is crucial because:

  • It enables proactive issue detection before customers are affected
  • It’s essential for troubleshooting production incidents
  • CloudWatch is AWS’s native monitoring solution, making it the standard for AWS workloads
  • Understanding IAM permissions for monitoring is critical for security

πŸ”§ Step-by-Step Instructions

Step 1: Configure IAM Role for Your EC2 Instance

First, create an IAM role with the necessary permissions for CloudWatch Agent.

  1. Navigate to IAM Console β†’ Roles β†’ Create role
  2. Select AWS service β†’ EC2 β†’ Next
  3. Search and attach this AWS managed policy: CloudWatchAgentAdminPolicy
  4. Name the role: CloudWatch-Agent-Role
  5. Attach the role to your EC2 instance:
    • EC2 Console β†’ Select instance β†’ Actions β†’ Security β†’ Modify IAM role

πŸ’‘ Information

  • CloudWatchAgentAdminPolicy: Allows the agent to write metrics and logs to CloudWatch
  • Without SSM permissions, you cannot store configurations centrally or use the ssm: prefix

⚠️ Common Mistakes

  • Forgetting SSM permissions causes “Access Denied” when saving to Parameter Store
  • Not attaching the role to EC2 means no metrics will appear in CloudWatch

You might need to setup your own policy to be able to write to Systems Manager

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowWriteCloudWatchAgentConfigToSSM",
            "Effect": "Allow",
            "Action": "ssm:PutParameter",
            "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
        }
    ]
}

Step 2: Install Nginx and CloudWatch Agent

  1. SSH into your EC2 instance

  2. Install nginx as our sample application:

    # Update system and install nginx
    sudo dnf update -y
    sudo dnf install nginx -y
    
    # Start nginx
    sudo systemctl start nginx
    sudo systemctl enable nginx
    
    # Create a test page
    echo "<h1>CloudWatch Test</h1>" | sudo tee /usr/share/nginx/html/index.html
  3. Download and install CloudWatch Agent:

    # Download the agent
    wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
    
    # Install it
    sudo rpm -U ./amazon-cloudwatch-agent.rpm
  4. Install CollectD:

    # Install CollectD
    sudo dnf install collectd -y
  5. Amazon Linux 2023 uses journald for logging. In order to get the logs to the traditional files you need to install RSyslog.

    # Install and activate rsyslog
    sudo dnf install rsyslog -y
    sudo systemctl start rsyslog
    sudo systemctl enable rsyslog
    
    # Configure SSH to log to traditional files
    echo "SyslogFacility AUTH" | sudo tee -a /etc/ssh/sshd_config
    echo "LogLevel INFO" | sudo tee -a /etc/ssh/sshd_config
    
    # Restart sshd
    sudo systemctl restart sshd

πŸ’‘ Information

  • Package Installation: The RPM creates a cwagent user and installs files in /opt/aws/amazon-cloudwatch-agent/
  • You needed to install CollectD.
  • The agent can collect both AWS service metrics and custom application metrics

Now you will have the logs here:

  • Nginx: /var/log/nginx/access.log
  • SSH: /var/log/secure
  • System: /var/log/messages

Step 3: Configure CloudWatch Agent Using the Wizard

  1. Run the configuration wizard:

    sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
  2. Select these options in the wizard:

    OS:                     1 (linux)
    EC2 or On-Premises:     1 (EC2)
    User:                   2 (root)
    StatsD:                 2 (no)
    CollectD:               1 (yes)
    Host metrics:           1 (yes)
    CPU per core:           2 (no)
    EC2 dimensions:         1 (yes)
    Collection interval:    4 (60s)
    Metrics level:          2 (Standard)
    Satisfied:              1 (yes)
    Import existing config: 2 (no)
    Monitor log files:      1 (yes)
  3. Configure Nginx access log as source:

    Nginx log:
      File path:   /var/log/nginx/access.log
      Group name:  access.log
      Stream name: {instance_id}
      Add another: 2 (no)
  4. Save to Parameter Store:

    Store in Parameter Store: 1 (yes)
    Parameter name:          AmazonCloudWatch-linux
    Region:                  [press Enter]
    Credentials:             1 (use IAM role)

πŸ’‘ Information

  • Standard Metrics: Includes CPU, memory, disk, and swap usage
  • Log Groups: Organize different log types for easier querying
  • Parameter Store: Enables centralized configuration management across multiple instances

Step 4: Start the CloudWatch Agent

  1. Alt 1: Start the agent using the Parameter Store configuration:

    sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
        -a fetch-config \
        -m ec2 \
        -s \
        -c ssm:AmazonCloudWatch-al2023
  2. Alt 2: Start the agent using the local configuration:

    sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
        -a fetch-config \
        -m ec2 \
        -s \
        -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json
  3. Verify it’s running:

    # Check status
    sudo systemctl status amazon-cloudwatch-agent
    
    # View logs if there are issues
    sudo tail -f /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log

πŸ’‘ Information

  • fetch-config: Downloads and applies the configuration
  • -m ec2: Uses EC2 metadata for instance information
  • -s: Starts the agent after configuration
  • ssm: prefix: Fetches config from Parameter Store (requires SSM permissions)

⚠️ Common Mistakes

  • Using ssm: without SSM permissions will fail with “Access Denied”
  • The agent needs 2-5 minutes before metrics appear in CloudWatch

Step 5: Create CloudWatch Dashboard

  1. Navigate to CloudWatch Console β†’ Dashboards β†’ Create dashboard

  2. Name: test

  3. Add four widgets:

    Widget 1 - CPU:

    • Type: Number
    • Metrics β†’ EC2 β†’ Per-Instance Metrics
    • Select: CPUUtilization

    Widget 2 - Memory:

    • Type: Number
    • Metrics β†’ CWAgent β†’ Select your instance
    • Select: mem_used_percent

    Widget 3 - Disk:

    • Type: Number
    • Metrics β†’ CWAgent β†’ Select your instance
    • Select: disk_used_percent

    Widget 4 - System Logs:

    • Type: Logs table
    • Log group: access.log
    • Query: fields @timestamp, @message | sort @timestamp desc | limit 10
  4. Save the dashboard

πŸ’‘ Information

  • EC2 Namespace: Contains default metrics (CPU only, no agent needed)
  • CWAgent Namespace: Contains custom metrics from CloudWatch Agent
  • Metrics may take 5 minutes to appear after agent startup

πŸ§ͺ Final Tests

Run the Application and Validate Your Work

  1. Generate test data:

    # Install stress tool
    sudo dnf install -y stress-ng
    
    # Generate CPU load
    sudo stress-ng --cpu 2 --timeout 30s
    
    # Generate nginx traffic
    for i in {1..20}; do curl localhost; done
    
    # Create log entries
    logger -t TEST "CloudWatch monitoring test"
  2. Open CloudWatch Dashboard and verify:

    • CPU utilization increases during stress test
    • Memory and disk percentages are displayed
    • System logs show your test message
    • Nginx access logs appear (if configured)

βœ… Expected Results

  • All three number widgets display percentage values
  • CPU shows spike during stress test (may take 1-2 minutes)
  • Logs table shows recent system messages
  • No error messages in agent logs

πŸ”§ Troubleshooting

If you encounter issues:

  • No metrics appearing: Check IAM role has both required policies
  • SSM errors: Ensure AmazonSSMManagedInstanceCore policy is attached
  • Agent won’t start: Review logs at /opt/aws/amazon-cloudwatch-agent/logs/
  • Missing CWAgent namespace: Wait 5 minutes, then restart agent

πŸš€ Optional Challenge

Want to take your learning further? Try:

  • Adding an alarm when CPU exceeds 80%
  • Creating a CloudWatch Insights query to analyze nginx response codes
  • Setting up SNS notifications for critical metrics
  • Configuring log retention policies to manage costs

πŸ“š Further Reading

Done! πŸŽ‰

Great job! You’ve successfully implemented CloudWatch monitoring and learned how to use Systems Manager Parameter Store for configuration management. This setup provides comprehensive visibility into your EC2 instances and applications! πŸš€