AWS Monitoring and Reporting

1. What is Observability?

- The ability to mesaure and understand how internal how internal systems work in order to answaer questions regarding performance, security and faults with a system / application.

- To obtain observability you need to use: 

    * Metrics - A number that is measured over period of time eg. if we measured the CPU usage and aggregated it over an a period of time we could have an Average CPU

    * Logs - A text file where each line contains event data about what happened at a certain time.

    * Traces - A history of request that is travels through multiple Apps/services so we can pinpoint performance or failure

    * Alarms - it sometimes considered fourth Poilar of Observability

- You have to use them toghether , using them in isolate does not gain you observability

2. CloudWatch


- is a monitoring solution for AWS resources

- CloudWatch is a centralicez log managment service

- Cloud Watch is an umbrella service meaning that is really a collection of monitoring tools as follows: 

    * Logs - any custom log data, Application Logs, Nginx Logs, Lambda Logs

    * Metrics - Represents a time-ordered set of data points. A variable to monitor eg. Memory Usage

    * Events - trigger an event based on a condition eg. ever hour take snapshot of server (now known as Amazon EventBridge)

    * Alarms - triggers notifications based on metrics which breach a defined threshold

    * Dashboards - create visualizations based on metrics

    * ServiceLens - visualize and analyze the health, performance, availability of your app in a single place 

    * Container Insights - collects, aggregates, and summerizes metrics and logs from your conteinerized apps nad microservices

    * Synthetics - test your web-apps to see if they're broken 

    * Contributor Insights - view the top contributors impacting the performance of your systems and applications in real-time

- all this CloudWatch services are build off of CloudWatch logs.


<CW> IMAGE



3. CloudWatch Logs 

- is used to :

    * monitor

    * store

    * access log files

- detail services:

    * Export Logs to S3 - You can export Logs to S3 to do things like perform custom analysis

    * Stream to Elasticsearch Service (ES) - You can stream logs to an ES cluster in near real-life to have more robust full text search or use with the ELK stack

    * Stream CloudTrail Events to CloudWatch Logs - You can turn on CloudTrail to stream event data to CloudWatch Log Group

    * Log Security - By default , log group s are encrypted at rest using SSE. You can use your own Customer Master Key (CMKs) with AWS KMS

    * Log Filtering - Logs can be filtred by using a Filtering Syntax and CloudWatch Logs as sub-service called CloudWatch Insights

    * Log Rotation 

        ** By default , logs are kept idefinitely and never expire. You can adjust the retention policy for each log group:

            *** keeping the idefinite retention

            *** choosing a retention period between 1 Day to 10 years

- Most AWS Services are integrated with ClodWatch Logs.  Logging service sometimes need to be turned on or reaquiers the IAM Permission to write to ClodWatch Logs.


a) CloudWatch Logs - Log Groups

- Logs Groups - A collection of log streams. 

- Its common to name logs groups with the forward slash syntax:

/exampro/prod/app

/exampro/prod/db

/exampro/dev/app

/exampro/dev/db

-You can set the retention on a Log Group between never expire and 120 months (10 years)


b) CloudWatch Logs - Log Stream

- Log Streams - a log stream represents a sequence of events from application or instance being monitored

- You can create Log Streams manually but generally this is automatically done by the service you are using

- Example Log Group for a Lambda function . Logs Streams are named after the running instance. Lambdas frequency run on new instances so the streams contain timestamps

/20202/07.06.($LATESTJebcasfd438927nbdfkn

- Example Log Group for an application logs running on EC2 You can see here the Log Streams are named after the running instance's Instance ID

i-037623bdjhk8824890

- Example Log Group for AWS Glue  You can see here the Log Streams are named after the Glue Jobs

exampro-events-crawler

c) CloudWatch Logs - Log Events

- Log Events - Represents a single event in a log file. Log events can be seen within a Log Stream.

-You can use filter events to filter Out logs based on simple or Pattern matching syntax


d) CloudWatch Logs - Log Insights

-CloudWatch Logs Insights - enables you to interactively search and analyze your CloudWatch log data and has the following advantages:

    * more robust filtering then using the simple Filter events in a Log Stream

    * Less burdensome then  having to export logs to S3 and analyze them via Athena

- CloudWatch Logs Insights supports all types of logs.

- CloudWatch Logs Insights is commonly used via the console to do ad-hoc queries against logs groups.

-CloudWatch Insights has its own language calledL 

    * CloudWatch Logs Insights Query Syntax

filter action="REJECT"

| stats count(*) as numRejections by srcAddr

| sort numRejections desc

| limit 20

- A single request can query up to 20 log groups

- Queries time out after 15 minutes, if they have not completed

-Query results are available for 7 days.

e)  CloudWatch Logs - Logs Insights - Discovered Fields

-When CloudWatch Insights reads a logs, it will first analyze the log events and try to structure the content by generating field that you can then use in your query.

-CloudWatch Logs Insights inserts the @ symbol at the start of fields that generates.

-Five system fields will be automatically genereted:

    * @message - the raw unparsed log event

    * @timestamp - the event timestamp contained in the log event's timestamp field.

    * @investigationTime - the time when twhe log event was received by CloudWatch Logs.

    * @logStream - the name of the log stream that the log event was added to.

    * @log - is a log group identifier inthe form of account-id:log-group-name

-CloudWatch Logs Insights automatically discovers fields in logs from AWS services such as:

    *Amazon VPC Flow Logs

        **@timestamp

        **@logStream

        **@message

        **@accountId

        **@endTime

        **@interfaceId

        **@logStatus

        **@startTime

        **@version

        **@action

        **@bytes

        **@dstAddr

        **@dstPort

        **@packets

        **@protocol

        **@srcAddr

        **@srcPort

    *Amazon Route 53

        **@timestamp

        **@logStream

        **@message

        **@edgeLocation

        **@hostZoneId

        **@protocol

        **@queryName

        **@queryTimestamp

        **@queryType

        **@resolverIp

        **@responseCode

        **@version

    *AWS Lambda

        **@timestamp

        **@logStream

        **@message

        **@requestId

        **@duration

        **@billedDuration

        **@type

        **@maxMemoryUsed

        **@memorySize

        ** with X-Ray

            ***@xrayTracedId

            ***@xraySegmentId

    *AWS CloudTrail

        **@eventVersion

        **@eventTime

        **@eventSource

        **@eventName

        **@awsRegion

        **@sourceIPAddres

        **@userAgent

    *JSON Logs

        ** The fields of a JSON log will be turned into fields

    *Other Types of Logs

        ** Fields that CloudWatch Logs Insights doesn't automatically discover you can use the parse command to extract and create ephemeral fields for use in the query.

-There are more , you have to check the JSON in the CloudTrail events to see full list


f)

4. CloudWatch Metrics

-A CloudWatch Metric represents a time-ordered set if data points . Its a variable that is monitored over time.

- CloudWatch comes with many predefined metrics that are generally name spaced by AWS Service

- EC2 Per-Instance Metrics

    *CPUUtilization

    *DiskReadOps

    *DiskReadBytes

    *DiskWriteBytes

    *NetworkIn

    *NetworkOut

    *NetworkPacketsIn

    *NetworkPacketsOut

5. Availability of Data 


-When an AWS Services emits data to CloudWatch the availability of the data varies based on the AWS Service:

    * EC2

        ** Basic Monitoring - 5 minute interval

        ** Detailed Monitoring - 1 minute interval

    * Other Services

        ** Basic Monitoring - 1 minute / 3 minute / 5 minute (MAjority of AWS services avaibility is 1 minute)

        ** Detailed Monitoring - N/A


6. CloudWatch Agent 

-the CloudWatch Agent can be installed using AWS System Manager (SSM) Run Command onto the target EC2 Instance. Using AWS-ConfigureAWSPackage - Install or uninstall a Distributor package. You can install the latest version, default version, or a version of the package yopu specify. Packages provided by AWS such as AmazonCloudWatchAgent. AwsEnaNetworkDriver, and AWSPVDriver are also supported.

    *AWS-ConfigureAWSPackage with parameters

        **Action - Install

        **Installation Type - Uninstall and reinstall

        **Name - AmazonCloudWatchAgent

        **Version - latest

        ** Choose which EC2 Instance you want to install Agent on:

            ***Specify instance tags

            ***Choose instances manually

            ***Choose a resource group

        ** You must attach the CloudWatchAgentServerRole IAM role to the EC2 instance to be able to run the agent on the instance

    

7. Host Level Metrics 

- Some metrics you might think are tracked by default for EC2 instances are not, and require installing the CloudWatch Agent

- Host Level Metrics - these are what you get without installing the Agent

    *CPU Usage

    *Network Usage

    *Disk Usage

    *Status Checks

        **Underlying Hypervisior status

        **Underlying EC2 Instance status


- Agent Level Metrics - These are what you get when installing the Agent

    *Memory utilization 

    *Disk Swap utilization

    *Disk Space utilization

    *Page file utilization

    *Log collection - The CloudWatch Agent is also used to collect various logs from an EC2 instance and send them to a CloudWatch Log Group

8. Custom High Resolution Metrics


-You can publish your own Custom Metrics using the AWS CLI or SDK


aws cloudwatch put-metric-data \

    --metric-name Enterprise-D \

    --namespace Starfleet \

    --unit Bytes \

    --value 2343324 \

    --dimensions HullIntegrity=100, Shild=70, Thrusters=maxium

-When you publish a custom metric, you can define the resolution as either:

    * standard resolution (1 minute)

    * high resolution (>1 minute to 1 second)

-With High Resolution you can track in intervals of:

    *1 second

    *5 seconds

    *10 seconds

    *30 seconds

    *multiple of 60 seconds

    

10. Log Collection

- The CloudWatch Agent can send logs running on your EC2 instance to a CloudWatch Log Group.

- To send logs:

1) the Agent Configuration needs to be updated to include the logs

2) the CloudWatch Agent service needs to be restarted 

- The Agent's configuration file is located at /etc/awslogs/awslogs.conf

[exampro_application_log]

log_group_name = /exampro/rails/logs/production

log_stream_name = {instance_id}

datetime_format = %Y-%m-%dT%H:%M:%S.%f

file = /var/www/my-app/current/log/production.log*


- to restart CloudWatch Agent so it will send the added log files

sudo service awslogsd stop

sudo service awslogsd start

-


11. EventBridge

- What is an Event Bus? An event bus receives events from a source and routes events to a target based on rules

<eventbus> image

- EventBridge is a serverless event bus service that is used for application integration by streaming real-time data to you applications

-EventBridge was formally called CloudWatch Events.

-

a) Event Bridge Core Components

- Event Bus

    * Holds event data ,  define rules on an event bus to react to events.

    * Kinds of Event Bus

        ** Default Event Bus - An AWS account has a default event bus        

        ** Custom Event Bus - Scoped to multiple accounts or other AWS accounts

        ** SaaS Event Bus - Scoped to with Third party SaaS Providers

-Definitions:

    * Producers - AWS Services that emit events

    * Partner Sources - A third-party apps that can emit events to an event bus

    * Events - Data emitted by services. JSON objects that travel (stream) within the event bus

    * Rules - Determines what events to capture and pass to targets. (100 Rules per bus)

    * Targets - AWS Services that consume events (5 targets per rule)

b) EventBridge - Anatomy of an Event

-The top level fields here will always appear in every single event.

-The contents of fields appearing under detail will vary based on what AWS Cloud service emits the event

- example

{

"version" :  "0",

"id" : "6a7e8feb-b391-4ef7-e9f1-bf3703467718",

"detail-type" : "EC2 Instance State-change Notification", 

"source" : "aws.ec2",

"account" : "12121212121212", 

"time" : "2020-05-22T14:22:48Z",

"region" : "us-east-1",

"resources" : [

    "arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"

],

"detail" :  {

    "instance-id" : "i-1234567890abcdef0",

    "state" : "terminated"

}

}


version - by default, this is set to 0 (zero in all events.

id - a unique value is generated for every event.

detail-type - identifies fields and values that appear in the detail field

source - identifies the service that sourced the event.

account - 12-digit number identifying AWS account.

time - the event timestamp

region - AWS region where the event originated

resources - JSON array contains ARNs the identify resources that are involved in the event

detail - JSON object containing data provided by the Cloud Service. Can contain 50 fields nested several levels deep


b) EventBridge - Scheduled Expressions

- You can create EventBridge Rules that trigger on a schedule. You can think of it as Serverless Cron Jobs

    *All scheduled events use UTC time zone

    *the minimum precision for schedules is 1 minute

-EventBridge supports:

    * cron expressions

        ** very fine grain control

        ** cron expression 0 10 * * ? *

    * rate expressions

        ** easy to set, not as fine grained 

        ** fixed rate every 8 hours (days, minutes)

c) EventBridge - Rules

- You specify up to five Targets for a single rule. Commonly targeted AWS Cloud Services:

    * Lambda Function

    * SQS queue

    * SNS topic

    * Firehose delivery stream

    * ESC Task

- You may have some additional fields to select the target. eg.

    * Lambda Function

    * Lambda Alias

    * Lambda Version

- You can specify what gets passed along by changing Configure Input. This acts as sort of filter. 

    * Matched events - The entire event pattern text is passed to the target when the rule is triggered. (Just Pass everything)

    * Part of the matched event - only the part of the event text that you specify is passed to the target

        **example: $.detail

    * Constant (JSON text) - send static content instead of the matched event data. (Mocked JSON)

        ** example: { "success": true }

    * Input transformer

        ** You can transform for the event text a different format of a :

            *** string 

                **** example:

                { 

                   "instance": "$.detail.instance",

                    "state": "$.detail.state"

                }

                "instance <instance> is in <state>"


            *** or a JSON object

                { 

                   "instance": "$.detail.instance",

                    "state": "$.detail.state"

                }

                { 

                   "instance": <instance>,

                    "state": <state>

                }

        ** Yo can map fields to the variables. Then you can use those variables in a string or JSON object and that it what gets passed along.

        ** You can't use these as variable names (reserved by AWS):

            *** aws.events.rule-arn

            *** aws.events.rule-name

            *** aws.events.event


d) EventBridge - Schema Registry

- EventBridge Schema Registry allows you to create, discover and manage OpenAPI schemas for events on EventBridge.

- What is a schema? A schema is an outline, diagram, or model. Schemas are often used to describe the structure of different types of data.

- It will monitor events within the Event Bus -> Catalogue changes to the schema -> View the schema and Download the Code Bindings

- Why would you want a schema of the events in your EventBridge event bus?

    * So you can see if the structure of the events have changed over time (changed version)

    * This makes it easier for developers to know. What data to expect from a type of event so its easier to integrate into applications.

    * So we can download Code Bindings for various langugaes to make it easier for developers to work with events in their code.


import { EC2InstanceLaunchLifecycleAction } from './schema/aws/autoscaling/ec2instancelaunch lifecycleaction/EC2InstanceLaunchLifecycleAction'

const event = new EC2InstanceLaunchLifecycleAction

event.eC2InstanceId

    * A Code Binding is when the schema is wrapped in a programming Object. This standardizes how to work with event data in the code Leading to fewer bugs and easier discovery data


- By installing the AWS Toolkit for VSCode you can easily View Schemas and install Code Bindings

    * VSCode is a very popular code editor. You can install "extensions" directly with the editor. This is how you would install AWs Toolkit.

    *Once installed you open the Command Pallet (Command + Shift + P_ and Connect to your AWS Account

-

e) EventBridge - CloudTrail Events

- Not all AWS Services emit CloudWatch Events

- For other AWS Services we can use CloudTrail

-Turning on CloudTrail allows EventBridge to track changes to AWS Services made by API calls or by AWS users.

-The Detail Type of CloudTrail will be called: "AWS API Call via CloudTrail"

-AWS API call events that are larger than 256 KB in size are not supported

-

f) EventBridge - Event Patterns

- Event Patterns are used to filter what events should be used to pass along to a target

-You can filter events by providing the same fildes and values found in the orginal Events.

-Let us say we want to create a rule that Only reacts to when an EC2 instance is terminated

- We would just supply the following as the Event Pattern

    * Prefix Matching 

        ** match on the prefix of a value in the event source

        ** example 

"region": [{ "prefix" : "ca-" }]

    * Anything-but Matching

        ** matches anything except what's provided in the rule

        ** example

"state": [{ "anything-but" : ["stopped" , "overloaded"] }]

    * Numeric Matching

        ** Matches against numeric operator for "<", ">", "=", "<=", ">="

        ** example

"x-limit": [{ "numeric" : [">", 0, "<=", 5 ] }]

    * IP Adress Matching

        ** matching against avaible for both IPv4 and IPv6 address

        ** example

"source-ip": [{ "cidr" : "10.0.0.0/24" }]

    * Exists Matching

        ** matches works on the presence or absence of a field in the JSON

        ** example

"c-count": [{ "exists" : false }]

    * Empty Value Matching

        ** for strings you can use "" to match empty. For other values you can null

        ** example

"eventVersion": [""]

"responseElements": [null]

    * Complex example with Multiple Matching

        ** combine multiple matching rules into a more complex event pattern

        ** example

{

    "time" : [{ "prexif" : "2017-20-02" }],

    "detail" : {

        "state": [{ "anything-but" : "initializing" }]

        "c-count": [{ "numeric" : ["<", 10 ] }]

        "x-limit": [{ "anything-but" : [100, 200, 300 ] }]

    }

}

-

g) EventBridge - Partner Event Sources

-A list of third-party service providers can be integrated to work with EventBridge) 

- Events will emit from the provider into your EventBus

- Example

    *Atlassian -Opsgenie

    *Auth0

    PagerDuty

    *DataDog

    *Zendesk

    *Whispir

    *Saviynt

    *Segment

    *SignalFx

    *SugarCRM


i) 

j)

k)

l)

m)

n)

12. CloudWatch Alarms

- A CloudWatch Alarm monitors CloudWatch Metric based on a defined threshold

- When alarm breaches (goes outside defined threshold) than it changes state

-Metric Alarm States

    *OK

        ** The metric or expression is within the defined threshold

    *ALARM

        ** The metric or expression is outside of the defined threshold

    *INSUFFICENT_DATA

        ** The alarm has just started

        ** The metric is not available

        ** Not enough data is available

-When it changes state we can define what action it should trigger.

    * Notification

    * Auto Scaling Group

    * EC2 Action

a) CloudWatch Alarms - Anatomy of the Alarm

- Definitions

    * Threshold Condition - Defines when a datapoint is breached

    * Data point - Represents the metric's measurement at a given period

    * Evaluation Periods - number of previous periods

    * Metric - The actual data we are measuring

        **NetworkIn - The volume of incoming network traffic. Measured in Bytes . When using 5 min monitoring divide to get 300 Bytes/Second

    * Period - How often it checks to evaluate the Alarm ( 5 minutes)

    * Datapoints to alarm - 1 data point is breached in an evaluation going back 4 periods. This is what trigger the alarm.


<CWM> image

b) CloudWatch Alarms - Alarm Conditions

- When you create an alarm you define the threshold. The most common type is a Static Threshold.

- Then you define the condition of the alarm

- Then you define the threshold value

- Example: You create an CloudWatch Alarm because you want to avoid unexpected charges (Billing Alarm)

    * You use the EstimatedCharges metric

    *You set the Threshold Type to Static

    * Set the Alarm condition to Greater

    *Set the threshold value of 50 USD

-


c) CloudWatch Alarms - Composite Alarms

- Composite Alarms are alarms that watch other alarms

- Using composite alarms can help you reduce alarm noise.

-Imagine you have two Alarms and you configure them to have no action:

    *CPU Utilization

    *NetworkIn

-You select both Alarms and create a Composite Alarm- > Create composite alarm -> Then you set the alarm conditions  (ALARM("CPU_Utilization") OR ALARM(("NetworkIn")) -> The action you can configure for a composite alarm is an SNS Topic

-

d)

e)

f)

g)

h)

i)


13. CloudWatch Dashboards

- CloudWatch Dashboards allows you to visualize your Cloud Metrics in the form of various graphs. 

-You create a widget, choose and configure a metric and add to you dashboard

14. CloudWatch ServiceLens

- CloudWatch ServiceLens gives you observability for your distributed applications by consolidating metrics, traces, logs, and alarms into one unified dashboard.

- What is a distributed application? Also known as a distributed system, is when network isolated service or applications that have to communicate over a network, together make a larger system/application

-Applications that could be defined as distributed system generally utilize :

    *Microservices

    *Containers

    *Various Cloud Services, Compute and Databases tied together using Application Integration Services

-ServiceLens integrates CloudWatch with X-Ray to provide an end-to-end view of your application to help you efficiently:

    *pinpoint performance bottlenecks

    *identify impacted users

- Service Map displays your service endpoints as "nodes" and highlights the traffic, latency, and  errors for each node and its connections

- ServiceLens integrates with CloudWatch Synthetics

- ServiceLens supports log correlation with:

    * Lambda Functions

    * API Gateway

    * Java-based apps on EC2

    * Java-based apps on ECS

    * Java-based apps on EKS

    * Kubernetes with Container Insights

- To install and use Service Lens you need to:

    * Deploy X-Ray (Instrument your services)

    * Deploy CloudWatch Agent and X-Ray deamon

- ServiceLens has two modes :

    * Map View - is the Service Map showing us traces between nodes

    * List View - is flat list of Nodes that are. Make us the service.

- Click into a node we a get Service Dashboard with lots of information

-ServiceLens lets us quickly filter trace information to open in X-Ray Analytics

-

15. CloudWatch Synthetics

- Synthetics is used to test web-application by creating canaries to :

    * Broken or dead links

    * Step by step task completion

    * Page load errors

    * Load latencies of assets

    * Complex Wizard flows

    * Checkout flows

- What is a Canary? Canaries are configurable scripts that run on a schedule, to monitor your endpoints and APIs. Canaries mimic steps a real user would take so you can continuously verify the customer expirience.

- Canaries run on AWS Lambdas using the Node.js and Puppeteer.

- Puppeteer is a headless chrome browser and an automated testing framework. You can code Puppeteer to open a web-browser and click and enter information into a website. 

-When creating a canary you can:

    *use a blue-print

    *use the inline-editor

    *import from S3

-There are 4 Blueprints:

    *Heartbeat monitoring - used to check a single page


//INSERT URL here

const URL = "https://app.exampro.co/signup";

let page - await synthetics.getPage();

const response = await page.goto(URL, (waitUntil: 'domcontentloaded', timeout: 3000));

//Wait for page to render

//Increase or decrease wait time based on endpoint being monitored.

await page.waitFor(15000);

await synthetics.takeScreenshot('loaded', 'loaded');

let pageTitle = await page.title();

log.info('Page title: ' = pageTitle);

if (response.stdout() !== 200) {

    throw = 'failed to loade page!";

}

};

exports.handler = async () => {

    return await pageLoadBlueprint();

};

        ** supply a single URL

        ** wait a while and then take a  screenshot when page has loaded

        * Its called HearBeat Monitoring because it checks continously to see if the page is still a live (check "run continously" pick time interval)

    *API canary - Used to check an API endpoint

        **Supply the API endpoint

            *** Method (GET, POST)

            *** Headers

            *** Payload (data)

        ** Checks if 200 is returned for success, anything else is considered a failure

    *Broken link checker - Supply lin(s) , then look for links on the page , and follow them to see if any those links are broken

        **Tell what website it should look at. ow many links on the page it should click on and see if they load

        ** You can supply multiple URLS

        ** it will log all the pages it was able to load or not load. Since Canaries use AWS Lambda it would just log to a CloudWatch Log group

    *GUI workflow builder - Test a sequence of steps that makes up a workflow. 

        ** You add actions such as Click, Input Text, Verify Text

-


16. CloudWatch Container Insights

-Container Insights collects, aggregates and summarizes information about containers from metrics and logs

-Container Insight works with:

    *Elastic Container Service (ECS)

    *ECS Fargate

    *Elastic Kubernetes Service (EKS) 

    *Kubernetes running on EC2 instances

- Metrics that Container Insights collects are available in CloudWatch automatic dashboards

-You can analyze and troubleshoot container performance and logs data with CloudWatch Log Insights

-Operational data is collected as performance log events

    *These are entries that use a structured JSON schema that enables high-cardinality data to be ingested and stored at scale.

-Container Insights can be filtered by 

    * Cluster

    * Node

    * Pod

    * Task

    * Service Level

- Contributor Insights allows you to view top contributors impacting the performance of your systems and application in real-time

    * Contributor Insights looks at your CloudWatch Logs and based on Insight rules you define shows real-time time-series data.

    * AWS has a bunch of sample rules you can use to get started


17. CloudWatch CheatSheet

- The Pilars of Observability are Metrics, Logs and Traces. You need to use all three together to obtain Observability 

- AWS CloudWatch is a collection of monitoring services:

    * 1) CloudWatch Logs - centralization log management service. Logs from AWS Services, application Logs running on your EC2

    * 2) CloudWatch Log Insights - Robust way to filter your logs, it auto-detects log patterns creates dynamic fields, compose queries.

    * 3) CloudWatch Metrics - Aggerates datapoints from AWS Service to produce a metric. eg AvgCPUUtilization

    * 4) CloudWatch Alarms - Define a threshold up on metric and reacts by triggering action such an EC2 , Auto Scaling or SNS action

    * 5) CloudWatch Events (EventBridge) - A serverless account wide event bus. Create event-driven loosely coupled apps with AWS Services

    * 6) CloudWatch Dashboards - Create dashboards filled with graphs powered by CloudWatch metrics

    * 7) CloudWatch Container Insights - collect, aggregate, and summarize metrics and logs from your containerized apps and microservices

    * 8) CloudWatch Contributor Insights - view the top contributors impacting the performance of your systems and applications in real-time

    * CloudWatch Synthetics - Test your web-apps to see if they're broken. Uses puppeteer underneath

    * CloudWatch Service Lens - "Observability" for distributed or serverless apps by pulling X-Ray Traces, CloudWatch Logs, and Metrics


CloudWatch Log CheatSheet

- CloudWatch Log is used to monitor, store and access your log files

    * Export logs to S3 to analyze with Athena

    *Stream logs to ElasticSearch Service to run full-text search

    *Stream CloudTrail Events to CloudWatch Logs to allow you to react to CloudTrail Events

    *Encrypted by default , You can apply your own KMS Key

    *Retains logs indefinitely by default you can choose a custom retention period between 1 day and 10 years 

    *Support a simple Filtering Syntax

    * Log Groups - A container for collection of log streams . Uses the forward slash namig convention eg. /my-app/prod/us-east/

    * Log Streams - Represent a log, usually created by AWS Services, can be manually created

    * Log Events - Represents a single event ( a single line from the log file)

    * In order to send custom application logs running on your EC2 instance you nee to install the SSM CloudWatch Agent

- When the agent is install you update the /etc/awslogs/awslogs.conf on your EC2 instance to include the logs you want to send to CloudWatch


CloudWatch Log Insights CheatSheet

- CloudWatch Log Insights enables you to interactively search and analyze your CloudWatch log data

- CloudWatch Logs Insights supports all kin of logs

- CloudWatch Logs Insights is generally used from the AWS Console to create ad-hoc queries

- CloudWatch Logs Insights has its own language called CloudWatch Logs Insights Query Syntax

- A single request can query up to 20 log groups.

-Queries time out after 15 minutes, if they have not completed.

- Query results are available for 7 days

- CloudWatch Logs Insights comes with bunch of sample queries

- CloudWatch Logs Insights analyzes the log events in imported logs to generated out Dynamic Fields

- CloudWatch Logs Insights always generates out 5 of the following fields:

    * @message - the raw unparsed log event

    * @timestamp - the event timestamp contained in the log event's timestamp field.

    * @investigationTime - the time when twhe log event was received by CloudWatch Logs.

    * @logStream - the name of the log stream that the log event was added to.

    * @log - is a log group identifier inthe form of account-id:log-group-name


-CloudWatch auto-discovery fields for the following AWS Services

    * Amazon VPC Flow Logs

    * Amazon Route53

    * AWS Lambda

    * AWS CloudTrail

-JSON logs keys wll all become discovery fields

-For other logs that are not automatically  discovered  you can use the parse command to create  ephemeral  fields.



CloudWatch Metrics CheatSheet

-CloudWatch Metrics represents a time-ordered set of data points. Its a variable  that is monitored over time

- Metrics are data about the performance of your systems

-CloudWatch comes with many predefined metrics that are generally name spaced by AWS Service

-Very common predefined metrics at the EC2 Per-Instance Metrics: CPUUtilization, DiskReadOps, DiskWriteOps, DiskRead Bytes , DiskWriteBytes, NetworkIn, NetworkOut, NetworkPackersIn, Network PacketsOut

-Metric data is kept for 15 months , enabling you to view both up-to-the-minute data and historical data.

-Metric math enables you to query multiple CloudWatch metrics and use math expressions to create new time series 

-You can publish your own metrics using the AWS CLI or AWS SDK eg. aws cloudwatch put-metric-data

-You can define custom metrics in two different resolutions:

    * standard resolution (1 min)

    * high resolution (>1 min to 1 second)

-With High Resolution you can track in intervals of: 1 second, 5 seconds, 10 seconds, 30 seconds, multiple of 60 seconds.

-  Data available (when you can see the data) varies based on service:

    * EC2 Basic Monitoring (5 min)

    * EC2 Detailed Monitoring (1 min)

    * Other Services generally 1 min but could also be 3 or 5 minutes

-You can turn on EC2 Detailed Monitoring for a price

- Not all EC2 metrics are tracked by default you have to install the Cloud Watch Agent to collect:

    * Memory utilization ( AM I running out of memory?)

    * Disk Swap utilization 

    * Disk Space Utilization (Am I running out of disk space?)

    * Page file utilization



CloudWatch Alarms CheatSheet

-A CloudWatch Alarm monitors a CloudWatch Metric based on a defined threshold. When an alarm breaches (goes outside the defined threshold) than it changes state

- Metric Alarm States

    * OK the metric or expression is within the defined threshold

    * ALARM The metrics or expression is outside of the defined threshold

    * INSUFFICIENT_DATA 

        ** The alarm has just started

        ** The metric is not available

        ** Not enough data is available

- When it changes state we can define what action it should trigger:

    * Notification 

    * AutoScaling Group

    * EC2 Action 

- You can define Condition of either a Static or Anomaly Detection

    * Static set a static value as the threshold eg. 100 USD

    * Anomaly Detection sets a band around the data points , helps prevents false-positives , more flexible if you have seasonal data.

- Composite Alarms - allows you watch multiple alarms and require both to trigger before resulting in an alarm action

    * The alarms being watch must have no actions sets

    * You can only trigger an SNS as the action (so on EC2 or ASG actions)


CloudWatch Events (EventBridge) CheatSheet

- EventBridge is a serverless account wide or organization event bus. Create event-driven loosely coupled apps with AWS Services

-EventBridge was formally known as CloudWatch Events (both are still accessible within the AWS console)

-An event bus receives events from a source and routes events to a target based on rules

-EventBridge/ CloudWatch Events are json objects that are used to pass data around

-Many AWS Services emit Event data, but for AWS services that do not, you can turn on CloudTrail and react to those events.

-AWS API call events that are larger than 256 KB in size are not supported

- To react to event data you need to create an EventBridge rule

    * You can choose up to 5 Service Target eg. Lambda Function , SQS queue , SNS Topic, Firehose delivery stream , ECS Task 

    *When you select target you have additional fields to narrow do the AWS Service you want to target

    *You can apply Event Matching to filter what events should be passed to the Target

        ** You simply provide a sjson schema with the fields you want to match on , or you can use complex matchers:

            *** Prefix Matching match on the prefix of a value in the event source

            ***Anything but Matching matches anything except what' provided in the rule 

            ***Numeric Matching matches against numeric operator for "<", ">", "=", "<=", ">="

            ***IP Address Matching - matching against avaible for both IPv4 and IPv6 addressess

            ***Exists Matching matching works on the presence or absence of a field in the JSON

            ***Empty Value Matching for strings you can use "" to match empty for other values you can null

            ***Multiple Matching - combine multiple matching tules into a more complex event pattern

    *You can set Configure input which is used to transform/filter the event's data structure that will passed:

        ** Match Event - the entire event pattern text is passed to the target when the rule is triggered

        ** Part of the matched event - only the part of the event text that you specify is passed to the target.

        ** Constant (JSON text) send static content instead of the matched event data. (Mocked JSON)

        ** Input transformer - you can transform for the event text a different format of a string or a JSON object

- A common use case of EventBridge is to use as a serverless cronjob eg. trigger database backup everyday

-Event Bridge can schedule events using either Cron Expression or Rate Expressions

    * Cron Expression - very fine tune controle eg. 15 112 * MON-FRI * = 12 pm UTP Monday to Friday

    * Rate Expression - Easy to set, not as fine grained eg. choose every X Hours / Minutes / Days

    * All scheduled events use UTC time zone    

    * the minimum precision for schedules is 1 minute

- The event bus extend to third-party SaaS Products to accept Partner Event Sources eg. React to Datadog event and trigger ASG 

-EventBridge Schema allows you to keep track of changes to your event schemas an:

    * It will automatically detect the schema changes and create versions 

    * You can download the Schema as Code Bindings 

    * You can use the AWS Toolkit VSCode Plugin to vie genereted Schemas or apply Code Bindings



18. CloudTrail

- Logs API calls between AWS services. When you need to know who to blame.

- AWS CloudTrail is a service that enables governance, compliance, operational auditing , and risk auditing of your AWS account.

- AWS CloudTrail is used to monitor API calls and Actions made on an AWS account. 

- Easily identify which users and accounts made the call to AWS eg.

    * Where Source IP Address

    * When EventTime

    * Who User, UserAgent

    * What Region, Resource, Action

- You can set up CloudTrail cross organizations


a) CloudTrail - Event History

- CloudTrail is already logging by default and will collect logs for last 90 days via Event History

-If you need more than 90 days you need to create a Trail

- Trail are output to S3 and do not have GUI like Event History. To analyze a Trail you'd have to use Amazon Athena

-


b) CloudTrail - Trail Options

- A Trail can be set to log to all regions

- A Trail can be set to across all accounts in an Organization

- you can Encrypt your Logs using Server Side Encryption via Key Management Service (SSE-KMS)

-We can ensure the Integrity of our logs to see if they have been tampered we need to turn on Log File Validation

-


c) CloudTrail to CloudWatch

-CloudTrail can be set to deliver events to a CloudWatch log. Configuring delivery to CloudWatch  Logs enables you to receive SNS notifications from CloudWatch when specific API activity occurs. Standard CloudWatch and CloudWatch Logs charges will apply.

d) CloudTrail - Management vs Data Events

Management Events

-Tracks management operations. Turned on by default. Can't be turned off:

    * Configuring security - eg. IAM AttachRolePolicy API operations

    * Registering devices - eg. Amazon Ec2 CreateDefaultVpc API operations

    * Configured rules for routing data - eg. Amazon EC2 CreateSubnet API operations

    * Setting up logging - eg. AWS CloudTrail CreateTrail API operations


Data Events

- Tracks specific operations for specific AWS Services. Data events are high volume logging and will result in additional charges. Turned off by default

- The two services that can be tracked is S3 and Lambda. So it would track action such as GetObject , DeleteObject , PutObject

e) CloudTrail CheatSheet

- CloudTrail logs calls between AWS services

- governance, compliance, operational auditing, and risk auditing are keywords relating to CloudTrail 

- When you need to know who to blame think CloudTrail

- CloudTrail by default logs event data for the past 90s days via Event History

- To track beyond 90 days you need to create Trail

- To ensure logs have not been tampered with you need to turn on Log File Validation option

- CloudTrail logs can be encrypted using KMS (Key Management Service)

- CloudTrail can be set to log across all AWS accounts in an Organization and all regions in an account.

- CloudTrail logs can be streamed to CloudWatch logs

- Trails are outputted to an S3 bucket that you specify

- CloudTrail logs two kinds of events: Management Events and Data Events

- Management events log management operations eg. AttachRolePolicy

- Data Events log data operations for resources (S3, Lambda) eg. GetObject, DeleteObject, and PutObject

- Data Events are disabled by default when creating a Trail

- Trail logs in S3 can be analyzed using Athena


f)


19. 

20.

Komentarze

Popularne posty z tego bloga

Kubernetes

Helm

Ansible Tower / AWX