AWS Monitoring and Reporting

1. What is Observability?

- The ability to mesaure and understand how internal how internal systems work in order to answaer questions regarding performance, security and faults with a system / application.

- To obtain observability you need to use:

* Metrics - A number that is measured over period of time eg. if we measured the CPU usage and aggregated it over an a period of time we could have an Average CPU

* Logs - A text file where each line contains event data about what happened at a certain time.

* Traces - A history of request that is travels through multiple Apps/services so we can pinpoint performance or failure

* Alarms - it sometimes considered fourth Poilar of Observability

- You have to use them toghether , using them in isolate does not gain you observability

2. CloudWatch

- is a monitoring solution for AWS resources

- CloudWatch is a centralicez log managment service

- Cloud Watch is an umbrella service meaning that is really a collection of monitoring tools as follows:

* Logs - any custom log data, Application Logs, Nginx Logs, Lambda Logs

* Metrics - Represents a time-ordered set of data points. A variable to monitor eg. Memory Usage

* Events - trigger an event based on a condition eg. ever hour take snapshot of server (now known as Amazon EventBridge)

* Alarms - triggers notifications based on metrics which breach a defined threshold

* Dashboards - create visualizations based on metrics

* ServiceLens - visualize and analyze the health, performance, availability of your app in a single place

* Container Insights - collects, aggregates, and summerizes metrics and logs from your conteinerized apps nad microservices

* Synthetics - test your web-apps to see if they're broken

* Contributor Insights - view the top contributors impacting the performance of your systems and applications in real-time

- all this CloudWatch services are build off of CloudWatch logs.

<CW> IMAGE

3. CloudWatch Logs

- is used to :

* monitor

* store

* access log files

- detail services:

* Export Logs to S3 - You can export Logs to S3 to do things like perform custom analysis

* Stream to Elasticsearch Service (ES) - You can stream logs to an ES cluster in near real-life to have more robust full text search or use with the ELK stack

* Stream CloudTrail Events to CloudWatch Logs - You can turn on CloudTrail to stream event data to CloudWatch Log Group

* Log Security - By default , log group s are encrypted at rest using SSE. You can use your own Customer Master Key (CMKs) with AWS KMS

* Log Filtering - Logs can be filtred by using a Filtering Syntax and CloudWatch Logs as sub-service called CloudWatch Insights

* Log Rotation

** By default , logs are kept idefinitely and never expire. You can adjust the retention policy for each log group:

*** keeping the idefinite retention

*** choosing a retention period between 1 Day to 10 years

- Most AWS Services are integrated with ClodWatch Logs. Logging service sometimes need to be turned on or reaquiers the IAM Permission to write to ClodWatch Logs.

a) CloudWatch Logs - Log Groups

- Logs Groups - A collection of log streams.

- Its common to name logs groups with the forward slash syntax:

/exampro/prod/app

/exampro/prod/db

/exampro/dev/app

/exampro/dev/db

-You can set the retention on a Log Group between never expire and 120 months (10 years)

b) CloudWatch Logs - Log Stream

- Log Streams - a log stream represents a sequence of events from application or instance being monitored.

- You can create Log Streams manually but generally this is automatically done by the service you are using

- Example Log Group for a Lambda function . Logs Streams are named after the running instance. Lambdas frequency run on new instances so the streams contain timestamps

/20202/07.06.($LATESTJebcasfd438927nbdfkn

- Example Log Group for an application logs running on EC2 You can see here the Log Streams are named after the running instance's Instance ID

i-037623bdjhk8824890

- Example Log Group for AWS Glue You can see here the Log Streams are named after the Glue Jobs

exampro-events-crawler

c) CloudWatch Logs - Log Events

- Log Events - Represents a single event in a log file. Log events can be seen within a Log Stream.

-You can use filter events to filter Out logs based on simple or Pattern matching syntax

d) CloudWatch Logs - Log Insights

-CloudWatch Logs Insights - enables you to interactively search and analyze your CloudWatch log data and has the following advantages:

* more robust filtering then using the simple Filter events in a Log Stream

* Less burdensome then having to export logs to S3 and analyze them via Athena

- CloudWatch Logs Insights supports all types of logs.

- CloudWatch Logs Insights is commonly used via the console to do ad-hoc queries against logs groups.

-CloudWatch Insights has its own language calledL

* CloudWatch Logs Insights Query Syntax

filter action="REJECT"

| stats count(*) as numRejections by srcAddr

| sort numRejections desc

| limit 20

- A single request can query up to 20 log groups

- Queries time out after 15 minutes, if they have not completed

-Query results are available for 7 days.

e) CloudWatch Logs - Logs Insights - Discovered Fields

-When CloudWatch Insights reads a logs, it will first analyze the log events and try to structure the content by generating field that you can then use in your query.

-CloudWatch Logs Insights inserts the @ symbol at the start of fields that generates.

-Five system fields will be automatically genereted:

* @message - the raw unparsed log event

* @timestamp - the event timestamp contained in the log event's timestamp field.

* @investigationTime - the time when twhe log event was received by CloudWatch Logs.

* @logStream - the name of the log stream that the log event was added to.

* @log - is a log group identifier inthe form of account-id:log-group-name

-CloudWatch Logs Insights automatically discovers fields in logs from AWS services such as:

*Amazon VPC Flow Logs

**@timestamp

**@logStream

**@message

**@accountId

**@endTime

**@interfaceId

**@logStatus

**@startTime

**@version

**@action

**@bytes

**@dstAddr

**@dstPort

**@packets

**@protocol

**@srcAddr

**@srcPort

*Amazon Route 53

**@timestamp

**@logStream

**@message

**@edgeLocation

**@hostZoneId

**@protocol

**@queryName

**@queryTimestamp

**@queryType

**@resolverIp

**@responseCode

**@version

*AWS Lambda

**@timestamp

**@logStream

**@message

**@requestId

**@duration

**@billedDuration

**@type

**@maxMemoryUsed

**@memorySize

** with X-Ray

***@xrayTracedId

***@xraySegmentId

*AWS CloudTrail

**@eventVersion

**@eventTime

**@eventSource

**@eventName

**@awsRegion

**@sourceIPAddres

**@userAgent

*JSON Logs

** The fields of a JSON log will be turned into fields

*Other Types of Logs

** Fields that CloudWatch Logs Insights doesn't automatically discover you can use the parse command to extract and create ephemeral fields for use in the query.

-There are more , you have to check the JSON in the CloudTrail events to see full list

4. CloudWatch Metrics

-A CloudWatch Metric represents a time-ordered set if data points . Its a variable that is monitored over time.

- CloudWatch comes with many predefined metrics that are generally name spaced by AWS Service

- EC2 Per-Instance Metrics

*CPUUtilization

*DiskReadOps

*DiskReadBytes

*DiskWriteBytes

*NetworkIn

*NetworkOut

*NetworkPacketsIn

*NetworkPacketsOut

5. Availability of Data

-When an AWS Services emits data to CloudWatch the availability of the data varies based on the AWS Service:

* EC2

** Basic Monitoring - 5 minute interval

** Detailed Monitoring - 1 minute interval

* Other Services

** Basic Monitoring - 1 minute / 3 minute / 5 minute (MAjority of AWS services avaibility is 1 minute)

** Detailed Monitoring - N/A

6. CloudWatch Agent

-the CloudWatch Agent can be installed using AWS System Manager (SSM) Run Command onto the target EC2 Instance. Using AWS-ConfigureAWSPackage - Install or uninstall a Distributor package. You can install the latest version, default version, or a version of the package yopu specify. Packages provided by AWS such as AmazonCloudWatchAgent. AwsEnaNetworkDriver, and AWSPVDriver are also supported.

*AWS-ConfigureAWSPackage with parameters

**Action - Install

**Installation Type - Uninstall and reinstall

**Name - AmazonCloudWatchAgent

**Version - latest

** Choose which EC2 Instance you want to install Agent on:

***Specify instance tags

***Choose instances manually

***Choose a resource group

** You must attach the CloudWatchAgentServerRole IAM role to the EC2 instance to be able to run the agent on the instance

7. Host Level Metrics

- Some metrics you might think are tracked by default for EC2 instances are not, and require installing the CloudWatch Agent

- Host Level Metrics - these are what you get without installing the Agent

*CPU Usage

*Network Usage

*Disk Usage

*Status Checks

**Underlying Hypervisior status

**Underlying EC2 Instance status

- Agent Level Metrics - These are what you get when installing the Agent

*Memory utilization

*Disk Swap utilization

*Disk Space utilization

*Page file utilization

*Log collection - The CloudWatch Agent is also used to collect various logs from an EC2 instance and send them to a CloudWatch Log Group

8. Custom High Resolution Metrics

-You can publish your own Custom Metrics using the AWS CLI or SDK

aws cloudwatch put-metric-data \

--metric-name Enterprise-D \

--namespace Starfleet \

--unit Bytes \

--value 2343324 \

--dimensions HullIntegrity=100, Shild=70, Thrusters=maxium

-When you publish a custom metric, you can define the resolution as either:

* standard resolution (1 minute)

* high resolution (>1 minute to 1 second)

-With High Resolution you can track in intervals of:

*1 second

*5 seconds

*10 seconds

*30 seconds

*multiple of 60 seconds

10. Log Collection

- The CloudWatch Agent can send logs running on your EC2 instance to a CloudWatch Log Group.

- To send logs:

1) the Agent Configuration needs to be updated to include the logs

2) the CloudWatch Agent service needs to be restarted

- The Agent's configuration file is located at /etc/awslogs/awslogs.conf

[exampro_application_log]

log_group_name = /exampro/rails/logs/production

log_stream_name = {instance_id}

datetime_format = %Y-%m-%dT%H:%M:%S.%f

file = /var/www/my-app/current/log/production.log*

- to restart CloudWatch Agent so it will send the added log files

sudo service awslogsd stop

sudo service awslogsd start

11. EventBridge

- What is an Event Bus? An event bus receives events from a source and routes events to a target based on rules

<eventbus> image

- EventBridge is a serverless event bus service that is used for application integration by streaming real-time data to you applications

-EventBridge was formally called CloudWatch Events.

a) Event Bridge Core Components

- Event Bus

* Holds event data , define rules on an event bus to react to events.

* Kinds of Event Bus

** Default Event Bus - An AWS account has a default event bus

** Custom Event Bus - Scoped to multiple accounts or other AWS accounts

** SaaS Event Bus - Scoped to with Third party SaaS Providers

-Definitions:

* Producers - AWS Services that emit events

* Partner Sources - A third-party apps that can emit events to an event bus

* Events - Data emitted by services. JSON objects that travel (stream) within the event bus

* Rules - Determines what events to capture and pass to targets. (100 Rules per bus)

* Targets - AWS Services that consume events (5 targets per rule)

b) EventBridge - Anatomy of an Event

-The top level fields here will always appear in every single event.

-The contents of fields appearing under detail will vary based on what AWS Cloud service emits the event

- example

{

"version" : "0",

"id" : "6a7e8feb-b391-4ef7-e9f1-bf3703467718",

"detail-type" : "EC2 Instance State-change Notification",

"source" : "aws.ec2",

"account" : "12121212121212",

"time" : "2020-05-22T14:22:48Z",

"region" : "us-east-1",

"resources" : [

"arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"

"detail" : {

"instance-id" : "i-1234567890abcdef0",

"state" : "terminated"

}

version - by default, this is set to 0 (zero in all events.

id - a unique value is generated for every event.

detail-type - identifies fields and values that appear in the detail field

source - identifies the service that sourced the event.

account - 12-digit number identifying AWS account.

time - the event timestamp

region - AWS region where the event originated

resources - JSON array contains ARNs the identify resources that are involved in the event

detail - JSON object containing data provided by the Cloud Service. Can contain 50 fields nested several levels deep

b) EventBridge - Scheduled Expressions

- You can create EventBridge Rules that trigger on a schedule. You can think of it as Serverless Cron Jobs

*All scheduled events use UTC time zone

*the minimum precision for schedules is 1 minute

-EventBridge supports:

* cron expressions

** very fine grain control

** cron expression 0 10 * * ? *

* rate expressions

** easy to set, not as fine grained

** fixed rate every 8 hours (days, minutes)

c) EventBridge - Rules

- You specify up to five Targets for a single rule. Commonly targeted AWS Cloud Services:

* Lambda Function

* SQS queue

* SNS topic

* Firehose delivery stream

* ESC Task

- You may have some additional fields to select the target. eg.

* Lambda Function

* Lambda Alias

* Lambda Version

- You can specify what gets passed along by changing Configure Input. This acts as sort of filter.

* Matched events - The entire event pattern text is passed to the target when the rule is triggered. (Just Pass everything)

* Part of the matched event - only the part of the event text that you specify is passed to the target

**example: $.detail

* Constant (JSON text) - send static content instead of the matched event data. (Mocked JSON)

** example: { "success": true }

* Input transformer -

** You can transform for the event text a different format of a :

*** string

**** example:

{

"instance": "$.detail.instance",

"state": "$.detail.state"

}

"instance <instance> is in <state>"

*** or a JSON object

{

"instance": "$.detail.instance",

"state": "$.detail.state"

}

{

"instance": <instance>,

"state": <state>

}

** Yo can map fields to the variables. Then you can use those variables in a string or JSON object and that it what gets passed along.

** You can't use these as variable names (reserved by AWS):

*** aws.events.rule-arn

*** aws.events.rule-name

*** aws.events.event

d) EventBridge - Schema Registry

- EventBridge Schema Registry allows you to create, discover and manage OpenAPI schemas for events on EventBridge.

- What is a schema? A schema is an outline, diagram, or model. Schemas are often used to describe the structure of different types of data.

- It will monitor events within the Event Bus -> Catalogue changes to the schema -> View the schema and Download the Code Bindings

- Why would you want a schema of the events in your EventBridge event bus?

* So you can see if the structure of the events have changed over time (changed version)

* This makes it easier for developers to know. What data to expect from a type of event so its easier to integrate into applications.

* So we can download Code Bindings for various langugaes to make it easier for developers to work with events in their code.

import { EC2InstanceLaunchLifecycleAction } from './schema/aws/autoscaling/ec2instancelaunch lifecycleaction/EC2InstanceLaunchLifecycleAction'

const event = new EC2InstanceLaunchLifecycleAction

event.eC2InstanceId

* A Code Binding is when the schema is wrapped in a programming Object. This standardizes how to work with event data in the code Leading to fewer bugs and easier discovery data

- By installing the AWS Toolkit for VSCode you can easily View Schemas and install Code Bindings

* VSCode is a very popular code editor. You can install "extensions" directly with the editor. This is how you would install AWs Toolkit.

*Once installed you open the Command Pallet (Command + Shift + P_ and Connect to your AWS Account

e) EventBridge - CloudTrail Events

- Not all AWS Services emit CloudWatch Events

- For other AWS Services we can use CloudTrail

-Turning on CloudTrail allows EventBridge to track changes to AWS Services made by API calls or by AWS users.

-The Detail Type of CloudTrail will be called: "AWS API Call via CloudTrail"

-AWS API call events that are larger than 256 KB in size are not supported

f) EventBridge - Event Patterns

- Event Patterns are used to filter what events should be used to pass along to a target

-You can filter events by providing the same fildes and values found in the orginal Events.

-Let us say we want to create a rule that Only reacts to when an EC2 instance is terminated

- We would just supply the following as the Event Pattern

* Prefix Matching

** match on the prefix of a value in the event source

** example

"region": [{ "prefix" : "ca-" }]

* Anything-but Matching

** matches anything except what's provided in the rule

** example

"state": [{ "anything-but" : ["stopped" , "overloaded"] }]

* Numeric Matching

** Matches against numeric operator for "<", ">", "=", "<=", ">="

** example

"x-limit": [{ "numeric" : [">", 0, "<=", 5 ] }]

* IP Adress Matching

** matching against avaible for both IPv4 and IPv6 address

** example

"source-ip": [{ "cidr" : "10.0.0.0/24" }]

* Exists Matching

** matches works on the presence or absence of a field in the JSON

** example

"c-count": [{ "exists" : false }]

* Empty Value Matching

** for strings you can use "" to match empty. For other values you can null

** example

"eventVersion": [""]

"responseElements": [null]

* Complex example with Multiple Matching

** combine multiple matching rules into a more complex event pattern

** example

{

"time" : [{ "prexif" : "2017-20-02" }],

"detail" : {

"state": [{ "anything-but" : "initializing" }]

"c-count": [{ "numeric" : ["<", 10 ] }]

"x-limit": [{ "anything-but" : [100, 200, 300 ] }]

}

g) EventBridge - Partner Event Sources

-A list of third-party service providers can be integrated to work with EventBridge)

- Events will emit from the provider into your EventBus

- Example

*Atlassian -Opsgenie

*Auth0

PagerDuty

*DataDog

*Zendesk

*Whispir

*Saviynt

*Segment

*SignalFx

*SugarCRM

12. CloudWatch Alarms

- A CloudWatch Alarm monitors CloudWatch Metric based on a defined threshold

- When alarm breaches (goes outside defined threshold) than it changes state

-Metric Alarm States

*OK

** The metric or expression is within the defined threshold

*ALARM

** The metric or expression is outside of the defined threshold

*INSUFFICENT_DATA

** The alarm has just started

** The metric is not available

** Not enough data is available

-When it changes state we can define what action it should trigger.

* Notification

* Auto Scaling Group

* EC2 Action

a) CloudWatch Alarms - Anatomy of the Alarm

- Definitions

* Threshold Condition - Defines when a datapoint is breached

* Data point - Represents the metric's measurement at a given period

* Evaluation Periods - number of previous periods

* Metric - The actual data we are measuring

**NetworkIn - The volume of incoming network traffic. Measured in Bytes . When using 5 min monitoring divide to get 300 Bytes/Second

* Period - How often it checks to evaluate the Alarm ( 5 minutes)

* Datapoints to alarm - 1 data point is breached in an evaluation going back 4 periods. This is what trigger the alarm.

<CWM> image

b) CloudWatch Alarms - Alarm Conditions

- When you create an alarm you define the threshold. The most common type is a Static Threshold.

- Then you define the condition of the alarm

- Then you define the threshold value

- Example: You create an CloudWatch Alarm because you want to avoid unexpected charges (Billing Alarm)

* You use the EstimatedCharges metric

*You set the Threshold Type to Static

* Set the Alarm condition to Greater

*Set the threshold value of 50 USD

c) CloudWatch Alarms - Composite Alarms

- Composite Alarms are alarms that watch other alarms

- Using composite alarms can help you reduce alarm noise.

-Imagine you have two Alarms and you configure them to have no action:

*CPU Utilization

*NetworkIn

-You select both Alarms and create a Composite Alarm- > Create composite alarm -> Then you set the alarm conditions (ALARM("CPU_Utilization") OR ALARM(("NetworkIn")) -> The action you can configure for a composite alarm is an SNS Topic

13. CloudWatch Dashboards

- CloudWatch Dashboards allows you to visualize your Cloud Metrics in the form of various graphs.

-You create a widget, choose and configure a metric and add to you dashboard

14. CloudWatch ServiceLens

- CloudWatch ServiceLens gives you observability for your distributed applications by consolidating metrics, traces, logs, and alarms into one unified dashboard.

- What is a distributed application? Also known as a distributed system, is when network isolated service or applications that have to communicate over a network, together make a larger system/application

-Applications that could be defined as distributed system generally utilize :

*Microservices

*Containers

*Various Cloud Services, Compute and Databases tied together using Application Integration Services

-ServiceLens integrates CloudWatch with X-Ray to provide an end-to-end view of your application to help you efficiently:

*pinpoint performance bottlenecks

*identify impacted users

- Service Map displays your service endpoints as "nodes" and highlights the traffic, latency, and errors for each node and its connections

- ServiceLens integrates with CloudWatch Synthetics

- ServiceLens supports log correlation with:

* Lambda Functions

* API Gateway

* Java-based apps on EC2

* Java-based apps on ECS

* Java-based apps on EKS

* Kubernetes with Container Insights

- To install and use Service Lens you need to:

* Deploy X-Ray (Instrument your services)

* Deploy CloudWatch Agent and X-Ray deamon

- ServiceLens has two modes :

* Map View - is the Service Map showing us traces between nodes

* List View - is flat list of Nodes that are. Make us the service.

- Click into a node we a get Service Dashboard with lots of information

-ServiceLens lets us quickly filter trace information to open in X-Ray Analytics

15. CloudWatch Synthetics

- Synthetics is used to test web-application by creating canaries to :

* Broken or dead links

* Step by step task completion

* Page load errors

* Load latencies of assets

* Complex Wizard flows

* Checkout flows

- What is a Canary? Canaries are configurable scripts that run on a schedule, to monitor your endpoints and APIs. Canaries mimic steps a real user would take so you can continuously verify the customer expirience.

- Canaries run on AWS Lambdas using the Node.js and Puppeteer.

- Puppeteer is a headless chrome browser and an automated testing framework. You can code Puppeteer to open a web-browser and click and enter information into a website.

-When creating a canary you can:

*use a blue-print

*use the inline-editor

*import from S3

-There are 4 Blueprints:

*Heartbeat monitoring - used to check a single page

//INSERT URL here

const URL = "https://app.exampro.co/signup";

let page - await synthetics.getPage();

const response = await page.goto(URL, (waitUntil: 'domcontentloaded', timeout: 3000));

//Wait for page to render

//Increase or decrease wait time based on endpoint being monitored.

await page.waitFor(15000);

await synthetics.takeScreenshot('loaded', 'loaded');

let pageTitle = await page.title();

log.info('Page title: ' = pageTitle);

if (response.stdout() !== 200) {

throw = 'failed to loade page!";

}

};

exports.handler = async () => {

return await pageLoadBlueprint();

};

** supply a single URL

** wait a while and then take a screenshot when page has loaded

* Its called HearBeat Monitoring because it checks continously to see if the page is still a live (check "run continously" pick time interval)

*API canary - Used to check an API endpoint

**Supply the API endpoint

*** Method (GET, POST)

*** Headers

*** Payload (data)

** Checks if 200 is returned for success, anything else is considered a failure

*Broken link checker - Supply lin(s) , then look for links on the page , and follow them to see if any those links are broken

**Tell what website it should look at. ow many links on the page it should click on and see if they load

** You can supply multiple URLS

** it will log all the pages it was able to load or not load. Since Canaries use AWS Lambda it would just log to a CloudWatch Log group

*GUI workflow builder - Test a sequence of steps that makes up a workflow.

** You add actions such as Click, Input Text, Verify Text

16. CloudWatch Container Insights

-Container Insights collects, aggregates and summarizes information about containers from metrics and logs

-Container Insight works with:

*Elastic Container Service (ECS)

*ECS Fargate

*Elastic Kubernetes Service (EKS)

*Kubernetes running on EC2 instances

- Metrics that Container Insights collects are available in CloudWatch automatic dashboards

-You can analyze and troubleshoot container performance and logs data with CloudWatch Log Insights

-Operational data is collected as performance log events

*These are entries that use a structured JSON schema that enables high-cardinality data to be ingested and stored at scale.

-Container Insights can be filtered by

* Cluster

* Node

* Pod

* Task

* Service Level

- Contributor Insights allows you to view top contributors impacting the performance of your systems and application in real-time

* Contributor Insights looks at your CloudWatch Logs and based on Insight rules you define shows real-time time-series data.

* AWS has a bunch of sample rules you can use to get started

17. CloudWatch CheatSheet

- The Pilars of Observability are Metrics, Logs and Traces. You need to use all three together to obtain Observability

- AWS CloudWatch is a collection of monitoring services:

* 1) CloudWatch Logs - centralization log management service. Logs from AWS Services, application Logs running on your EC2

* 2) CloudWatch Log Insights - Robust way to filter your logs, it auto-detects log patterns creates dynamic fields, compose queries.

* 3) CloudWatch Metrics - Aggerates datapoints from AWS Service to produce a metric. eg AvgCPUUtilization

* 4) CloudWatch Alarms - Define a threshold up on metric and reacts by triggering action such an EC2 , Auto Scaling or SNS action

* 5) CloudWatch Events (EventBridge) - A serverless account wide event bus. Create event-driven loosely coupled apps with AWS Services

* 6) CloudWatch Dashboards - Create dashboards filled with graphs powered by CloudWatch metrics

* 7) CloudWatch Container Insights - collect, aggregate, and summarize metrics and logs from your containerized apps and microservices

* 8) CloudWatch Contributor Insights - view the top contributors impacting the performance of your systems and applications in real-time

* CloudWatch Synthetics - Test your web-apps to see if they're broken. Uses puppeteer underneath

* CloudWatch Service Lens - "Observability" for distributed or serverless apps by pulling X-Ray Traces, CloudWatch Logs, and Metrics

CloudWatch Log CheatSheet

- CloudWatch Log is used to monitor, store and access your log files

* Export logs to S3 to analyze with Athena

*Stream logs to ElasticSearch Service to run full-text search

*Stream CloudTrail Events to CloudWatch Logs to allow you to react to CloudTrail Events

*Encrypted by default , You can apply your own KMS Key

*Retains logs indefinitely by default you can choose a custom retention period between 1 day and 10 years

*Support a simple Filtering Syntax

* Log Groups - A container for collection of log streams . Uses the forward slash namig convention eg. /my-app/prod/us-east/

* Log Streams - Represent a log, usually created by AWS Services, can be manually created

* Log Events - Represents a single event ( a single line from the log file)

* In order to send custom application logs running on your EC2 instance you nee to install the SSM CloudWatch Agent

- When the agent is install you update the /etc/awslogs/awslogs.conf on your EC2 instance to include the logs you want to send to CloudWatch

CloudWatch Log Insights CheatSheet

- CloudWatch Log Insights enables you to interactively search and analyze your CloudWatch log data

- CloudWatch Logs Insights supports all kin of logs

- CloudWatch Logs Insights is generally used from the AWS Console to create ad-hoc queries

- CloudWatch Logs Insights has its own language called CloudWatch Logs Insights Query Syntax

- A single request can query up to 20 log groups.

-Queries time out after 15 minutes, if they have not completed.

- Query results are available for 7 days.

- CloudWatch Logs Insights comes with bunch of sample queries

- CloudWatch Logs Insights analyzes the log events in imported logs to generated out Dynamic Fields

- CloudWatch Logs Insights always generates out 5 of the following fields:

* @message - the raw unparsed log event

* @timestamp - the event timestamp contained in the log event's timestamp field.

* @investigationTime - the time when twhe log event was received by CloudWatch Logs.

* @logStream - the name of the log stream that the log event was added to.

* @log - is a log group identifier inthe form of account-id:log-group-name

-CloudWatch auto-discovery fields for the following AWS Services

* Amazon VPC Flow Logs

* Amazon Route53

* AWS Lambda

* AWS CloudTrail

-JSON logs keys wll all become discovery fields

-For other logs that are not automatically discovered you can use the parse command to create ephemeral fields.

CloudWatch Metrics CheatSheet

-CloudWatch Metrics represents a time-ordered set of data points. Its a variable that is monitored over time

- Metrics are data about the performance of your systems

-CloudWatch comes with many predefined metrics that are generally name spaced by AWS Service

-Very common predefined metrics at the EC2 Per-Instance Metrics: CPUUtilization, DiskReadOps, DiskWriteOps, DiskRead Bytes , DiskWriteBytes, NetworkIn, NetworkOut, NetworkPackersIn, Network PacketsOut

-Metric data is kept for 15 months , enabling you to view both up-to-the-minute data and historical data.

-Metric math enables you to query multiple CloudWatch metrics and use math expressions to create new time series

-You can publish your own metrics using the AWS CLI or AWS SDK eg. aws cloudwatch put-metric-data

-You can define custom metrics in two different resolutions:

* standard resolution (1 min)

* high resolution (>1 min to 1 second)

-With High Resolution you can track in intervals of: 1 second, 5 seconds, 10 seconds, 30 seconds, multiple of 60 seconds.

- Data available (when you can see the data) varies based on service:

* EC2 Basic Monitoring (5 min)

* EC2 Detailed Monitoring (1 min)

* Other Services generally 1 min but could also be 3 or 5 minutes

-You can turn on EC2 Detailed Monitoring for a price

- Not all EC2 metrics are tracked by default you have to install the Cloud Watch Agent to collect:

* Memory utilization ( AM I running out of memory?)

* Disk Swap utilization

* Disk Space Utilization (Am I running out of disk space?)

* Page file utilization

CloudWatch Alarms CheatSheet

-A CloudWatch Alarm monitors a CloudWatch Metric based on a defined threshold. When an alarm breaches (goes outside the defined threshold) than it changes state,

- Metric Alarm States

* OK the metric or expression is within the defined threshold

* ALARM The metrics or expression is outside of the defined threshold

* INSUFFICIENT_DATA

** The alarm has just started

** The metric is not available

** Not enough data is available

- When it changes state we can define what action it should trigger:

* Notification

* AutoScaling Group

* EC2 Action

- You can define Condition of either a Static or Anomaly Detection

* Static set a static value as the threshold eg. 100 USD

* Anomaly Detection sets a band around the data points , helps prevents false-positives , more flexible if you have seasonal data.

- Composite Alarms - allows you watch multiple alarms and require both to trigger before resulting in an alarm action

* The alarms being watch must have no actions sets

* You can only trigger an SNS as the action (so on EC2 or ASG actions)

CloudWatch Events (EventBridge) CheatSheet

- EventBridge is a serverless account wide or organization event bus. Create event-driven loosely coupled apps with AWS Services

-EventBridge was formally known as CloudWatch Events (both are still accessible within the AWS console)

-An event bus receives events from a source and routes events to a target based on rules

-EventBridge/ CloudWatch Events are json objects that are used to pass data around

-Many AWS Services emit Event data, but for AWS services that do not, you can turn on CloudTrail and react to those events.

-AWS API call events that are larger than 256 KB in size are not supported

- To react to event data you need to create an EventBridge rule

* You can choose up to 5 Service Target eg. Lambda Function , SQS queue , SNS Topic, Firehose delivery stream , ECS Task

*When you select target you have additional fields to narrow do the AWS Service you want to target

*You can apply Event Matching to filter what events should be passed to the Target

** You simply provide a sjson schema with the fields you want to match on , or you can use complex matchers:

*** Prefix Matching match on the prefix of a value in the event source

***Anything but Matching matches anything except what' provided in the rule

***Numeric Matching matches against numeric operator for "<", ">", "=", "<=", ">="

***IP Address Matching - matching against avaible for both IPv4 and IPv6 addressess

***Exists Matching matching works on the presence or absence of a field in the JSON

***Empty Value Matching for strings you can use "" to match empty for other values you can null

***Multiple Matching - combine multiple matching tules into a more complex event pattern

*You can set Configure input which is used to transform/filter the event's data structure that will passed:

** Match Event - the entire event pattern text is passed to the target when the rule is triggered

** Part of the matched event - only the part of the event text that you specify is passed to the target.

** Constant (JSON text) send static content instead of the matched event data. (Mocked JSON)

** Input transformer - you can transform for the event text a different format of a string or a JSON object

- A common use case of EventBridge is to use as a serverless cronjob eg. trigger database backup everyday

-Event Bridge can schedule events using either Cron Expression or Rate Expressions

* Cron Expression - very fine tune controle eg. 15 112 * MON-FRI * = 12 pm UTP Monday to Friday

* Rate Expression - Easy to set, not as fine grained eg. choose every X Hours / Minutes / Days

* All scheduled events use UTC time zone

* the minimum precision for schedules is 1 minute

- The event bus extend to third-party SaaS Products to accept Partner Event Sources eg. React to Datadog event and trigger ASG

-EventBridge Schema allows you to keep track of changes to your event schemas an:

* It will automatically detect the schema changes and create versions

* You can download the Schema as Code Bindings

* You can use the AWS Toolkit VSCode Plugin to vie genereted Schemas or apply Code Bindings

18. CloudTrail

- Logs API calls between AWS services. When you need to know who to blame.

- AWS CloudTrail is a service that enables governance, compliance, operational auditing , and risk auditing of your AWS account.

- AWS CloudTrail is used to monitor API calls and Actions made on an AWS account.

- Easily identify which users and accounts made the call to AWS eg.

* Where Source IP Address

* When EventTime

* Who User, UserAgent

* What Region, Resource, Action

- You can set up CloudTrail cross organizations

a) CloudTrail - Event History

- CloudTrail is already logging by default and will collect logs for last 90 days via Event History

-If you need more than 90 days you need to create a Trail

- Trail are output to S3 and do not have GUI like Event History. To analyze a Trail you'd have to use Amazon Athena

b) CloudTrail - Trail Options

- A Trail can be set to log to all regions

- A Trail can be set to across all accounts in an Organization

- you can Encrypt your Logs using Server Side Encryption via Key Management Service (SSE-KMS)

-We can ensure the Integrity of our logs to see if they have been tampered we need to turn on Log File Validation

c) CloudTrail to CloudWatch

-CloudTrail can be set to deliver events to a CloudWatch log. Configuring delivery to CloudWatch Logs enables you to receive SNS notifications from CloudWatch when specific API activity occurs. Standard CloudWatch and CloudWatch Logs charges will apply.

d) CloudTrail - Management vs Data Events

Management Events

-Tracks management operations. Turned on by default. Can't be turned off:

* Configuring security - eg. IAM AttachRolePolicy API operations

* Registering devices - eg. Amazon Ec2 CreateDefaultVpc API operations

* Configured rules for routing data - eg. Amazon EC2 CreateSubnet API operations

* Setting up logging - eg. AWS CloudTrail CreateTrail API operations

Data Events

- Tracks specific operations for specific AWS Services. Data events are high volume logging and will result in additional charges. Turned off by default

- The two services that can be tracked is S3 and Lambda. So it would track action such as GetObject , DeleteObject , PutObject

e) CloudTrail CheatSheet

- CloudTrail logs calls between AWS services

- governance, compliance, operational auditing, and risk auditing are keywords relating to CloudTrail

- When you need to know who to blame think CloudTrail

- CloudTrail by default logs event data for the past 90s days via Event History

- To track beyond 90 days you need to create Trail

- To ensure logs have not been tampered with you need to turn on Log File Validation option

- CloudTrail logs can be encrypted using KMS (Key Management Service)

- CloudTrail can be set to log across all AWS accounts in an Organization and all regions in an account.

- CloudTrail logs can be streamed to CloudWatch logs

- Trails are outputted to an S3 bucket that you specify

- CloudTrail logs two kinds of events: Management Events and Data Events

- Management events log management operations eg. AttachRolePolicy

- Data Events log data operations for resources (S3, Lambda) eg. GetObject, DeleteObject, and PutObject

- Data Events are disabled by default when creating a Trail

- Trail logs in S3 can be analyzed using Athena

19.

20.

Szukaj na tym blogu

DevSecOps Notes

AWS Monitoring and Reporting

Komentarze

Prześlij komentarz

Popularne posty z tego bloga

Kubernetes

Helm

Ansible Tower / AWX