AWS Monitoring and Reporting
1. What is Observability?
- The ability to mesaure and understand how internal how internal systems work in order to answaer questions regarding performance, security and faults with a system / application.
- To obtain observability you need to use:
* Metrics - A number that is measured over period of time eg. if we measured the CPU usage and aggregated it over an a period of time we could have an Average CPU
* Logs - A text file where each line contains event data about what happened at a certain time.
* Traces - A history of request that is travels through multiple Apps/services so we can pinpoint performance or failure
* Alarms - it sometimes considered fourth Poilar of Observability
- You have to use them toghether , using them in isolate does not gain you observability
2. CloudWatch
- is a monitoring solution for AWS resources
- CloudWatch is a centralicez log managment service
- Cloud Watch is an umbrella service meaning that is really a collection of monitoring tools as follows:
* Logs - any custom log data, Application Logs, Nginx Logs, Lambda Logs
* Metrics - Represents a time-ordered set of data points. A variable to monitor eg. Memory Usage
* Events - trigger an event based on a condition eg. ever hour take snapshot of server (now known as Amazon EventBridge)
* Alarms - triggers notifications based on metrics which breach a defined threshold
* Dashboards - create visualizations based on metrics
* ServiceLens - visualize and analyze the health, performance, availability of your app in a single place
* Container Insights - collects, aggregates, and summerizes metrics and logs from your conteinerized apps nad microservices
* Synthetics - test your web-apps to see if they're broken
* Contributor Insights - view the top contributors impacting the performance of your systems and applications in real-time
- all this CloudWatch services are build off of CloudWatch logs.
<CW> IMAGE
3. CloudWatch Logs
- is used to :
* monitor
* store
* access log files
- detail services:
* Export Logs to S3 - You can export Logs to S3 to do things like perform custom analysis
* Stream to Elasticsearch Service (ES) - You can stream logs to an ES cluster in near real-life to have more robust full text search or use with the ELK stack
* Stream CloudTrail Events to CloudWatch Logs - You can turn on CloudTrail to stream event data to CloudWatch Log Group
* Log Security - By default , log group s are encrypted at rest using SSE. You can use your own Customer Master Key (CMKs) with AWS KMS
* Log Filtering - Logs can be filtred by using a Filtering Syntax and CloudWatch Logs as sub-service called CloudWatch Insights
* Log Rotation
** By default , logs are kept idefinitely and never expire. You can adjust the retention policy for each log group:
*** keeping the idefinite retention
*** choosing a retention period between 1 Day to 10 years
- Most AWS Services are integrated with ClodWatch Logs. Logging service sometimes need to be turned on or reaquiers the IAM Permission to write to ClodWatch Logs.
a) CloudWatch Logs - Log Groups
- Logs Groups - A collection of log streams.
- Its common to name logs groups with the forward slash syntax:
/exampro/prod/app
/exampro/prod/db
/exampro/dev/app
/exampro/dev/db
-You can set the retention on a Log Group between never expire and 120 months (10 years)
b) CloudWatch Logs - Log Stream
- Log Streams - a log stream represents a sequence of events from application or instance being monitored.
- You can create Log Streams manually but generally this is automatically done by the service you are using
- Example Log Group for a Lambda function . Logs Streams are named after the running instance. Lambdas frequency run on new instances so the streams contain timestamps
/20202/07.06.($LATESTJebcasfd438927nbdfkn
- Example Log Group for an application logs running on EC2 You can see here the Log Streams are named after the running instance's Instance ID
i-037623bdjhk8824890
- Example Log Group for AWS Glue You can see here the Log Streams are named after the Glue Jobs
exampro-events-crawler
c) CloudWatch Logs - Log Events
- Log Events - Represents a single event in a log file. Log events can be seen within a Log Stream.
-You can use filter events to filter Out logs based on simple or Pattern matching syntax
d) CloudWatch Logs - Log Insights
-CloudWatch Logs Insights - enables you to interactively search and analyze your CloudWatch log data and has the following advantages:
* more robust filtering then using the simple Filter events in a Log Stream
* Less burdensome then having to export logs to S3 and analyze them via Athena
- CloudWatch Logs Insights supports all types of logs.
- CloudWatch Logs Insights is commonly used via the console to do ad-hoc queries against logs groups.
-CloudWatch Insights has its own language calledL
* CloudWatch Logs Insights Query Syntax
filter action="REJECT"
| stats count(*) as numRejections by srcAddr
| sort numRejections desc
| limit 20
- A single request can query up to 20 log groups
- Queries time out after 15 minutes, if they have not completed
-Query results are available for 7 days.
e) CloudWatch Logs - Logs Insights - Discovered Fields
-When CloudWatch Insights reads a logs, it will first analyze the log events and try to structure the content by generating field that you can then use in your query.
-CloudWatch Logs Insights inserts the @ symbol at the start of fields that generates.
-Five system fields will be automatically genereted:
* @message - the raw unparsed log event
* @timestamp - the event timestamp contained in the log event's timestamp field.
* @investigationTime - the time when twhe log event was received by CloudWatch Logs.
* @logStream - the name of the log stream that the log event was added to.
* @log - is a log group identifier inthe form of account-id:log-group-name
-CloudWatch Logs Insights automatically discovers fields in logs from AWS services such as:
*Amazon VPC Flow Logs
**@timestamp
**@logStream
**@message
**@accountId
**@endTime
**@interfaceId
**@logStatus
**@startTime
**@version
**@action
**@bytes
**@dstAddr
**@dstPort
**@packets
**@protocol
**@srcAddr
**@srcPort
*Amazon Route 53
**@timestamp
**@logStream
**@message
**@edgeLocation
**@hostZoneId
**@protocol
**@queryName
**@queryTimestamp
**@queryType
**@resolverIp
**@responseCode
**@version
*AWS Lambda
**@timestamp
**@logStream
**@message
**@requestId
**@duration
**@billedDuration
**@type
**@maxMemoryUsed
**@memorySize
** with X-Ray
***@xrayTracedId
***@xraySegmentId
*AWS CloudTrail
**@eventVersion
**@eventTime
**@eventSource
**@eventName
**@awsRegion
**@sourceIPAddres
**@userAgent
*JSON Logs
** The fields of a JSON log will be turned into fields
*Other Types of Logs
** Fields that CloudWatch Logs Insights doesn't automatically discover you can use the parse command to extract and create ephemeral fields for use in the query.
-There are more , you have to check the JSON in the CloudTrail events to see full list
f)
4. CloudWatch Metrics
-A CloudWatch Metric represents a time-ordered set if data points . Its a variable that is monitored over time.
- CloudWatch comes with many predefined metrics that are generally name spaced by AWS Service
- EC2 Per-Instance Metrics
*CPUUtilization
*DiskReadOps
*DiskReadBytes
*DiskWriteBytes
*NetworkIn
*NetworkOut
*NetworkPacketsIn
*NetworkPacketsOut
5. Availability of Data
-When an AWS Services emits data to CloudWatch the availability of the data varies based on the AWS Service:
* EC2
** Basic Monitoring - 5 minute interval
** Detailed Monitoring - 1 minute interval
* Other Services
** Basic Monitoring - 1 minute / 3 minute / 5 minute (MAjority of AWS services avaibility is 1 minute)
** Detailed Monitoring - N/A
6. CloudWatch Agent
-the CloudWatch Agent can be installed using AWS System Manager (SSM) Run Command onto the target EC2 Instance. Using AWS-ConfigureAWSPackage - Install or uninstall a Distributor package. You can install the latest version, default version, or a version of the package yopu specify. Packages provided by AWS such as AmazonCloudWatchAgent. AwsEnaNetworkDriver, and AWSPVDriver are also supported.
*AWS-ConfigureAWSPackage with parameters
**Action - Install
**Installation Type - Uninstall and reinstall
**Name - AmazonCloudWatchAgent
**Version - latest
** Choose which EC2 Instance you want to install Agent on:
***Specify instance tags
***Choose instances manually
***Choose a resource group
** You must attach the CloudWatchAgentServerRole IAM role to the EC2 instance to be able to run the agent on the instance
7. Host Level Metrics
- Some metrics you might think are tracked by default for EC2 instances are not, and require installing the CloudWatch Agent
- Host Level Metrics - these are what you get without installing the Agent
*CPU Usage
*Network Usage
*Disk Usage
*Status Checks
**Underlying Hypervisior status
**Underlying EC2 Instance status
- Agent Level Metrics - These are what you get when installing the Agent
*Memory utilization
*Disk Swap utilization
*Disk Space utilization
*Page file utilization
*Log collection - The CloudWatch Agent is also used to collect various logs from an EC2 instance and send them to a CloudWatch Log Group
8. Custom High Resolution Metrics
-You can publish your own Custom Metrics using the AWS CLI or SDK
aws cloudwatch put-metric-data \
--metric-name Enterprise-D \
--namespace Starfleet \
--unit Bytes \
--value 2343324 \
--dimensions HullIntegrity=100, Shild=70, Thrusters=maxium
-When you publish a custom metric, you can define the resolution as either:
* standard resolution (1 minute)
* high resolution (>1 minute to 1 second)
-With High Resolution you can track in intervals of:
*1 second
*5 seconds
*10 seconds
*30 seconds
*multiple of 60 seconds
10. Log Collection
- The CloudWatch Agent can send logs running on your EC2 instance to a CloudWatch Log Group.
- To send logs:
1) the Agent Configuration needs to be updated to include the logs
2) the CloudWatch Agent service needs to be restarted
- The Agent's configuration file is located at /etc/awslogs/awslogs.conf
[exampro_application_log]
log_group_name = /exampro/rails/logs/production
log_stream_name = {instance_id}
datetime_format = %Y-%m-%dT%H:%M:%S.%f
file = /var/www/my-app/current/log/production.log*
- to restart CloudWatch Agent so it will send the added log files
sudo service awslogsd stop
sudo service awslogsd start
-
11. EventBridge
- What is an Event Bus? An event bus receives events from a source and routes events to a target based on rules
<eventbus> image
- EventBridge is a serverless event bus service that is used for application integration by streaming real-time data to you applications
-EventBridge was formally called CloudWatch Events.
-
a) Event Bridge Core Components
- Event Bus
* Holds event data , define rules on an event bus to react to events.
* Kinds of Event Bus
** Default Event Bus - An AWS account has a default event bus
** Custom Event Bus - Scoped to multiple accounts or other AWS accounts
** SaaS Event Bus - Scoped to with Third party SaaS Providers
-Definitions:
* Producers - AWS Services that emit events
* Partner Sources - A third-party apps that can emit events to an event bus
* Events - Data emitted by services. JSON objects that travel (stream) within the event bus
* Rules - Determines what events to capture and pass to targets. (100 Rules per bus)
* Targets - AWS Services that consume events (5 targets per rule)
b) EventBridge - Anatomy of an Event
-The top level fields here will always appear in every single event.
-The contents of fields appearing under detail will vary based on what AWS Cloud service emits the event
- example
{
"version" : "0",
"id" : "6a7e8feb-b391-4ef7-e9f1-bf3703467718",
"detail-type" : "EC2 Instance State-change Notification",
"source" : "aws.ec2",
"account" : "12121212121212",
"time" : "2020-05-22T14:22:48Z",
"region" : "us-east-1",
"resources" : [
"arn:aws:ec2:us-east-1:123456789012:instance/i-1234567890abcdef0"
],
"detail" : {
"instance-id" : "i-1234567890abcdef0",
"state" : "terminated"
}
}
version - by default, this is set to 0 (zero in all events.
id - a unique value is generated for every event.
detail-type - identifies fields and values that appear in the detail field
source - identifies the service that sourced the event.
account - 12-digit number identifying AWS account.
time - the event timestamp
region - AWS region where the event originated
resources - JSON array contains ARNs the identify resources that are involved in the event
detail - JSON object containing data provided by the Cloud Service. Can contain 50 fields nested several levels deep
b) EventBridge - Scheduled Expressions
- You can create EventBridge Rules that trigger on a schedule. You can think of it as Serverless Cron Jobs
*All scheduled events use UTC time zone
*the minimum precision for schedules is 1 minute
-EventBridge supports:
* cron expressions
** very fine grain control
** cron expression 0 10 * * ? *
* rate expressions
** easy to set, not as fine grained
** fixed rate every 8 hours (days, minutes)
c) EventBridge - Rules
- You specify up to five Targets for a single rule. Commonly targeted AWS Cloud Services:
* Lambda Function
* SQS queue
* SNS topic
* Firehose delivery stream
* ESC Task
- You may have some additional fields to select the target. eg.
* Lambda Function
* Lambda Alias
* Lambda Version
- You can specify what gets passed along by changing Configure Input. This acts as sort of filter.
* Matched events - The entire event pattern text is passed to the target when the rule is triggered. (Just Pass everything)
* Part of the matched event - only the part of the event text that you specify is passed to the target
**example: $.detail
* Constant (JSON text) - send static content instead of the matched event data. (Mocked JSON)
** example: { "success": true }
* Input transformer -
** You can transform for the event text a different format of a :
*** string
**** example:
{
"instance": "$.detail.instance",
"state": "$.detail.state"
}
"instance <instance> is in <state>"
*** or a JSON object
{
"instance": "$.detail.instance",
"state": "$.detail.state"
}
{
"instance": <instance>,
"state": <state>
}
** Yo can map fields to the variables. Then you can use those variables in a string or JSON object and that it what gets passed along.
** You can't use these as variable names (reserved by AWS):
*** aws.events.rule-arn
*** aws.events.rule-name
*** aws.events.event
d) EventBridge - Schema Registry
- EventBridge Schema Registry allows you to create, discover and manage OpenAPI schemas for events on EventBridge.
- What is a schema? A schema is an outline, diagram, or model. Schemas are often used to describe the structure of different types of data.
- It will monitor events within the Event Bus -> Catalogue changes to the schema -> View the schema and Download the Code Bindings
- Why would you want a schema of the events in your EventBridge event bus?
* So you can see if the structure of the events have changed over time (changed version)
* This makes it easier for developers to know. What data to expect from a type of event so its easier to integrate into applications.
* So we can download Code Bindings for various langugaes to make it easier for developers to work with events in their code.
import { EC2InstanceLaunchLifecycleAction } from './schema/aws/autoscaling/ec2instancelaunch lifecycleaction/EC2InstanceLaunchLifecycleAction'
const event = new EC2InstanceLaunchLifecycleAction
event.eC2InstanceId
* A Code Binding is when the schema is wrapped in a programming Object. This standardizes how to work with event data in the code Leading to fewer bugs and easier discovery data
- By installing the AWS Toolkit for VSCode you can easily View Schemas and install Code Bindings
* VSCode is a very popular code editor. You can install "extensions" directly with the editor. This is how you would install AWs Toolkit.
*Once installed you open the Command Pallet (Command + Shift + P_ and Connect to your AWS Account
-
e) EventBridge - CloudTrail Events
- Not all AWS Services emit CloudWatch Events
- For other AWS Services we can use CloudTrail
-Turning on CloudTrail allows EventBridge to track changes to AWS Services made by API calls or by AWS users.
-The Detail Type of CloudTrail will be called: "AWS API Call via CloudTrail"
-AWS API call events that are larger than 256 KB in size are not supported
-
f) EventBridge - Event Patterns
- Event Patterns are used to filter what events should be used to pass along to a target
-You can filter events by providing the same fildes and values found in the orginal Events.
-Let us say we want to create a rule that Only reacts to when an EC2 instance is terminated
- We would just supply the following as the Event Pattern
* Prefix Matching
** match on the prefix of a value in the event source
** example
"region": [{ "prefix" : "ca-" }]
* Anything-but Matching
** matches anything except what's provided in the rule
** example
"state": [{ "anything-but" : ["stopped" , "overloaded"] }]
* Numeric Matching
** Matches against numeric operator for "<", ">", "=", "<=", ">="
** example
"x-limit": [{ "numeric" : [">", 0, "<=", 5 ] }]
* IP Adress Matching
** matching against avaible for both IPv4 and IPv6 address
** example
"source-ip": [{ "cidr" : "10.0.0.0/24" }]
* Exists Matching
** matches works on the presence or absence of a field in the JSON
** example
"c-count": [{ "exists" : false }]
* Empty Value Matching
** for strings you can use "" to match empty. For other values you can null
** example
"eventVersion": [""]
"responseElements": [null]
* Complex example with Multiple Matching
** combine multiple matching rules into a more complex event pattern
** example
{
"time" : [{ "prexif" : "2017-20-02" }],
"detail" : {
"state": [{ "anything-but" : "initializing" }]
"c-count": [{ "numeric" : ["<", 10 ] }]
"x-limit": [{ "anything-but" : [100, 200, 300 ] }]
}
}
-
g) EventBridge - Partner Event Sources
-A list of third-party service providers can be integrated to work with EventBridge)
- Events will emit from the provider into your EventBus
- Example
*Atlassian -Opsgenie
*Auth0
PagerDuty
*DataDog
*Zendesk
*Whispir
*Saviynt
*Segment
*SignalFx
*SugarCRM
i)
j)
k)
l)
m)
n)
12. CloudWatch Alarms
- A CloudWatch Alarm monitors CloudWatch Metric based on a defined threshold
- When alarm breaches (goes outside defined threshold) than it changes state
-Metric Alarm States
*OK
** The metric or expression is within the defined threshold
*ALARM
** The metric or expression is outside of the defined threshold
*INSUFFICENT_DATA
** The alarm has just started
** The metric is not available
** Not enough data is available
-When it changes state we can define what action it should trigger.
* Notification
* Auto Scaling Group
* EC2 Action
a) CloudWatch Alarms - Anatomy of the Alarm
- Definitions
* Threshold Condition - Defines when a datapoint is breached
* Data point - Represents the metric's measurement at a given period
* Evaluation Periods - number of previous periods
* Metric - The actual data we are measuring
**NetworkIn - The volume of incoming network traffic. Measured in Bytes . When using 5 min monitoring divide to get 300 Bytes/Second
* Period - How often it checks to evaluate the Alarm ( 5 minutes)
* Datapoints to alarm - 1 data point is breached in an evaluation going back 4 periods. This is what trigger the alarm.
<CWM> image
b) CloudWatch Alarms - Alarm Conditions
- When you create an alarm you define the threshold. The most common type is a Static Threshold.
- Then you define the condition of the alarm
- Then you define the threshold value
- Example: You create an CloudWatch Alarm because you want to avoid unexpected charges (Billing Alarm)
* You use the EstimatedCharges metric
*You set the Threshold Type to Static
* Set the Alarm condition to Greater
*Set the threshold value of 50 USD
-
c) CloudWatch Alarms - Composite Alarms
- Composite Alarms are alarms that watch other alarms
- Using composite alarms can help you reduce alarm noise.
-Imagine you have two Alarms and you configure them to have no action:
*CPU Utilization
*NetworkIn
-You select both Alarms and create a Composite Alarm- > Create composite alarm -> Then you set the alarm conditions (ALARM("CPU_Utilization") OR ALARM(("NetworkIn")) -> The action you can configure for a composite alarm is an SNS Topic
-
d)
e)
f)
g)
h)
i)
13. CloudWatch Dashboards
- CloudWatch Dashboards allows you to visualize your Cloud Metrics in the form of various graphs.
-You create a widget, choose and configure a metric and add to you dashboard
14. CloudWatch ServiceLens
- CloudWatch ServiceLens gives you observability for your distributed applications by consolidating metrics, traces, logs, and alarms into one unified dashboard.
- What is a distributed application? Also known as a distributed system, is when network isolated service or applications that have to communicate over a network, together make a larger system/application
-Applications that could be defined as distributed system generally utilize :
*Microservices
*Containers
*Various Cloud Services, Compute and Databases tied together using Application Integration Services
-ServiceLens integrates CloudWatch with X-Ray to provide an end-to-end view of your application to help you efficiently:
*pinpoint performance bottlenecks
*identify impacted users
- Service Map displays your service endpoints as "nodes" and highlights the traffic, latency, and errors for each node and its connections
- ServiceLens integrates with CloudWatch Synthetics
- ServiceLens supports log correlation with:
* Lambda Functions
* API Gateway
* Java-based apps on EC2
* Java-based apps on ECS
* Java-based apps on EKS
* Kubernetes with Container Insights
- To install and use Service Lens you need to:
* Deploy X-Ray (Instrument your services)
* Deploy CloudWatch Agent and X-Ray deamon
- ServiceLens has two modes :
* Map View - is the Service Map showing us traces between nodes
* List View - is flat list of Nodes that are. Make us the service.
- Click into a node we a get Service Dashboard with lots of information
-ServiceLens lets us quickly filter trace information to open in X-Ray Analytics
-
15. CloudWatch Synthetics
- Synthetics is used to test web-application by creating canaries to :
* Broken or dead links
* Step by step task completion
* Page load errors
* Load latencies of assets
* Complex Wizard flows
* Checkout flows
- What is a Canary? Canaries are configurable scripts that run on a schedule, to monitor your endpoints and APIs. Canaries mimic steps a real user would take so you can continuously verify the customer expirience.
- Canaries run on AWS Lambdas using the Node.js and Puppeteer.
- Puppeteer is a headless chrome browser and an automated testing framework. You can code Puppeteer to open a web-browser and click and enter information into a website.
-When creating a canary you can:
*use a blue-print
*use the inline-editor
*import from S3
-There are 4 Blueprints:
*Heartbeat monitoring - used to check a single page
//INSERT URL here
const URL = "https://app.exampro.co/signup";
let page - await synthetics.getPage();
const response = await page.goto(URL, (waitUntil: 'domcontentloaded', timeout: 3000));
//Wait for page to render
//Increase or decrease wait time based on endpoint being monitored.
await page.waitFor(15000);
await synthetics.takeScreenshot('loaded', 'loaded');
let pageTitle = await page.title();
log.info('Page title: ' = pageTitle);
if (response.stdout() !== 200) {
throw = 'failed to loade page!";
}
};
exports.handler = async () => {
return await pageLoadBlueprint();
};
** supply a single URL
** wait a while and then take a screenshot when page has loaded
* Its called HearBeat Monitoring because it checks continously to see if the page is still a live (check "run continously" pick time interval)
*API canary - Used to check an API endpoint
**Supply the API endpoint
*** Method (GET, POST)
*** Headers
*** Payload (data)
** Checks if 200 is returned for success, anything else is considered a failure
*Broken link checker - Supply lin(s) , then look for links on the page , and follow them to see if any those links are broken
**Tell what website it should look at. ow many links on the page it should click on and see if they load
** You can supply multiple URLS
** it will log all the pages it was able to load or not load. Since Canaries use AWS Lambda it would just log to a CloudWatch Log group
*GUI workflow builder - Test a sequence of steps that makes up a workflow.
** You add actions such as Click, Input Text, Verify Text
-
16. CloudWatch Container Insights
-Container Insights collects, aggregates and summarizes information about containers from metrics and logs
-Container Insight works with:
*Elastic Container Service (ECS)
*ECS Fargate
*Elastic Kubernetes Service (EKS)
*Kubernetes running on EC2 instances
- Metrics that Container Insights collects are available in CloudWatch automatic dashboards
-You can analyze and troubleshoot container performance and logs data with CloudWatch Log Insights
-Operational data is collected as performance log events
*These are entries that use a structured JSON schema that enables high-cardinality data to be ingested and stored at scale.
-Container Insights can be filtered by
* Cluster
* Node
* Pod
* Task
* Service Level
- Contributor Insights allows you to view top contributors impacting the performance of your systems and application in real-time
* Contributor Insights looks at your CloudWatch Logs and based on Insight rules you define shows real-time time-series data.
* AWS has a bunch of sample rules you can use to get started
17. CloudWatch CheatSheet
- The Pilars of Observability are Metrics, Logs and Traces. You need to use all three together to obtain Observability
- AWS CloudWatch is a collection of monitoring services:
* 1) CloudWatch Logs - centralization log management service. Logs from AWS Services, application Logs running on your EC2
* 2) CloudWatch Log Insights - Robust way to filter your logs, it auto-detects log patterns creates dynamic fields, compose queries.
* 3) CloudWatch Metrics - Aggerates datapoints from AWS Service to produce a metric. eg AvgCPUUtilization
* 4) CloudWatch Alarms - Define a threshold up on metric and reacts by triggering action such an EC2 , Auto Scaling or SNS action
* 5) CloudWatch Events (EventBridge) - A serverless account wide event bus. Create event-driven loosely coupled apps with AWS Services
* 6) CloudWatch Dashboards - Create dashboards filled with graphs powered by CloudWatch metrics
* 7) CloudWatch Container Insights - collect, aggregate, and summarize metrics and logs from your containerized apps and microservices
* 8) CloudWatch Contributor Insights - view the top contributors impacting the performance of your systems and applications in real-time
* CloudWatch Synthetics - Test your web-apps to see if they're broken. Uses puppeteer underneath
* CloudWatch Service Lens - "Observability" for distributed or serverless apps by pulling X-Ray Traces, CloudWatch Logs, and Metrics
CloudWatch Log CheatSheet
- CloudWatch Log is used to monitor, store and access your log files
* Export logs to S3 to analyze with Athena
*Stream logs to ElasticSearch Service to run full-text search
*Stream CloudTrail Events to CloudWatch Logs to allow you to react to CloudTrail Events
*Encrypted by default , You can apply your own KMS Key
*Retains logs indefinitely by default you can choose a custom retention period between 1 day and 10 years
*Support a simple Filtering Syntax
* Log Groups - A container for collection of log streams . Uses the forward slash namig convention eg. /my-app/prod/us-east/
* Log Streams - Represent a log, usually created by AWS Services, can be manually created
* Log Events - Represents a single event ( a single line from the log file)
* In order to send custom application logs running on your EC2 instance you nee to install the SSM CloudWatch Agent
- When the agent is install you update the /etc/awslogs/awslogs.conf on your EC2 instance to include the logs you want to send to CloudWatch
CloudWatch Log Insights CheatSheet
- CloudWatch Log Insights enables you to interactively search and analyze your CloudWatch log data
- CloudWatch Logs Insights supports all kin of logs
- CloudWatch Logs Insights is generally used from the AWS Console to create ad-hoc queries
- CloudWatch Logs Insights has its own language called CloudWatch Logs Insights Query Syntax
- A single request can query up to 20 log groups.
-Queries time out after 15 minutes, if they have not completed.
- Query results are available for 7 days.
- CloudWatch Logs Insights comes with bunch of sample queries
- CloudWatch Logs Insights analyzes the log events in imported logs to generated out Dynamic Fields
- CloudWatch Logs Insights always generates out 5 of the following fields:
* @message - the raw unparsed log event
* @timestamp - the event timestamp contained in the log event's timestamp field.
* @investigationTime - the time when twhe log event was received by CloudWatch Logs.
* @logStream - the name of the log stream that the log event was added to.
* @log - is a log group identifier inthe form of account-id:log-group-name
-CloudWatch auto-discovery fields for the following AWS Services
* Amazon VPC Flow Logs
* Amazon Route53
* AWS Lambda
* AWS CloudTrail
-JSON logs keys wll all become discovery fields
-For other logs that are not automatically discovered you can use the parse command to create ephemeral fields.
CloudWatch Metrics CheatSheet
-CloudWatch Metrics represents a time-ordered set of data points. Its a variable that is monitored over time
- Metrics are data about the performance of your systems
-CloudWatch comes with many predefined metrics that are generally name spaced by AWS Service
-Very common predefined metrics at the EC2 Per-Instance Metrics: CPUUtilization, DiskReadOps, DiskWriteOps, DiskRead Bytes , DiskWriteBytes, NetworkIn, NetworkOut, NetworkPackersIn, Network PacketsOut
-Metric data is kept for 15 months , enabling you to view both up-to-the-minute data and historical data.
-Metric math enables you to query multiple CloudWatch metrics and use math expressions to create new time series
-You can publish your own metrics using the AWS CLI or AWS SDK eg. aws cloudwatch put-metric-data
-You can define custom metrics in two different resolutions:
* standard resolution (1 min)
* high resolution (>1 min to 1 second)
-With High Resolution you can track in intervals of: 1 second, 5 seconds, 10 seconds, 30 seconds, multiple of 60 seconds.
- Data available (when you can see the data) varies based on service:
* EC2 Basic Monitoring (5 min)
* EC2 Detailed Monitoring (1 min)
* Other Services generally 1 min but could also be 3 or 5 minutes
-You can turn on EC2 Detailed Monitoring for a price
- Not all EC2 metrics are tracked by default you have to install the Cloud Watch Agent to collect:
* Memory utilization ( AM I running out of memory?)
* Disk Swap utilization
* Disk Space Utilization (Am I running out of disk space?)
* Page file utilization
CloudWatch Alarms CheatSheet
-A CloudWatch Alarm monitors a CloudWatch Metric based on a defined threshold. When an alarm breaches (goes outside the defined threshold) than it changes state,
- Metric Alarm States
* OK the metric or expression is within the defined threshold
* ALARM The metrics or expression is outside of the defined threshold
* INSUFFICIENT_DATA
** The alarm has just started
** The metric is not available
** Not enough data is available
- When it changes state we can define what action it should trigger:
* Notification
* AutoScaling Group
* EC2 Action
- You can define Condition of either a Static or Anomaly Detection
* Static set a static value as the threshold eg. 100 USD
* Anomaly Detection sets a band around the data points , helps prevents false-positives , more flexible if you have seasonal data.
- Composite Alarms - allows you watch multiple alarms and require both to trigger before resulting in an alarm action
* The alarms being watch must have no actions sets
* You can only trigger an SNS as the action (so on EC2 or ASG actions)
CloudWatch Events (EventBridge) CheatSheet
- EventBridge is a serverless account wide or organization event bus. Create event-driven loosely coupled apps with AWS Services
-EventBridge was formally known as CloudWatch Events (both are still accessible within the AWS console)
-An event bus receives events from a source and routes events to a target based on rules
-EventBridge/ CloudWatch Events are json objects that are used to pass data around
-Many AWS Services emit Event data, but for AWS services that do not, you can turn on CloudTrail and react to those events.
-AWS API call events that are larger than 256 KB in size are not supported
- To react to event data you need to create an EventBridge rule
* You can choose up to 5 Service Target eg. Lambda Function , SQS queue , SNS Topic, Firehose delivery stream , ECS Task
*When you select target you have additional fields to narrow do the AWS Service you want to target
*You can apply Event Matching to filter what events should be passed to the Target
** You simply provide a sjson schema with the fields you want to match on , or you can use complex matchers:
*** Prefix Matching match on the prefix of a value in the event source
***Anything but Matching matches anything except what' provided in the rule
***Numeric Matching matches against numeric operator for "<", ">", "=", "<=", ">="
***IP Address Matching - matching against avaible for both IPv4 and IPv6 addressess
***Exists Matching matching works on the presence or absence of a field in the JSON
***Empty Value Matching for strings you can use "" to match empty for other values you can null
***Multiple Matching - combine multiple matching tules into a more complex event pattern
*You can set Configure input which is used to transform/filter the event's data structure that will passed:
** Match Event - the entire event pattern text is passed to the target when the rule is triggered
** Part of the matched event - only the part of the event text that you specify is passed to the target.
** Constant (JSON text) send static content instead of the matched event data. (Mocked JSON)
** Input transformer - you can transform for the event text a different format of a string or a JSON object
- A common use case of EventBridge is to use as a serverless cronjob eg. trigger database backup everyday
-Event Bridge can schedule events using either Cron Expression or Rate Expressions
* Cron Expression - very fine tune controle eg. 15 112 * MON-FRI * = 12 pm UTP Monday to Friday
* Rate Expression - Easy to set, not as fine grained eg. choose every X Hours / Minutes / Days
* All scheduled events use UTC time zone
* the minimum precision for schedules is 1 minute
- The event bus extend to third-party SaaS Products to accept Partner Event Sources eg. React to Datadog event and trigger ASG
-EventBridge Schema allows you to keep track of changes to your event schemas an:
* It will automatically detect the schema changes and create versions
* You can download the Schema as Code Bindings
* You can use the AWS Toolkit VSCode Plugin to vie genereted Schemas or apply Code Bindings
18. CloudTrail
- Logs API calls between AWS services. When you need to know who to blame.
- AWS CloudTrail is a service that enables governance, compliance, operational auditing , and risk auditing of your AWS account.
- AWS CloudTrail is used to monitor API calls and Actions made on an AWS account.
- Easily identify which users and accounts made the call to AWS eg.
* Where Source IP Address
* When EventTime
* Who User, UserAgent
* What Region, Resource, Action
- You can set up CloudTrail cross organizations
a) CloudTrail - Event History
- CloudTrail is already logging by default and will collect logs for last 90 days via Event History
-If you need more than 90 days you need to create a Trail
- Trail are output to S3 and do not have GUI like Event History. To analyze a Trail you'd have to use Amazon Athena
-
b) CloudTrail - Trail Options
- A Trail can be set to log to all regions
- A Trail can be set to across all accounts in an Organization
- you can Encrypt your Logs using Server Side Encryption via Key Management Service (SSE-KMS)
-We can ensure the Integrity of our logs to see if they have been tampered we need to turn on Log File Validation
-
c) CloudTrail to CloudWatch
-CloudTrail can be set to deliver events to a CloudWatch log. Configuring delivery to CloudWatch Logs enables you to receive SNS notifications from CloudWatch when specific API activity occurs. Standard CloudWatch and CloudWatch Logs charges will apply.
d) CloudTrail - Management vs Data Events
Management Events
-Tracks management operations. Turned on by default. Can't be turned off:
* Configuring security - eg. IAM AttachRolePolicy API operations
* Registering devices - eg. Amazon Ec2 CreateDefaultVpc API operations
* Configured rules for routing data - eg. Amazon EC2 CreateSubnet API operations
* Setting up logging - eg. AWS CloudTrail CreateTrail API operations
Data Events
- Tracks specific operations for specific AWS Services. Data events are high volume logging and will result in additional charges. Turned off by default
- The two services that can be tracked is S3 and Lambda. So it would track action such as GetObject , DeleteObject , PutObject
e) CloudTrail CheatSheet
- CloudTrail logs calls between AWS services
- governance, compliance, operational auditing, and risk auditing are keywords relating to CloudTrail
- When you need to know who to blame think CloudTrail
- CloudTrail by default logs event data for the past 90s days via Event History
- To track beyond 90 days you need to create Trail
- To ensure logs have not been tampered with you need to turn on Log File Validation option
- CloudTrail logs can be encrypted using KMS (Key Management Service)
- CloudTrail can be set to log across all AWS accounts in an Organization and all regions in an account.
- CloudTrail logs can be streamed to CloudWatch logs
- Trails are outputted to an S3 bucket that you specify
- CloudTrail logs two kinds of events: Management Events and Data Events
- Management events log management operations eg. AttachRolePolicy
- Data Events log data operations for resources (S3, Lambda) eg. GetObject, DeleteObject, and PutObject
- Data Events are disabled by default when creating a Trail
- Trail logs in S3 can be analyzed using Athena
f)
19.
20.
Komentarze
Prześlij komentarz