AWS Storage and Data Management
1) Service Catalog
- AWS Service Catalog enables organization to create and manage catalogs of products that are approved for use on AWS to achieve consistent governance and meet compliance requierments.
- The AWS Service Catalog is an alternative to granting direct access to AWS resources via the AWS Console
* Standarization
* Self-service discovery and launch
* Fine-grain access control
* Extensibility and version control
Anatomy of Service Control

b) Service Catalog - Administrator Portfolio
2) Elastic Block Store (EBS)
- A virtual hard drive in the cloud. Create new volumes attach to EC2 instances, backup via snapshots and easy encryption.
- What is IOPS?
IOPS stands for Input/Output Per Second. It is the speed at which non-contiguous reads and writes can be performed on a storage medium. high I/O = lots of small fast reads and writes
- What is Throughput?
The data transfer rate to and from the storage medium in megabytes per second.
- What is Bandwidth?
Bandwidth is the measurement of the total possible speed of data movement along the network.
- Thing Bandwidth as the pipe and Throughput as the water
- Elastic Block Store (EBS) is a highly available and durable solution for attaching persistent storage volumes to an EC2 instance.
-Volumes are automatically replicated within their Available Zone (AZ) to protect from component failure.
a) EBS - Volume Type Usage
- There are 5 Types of EBS Storage
* 1. General Purpose (SSD) (gp2) - for general usage without specific requierments
* 2. Provisioned IOPS (SSD) (io1) - when you require really fast input & output
* 3. Throughput Optimized HDD (st1) - magnetic drive optimised for quick throughput
* 4. Cold HDD (sc1) - lowest cost HDD volume for infrequently access workloads
* 5. EBS Magnetic (standard) - previous generation HDD
b) Storage Volumes
- Hard Disk Drive (HDD)
* HDD is magnetic storage that uses rotating platters an actuator arm and a magnetic head (similar to record player)
* HDD is very good at writing a continuously amount of data
* HDD is not great for writing many small reads and write (think of the arm of record player having lift up and down and move around)
* Better for Throughput
* Physical Moving Part
- Solid State Drive (SSD)
* Uses integrated circuit assemblies as memory to store data persistently, typically using flash memory. SDDs are typically more resistant to physical shock, run silently, and have quicker access time and lower latency.
* Very good frequently reads and writes (I/O)
* No physical moving parts
- Magnetic Tape
* A large reel of magnetic tape. A tape drive is used to write data to the tapel. Medium and large-sized data centers deployed both type and disk formats. They normally come in the form of a cassettes. Magnetic is very cheap to produce and can store considerable amount of data.
* Durable for decades
* cheap to produce
-
c) EBS - Moving Volume
- From one AZ to another
* 1. take a Snapshot of the volume
* 2. create an AMI from the Snapshot
* 3. launch new EC2 instance in desired AZ
- From one Region to another
* 1. take a Snapshot of the volume
* 2. create an AMI from the Snapshot
* 3. copy the AMI to another Region
* 4. launch a new EC2 instance from the copied AMI.
-
d) EBS - Encrypted Root Volume
- When you are through the wizard launching an EC2 instance you can encrypt the volume on creation
- If you want to encrypt an existing volume you'll have to do the following:
* Take snapshot of the unencrypted volume
* Create a copy of that Snapshot and select the Encryption option
* Create a new AMI from the encrypted Snapshot
* Launch a new EC2 instance from the created AMI
-
e) EBS vs Instance Store Volumes
- An EC2 instance can be backed (root device) by an EBS Volume or Instance Store Volume
- EBS Volumes
* A durable , block-level storage device that you can attach to a single EC2 instance
* EBS Volume created from an EBS Snapshot
* Can start and stop instances
* Data will persist if your reboot your system
* Ideal for when you want data to persist. In most use cases you'll want EBS backed volume.
- Instance Store Volumes (Ephemeral - lasting for a very short time)
* A temporary storage type located on disks that are physically attached to a host machine.
* An Instance Store Volume is created from a template stored in S3
* Cannot stop instances can only terminate.
* Data will be loss in case of health host fails or instance is terminated
-
f) EBS CheatSheet
- Elastic Block Store (EBS) is a virtual hard disk. Snapshots are a point-in-time copy of that disk.
- Volumes exist on EBS. Snapshots exist on S3.
- Snapshots are incremental, only changes made since the last snapshot are moved to S3
- Initial Snapshots of an EC2 instance will take longer to create than subsequent Snapshots
- Of taking Snapshot of a root volume, the EC2 instance should be stopped before Snapshotting
- You can take Snapshots while the instance is still running.
- You can create AMIs from Volumes, or from Snapshots
- EBS Volumes A durable, block-level storage device that you can attach to a single EC2 instance
- EBS Volumes can be modified on the fly eg. storage type or volume size.
- volumes always exist in the same AZ as the EC2 instance.
- Instance Store Volumes A temporary storage type located on disks that are physically attached to a host machine.
- Instance Store Volumes (ephemeral) cannot be stopped. If the host fails then lose your data.
- EBS Backed instances can be stopped and you will not lose any data.
- by default root volumes are deleted on termination.
- EBS Volumes can have termination protection (don't delete the volume on termination)
- Snapshots or restored encrypted volumes will also be encrypted.
- You cannot share a snapshot if it has been encrypted.
- Unencrypted snapshots can be shared with either AWS accounts or made public.
g)
h)
i)
j)
k)
l)
m)
n)
o)
p)
3) AWS Storage Gateway
- Extending, backing up up on-premise storage to the cloud
- AWS Storage Gateway connects an on-premises software appliance with cloud-based storage.
- Provides you seamless and secure integration between your organization's on-premises IT environment and AWS's storage infrasture.
- Securely store your data to the AWs Cloud for scalable and cost effective storage.
- Software appliance is available as a virtual machine (VM) image.
- Supports both VMmare ESXI and Microsoft Hyper-V.
- Once installed and activated you can use the AWS Console to create your gateway
- There are 3 Types of Gateways:
* File Gateway (NFS)
** store your files in S3
* Volume Gateway (iSCSI)
** store copies of your hard disk drives in S3
** Stored Volumes
** Cached Volumes
* Tape Gateway (VTL)
** virtual tape library
-
a) File Gateway (NFS)
- Your files are stored as objects inside your S3 buckets.
- Access your files through a Network File System (NFS) or SMB mount point
- Ownership, permissions, and timestamps are all stored within S3 metadata of the object associated with the file.
- Once a file is transferred to S3, it can be managed as a native S3 object.
- Bucket Policies, Versioning, Lifecycle Management, and Cross-Region Replication apply directly to objects stored in your bucket.
b) Volume Gateway (iSCSI)
- Volume Gateway presents your applications with disk volumes using the Internet Small Computer Systems Interface (iSCSI) block protocol.
- Data that is written to volumes can by asynchronously backed up as point-in-time snapshots of the volumes, and stored in the cloud as AWS EBS Snapshots.
-Snapshots are incremental backups that capture only changed blocks in the volume.
- All Snapshot storage is alse compressed to help minimize your storage charges.
c) Volume Gateway - Stored Volumes
- Primary data is stored locally, while asynchronously backing up that data to AWS.
- Provide your on-premises applications with low-latency access to thier entire datasets, while still providing durable off-site backups.
-Create storage volumes and mount them as iSCSI devices from your on-premises servers.
- Any data written to stored volumes in stored on your on-premises storage hardware.
- Amazon Elastic Block Store (EBS) snapshots are backed up to AWS S3.
- Stored Volumes can be between 1GB - 16TB in size.
d) Volume Gateway - Cached Volumes
- Lets you use AWS S3 as your primary data storage, while retaining frequently accessed data locally in your storage gateway.
- Minimizes the need to scale your on-premises storage infrastracture, while still providing your applications with low latency data access.
- Create storage volumes up to 32TB in size and attach them as iSCSI devices from your on-premises servers.
- Your gateway stores data that you write to these volumes in S3, and retains recently read data in your on-premises storage gateway cache and upload buffer storage.
- Cached volumes can be between 1GB - 32GB in size.
e) Tape Gateway (VTL)
- A durable, cos-effective solution to archive your data in the AWS Cloud
- The VTL interface it provides let you leverage existing tape-based backup application infrastructure.
- Store data on virtual tape cartridges that you create on your tape gateway.
- Each tape gateway is pre-configured with a media changer and tape drives, which are available to your existing client backup applications as iSCSI devices.
- You add tape cartridges as you need to archive your data.
- Supported by NetBackup, Backup Exec, and Veeam.
f) Storage Gateway CheatSheet
- Storage Gateway connects on-premise storage to cloud storage (hybrid storage solution)
- These are three types of Gateways: File Gateway, Volume Gateway, Tape Gateway
- File Gateway lets S3 act a local file system using NFS or SMB, extends your local hard drive to S3
- Volume Gateway is used for backups and has two types: Stored and Cached
- Stored Volume Gateway continuously backups local storage to S3 as EBS Snapshots Primary Data on-Premise
- Stored Volumes are 1GB to 16TB in size
- Cached Volume Gateway caches the frequently used files on-premise. Primary Data is stored on S3
- Cached Volumes are 1GB to 32GB in size
- Tape Gateway backups up virtual tapes to S3 Glacier for long archive storage
g)
h)
i)
j)
k)
4) ElastiCache
- Managed caching service which either runs Redis or Memcached
- Deploy, run , and scale popular open source compatibility in-memory data stores.
- Frequently identical queries are stored in the cache.
- ElastiCache is only accessible to resource operating with the same VPC to ensure low latency.
- ElastiCache supports 2 open-source caching engines:
* 1. Memcached
* 2. Redis
a) What is In- Memory Data Store
- Caching
Caching is the process of storing data in a cache. A cache is a temporary storage area. Caches are optimized for fast retrieval with the trade off that data is not durable.
- In Memory Data Store
When data is stored In-Memory (thin of RAM). The trade off is high volatility (low durability, risk of data loss) but access to data is very fast.
-
b) ElastiCache - Caching Comparision
- Memcached
* is generally preferred for caching HTML fragments. Memcached is a simple key/value store. The trade off it to begin simple is that its very fast.
-Redis
* can perform many different kinds of operations on your data. It's very good for leaderboards, keep track of unread notification data. It's very fas but arguably not as fast as Memcached.
c) ElastiCache CheatSheet
- ElastiCache is a managed in-memory caching service
- ElastiCache can launch either Memcached or Redis
- Memcached is a simple key/value store preferred for caching HTML fragments and is arguably faster than Redis
- Redis has richer data types and operations. Great for leaderboard, geospatial data or keeping track of unread notifications.
- A cache is a temporary storage area.
- Most frequently identical queries are stored in the cache
- Resources only within the same VPC may connect to ElastiCache to ensure low latencies.
d)
e)
f)
g)
h)
5) Identity Access Managment (IAM)
- Manages access of AWS users and resources.
-
a) IAM - Core Components
- IAM allows managment of access of users and resources
- IAM Identities
* IAM Users
** End users who log into the console or interact with AWs resource programmatically
* IAM Groups
** Group up your Users so they all share permission levels of the group eg. Administrators, Developers, Auditiors
* IAM Roles
** Associate permissions to a Role and then assign this to an Users or Groups
- IAM Policies
* JSON documents which grant permissions for a specific user, group, or role to access services. Policies are attached to IAM identities.
- A user can belong to a group. Roles can be applied to groups to quickly add and remove permissions en-masse to users.
- A user can have a role directly attached. An policy can be directly attached to a user (called an Inline Policy)
- Roles can have many policies attached
- Verious AWS resources allow you attach roles directly to them.
-
b) IAM - Managed vs Customer vs InlinePolicy
- Managed Policies
* A policy which is managed by AWS , which you cannot edit.
* Managed policies are labeled with an orange box.
- Customer Managed Policies
* A policy created by the customer which is editable.
* Customer policies have no symbol beside them.
- Inline Policies
* A policy which is directly attached to the user.
c) IAM - Policies
- Version - policy language version. 2012-10-17 is the latest version.
- Statement - container for the policy element you are allowed to have mulitples.
- Sid (optional) - a way of labeling your statements.
- Effect - Set whether the policy will Allow or Deny
- Principal - account, user, role, or federated user to which you would like to allow or deny access
- Action - list of actions that the policy allows or denies
- Resource - the resource to which the action(s) applies
- Condition (optional) - circumstances under which the policy grants permission
d) IAM - Password Policy
- In IAM you can set a Password Policy
- To set minimum requirements of a password and rotate password so users have to update their password after X days.
e) IAM - Access Keys
- Access Keys allow users to interact with AWS service programmatically via the AWS CLI or AWS SDK.
- You are allowed two Access Keys per user.
f) IAM - MFA
- Multi-factor authentication (MFA) can be turned on per user.
- The user has to turn on MFA themselves, Administrator cannot directly enforce users to have MFA.
- They Administrator account could create a policy requiring MFA to access certain resources.
g) IAM - CheatSheet
- Identity Access Management is used to manage access to users and resources
- IAM is a universal system. (applied to all regions at the same time). IAM is a free service
- A root account is the account initially created when AWS is set up (full administrator)
- New IAM account have no permissions by default until granted
- New users get assigned an Access Key Id an Secret when first created when you give them programmatic access
- Access Key are only used for CLI and SDK (cannot access console)
- Access Key are only shown once when created. If lost they must be deleted/recreated again.
- Always setup MFA for Root Accounts
- Users must enable MFA on their own, Administrator cannot turn it on for each user
- IAM allows your set password policies to set minimum password requirements or rotate passwords
- IAM Identities as Users, Groups, and Roles
- IAM Users End users who log into the console or interact with AWS resources programmatically
- IAM Groups Group up your Users so they alll share permission levels of the group
- eg. Administrators, Developers, Auditiors
- IAM Roles Associate permissions to a Role and then assign this to an Users to Groups
- IAM Policies JSON documents which grant permissions for a specific user, group, or role to access services. Policies are attached to IAM Identities
- Managed Policies are policies provided by AWS and cannot be edited
- Customer Managed Policies are policies created by use the customer, which you can edit
- Inline Policies are policies which are directly attached to a user
h)
i)
j)
6) Simple Storage Service (S3)
- Object-based storage service. Serverless storage service in the cloud Don't worry about filesystems or disk space.
- What is Object Storage (Object-based Storage)?
* data storage architecture that manages data as objects, as opposed to other storage architectures:
** file systems - which manages data as a files and fire hierarchy, and
** block storage - which manages data as blocks within sectors and tracks.
- S3 provides you with unlimited storage . You don't need to think about the underlying infrastructure.
-The S3 Console provides an interface for you yo upload and access your data.
- S3 Object
* Objects contain your data. They are like files.
* Object may consist of:
** Key - this is the name of the object
** Value - the data itself made up of a sequence of bytes
** Version ID - when versioning enabled, the version of object
** Metadata - additional information attached to the object.
* You can store data from 0 bytes to 5 Terabytes in size.
- S3 Bucket
* Buckets hold objects. Buckets can also have folders which in turn hold objects.
* S3 is a universal namespace so bucket names must be unique (think like having a domain name)
a) S3 - Storage Classes
- Trade Retrieval Time, Accessibility and Durability for Cheaper Storage
- Storage Classes (from most expensive to cheapest)
* 1. Standard (default)
** Fast. 99.99% Availability, 11 9's Durability. Replicated across at least three AZs
* 2. Intelligent Tiering
** Uses ML to analyze your object usage and determine the appropriate storage class. Data is moved to the most cost-effective access tier, without any performance impact or added overhead.
* 3. Standard Infrequently Accessed (IA)
** Still fast. Cheaper if you access files less than once a month. Additional retrieval fee is applied. 50% less than Standard (reduce availability)
* 4. One Zone IA
** Still Fast. Objects only exist in one AZ. Availability (is 99.5%), but cheaper than Standard IA by 20% less (Reduce durability) Data could get destroyed. A retrieval fee is applied.
* 5. Glacier
** For long-term cold storage. Retrieval of data can take minutes to hours but the off is very cheap storage.
* 6. Glacier Deep Archive
** The lowest cost storage class. Data retrieval time is 12 hours.
-
- S3 Guarantees
* Platform is built for 99,99% availability
* Amazon guarantee 99,9% availability
* Amazon guarantee 11' 9s of durability
b) S3 - Security
- All new buckets are PRIVATE when created by default
- Logging per request can be turned on a bucket. Log files are generated and saved in a different bucket. (event a bucket in a different AWS account if desired)
- Access control is configured using Bucket Policies and Access Control Lists (ACL)
* Access Control Lists
** Legacy feature (but not deprecated) of controlling access to buckets and objects.
** Simple way of granting access
* Bucket Policies
**Use a policy to define complex rule access.
- You can see in console buckets globally (from all regions), but still every bucket is assign to a region
c) S3 - Security
- Encryption in Transit
*Traffic between your local host and S3 is achieved via SSL/TLS
- Server Side Encryption (SSE) - Encryption at Rest
* Amazon help you encrypt the object data
* S3 Managed Keys - (Amazon manages all the keys)
* SSE-AES - S3 handles the key, uses AES-256 algorithm
* SSE-KMS - envelope encryption, AWS KMS and you manage the keys
* SSE-C - customer provided key (you manage the keys)
- Client-Side Encryption
You encrypt your own files before uploading them to S3
d) S3 - Data Consistency
- New Objects (PUTS)
* Read After Write Consistency
* When you upload a new S3 object you are able read immediately after writing.
- Overwrite (PUTS) or Delete Objects (DELETES)
* Eventual Consistency
* When you overwrite or delete an object it takes time for S3 to replicate versions to AZs
* If you were to read immediately, S3 may return you an old copy. You need to generally wait a few seconds before reading.
e) S3 - Cross Region Replication (CRR)
- When enabled , any object that is uploaded will be automatically replicated to another region(s). Provides higher durability and potential disaster recovery for objects.
- You must have versioning turned on both the source and destination buckets. You can have CRR replicate to another AWS account.
f) S3 - Versioning
- Store all versions of an object in S3
- Once enabled it cannot be disabled, only suspended on the bucket
- Fully integrates with S3 Lifecycle rules
- MFA Delete feature provides extra protection against deletion of your data
g) S3 - Lifecycle Management
- Automate the process of moving objects to different Storage classes or deleting objects all toghether.
- Can be used together with versioning.
- Can be applied current and previous versions.
h) S3 - Transfer Acceleration
- Fast and secure transfer of files over long distances between your end users and an S3 bucket.
- Utilizes CloudFront's distributed Edge Locations.
- Instead of uploading to your bucket, users use a distinct URL for an Edge Location
- As data arrives at the Edge Location it is automatically routed to S3 over a specially optimized network path. (Amazon's Backbone Network)
i) S3 - Presigned Urls
- Generate a url which provides you temporary access to an object to either uploaded or download object data. Presigned Urls are commonly used to provide access to private objects. You can use AWs CLI or AWS SDK to generate Presigned Urls.
aws s3 presign s3://mybucket/myobject --expires-in 300
- You have a web-application which needs to allow users to download files from a password protected part of your web-app. Your web-app generates presigned url which expires after 5 seconds. The user downloads the file.
j) S3 - MFA Delete
- MFA Delete ensures users cannot delete objects froma bucket unless they provide their MFA code.
- MFA Delete can only be enabled under these conditions
* 1. The AWS CLI must be used to turn on MFA
* 2. The bucket must have versioning turned on
aws s3api put-bucket-versioing \
--bucket bucketname \
--versioing-configuration Status=Enabled, MFADelete=Enabled \
--mfa "your-mfa-serial-number mfa-code" \
- Only the bucket owner logged in as Root User can DELETE objects from bucket
- Use the AWS CLI -> Pass MFA Code with API Request -> Bucket Versioning Must Be Turned On
k) S3 CheatSheet
- Simple Storage Service (S3) Object-based storage. Store unlimited amount of data without worry of underlying storage infrastructure
- S3 replicates data across at least 3 AZs to ensure 99,99% Availability and 11' 9s of durability
- Objects contain your data (they're like files)
- Objects can be size anywhere from 0 Bytes up to 5 Terabytes
-Buckets contain objects. Buckets can also contain folders which can in turn can contain objects.
- Bucket names are unique across all AWS accounts. Like a domain name.
- When you upload a file to S3 successfully you'' receive a HTTP 200 code
- Lifecycle Management Objects can be moved between storage classes or objects can by deleted automatically based on a schedule
- Versioning Objects are giving, a Version ID. When new objects are uploaded the old objects are kept. You can access any object version. When you delete an object the previous object is restored. Once Versioning is turned on it cannot be turn off, only suspended.
- MFA Delete enforce DELETE operations to require MFA token in order to delete an object. Must have versioining turned on to use. Can only turn on MFA Delete from the AWS CLI. Root Account is only allowed to delete objects
- All on a bucket are private by default
- Logging can be turned to on a bucket to log to track operations performed on objects
- Access control is configured using Bucket Policies and Access Control Lists (ACL)
- Bucket Policies are JSON documents which let you write complex control access
- ACLs are the legacy method (not deprecated) where you grant access to objects and buckets with simple actions.
- Security in Transit Uploading files is done over SSL
- SSE stands for Server Side Encryption , S3 has 3 options for SSE.
- SSE-AES S3 handles the key , uses AES-256 algorithm
- SSE-KMS Envelope encryption via AWS KMS and you manage the keys
- SSE-C Customer provided key (you manage the keys)
- Client-Side Encryption You must encrypt your own files before uploading them to S3
- Cross Region Replication (CRR) allows yo to replicate files across regions for greater durability. You must have versioning turned on in the source and destination bucket. You can have CRR replicate to bucket in another AWS Account
- Transfer Acceleration provide faster and secure uploads from anywhere in the world. Data is uploaded via distinct url to an Edge Location. Data is then transported to your S3 bucket via AWS backbone network.
- Presigned Urls is a url generated via the AWS CLI and SDK. It provides. temporary access to write or download object data. Presigned Urls are commonly used to access private objects
- S3 has 6 different Storage Classes:
* 1. Standard (default)
** Fast. 99.99% Availability, 11 9's Durability. Replicated across at least three AZs
* 2. Intelligent Tiering
** Uses ML to analyze your object usage and determine the appropriate storage class. Data is moved to the most cost-effective access tier, without any performance impact or added overhead.
* 3. Standard Infrequently Accessed (IA)
** Still fast. Cheaper if you access files less than once a month. Additional retrieval fee is applied. 50% less than Standard (reduce availability)
* 4. One Zone IA
** Still Fast. Objects only exist in one AZ. Availability (is 99.5%), but cheaper than Standard IA by 20% less (Reduce durability) Data could get destroyed. A retrieval fee is applied.
* 5. Glacier
** For long-term cold storage. Retrieval of data can take minutes to hours but the off is very cheap storage.
* 6. Glacier Deep Archive
** The lowest cost storage class. Data retrieval time is 12 hours.
l)
m)
n)
o)
p)
r)
7) Amazon S3 Glacier
- Long-term durable and secure cold storage. Long-term backup and data archiving.
- S3 Glacier is an extremely low-cost storage service. It provides durable storage with security features for data archiving and backup.
- Cost effective for months, years , or even decades.
- With S3 Glacier is a Serverless service so don't have to worry about:
* capacity planning
* hardware provisioning
* data replication
* hardware failure detection and recovery
* time-consuming hardware migrations.
-
a) S3 Glacier - Use Cases
- A company may need to hold financial tax records for a period of 7 years to meet government and financial regulation
-
-
b) S3 Glacier - Security
- S3 Glacier is automatically server-side encrypted using 256-bit Advanced Encryption Standard (AES-256)
- As an additional safeguard, AWS encrypts the key itself with a master key that is regularly rotated.
- Data-in-transit between S3 and S3 Glacier via lifecycle policies is encrypted using SSL.
- Vault Access policy controls who is able to access the vault.
- Vault Lock policy how a vault can be modified for a period of time. This can help you enforce regulatory and compliance requirements. When the policy is applied to a vault the archive cannot be modified based on the policy's conditions.
- You can monitor S3 Glacier using:
* CloudWatch Alarms
* CloudTrail Logs
* AWS Trusted Advisor
- S3 Glacier is compliant with the following programs:
* SOC
* PCI DSS
* FedRamp
*HIPPA
-
c) S3 Glacier - Data Model
- Vault
* A vault is a container for storing archives
* You name a vault and choose a region
* Each vault has a unique address per region
- Archive
* The Base unit of storage
* Can be photo, video or document
* Each archive has a unique ID
- Job
* 1. perform a select query on an archive
* 2. retrieve an archive
* 3. get an inventory of a vault
* Use S3 Glacier Select and Provide a list of S3 Archive objects
* Each job has a unique ID
d) S3 Glacier - Notification Configuration
- Because jobs take time to complete, S3 Glacier supports a notification mechanism to notify you when a job is complete.
- Configure a vault to send notification to an Amazon SNS topic when the job completes.
- S3 Glacier stores the notification configuration as. a JSON document.
- Notification configurations are associated with vaults; you can have one for each vault. Each notification configuration resource is uniquely identified by a URI of the form:
https://glacier.us-east-1.amazonaws.com/123456789012/vaults/myvault/notification-configuration
-
e) S3 Glacier - Job Operations
- These are the following Job Operations can be performed on a S3 Glacier Vault
- List Jobs for a vault, including jobs that are in-progress and that have recently finished.
- Describe Job returns information about a job you previously initiated:
* the job initiate date
* the user who initiated the job
* the job status code/message
* the SNS topic to notify after S3 Glacier completes the job
- Initiate Job operations you want to perform:
* 1. Inventory Retrieval get an inventory of a vault
* 2. Archive Retrieval retrieve an archive
* 3. Select perform a select query on an archive
- Get Output Job downloads the output of the job you initiated using Initiate Job.
- You can use the AWS API, CLI or SDK to trigger these Job Operations.
f) S3 Glacier - Vault Inventory
- A Vault Inventory refers to the list of archives in a vault.
- You can't explore Glacier like S3 via the console so you have to get a list of files, then make induvial requests for archives.
- After you upload you first archive to your vault, Glacier automatically creates a vault inventory
- On first archive upload it takes half a day and up to a day before that inventory is available for retrieval.
- The vault inventory automatically updates once a day.
- To retrieve a Vault Inventory you need to execute IntiateJob and Get JobOutput API calls.
aws glacier initiate-job --account-id - --vault-name my-vault --job-parameters file://job-archive-retrieval.json
aws glacier get-job-output --account-id - --vault-name my-vault --job-id kjdfnkjnalcsnlrkfmkls output.json
- The contents of the job-archive-retrieval.json
{
"Type": "inventory-retrieval",
"Description": "My inventory job",
"Format": "CSV",
"SNSTopic": "arn:aws:sns:us-east-1:12345678901:my-topic"
}
g) S3 Glacier - Archive Retrieval
- To Retrieve an Archive (File) you InitiateJob as the type Archive Retrieval
- You provide the unique Archive ID. You choose Retrieval Tier.
- Instead of retrieving the whole. Archive you can return a ByteRange.
h) S3 Glacier - Select
- Before S3 Select you would have to request 5GBs of objects and 5 GBs would be returned.
- With S3 Select you can write simple SQL expressions to pull only the bytes you need from those objects.
- Using S3 Select can lead to a performance increase of 400%
import boto3
glacier = boto3.client("glacier")
jobParameters = {
"Type": "select", "ArchiveId": "ID",
"Tier": "Expedited",
"SelectParameters" : {
"InputSerialization": {"csv": {}},
"ExpressionType": "SQL",
"Expression": "SELECT * FROM archive WHERE _5='90210'",
"OutputSerialization: {
"csv": {}
}
},
"OutputLocation": {
"S3": {"BucketName": "glacier-results", "Prefix": "1"}
}
glacier.initiate_job(vaultName="myVault", jobParameters=jobParameters)
- choose you tier - Expedited, Standard, Bulk
- choose the Input Serialization - eg. CSV, JSON, Parquet / GZIP, BZiP2
- write your expression
- choose your Output Serialization
- Choose your Output Location
- InitiateJob could be done via SDK or CLI
i) S3 Glacier - Retrieval Times
- When initiating a select or an archive retrieval job you can specify how fast you want the data.
- You pay per GB retrieved and number of requests. This is a separate cost from just the cost of storage.
- How fast you want to retrieve archives affects cost and its organized into tiers:
* Expedited Tier
** For urgent requests. Limited to 250 MB archive size.
** 1 - 5 mins
** $0.03 per GB
** $10.00 per 1000 requests
* Standard Tier
** No archive size limit. This is default option
** 3 - 5 hours
** $0.01 per GB
** $0.05 per 1000 requests
** No archive size limit, even petabytes worth of data.
** 5 - 12 hours
** $0.0025 per GB
** $0.025 per 1000 requests
j) S3 Glacier - Uploading Archives
- You cannot upload archives to S3 Glacier via the management console like S3. You need to use one of the following methods to move data into an S3 Glacier Vault.
- S3 Snowball, Snowball Edge or Snowmobile
* A rugged case containing computing and storage to physically transport terabytes of data to directly to S3 or S3 Glacier.
* Snowmoblie is a cargo container of computing and storage on a tractor trailer that can transport petabytes of data directly to S3 or S3 Galcier.
- S3 Lifecycle Policies
* You define in the S3 console lifecycle rules to automatically move data from an S3 bucket ( with a minimum of 30 days) into S3 Glacier
- Multipart Upload API
* Using the AWS API, SDK or CLI you divided your archive into parts and you upload them in parallel. When all parts are upload you use the API to tell S3 Glacier to assemble the final archive.
k) S3 Glacier - Deleting Vaults
- In order to delete a S3 Glacier vault you must:
* first delete all archives contained within the vault
* there have been no writes to the vault since last inventory
- S3 prepares an inventory every 24 hours. The inventory might not reflect the latest information and so you may have to wait for the next inventory update before deletion of the vault is possible.
- S3 Glacier Vaults cannot be deleted via the AWS Console. You have to use the AWS API, CLI or SDK
aws glacier delete-vault --vault-name my-vault --account-id 12345678901
-
l) S3 Glacier - Vault Access Policy
- A Vault Access Policy manages permissions to your vault. You can create one vault access policy per vault to manage permissions. You can modify permissions in a vault access policy at any time.
m) S3 Glacier - Vault Lock and Vault Lock Policy
- S3 Glacier Vault Lock allows you to easily deploy and enforce compliance controls for individual S3 Glacier Vaults with a Vault Lock policy.
- You can specify controls such as " write once read many" (WORM) in a vault lock policy and the lock policy and lock the policy from future edits. WORM means once written, cannot be modified.
- Once locked, the policy can no longer be changed.
-Locking a vault takes two steps:
* Step 1 - Initiate
** Initiate the lock by attaching a vault lock policy to your vault.
** The lock will be set to in-progress state and returns a lock ID.
** While in the in-progress state, you have 24 hours to validate your vault lock policy before the lock ID expires
* Step 2 - Validate
** Use the lock ID to complete the lock process.
** If the vault lock policy doesn't work as expected, you can abort the lock and restart from the beginning.
n) S3 Glacier - Policy Controls
- Cross Account is only allowed to Delete the user account has MFA turned on.
{
"Version": "2012-10-17",
"Statement": {{
"Sid": "add-mfa-delete-requierment",
"Principal": {
"AWS": [
"arn:aws:iam::123456789012:root"
]
},
"Effect": "Allow",
"Action" : [
"glacier:Delete*"
] ,
"Resoursce: [
"arn:aws:glacier:us-east-1:121212121212:vaults/myvault"
],
"Condition" : {
"Bool": {
"aws:MultiFactorAuthPresent": true
}
}
}]
}
}
- Full List of Actions for Vault Access and Lock Policy
* AbortMultipartUpload
* AddTagsToVault
* CompleteMultipartUpload
* DeleteVault
* DeleteVaultAccessPolicy
* DeleteVaultNotification
* DescribeJob
* DescribeVault
* GetJobOutput
* GetVaultAccessPolicy
* GetVaultLock
* GetVault Notification
* InitiateJob
* InitiateMultipartUpload
* ListJobs
* ListMultipartUpload
* ListParts
* ListTagsForVault
* RemoveTagsFromVault
* SetVaultNotifications
* UploadArchive
* UploadMultiPart
* SetVaultAccessPolicy
o) S3 Glacier - Data Retrieval Polices
- Set data retrieval quotas and manage the data retrieval activities across your AWS account in each AWS Region
- S3 Glacier has 3 data retrieval policies
* 1. No Retrieval Limit (default)
** No retrieval quota is set and all valid data retrieval request are accepted.
* 2. Free Tier Only
** Keep your retrievals within your daily free tier allowance and not incur any data retrieval cost.
* 3. Max Retrieval Rate
** Control the peak retrieval rate by specifying a data retrieval quota that has a bytes-per-hour maximum.
p) S3 Glacier - Provisioned Capacity
- Pay a fixed up-front fee to expediate retrievals from Amazon S3 Glacier vaults for a given month.
- Each capacity unit costs 1000 USD / month
- Each unit of capacity ensures that at least 3 expedited retrievals can be performed every 5 minutes and provides up to 150 MB/s of retrieval throughput.
r) S3 Glacier - Archive Metadata
- An archive is any object, such as a photo, vide, or document, that you store in a vault. It is a base unit of storage in Amazon S3 Glacier.
- When an archive is stored on S3 Glacier, S3 Glacier adds an additional 32 KB for index and related metadata which is used to identify and restore your object.
- So if your are upload an archive that is 1MB you are actually storing and being charged for 1MB + 32KB.
- It is not recommended to upload many small files but not upload large archives so you are not incurring additional storage fees due to not accounting for metadata storage.
- You can store a large archive and if you watch to fetch induvial data in an archive to select a range of bytes.
s) S3 Glacier CheatSheet
- S3 Glacier is an extremely low-cost storage service. Durable storage with security features for data archiving and backup
- Common use case: company needs to hold financial tax records for of 7 years to meet gov and financial regulations
- S3 Glacier is automatically server-side encrypted using 256-bit Advanced Encryption Standard (AES-256)
- As an additional safegourd, AWS encrypts the key itself with a master key that is regularly rotated.
- Data-in-transit between S3 and S3 Glacier via lifecycle policies is encrypted using SSL
- S3 Glacier Data Model is made up of Vault , Archive and a Job
* Vault: A vault is a container for storing archives
* Archive: The Base unit of storage , can be photo, video or document
* Job: Perform an operation on the Archive
** perform a select query on an archive
** retrieve an archive
** get an inventory of a vault
- Because jobs take time to complete, S3 Glacier supports a notification mechanism to notify you when a job is complete.
- A Vault Inventory refers to the list of archives in a vault.
* The vault inventory automatically updates once a day.
- To Retrieve an Archive (File) you InitiateJob as the type Archive Retrieval.
- With S3 Select you can write simple SQL expressions to pull only the bytes you need from those objects
- When initiating a select or an archive retrieval job you can specify how fast you want the data.
* Expediated Tier (1-5 mins, costly) For urgent requests. Limited to 250MB archive size.
* Standard Tier (3-5 hours, cheap) No archive size limit. This is default option.
* Bulk Tier (5-12 hours, very cheap) No archive size limit, even petabytes worth of data.
- There are three ways to get data into S3 Glacier (there is no console like S3 to upload files):
* S3 Snowball, Snowball Edge or Snowmobile
* S3 Lifecycle Policies
* S3 Multipart Upload API
- To delete a Vault you must first delete all files and there have been no writes to the vault since last inventory (inventory every 24 hours)
- Vault Access Policy controls who is able to access the vault
* Require MFA to delete files
* Don't let files delete for this period of time
- Vault Lock policy how a vault can be modified for a period of time,. Locking a vault takes two steps:
* Initiate the lock by attaching a vault lock policy to your vault. ( 24 hour window where you can stop the lock)
* Use the lock ID to complete the lock process.
- S3 Glacier has 3 data retrieval policies
* 1. No Retrieval Limit (default) No retrieval quota is set and all valid data retrieval requests are accepted.
* 2. Free Tier Only Keep you retrievals within your daily free tier allowance and not incur any data retrieval cost.
* 3. Max Retrieval Rate Control the peak retrieval rate by specifying a data retrieval quota that has a bytes-per-hour maximum.
- S3 Glacier provisioned capacity allows you to pay a fixed up-front fee to save money.
- S3 Sores Metadata along side the archive Uploading many small archives can result in additional data meaning greater costs.
* Try to store large archives to reduce this "metadata tax"
t)
u)
w)
x)
y)
z)
8) AWS Snowball
- Petabyte scale data transfer service. Move data onto AWS via physical briefcase computer.
- Low cost it cost thousands of dollars to transfer 100TB over high speed internet. Snowball can reduce that costs by 1/5th.
- Speed It can take 100 TB over 100 days to transfer over high speed internet. Snowball can reduce that transfer time by less than a week
-Snowball features and limitations:
* E-Ink display (shipping information)
* Tamper and weather proof
* Data is encrypted end-to-end (256-bit encryption)
* Uses Trusted Platform Module (TPM) ( a specialized chip on an endpoint devices that stores RSA encryption keys specific to the host system for hardware authentication)
* For security purposes, data transfers must be completed within 90 days of the Snowball being prepared.
* Snowball can Import and Export from S3
- Snowballs come in two sizes:
* 50 TB (42TB of usable space)
* 80 TB (72TB of usable space)
a)
b)
c)
d)
e)
9) AWS Snowball Edge
- Petabyte-scale data transfer service. Move data onto AWS via physical briefcase computer more storage and on-site compute capabilities.
- Similar to Snowball but with more storage and with local processing
- Snowball Edge features and limitations:
* LCD display (shipping information and other functionality)
* can undertake local processing and edge-computing workloads
* can use in a cluster in groups of 5 to 10 devices
* three options for device configurations:
** storage optimized (24 vCPUs)
** compute optimized (54 vCPUs)
** GPU optimized (54 vCPUs)
- snowball Edge come in two sizes:
* 100 TB (83 TB of usable space)
* 100 TB Clustered (45 TB per node)
10) Snowmobile
- a 45-foot long ruggedized shipping container, pulled by a semi-trailer truck. Transfer up to 100PB per Snowmobile.
- AWS personnel will help you connect your network to the snowmobile and when data transfer is complete they'll drive it back to AWS to import into S3 or Glacier.
- Security Features
* GPS tracking
* Alarm monitoring
* 24/7 video surveillance
* an escort security vehicle while in transit (optional)
a) Snowball & Snowball Edge & Snowmobile CheatSheet
- Snowball and Snowball Edge is a rugged container which contains a storage device
- Snowmobile is a 45-foot long ruggedized shipping container, pulled by a semi-trailler truck.
- Snowball and Snowball Edge is for peta-scale migration. Snowmobile is for exabyte-scale migration
- Low Cost thousands of dollars to transfer 100TB over high speed internet. Snowball is 1/5th
- Speed 100TB over 100 days to transfer over high speed internet, Snowball takes less than a week
- Snowball come in two sizes:
* 50 TB (42TB of usable space)
* 80 TB (72 TB of usable space)
- Snowball Edge comes in two sizes:
* 100 TB (83 TB of usable space)
* 100 TB Clustered (45 TB per node)
- Snowmobile comes in one size: 100PB
- You can both export or import data using Snowball or Snowmobile
- You can import into S3 or Glacier
- Snowball Edge can undertake local processing and edge-computing workloads
- Snowball Edge can use in a cluster in groups of 5 to 10 devices
- Snowball Edge provides three options for device configurations:
* storage optimized (24 vCPUs)
* compute optimized (54 vCPUs)
* CPU optimized (54 vCPUs)
11) Relational Database Service (RDS)
- A managed relational database service. Support multiple SQL engines, easy to scale, backup and secure.
- Relational Database Service (RDS) is the AWS Solution for relational databases.
- There are 6 relational database options currently available on AWS
* Amazon Aurora
* MySQL
* MariaDB
* PostgreSQL
* Oracle
* Microsoft SQL Server
a) RDS - Encryption
- You can turn on encryption at-rest for all RDS engines. You may not be able to turn encryption on for older versions of some engines. It will also encrypted the automated backups snapshots, and read replicas.
- Encryption is handled using the AWS Key Management Service (KMS)
b) RDS - Backup
- There are 2 backup solutions available for RDS
- Automated Backups
* Choose a Retention Period between 1 and 35 days.
* Stores transaction logs throughout the day.
* Automated backups are enabled by default
* All data is stored inside S3
* There is no additional charge for backup storage
* You define your backup window
* Storage I/O ay be suspended during backup
- Manual Snapshots
*Taken manually by the user.
* Backups persist even if you delete the original RDS instance.
c) RDS - Restoring Backup
- When recovering AWS will take the most recent daily backup, and apply transaction log data relevant to that day. This allows point-in-time recovery down to a second inside the retention period.
- Backup data is never restored overtop of an existing instance,
- When you restore an RDS instance from Automated Backup or a Manual Snapshot a new instance is created for the restored database.
- Restored RDS instances will have a new DNS endpoint.
d) RDS - Multi AZ
- Ensure database remains available if another AZ becomes unavailable
- Makes an exact copy of your database in another AZ. AWS automatically synchronizes changes in the database over to the standby copy.
- Automatic Failover protection if one AZ goes down failover will occur and the standby slave will be promoted to master.
-
e) RDS - Read Replicas
- Read-Replicas allow you to run multiples copies of your database, these copies only allows reads (no writes) and is intended to alleviate the workload of you primary database to improve performance
- You must have automatic backups enabled to use Read Replicas
- How to create a read replics: Actions -> Create read replica
- Asynchronous replication happens between the primary RDS instance and the replicas
-
f) RDS - Multi-AZ vs Read Replicas
-
-
g) RDS - Performance Insights
-
h) RDS - Reserved Instance
-
i) RDS - Aurora
- Serverless
- backup have to be turn on min 1 day
j) RDS CheatSheet
- Relational Database Service (RDS) is the AWS Solution for relational databases.
- RDS instances are managed by AWS , You cannot SSH into the VM running the database.
- There are 6 relational database options currently available on AWS, Aurora, MySQL, MariaDB, Postgres, Oracle, Microsoft SQL Server.
- Multi-AZ is an option you can turn on which makes an exact copy of your database in another AZ that is only standby
- For Multi-AZ AWS automatically synchronizes changes in the database over to the standby copy
- Multi-AZ has Automatic Failover protection if one AZ goes down failover will occur and the standby slave will be promoted to master.
- Read-Replicas allow you to run multiples copies of your database, these copies only allows reads (no writes) and is intended to alleviate the workload of your primary database to improve performance
- Read- Replicas use Asynchronous replication
- You must have automatic backups enabled to use Read Replicas.
- You can have upto 5 read replicas
- You can combine Read Replicas with Multi-AZ
- You can have Read Replicas in another Region (Cross-Region Read Replicas)
- Replicas can be promoted to their own database, but this breaks replication
- You can have Replicas of Read Replicas
- RDS has 2 backup solutions: Automated Backups and Database Snapshots
- Automated Backups, you choose a retention period between 1 and 35 days. There is no additional cost for backup storage, you define your backup window.
- Manual Snapshots, you manually create backups , if you delete you primary the manual snapshots will still exist and can be restored
- When you restore an instance it will create a new database. You just need to delete your old database and point traffic to new restores database.
- You can turn on encryption at-rest for RDS via KMS
k)
l)
m)
n)
o)
p)
r)
s)
t)
u)
w)
12) DynamoDB
- A key-value and document database (NoSQL) which can guarantees consistent reads and writes at any scale.
- What is NoSQL?
NoSQL is database which is neither relational and does not use SQL to query the data for results
- What is a Key/Value Store?
A from of data storage which has a key which references a value and nothing more
{ Title: 'S010e3dk' }
- What is a Document Store ?
A form of data storage which a nested data structure.
{
Series: 'DS9'
Episodes: [
{
Season: 1,
Episode: 19,
Title: 'Duet'
}
]
}
- DynamoDB is a NoSQL key/value and document database for internet-scale applications.
- Specific your read and write capacity per second, it just works at whatever capacity you need without you tweaking anything.
- Features
* Fully managed
* Multiregion
* Multimaster
* Durable database
* Built-in security
* Backup and restore
* In-memory caching
- Provides
* Eventual Consistent Reads (deafult)
* Strongly Consistent Reads
- All data is stored on SSD storage and is spread across 3 different regions.
a) DynamoDB - Table Structure
- Item - Row
- Attribute - cell
- Primary Key = Partition Key + Sort Key
b) DynamoDB - Consistent Reads
- When data needs to updated it has to write updates to all copies. It is possible for data to be inconsistent if you are reading from copy which has yet to be updated. You have the ability to choose the read consistency in DynamoDB to meet your needs.
- Eventual Consistent Reads (DEAFULT)
* When copies are being updated it is possible for you to read and be returned an inconsistent copy
* Reads are fast but there is no guarantee of consistent
* All copies of data eventually become generally consistent within a second.
- Strongly Consistent Reads
* When copies are bing updated and you attempt to read, it will not return a result until all copies are consistent.
* You have a guarantee of consistency but the trade off is higher latency (slower reads).
* All copies of data will be consistent within a second.
c) DynamoDB CheatSheet
- DynamoDB is a fully managed NoSQL key/value and document database.
- Applications that contain large amounts of data but require predictable read and write performance while scaling is a good fit for DynamoDB
- DynamoDB scales with whatever read and write capacity you specific per second
- DynamoDB can be set to have Eventually Consistent Reads (default) and Strongly Consistent Reads
- Eventually consistent reads data is returned immediately but data can be inconsistent. Copies of data will be generally consistent in 1 second.
- Strongly Consistent Reads will wait until data in consistent. Data will never be inconsistent but latency will be higher. Copies of data will be consistent with a guarantee of 1 second.
- DynamoDB stores 3 copies of data on SSD drives across 3 regions.
d)
e)
f)
g)
h)
i)
j)
k)
l)
m)
n)
o)
13)
14)
15)
16)










Komentarze
Prześlij komentarz