This is a digital copy of my physical notes taken while studying for AWS Certifications. Not all content is here since I’ve only taken notes on services which I haven’t actively used during my time with AWS. I’ve given every service a heading but notes may be terse if I am already familiar with the service.
EC2 - Elastic Compute Cloud
- Can run Mac, Linux or Windows
- By Default, when terminated will delete all attached storage
- Naming Convention consists of “Instance Class” -> “Instance Generation” -> “Size”
Fundamentals
- Setting up a budget is extremely important!
- By Default IAM users cannot see billing information & it must be enabled in root
- EC2 is made up of many parts
- EC2 - Virtual Machines
- EBS - Storage
- ELB - Load Balancing
- ASG - Auto Scaling
Security Groups
- Only contains “Allow rules”
- Is like a firewall as in - it specifies inbound/outbound rules
- Can also specify access rules for other EC2 related services
- Many to Many - Security groups are per region
- Default inbound all is blocked
- Default outbound all is allowed
AWS SDK Setup
- Never manually run
aws configure
from the EC2
- Instead, associate an IAM role with the instance
- EC2 will inherit perms from the role associated with instance
Purchasing Options
- Multiple ways of purchasing an EC2 instance
- On Demand
- Reserved (1-3 years)
- Convertable type available - Allows changing EC2 type
- Savings (1-3 years)
- Instead of reserving specific instances, you dedicate a set spend
- Spot instances
- These instances can die at any point but are extremely cheap.
- It’s possible to sell reserved instances on the AWS Marketplace
Spot Instances
- Using Amazon “Spot Requests”, it can handle automatically launching spot instances if you’ve configured the request to be persistent
- Spot Requests will automatically purchase and configure a spot instance if it falls within a set price range
- Using “Spot Fleets” you can purchase groups of instances at the lowest price
- Fleets is different to requests, both are automatic however fleets can pick and choose instance types to fit within your budget.
Placement Groups
- Placement groups can control how AWS will provision your groups of EC2s
- Cluster - Same AZ / Will have low latency (Literally same rack)
- Spread - Across different physical hardware (Same AZ)
- Maximum 7 instances per AZ
- Partition - Spread Cluster/Spread groups within an AZ
- Maximum 7 per AZ, can be spread across regions
Elastic Network Interface
- Virtual Network card
- 1x Mac Address
- 1x Private IP Address
- Infinite Secondary IP Addresses
- Infinite Security Groups
- Bound to a set AZ
- ENIs can be moved between EC2s and will retain Mac & IPs, allows for manual failovers etc.
Hibernation
- Different ways of “turning off” EC2 instances
- Terminate - Data is destroyed
- Stop - Disk is preserved for next start
- Hibernate - RAM & Disk is preserved for next start
- RAM is written to disk, there must be enough free space on disk to hibernate
AMI - Amazon Machine Image
- Customised EC2 instances, running own OS/Software/Etc
- Faster boots than default
- Created from snapshots of existing EC2s
- i.e. create an AMI of the EC2 instance that is fully configured running our application, we can then deploy that AMI to other instances, and they will also be fully configured.
EC2 Instance Store
- Real physical disks attached to EC2
- Ephemeral storage - Data is deleted on stop
- Users responsibility to back-up
EBS - Elastic Block Store
- Similar to a Network Drive
- can only be attached to ONE instance
- can only exist in ONE AZ
- Snapshot & Restore to move to different AZ
- Is deleted by default when an EC2 is terminated
- Snapshotting EBS can be done at any time for backup
- It’s possible to “archive” snapshots, this delays restoration by 24 hours but saves up to 75% on storage cost
Volume Types
- Can be used to boot:
- GP2 / GP3 - General purpose SSD
- IO1 / IO2 - Block Express High perf SSDs
- Cannot be used to boot:
- ST1 - High Throughput HDD
- SC1 - Low cost HDD
EBS MultiAttach
- Only available to IO1 & IO2
- Allows for up to 16 instances to attach
EBS Encryption
- This is handled Transparently, EC2 does all the work itself
- In Flight & At Rest encryption by default
EFS - Elastic Filesystem
- Managed service network filesystem
- Can be mounted to Many EC2 instances across Multiple AZs
- Access is managed via Security Groups
- Linux only
- Scales storage automatically
ELB - Elastic Load Balancer
- Splits traffic between EC2 Instances
- Exposes a single path into EC2 with DNS
- Domain name -> ELB -> EC2s
- Provides health checks for EC2 Instances
- ELB can ping ports/paths on EC2 instances to know if the instance is working and can receive requests
- ELB can be Public (External) or Private (Internal, only inter-VPC)
- ELB can be the source on a security group - “Allow * From ELB”
CLB - Classic Load Balancer
- HTTP/HTTPS/TCP/TLS
- Deprecated
ALB - Application Load Balancer
- HTTP/HTTPS/Websocket
- Can target Multiple EC2s OR Multiple containers inside a single EC2
- Can route based on URL, Hostname, or Query Strings
- Good for Docker and ECS
- Possible to map to dynamic ports
- Can target Lambda
- ALB gets given a fixed hostname URL
- ALB forwards the clients real IP in the header
x-forwarded-for
NLB - Network Load Balancer
- TCP/TLS/UDP
- Can handle millions of requests per second with < 100ms Latency
- One static IP per AZ, you can assign an Elastic IP
- Good if your security requires a fixed IP address
- Can be placed in front of an ALB
- Health checks can be done using TCP/HTTP/HTTPs
GWLB - Gateway Load Balancer
- Layer 3 (IP Network Protocol)
- For Fleets of 3rd party network virtual appliances
- e.g. Filter traffic through a scalable fleet of firewalls
- Uses the GENEVE protocol on port
6081
Sticky Sessions
- Possible to keep clients going to the same EC2 - “Sticky”
- Works with CLB, ALB & NLB
- Takes advantage of users cookies
- “Application Based” - Generated by target EC2 (your application)
- “Application” - Generated by load balancer
- “Duration Based” - Generated by load balancer
Cross Zone Load Balancing
- With CZB - Each load balancer distributes traffic entirely across all registered instances in all AZ’s. This is the default ALB behaviour
- Without CZB - Requests are distributed in the instances at the node of the balancer. This is the default NLB and GWLB behaviour
SSL/TLS on ELB
- SNI only works with ALB & NLB as well as Cloudfront
- This allows multiple certs on instances that apply based on domain
Connection Draining
- Specifically on Exam
- Is called “Registration Delay” on ALB and NLB
- If an EC2 enters the draining state, ELB will send new traffic elsewhere and allow the existing traffic to wrap up
ASG - Auto Scaling Groups
- Scales EC2 instances to meet demand
- Can set Minimum required instances as well as the Maximum instances
- Automatically replaces instances that fail health checks
- Requires a Launch Template which contains all required EC2 configuration
- Multiple ways of handling scaling
- Cloudwatch Alarms
- “Simple/Step” scaling policy
- i.e. Scale out by
x
steps if alarm hits 25%/50%/etc
- “Target Tracking” scaling policy
- i.e. Keep all instances under 40% CPU
- “Scheduled” scaling policy
- i.e. At 5pm double instance count
- “Predictive” scaling policy
- Based on historic load, automatically scale
- Once a scaling action takes place there is a 5-Minute delay on the next action
RDS - Relational Database Service
- Managed service for SQL Databases
- Postgres
- MySQL
- MariaDB
- Oracle
- Microsoft SQL Server
- IBM DB2
- Aurora
- Automatically handles Backups, Patches & Multi AZ deployment
- Automatically scales
- Auto-scales compute & storage
Read Replicas
- Specifically on Exam
- Can create up to 15 read only replicas across multiple AZ or Regions
- Eventually consistent - Replication is Async, data may be behind
- Good for heavy tasks like reporting or analytics, avoid slowing main instance
RDS Multi AZ
- Synchronous replication
- Single access point with automatic fail-over
- NOT for scaling, primarily for disaster recovery
- Read Replicas can be configured as Multi-AZ for disaster recovery.
- There is Zero Downtime on a Multi-AZ failover
- This stat is specifically on Exam
RDS Custom
- Specifically for Oracle & Microsoft SQL Server
- Gives access to the Operating System and underlying EC2 instances
- More flexible lift & shift migrations
RDS Backups
- Automated every day during set backup window
- Transaction logs are backed up every 5 minutes and can be restored at any time within 5 minute intervals
- Automated backups have 1 to 35 days retention
- Manual backups can be triggered by the user at any time
- These are retained until deleted
- It’s cheaper to store snapshots than to keep RDS “Stopped”.
- This trick may specifically be on the Exam
Aurora
- Custom DB, Compatible with SQL & Postgres drivers
- Claims 5x performance over MySQL
- Claims 3x performance over Postgres
- Underlying storage will automatically scale
- Handles up to 15 replicas with < 10ms latency
- Creates 6 copies across 3 AZ
- 4/6 are used for writing
- 3/6 are used for reads
- Data self-heals across instances in a P2P fashion
- One instance takes writes (Master instance)
- Offers “Reader endpoints”
- URLs that target specifically Read instances & Read Replicas
- This is specifically on Exam
- If high read traffic Aurora will auto-scale the read endpoint by adding more instances
- It’s possible to add multiple Reader Endpoints which target different instances with different capabilities, i.e. Endpoint for Aurora instances running on High RAM EC2s
Aurora Serverless
- 100% Auto-scaling & hands off, Good for unpredictable workloads
- Pay per second of uptime
Global Aurora
- Can set a primary region and up to 5 secondary regions
- Data will replicate across all regions in < 1 second.
- This stat is specifically on Exam
- Up to 16 Read replicas can exist per instance
- Promoting a secondary region instance for disaster recover takes < 1 minute
Aurora ML
- Directly integrates with services like Sagemaker or Comprehend to provide ML predictions around your data
- Allows for SQL requests that trigger ML services and returns standard data
- i.e. Get me all purchases that may be fraudulent etc.
Aurora Backups
- Automatically backed up daily
- Retention for 1 to 35 days
- If instances are not encrypted, neither will the backup
Aurora Clones
- Ability to clone the database for Developer instances or testing
- Faster than a snapshot
- Copy-on-write
RDS Security
- At-Rest & In-Flight Encryption
- Replicas copy encryption of primary instance
- If primary is not encrypted, neither will the replicas
- Possible to limit logins to IAM
RDS Proxy
- Allows applications to pool and share database connections
- Improves efficiency by reducing CPU & RAM usage on main instances and minimises open connections
- Considered Serverless, Auto-scales & is Highly Available
- Can reduce fail-over time by up to 66%
- Can enforce IAM logins & can limit access to only inside VPCs (Not Public!)
- Good for Lambda -> RDS connections since Lambda may leave open stale conections
Elasticache
- Managed service for implementing Redis & Memcache
- Redis is Multi-Az with automatic fail-over, Read Replicas & backups
- Memcache is Multi-Node (sharding) with no backups. However, it’s extremely fast
Elasticache Security
- IAM Authentication is available only to Redis
- Redis can also authenticate with a password/token to extra layer of encryption
- Supports SSL for in-flight encryption
- Memcache uses SASL-Based authentication
Route 53
- Private Zones can be created to specify DNS rules inside a VPC
- Weighted Records can distribute traffic across different resources
- i.e. send 10% of traffic to a new A/B testing site
- Latency Based Records will direct traffic based on the ping.
- 15 Health Checkers run in a Hosted Zone
- > 18% 2XX codes required to be considered “Healthy”
Elastic Beanstalk
- Managed service that automatically deploys and manages EC2, ASG, ELB, RDS etc
- Cloudformation under the hood
S3 - Simple Storage Service
- Data is stored in “Buckets”
S3 Replication
- Versioning must be enabled to turn on replication
- Multiple types:
- CRR - Cross Region Replication
- SRR - Same Region Replication
- Async action, eventually consistent
- Can be cross-account
- By default, only new objects are replicated once enabled
- You must enable replicating existing objects
- S3 Analytics can suggest lifecycle rules for replicated objects
Requester Pays
- Specifically on Exam
- By default, the bucket owner pays for storage and bandwidth
- With Requester Pays enabled the requester pays for bandwidth
S3 Transfer Acceleration
- Cross region transfer accelerator by allowing upload to an edge location which then uses AWS internal network to transfer to S3
S3 Byte Range Fetching
- Receive less data & use less bandwidth by requesting a set byte range from a file.
- You can also use this to split fetches into smaller chunks by running get requests in parallel.
S3 Encryption Options
- S3 Encrypts objects based on the headers when object is created
- SSE: Enabled by default, uses AWS keys
- SSE-KMS: Use a key stored in KMS
- ON EXAM: KMS limits requests per second, this could cause issues with high throughput files.
- SSE-C: Key is provided by the client
- Client Side Encryption: Client encrypts file before uploading
S3 MFA Delete
- Can be enabled to require MFA when deleting versioned files or root files.
- Only the root account can enable/disable this
S3 Glacier Vault Lock
- W.O.R.M (🪱) - Write once Read Many
- Allows the creation of lock policies that prevent deletion or modification, it’s possible to lock these policies to prevent changes
- S3 Object lock can be per object
- Per version of objects etc
- Offers modes:
- Compliance Mode: Default
- Governance Mode: Allow some IAM users to delete
- Legal Hold: Completely lock object regardless of modes or permissions
S3 Access Points
- Attach Policies to prefixes
- i.e. Allow the sales team to only access
/sales/*
in a bucket
- Possible to use these to limit access to only VPCs
S3 Object Lambda
- Attach a lambda function to an access point
- i.e. Run a file through a redaction function when it is requested.
- If requested via the public endpoint it will be redacted since it runs through the lambda
- If requested via another endpoint it will be the raw file
Cloudfront
Cloudfront Pricing
- Pricing is per region and can differ for edge locations, Amazon offers “Price classes”
- All Regions
- Price class 200 (Popular 200 regions)
- Price class 100 (Popular 100 regions)
Global Accelerator
- Uses any-cast IP to route to the closest Edge location. This can then route anywhere on the AWS network internally. Faster than hopping through the public net.
Amazon FSx
- Allow 3rd party file systems to run on AWS
- Windows Server FS
- Supports SMB & NTFS
- Active Directory support
- Can be mounted to linux
- Supports Windows DFS
- Lustre
- For High Performance Computing - Linux Cluster
- Can directly read & write to an S3 bucket
- Offers two types:
- Persistent FS: Same AZ Replication
- Scratch FS: No Replication, 6x faster. Considered Temporary
- NetApp ONTAP
- For pulling over existing Netapp workloads
- Supports NFS, SMB or iSCSI
- Allows Point in Time recovery
- OpenZFS
- For pulling over existing ZFS workloads
- Allows Point in Time recovery
Hybrid Cloud
- Extend AWS Infrastructure to on premises & edge locations
AWS Storage Gateway
- Bridge storage from On-Prem to AWS
- Gateway allows on-prem users to use NFS or SMB and then translates these requests to S3 actions
- FSx Gateway: Possible to also target an FSx system instead of S3
- Volume Gateway: Allow iSCSI interfacing
- Tape Gateway: Allow iSCSI interfacing for direct Tape backup
- Storage Gateway Hardware Appliance is a physical server that can be run on prem as an interface for these services
AWS Transfer Family
- Managed Service for S3/EFS access using FTP/SFTP/FTPS
- Pay per hour of connection + GBs
- Can interface with Active Directory & LDAP
AWS DataSync
- Specifically on Exam
- Move large data to and from AWS to on-prem
- Can target S3, EFS of FSx
- Metadata & POSIX file permissions are preserved.
- Exam will likely ask about this
- If there is not enough network capacity, it’s possible to use DataSync with SnowCone.
- Exam will likely ask about this
- Can sync inter-service
- S3 -> EFS for example
- Available for S3, EFS, FSx
SQS - Simple Queue Service
- Queue service for AWS
- Unlimited Throughput with no max messages
- 4 day message life
- Messages must be < 256kb
- OUT OF ORDER and sometimes messages may be processed twice
SQS Message visibility
- Specifically on Exam
- When a message is accessed by another service, by default it is “hidden” for 30s to be processed. This time can be modified & the service can request more time from SQS.
Long Poling
- Specifically on Exam
- Consumers of SQS messages can Long Poll a queue if there is nothing on it, they will instantly grab the message when it appears
- Long poling can last 1 to 20 seconds
FIFO queue
- First in First out queue system, enforces order.
SQS and ASG
- SQS Can trigger Autoscaling Groups through Cloudwatch
ApproximateNumberOfMessages
- SQS should be used to buffer Database writes
- Combo with ASG to scale DB EC2’s if large queue
SNS - Simple Notification Service
- 1 to Many message sender (Pub/Sub)
- Provider can send to a “Topic” and then Subscribers can read the messages
- Can target Email, SMS, HTTP, SQS, Lambda or Kinesis
SNS Fan Out Pattern
- Client sends message to SNS & then multiple SQS queues queue messages.
- i.e. Client sends order SNS, Shipping SQS & Inventory SQS then handle the action independently
- Reduce errors if client “misses” a queue
SNS FIFO
SNS Filters
- Subscribers can filter what messages they want to parse based on message content
- i.e. Only read failure messages etc
Kinesis
- Specifically on Exam
- Service for easily processing real-time data from IOT, Logs, Metrics, etc.
Kinesis Data Streams
- Service for streaming big data
- Uses “Shards” which are provisioned ahead of time and are your throughput limit
- Producers push messages into shards
- 1MB/s or 1000 messages/s per shard
- Consumers pull messages from shards
- Different types:
- Shared - 2MB/s per shard across all consumers
- Enhanced - 2MB/s per shard per consumer
- Retention is between 1 - 365 Days
- Data can be replayed
- Data is immutable & cannot be modified
- Two capacity modes:
- Provisioned - Manually set shard count and pay per shard per hour
- On-Demand - Automatically scales based on usage
- Scales automatically based in 30 day usage
- Pay per stream per hour
Kinesis Data Stream security
- Control access via IAM
- In-Flight & At-Rest encryption
- Can use VPC Endpoints to keep data off the public net
Kinesis Data Firehose
- Can pull data from many services and push it into S3, Redshift (S3 -> Redshift copy), Elasticsearch etc
- Can modify data optionally with Lambda
- Can optionally write all data (or only errors) to a separate S3 bucket
- Only pay for data pushed through the service
- Near real time, with no buffering
- Can set up to a 900s buffer with the minimum size being 1MB
Data Streams vs Firehose
Streaming for ingest at scale |
Load stream data into other services |
Write custom producer/consumer code |
Fully Managed |
Real time (~200ms) |
Near real time |
User manages sharding |
Auto scaling |
1 to 365 days data retention |
No storage or data retention |
Supports data replay |
No replay capability |
Ordering Kinesis data
- By using the same “Partition Key” you can ensure that data is processed in order
- If you don’t care about order, use a random key to distribute data evenly across shards
- Exam question example:
- If you have 100 trucks each sending GPS coordinates to a Kinesis stream how would you ensure that the data is processed in order?
- Use the truck ID as the partition key, this will ensure that the data is processed in order per truck. Since each trucks data will always go to the same shard - data on shards are processes FIFO.
Kinesis vs SQS - FIFO
- SQS Standard has no ordering
- SQS FIFO, without using GroupID will be first in first out
- If you use GroupID, you can ensure that messages in groups are processed in order
SQS vs SNS vs Kinesis
Consumers Pull data |
Push data to many subscribers |
Standard: 2MB per shard |
Delete after consume |
Up to 12.5m subscribers |
Enhanced: 2MB per shard per consumer |
Can have as many consumers as we want |
Data lost if not delivered |
Can replay data |
No need to provision |
Up to 100k topics |
Meant for Real Time big data |
Only Ordered if using FIFO |
No need to provision |
Shard level ordering |
Supports individual message delay |
Integrates with SQS for “Fan out” |
Retention for 1 - 365 days |
|
FIFO capability with SQS FIFO |
Provisioned or on demand available |
Kinesis Data Analytics
- Sits between Kinesis streams & firehose, can apply SQL statements to data in real-time
- Outputs back to a Kinesis stream or firehose
- Can reference S3 Data
- Pay for consumption rate
Kinesis Data analytics for Apache Flink
- Use Flink to process and analyse data
- Run Flink on a managed cluster
- Auto-scaling & backups provided
- Can ONLY read data from Kinesis streams & MSK
Amazon MQ
- SNS & SQS are “cloud native” and are AWS specific
- Traditional on-prem solutions use open standards like MQTT, AMQP, STOMP etc
- When migrating to the cloud, instead of porting everything to AWS services you can use Amazon MQ.
- Does not scale well since it requires dedicated servers
- Can run Multi-AZ for fail-over
- Requires EFS set up for shared storage
- Offers its own version of Queues (SQS) and Topics (SNS)
Containers
- Docker Containers
- Can run on ECS, EKS, Fargate, EC2
Amazon ECR
- Private Amazon docker repositories
- Amazon also offer a public “Gallery”
- Supports Vulnerability Scanning, versioning, tags, etc
- Access to images controlled by IAM
Amazon ECS
- When you launch containers on AWS it creates ECS tasks on an ECS cluster
- If using the EC2 Launch type, an ECS cluster is a group of provisioned EC2 instances
- Each EC2 instance must run the ECS Agent
- Fargate is a Serverless launch type for ECS
- No need to provision EC2 instances
- AWS will automatically handle running tasks based on desired CPU & RAM
- Sale by adding more tasks - not by adding more EC2 instances
- EXAM Prefers fargate
ECS IAM Roles
- Two separate IAM Role types
- ECS Instance Profiles (EC2 Launch types only)
- Used by the ECS Agent
- Allows the agent to register with ECS & make ECS API calls
- Allows sending logs to services like cloudwatch
- Allows pulling images from ECR
- ECS Task Roles
- Used by the Task itself
- Allows the task to make API calls to other AWS services
ECS Load Balancing
- ALB can be run in front of ECS Clusters assuming the ECS tasks expose HTTP/s endpoints
- NWLB is only recommended for High Throughput, Low Latency applications or with AWS Private Link
- CLB is not recommended and does not work with Fargate
ECS Data Volumes (EFS)
- Can mount EFS instances to ECS Tasks
- Tasks in teh same AZ can share data
- Fargate + EC2 = Serverless
ECS Autoscaling
- Automatically increase/decrease the ECS Task count
- ECS Auto-scales using AWS Application auto scaling
- Average CPU, RAM, Request count per target, etc
- Target Tracking: Scale based on Cloudwatch Metrics
- Step Scaling: Scale based on Cloudwatch Alarms
- Scheduled Scaling: Scale based on time of day
- ECS Scaling != EC2 Scaling in SC2 Launch type, it just launches more ECS Tasks
- Fargate Auto-scaling is handled by AWS, no need to set up ASG. It’s much easier and the preferred method on the exam
EC2 Launch Type Scaling
- Accommodate ECS service scaling by adding more EC2 instances
- Two Types
- Auto-scaling group scaling
- Scale ASG based on CPU utilisation
- Add EC2 over time
- ECS Cluster Capacity Provider
- Automatically Scales
- Paired with an ASG
- Add EC2 instances whe missing CPU/RAM for required tasks
- Preferred method
Amazon EKS
- Launch Kubernetes on services AWS - Fully Managed Service.
- Alternate to ECS, Open Source
- Can deploy EC2 or Fargate
- Use cases are mainly if you already have Kubernetes experience or need to run Kubernetes
- Cloud Agnostic, works on all cloud providers
- EKS Nodes run EKS Pods
- These are just EKS Tasks but named differently
- Node Types:
- Managed Node Groups
- Manages nodes (EC2) for you
- Nodes in ASG managed by EKS
- Supports on-demand and spot instances
- Self-managed nodes
- Nodes created by you then registered with EKS
- Can use pre-made AMI or roll your own
- Supports on-demand and spot instances
EKS Data Volumes
- Need to specify storage class manifest on EKS
- Leverages a Container Storage Interface (CSI) driver
- This acronym is on the exam!
- Supports multiple filesystems:
- EBS
- EFS
- Fargate only supports EFS
- FSx Lustre
- FSX NETAPP ONTAP
AWS App Runner
- Managed service for deploying web apps on AWS
- Provide either a container or source code as well as basic system requirements
- AWS will automatically deploy ASG/ALB/ECS etc. and provide you with a public endpoint URL
- Possible to also deploy inside a VPC
Lambda
- Virtual Functions - serverless compute service
- 15 minutes max runtime
- Only runs when invoked
- Pay per request + compute time
- Easy to give access to more RAM
- Up to 10GB of RAM
- Increasing RAM also increases Compute and Networking power
- Large language support
- Node.js, Python, Java, C#, GoLand, .NET, Ruby
- Custom Runtime API
- Rust (Community managed)
- etc…
- Lambda Container Images
- Must implement the Lambda Runtime API
- ECS/Fargate preferred for arbirtrary containers
- Lambda for quick functions
- Unless specifically called for EXAM prefers ECS/Fargate
Lambda Limits
- Per Region
- Execution is maximum 15 minutes and 10GB of RAM
- 4KB ENV space
- 512MB - 10GB storage exists at
/tmp
- Default 10k lambdas concurrently but can be increased via support
- Deployment limits
- 50MB Compressed
- 250MB Uncompressed
- Lambda Layers can be used to include larger dependencies
Lambda Snap Start
- For Lambdas running Java 11 or higher
- Increases startup speed at 10x no cost
- Init phase in java takes a long time so AWS will pre-start the function, snapshot RAM and DISK and then on the next invoke it will restore the snapshot.
Edge Functions
- Attach to Cloudfront
- Deployed Globally
- Runs on the closest edge location to a user
Cloudfront Functions
- Lightweight JavaScript functions
- High Scale low latency CDN customisations
- Only on Viewer Request & Viewer Response
- No Access to body
Lambda@Edge
- Viewer & Origin Request / Viewer & Origin Response access
- Node.js & Python support
- Can only be deployed in us-east-1
- 1000k requests/sec
Cloudfront Functions vs Lambda@Edge
Cache Key Normalisation |
Server ms execution time available |
Header Manipulation |
Adjustable CPU/Memory |
URL Rewriting |
Allows 3rd party libraries |
|
Network access for external services |
|
Access to request body |
Lambda in a VPC
- By default, Lambda functions are deployed in the generic AWS VPC (“Not in a VPC”)
- Possible to attach a VPC to Lambda using an ENI.
- Lambda will create the ENI inside your selected subnet
Lambda with RDS Proxy
- Lambda may open many connections to RDS which may not close.
- Highly recommend conneccting to RDS via RDS Proxy
- RDS Proxy is ONLY accessible from inside a VPC
Invoking Lambda from Aurora & RDS
- Possible to invoke functions from within the Database instance
- Allows processing data from within the Database
- Support by only RDS for Postgres & Aurora SQL
- Ensure lambda has the correct IAM permissions and allows inbound/outbound traffic rules set
- This is external from AWS events, AWS events do not include database data
DynamoDB
- Fully manage multi-AZ NoSQL Database
- Scales for millions of requests per second, infinite rows, 100TBs of Data
- Extremely low cost
- Two table types available - similar to S3 storage classes
- Standard
- Infrequent access
- Max 400KB object size
- Values can be
null
DynamoDB capacity modes
- Provisioned (Default)
- Specific read/write capacity ahead of time
- Pay per provisioned unit
- Possible to auto-scale
- On demand mode
- Capacity autoscaling
- Good for unpredictable and sudden workloads
- Only pay for what you use
DynamoDB accelerator (DAX)
- In-memory cache for DynamoDB
- Solves read congestion
- Microsecond latency on cached data
- requires no code changes
DynamoDB Stream processing
- Possible to run Lambdas whan a change is made to a DynamoDB table
- Ordered
- Two Options available
- DynamoDB Streams
- 24 Hour Data Retention
- Limited consumers
- Runs on Lambda
- Kinesis Data Streams
- Up to 1 year retention
- High # of consumers
- Connects to more AWS services
Global tables
- Two-way replication across regions
DynamoDB TTL
- Column contains a timestamp that when reached will delete the row
- EXAM will likely ask about this, usually around storing user web sessions for 24 hours
DynamoDB Backups
- Continuous PITR
- Optionally enabled for the last 35 days
- PITR at any point within the window
- Recovery generates a new table
- On-demand
- Manual backups, retained until deleted
- Can be configured in the AWS Backup service
- Recovery generates a new table
API Gateway
- Proxy HTTP to lambda
- Offers caching, throttling, logging, monitoring, API Keys, etc
- Supports Websockets
- Handles versioning APIs
- Integrates with:
- Lambda
- Any AWS API
- Any HTTP endpoint (external/internal)
API Gateway endpoint types
- Edge Optimise (Default)
- Requests routed via Cloudfront Edge locations
- Regional
- Requests routed to the closest region
- Manually combine with Cloudfront for more Cache control
- Private
- Only accessible from inside a VPC via an ENI
API Gateway Security
- Can authenticate users with IAM Roles, Cognito or a Custom Authorizer
Step Functions
- Build serverless visual workflows to orchestrate AWS services - often Lambda
- Possible to implement human intervention
Cognito
- Gives users identities to interact with AWS apps
- Two types
- User Pools
- Sign in functionality
- Integrates with API Gateway & ALB
- Identity Pools (Federated Identities)
Choosing the right Database
EXAM will likely ask about all of these from a high level
- RDBMS (=SQL/OLTP) - Great for joins
- NOSQL
- DynamoDB
- ElastiCache
- Neptune (Graph)
- DocumentDB (Mongo-esque)
- Keyspaces (Apache Cassandra)
- Object Storage
- Data Warehousing (SQL Analytics/BI)
- Redshift (OLAP)
- Athena
- EMR
- Search
- OpenSearch (Elasticsearch)
- Graph
- Ledger
- Time Series
Amazon KeySpaces
- For Apache Cassandra
- Analyse S3 data with SQL (Based on Presto)
- $5 per TB parsed
- SERVERLESS
- EXAM will ask about improving performance
- Use columnar data for cost savings
- Less scanning
- Recommend Parquet or ORC
- AWS Glue can handle conversion
- Compress data
- Partition datasets in S3
- Virtual columns
s3://data/year=2001/month=03/day=14
- Use larger files (> 128GB preferred)
Athena
- Serverless SQL Query service
- Query S3 data with SQL
Athena Federated Query
- Run queries on relational, non-relational dataset both on and off AWS
- Uses data source connectors which run on Lambdas
Redshift
- Based on Postgres (But not used for OLTP)
- OLAP
- 10x better performance than other data warehouses
- SQL interfaces
- BI tools like Quicksight integrate with Redshift
- VS Athena: Faster Queries, joins, aggregations due to indexing
- Leader Nodes are used for query planning & Compute nodes are used for performing the queries
- These must be provisioned ahead of time
- Snapshots to S3 - Only stores the changes
- Loading data with large inserts is preffered
- Via S3 or Firehose
- Can use JDBC on EC2s
Redshift Spectrum
- Query S3 data without having to load data into Redshift
- Must have a cluster already configured and running
- Query is submitted to thousands of nodes for processing
Amazon Elastic Map Reduce (EMR)
- Create Hadoop clusters (Big Data)
- Hundreds of EC2 instances make up clusters
- Bundled with Apache Spark, HBase, Presto, & Flink
- Autoscaling & Integrated with Spot instances
- Use cases include data processing, ML, web indexing and any big data operation
- Node Types:
- Master node manages cluster & health - must be long-running
- Core node manages tasks & stores data - must be long-running
- Task nodes (optional) only run tasks and can be spot-nodes
- Purchasing options include
- On Demand
- Reserved instances
- Spot Nodes
- Can be “Long Running” or “Transient”
- Transient nodes are only used for a single job and then terminated automatically
QuickSight
- Serverless BI application that creates dashboards
- Imports from most AWS services
- EXAM - In memory computation is possible using a SPICE engine if data is imported directly into quicksight
- Possible to set up column level security (CLS)
AWS Glue
- Managed ETC Service
- Serverless
- Transform data between formats and between services
- Exam: Convert CSV from an S3 Bucket into Parquet and store in another S3 bucket
Glue Crawler
- Crawls DB services and saves metadata that Glue can then use to perform jobs/actions
- Used by Glue Jobs, Athena, Redshift, EMR
- Glue job Bookmarks can prevent reprocessing data
- EXAM: Will likely ask about this
- Glue elastic views allow combining & replicating data across multiple sources using SQL
- No Custom code is required, auto monitors sources, serverless
- Glue Studio is a GUI form monitoring jobs, creating crawlers, etc
- Managed Service for making data lakes
- Discover, cleanse, transform, ingest data from S3, RDS, Relational & NoSQL DBs
- Can parse structured and unstructured data
- Offers “Blueprints” for ingesting data from AWS services automatically
- Offers fine-grain access control per column & row
- Exam -> If a company is using athena and quicksight to analyze data, users should only access their assigned data. Lake formation can sit between data sources & athena to allow for extremely fine grain permissions
Amazon Managed Streaming for Apache Kafka (MSK)
- Kafka is an alternative to Kinesis
- Fully managed
- Offers manual cluster provisioning
- Serverless option available
Kinesis vs Kafka
1MB Size limit on messages |
1mb default but up to 10MB |
Data streams with shards |
Kafka topics use partitions |
Shard splitting and merging |
Can only add patterns to topics |
TLS inflight & KMS at rest |
Plaintext or TLS inflight & KMS at rest |
Amazon Rekognition
- Find Objects, peoples, text in images using ML
- Facial analysis & search
- Can create a database of “Familiar faces”
- User cases involve:
- Labeling
- Moderation
- Text detecting
- user verification
- Pathing (Sports analysis etc.)
Rekognition moderation
- Examples in exam include
- Detecting inappropriate content
- Use on user generated content to creat a safe environment
- Requires a minimum confidence threshold to trigger
- Can trigger manual reviews using A2I
- Helps comply with regulations
AWS Transcribe
- Auto convert speech to text using ML
- Can automatically redact PII
- Can automatically detect languages
AWS Polly
- Convert text to speech
- Offers pronunciation lexicons to allow for user defined words & pronunciations
- Can generate audio using SSML (Speech Synthesis Markup Language)
Translate
- Language translation, allowing automated localisation
- Works well with extremely large swaths of data
Lex + Connect
- Lex is what drives Alexa
- Automatic speech recognition & Natural Language Understanding
- Good for chatbots
- Connect is a virtual call centre that can integrate with Lex
- Receive calls, route to agents, connect to CRMs, etc
- No upfront payments required
AWS Comprehend
- Natural language processing
- Can use ML to find insights in text such as sentiment, entities, key phrases, language, etc
Comprehend medical
- Can detect medical data specifically
- Converts plain text medical notes into structured data
Sagemaker
- Fully managed service to build and train ML models
- Will provision infrastructure for you
- Labeling, training, tuning, deploying, monitoring all handled by Sagemaker
Forecast
- Fully managed service for forecasting data
- Pulls historical data from S3 & many third party services
- Offers document search and allows indexing data and searching with natural language
Amazon Personalise
- ML driven data personalisation’s from data passed via S3 or an API
- eg. recommended products, content, etc
- Can output to API, SES etc
Cloudwatch
- Logging across many services
Cloudwatch Metrics
- Cloudwatch provides metrics for every AWS service
- Metric is a variable that can be monitored
- They belong to namespaces (service etc.)
- Dimensions are attributes on a metric
- Instance ID, Environment, etc
- Possible to create custom metrics
- Can stream metric data to Firehose or other 3rd parties
Cloudwatch Agent
- By default EC2 data does not go to CloudWatch
- Running the Cloudwatch Agent can send logs to Cloudwatch
- Can run on on-prem servers
Cloudwatch unified agent
- Collects additional Metrics
- Upgraded version of the basic agent
- Centralised config via SSM parameter store
Cloudwatch Alarms
- Alarms can be trigger by any Metric
- Forward alarm state to SNS, Lambda, etc
Cloudwatch container insights
- Summarise metrics and logs from a container
- ECS, EKS, EC2 Kubernetes, Fargate
Cloudwatch Contributor Insights
- Analyse logs & metrics to find anomalies
- Helps identify bad hosts / heavy network users
- E.g. VPC Flow Logs -> Cloudwatch Logs -> Contributor Insights -> Top 10 IPs by traffic
Event Bridge
- Offers a serverless event bus
- Scheduled Cron jobs
- can trigger Lambda, ECS, Fargate, etc
- Can respond/forward almost any AWS action/event
CloudTrail
- Provides governance, compliance and auditing for your AWS Account and is enabled by default
- Can view all events & API calls
- Multiple Event types
- Management events: Events that are performed on resources in your account
- By Default are analysed and logged
- Data events: S3 Bucket data events / Lambda execution events
- Not analysed and logged by default due to high volume
- Insight events: Detect unusual activity, will constantly analyse management events to create a baseline
- Will then alert on deviations from the baseline
- Events are stored for 90 days
- Can export to s3 to analyse later using Athena
- Possible to send events to Event Bridge to then trigger SNS or Lambda
AWS Config
- Helps by recording compliance & configuration changes over time
- Can answer basic questions like “Is there an S3 bucket open to the public?”, “is SSH enabled on this EC2?” etc.
- Can create custom rules to check for compliance like “Are any EC2s running that are not t2.micro”.
- Can trigger AWS events -> SNS etc
- No free tier, costs $0.003 per config per region then $0.001 per rule evaluation per region
- Cannot explicit change or fix any items that break rules but can trigger events to do so
CloudWatch vs Cloudtrail vs Config
- Cloudwatch
- Perform monitoring and create dashboards
- Events & alerting
- Log aggregation & analysis
- Cloudtrail
- Logs all API calls made to account
- Can define trails for specific resources
- Global service
- Config
- Record configuration changes
- Evaluate resources against compliance rules
- Can show a timeline of changes and compliance
AWS organizations
- Global services
- Allows managing multiple AWS accounts
- “Main” account manages all rules
- Consolidates billing across accounts
- Share reserved instances & price benefits
- API available to automate account creation
- Allows central cloudtrail aggregation
IAM Conditions
- Allows for finer grain permissions
- I.e. Block * except from from IP range
- Can limit based on tags, time, etc
- Block user access to EC2 start/stop unless tagged with “Analysis” etc.
IAM Roles vs Resource based policies
- Cross account access:
- Attach a resource based policy to the resource (S3 Policy etc.)
- OR use a role as a proxy
- When assuming a role you give up your current permissions and take on the permissions of the role
- When using a resource based policy you keep your current permissions and the resource policy is added on top
- SNS, SQS, Cloudwatch, Lambda etc use resource based policies
- Other services use IAM roles
IAM Permission Boundaries
- Can set the maximum permissions a user can have
- I.e. Boundary only allows
S3:*
but policy allows EC2:*
, final permission will block the EC2:*
permissions.
AWS IAM Identity Center
- Successor ot AWS Single Sign-on
- One log in for multiple AWS Accounts
- Also works with Windows EC2, Salesforce ETC as long as they support SAML2.0 Sign-in
- Can use a built-in identity provider or a 3rd party
- Active Directory, Onelogin, Okta etc.
- Org users assume IAM roles in accounts
- Can do attribute based control with IAM
- I.e. only allow users with a specific attribute to assume a role
AWS Active Directory services
- Managed windows active directory
- Exam requires an extremely high level understanding of different offerings:
- Managed AD: New AD on AWS with an established trust between on-prem AD
- AD Connector: Proxy to on-prem AD
- Simple: AD Compatible managed directory on AWS - Cannot be joined to on-prem AD
AWS Control tower
- Easily govern & secure multiple accounts
- organizations under the hood
- Automates account/env setup & policy management
- Provides a dashboard for compliance & security
- Allows configuring Guardrails to enforce policies
- Preventive: Uses SCPs (e.g. Prevent
us-east-1
resource provisioning)
- Detective: Uses AWS Config (e.g. Find untagged resources)
KMS
- Managed service for creating & managing encryption keys
- Most AWS services integrate for encryption
- Has API to allow for encryption in custom applications
- Backing key is rotated yearly
- Customer master key (CMK) can be used to encrypt data
- Two KMS key types
- Symmetric: Single key for encryption & decryption
- AWS Services use these
- Never leaves KMS
- Asymmetric: Public/Private key pair
- Used for encrypting/decrypting as well as signing/verifying
- Public key can leave KMS but private key never does
- good for external non-aws encryption
Multi-region KMS
- Can replicate keys across regions
- Good for cross-region encryption, exact key id & contents replicated
- Not recommend unless explicit required
- DynamoDB Global tables & Global Aurora require replicated keys for data encryption/decryption across regions
S3 Replication Encryption
- KMS Encrypted objects are not replicated by default
- You must specify a new key (or the same key) in the new region to re-encrypt the data
- Data is decrypted & then re-encrypted on replication
- You may get throttled by KMS if you have a large amount of data to re-encrypt, request a limit increase
Sharing encrypted AMIs
- If an AMI is encrypted in “Account 1” you must modify the AMI launch permission to specify that it can launch in “Account 2”
- You must also share the KMS key with “Account 2” to allow decryption of the AMI snapshot
- “Account 2” must also have the correct IAM permissions to use the KMS key
SSM Parameter Store
- Secure storage for configuration data & secrets
- Integrates with Cloudformation
- Can store plaintext, encrypted or secure strings
- “Parameter Policies” such as TTL incur an extra cost
Secrets Manager
- Different to parameter store since it can rotate keys
- Can connect a lambda function to generate new keys
- Secrets are encrypted using KMS
- Primarily used for RDS
- Multi-region secrets available
- Replication for disaster recovery
Amazon Certificate Manager (ACM)
- Easily Provision TLS certificates
- Does not work with EC2 instances
- Works with Cloudfront, ALB, API Gateway, etc
Web Application Firewall (WAF)
- Protect against layer 7 (HTTP) exploits
- Deploys in-front of ALB, API Gateway, Cloudfront, Cognito etc
- Allows defining ACLs for limits, headers, body etc
- Regional except for Cloudfront which is at edge
AWS Shield
- DDOS Protection
- Offers an advanced service for $3,000/month which can mitigate more complex attacks and provides 24/7 live support
AWS Firewall Manager
- Org wide firewall rules at a regional level
- Can manage WAF, Shield, Security Groups, NACLs etc across multiple AWS accounts
Guard Duty
- ML Threat detection through log parsing
- One click enable with a 30-day free trial
- Can ping AWS event bridge to trigger events
Amazon Inspector
- Automatic security assessments
- EC2 using SSM Agent
- ECR Images inspected on push
- Lambda inspected on deploy
- Will look for common CVEs on any installed packages
Macie
- Using ML on S3 data to detect PII
- Can ping Event Bridge to trigger events
VPCs
- Maximum CIDR per VPC is 5
- Soft limit of 5 VPCs per account
Default VPC
- New EC2 Instances are placed in the default VPC if there is no subnet specified
- By default, they have Internet access and IPv4 DNS names
Subnets
- AWS Reserves the first 4 and last IP in each subnet. The following addresses cannot be used:
*.*.*.0
- Network Address
*.*.*.1
- VPC Router
*.*.*.2
- Amazon Provided DNS
*.*.*.3
- Future use
*.*.*.255
- Network Broadcast Address (Not Supported)
- By default, EC2 instances are not assigned an IPv4
Internet Gateway
- Allows EC2 instances to access the internet
- Only one may exist in a VPC
- Does not magically enable internet access. You must first attach it to a VPC and then update the route table.
Bastion Hosts
- Allows external users to connect to private EC2 Instances
NAT Instances
- Outdated but potentially still in exam
- Allows EC2 Instances in Private Subnet to access the internet
- Nat instance exists in a public subnet and has an Elastic IP
- Route tables need to be configured to route traffic from the private subnet to the NAT instance
- AMI is available in the marketplace
- Not recommended for high traffic - Not Highly available
NAT Gateway
- AWS Managed NAT, High Bandwidth, Highly Available
- Pay per hour & per GB
- NATGW is created in a specific AZ
- Cannot be used by EC2s in the same subnet
- Highly resilient within on AZ but must create multiple across zones for high availability
NACL & Security Groups
- NACL intercepts all traffic into a subnet
- NACL is stateless so all traffic is monitored
- Security groups are stateful, so they allow through traffic that is a response to an outbound request
- NACLs are evaluated before security groups
- Default NACL allows all traffic
- NACL rules are evaluated in order & are weighted
- Recommend created a new NACL and associating it with the subnet instead of modifying the default
Security Groups vs NACLs
Instance level |
Subnet level |
Stateful (Return Allowed) |
Stateless (Always checked) |
All rules evaluated |
Weighted Rules |
VPC Peering
- Private connection between two VPCs via the AWS network
- Must not have overlapping CIDR blocks
- Can be cross account & cross region
- Route tables must be updated
VPC Endpoints (AWS PrivateLink)
- Allow instances in the VPC to access AWS services without needing to go over the internet
- Types:
- Gateway Endpoints
- Interface Endpoints
- Connect to services like SNS, SQS, etc
- Pay per-hour & per GB
VPC Flow Logs
- Capture IP traffic logs across VPC, Subnet, ENI
- Logs can be sent to S3, Cloudwatch, or Kinesis
- Possible to capture traffic from AWS managed services like ELB, RDS, Redshift etc
AWS Site to Site VPN
- Connect a datacenter to VPC
- Create a virtual private gateway (VPG) in your VPC
- Set up a customer gateway in your datacenter
- Configure the VPG with customer gateway IP
- OR use the public IP of the NAT device it is behind
- Route propagation must be enabled in your VPC route table for subnets to access the VPN
AWS VPN Cloudhub
- Allows access from multiple on premises connections
Direct Connect (DX)
- Provides a dedicated private connection from on-prem to your VPC
- Use-cases include:
- Increase bandwidth at a lower cost
- Consistent network performance
- Direct Connect Gateway allows connecting multiple VPCs across regions into a single DX connection
- Dedicated - 1, 10, 100 Gbps
- Physical ethernet port for customer
- Must be requested from AWS
- Hosted - 50Mbps to 1Gbps
- Virtual interface
- Can be requested from a partner
- Capacity can be increased on the fly
- Lead times for DX connections can often be long (1+ month)
- Residency is built in with multiple connections
- High - 2 DX connections across 2 different locations
- Maximum - 4 DX connections across 2 different locations
Site to Site VPN as a backup
- In the case that direct connect fails you can set up a site to site VPN as a backup connection, this is usually cheaper than having a second direct connect connection
Transit Gateway
- For transitive routing between VPCs & on-prem (Hub & Spoke model)
- Cross region/account support & peer-able across regions
- Only service which supports multicast
VPC Traffic Mirroring
- Allows capture & inspection of VPC traffic
- Will not affect original traffic since it mirrors it to another ENI or NLB
IPv6 for VPC
- All IPv6 addresses are public & internet accessible
Egress Only Internet Gateway
- Similar to NAT but only for IPv6
- Allows IPv6 traffic to go out to the internet but not in
Disaster Recovery
- Traditional: On-premises -> On-premises fail-over
- Hybrid: On-premises -> AWS fail-over
- AWS: AWS -> AWS fail-over
- Key-Words
- RPO: Recovery Point Objective
- Data Between RPO & Disaster is lost
- RTO: Recovery Time Objective
- Time between Disaster & RTO is downtime
- Different strategies
- Backup & Restore (High RPO)
- Backing up snapshots to s3 or using a snowball, may take time to restore & may lose lots of data depending on backup periods
- Pilot Light
- Critical systems are always running & are replicated
- Database & core systems - Nothing extra
- Warm Standby
- Full system is running cloned but is at the smallest scale possible
- Can be scaled up quickly
- Multi-Site (Low RPO)
- Full production clone running at all times
Database Migration Service (DMS)
- Quickly migrate databases to AWS
- Source remains operational during migration
- Supports Homogenous & Heterogeneous migrations
- Homogenous: Oracle -> Oracle
- Heterogeneous: Oracle -> Aurora
- Replication service runs from an EC2 instance
AWS Backup
- Fully managed service for managing backups across almost all AWS services
Elastic Network Adapter (ENA)
- Extremely low latency EC2 Networking
- 100 GBPs
- Used for high-performance computing
Elastic Fabric Adapter (EFA)
- Improved ENA but only works on Linux
- Great for inter-node tightly coupled workloads
AWS Parallel Cluster
- Open source cluster management on AWS
- Ability to auto-create VPCs, Subnets, etc
- EFA can be enabled on the cluster as a while
Amazon Pinpoint
- 2-way SMS, Email, Push notifications for marketing