Not so long ago, I studied for the AWS Cloud Practitioner exam and I realized that even though there are tons of preparation material out there, there are not so many revision sheets. I really do find it useful after I am done with my initial deep dive into the study material to create revision sheets that serve as study material for the days immediately prior to the exam.
It’s something as simple as:
- Reading the study material and making personal notes of the main points that I find hard to remember by heart
- Going through the practice questions and complementing these notes with recurring themes from the practice questions
However, if you are about to do the exam and you have not prepared a revision sheet, worry not, because this is what this blog is all about.
In this blog I cover the compute, storage, and databases parts of the cloud practitioner exam. I decided to differentiate databases from storage (even though databases ARE storage :D), because the study material is a lot for these services and it’s an easy way to separate them in two groups.
In general the blog aims to group services that are closely related or deal with similar functionalities. Some of these services are notionally connected and are easy to confuse during the exam and these notes will hopefully help clear some of the confusion out.
In the compute section, we will focus on Amazon Elastic Compute Cloud (EC2) and its different types and feature. This section also includes a small peak into Amazon Machine Image (AMI).
Amazon Elastic Compute Cloud (EC2)
Definition: EC2 is scalable computing capacity in the AWS Cloud. It provides virtual computing environments, known as instances.
EC2 auto-scaling: it is a feature of EC2 that allows the adjustment of the number of EC2 instances (in or out) automatically, based on demand. It needs to be activated manually and it doesn’t automatically deploy AWS shield.
EC2 user data script: it is a way for a system administrator to specify a bootstrap script to be run on an EC2 instance during launch.
EC2 instance types:
- General purpose: balance of compute, memory, and networking resources
- Compute optimized: ideal for compute-bound applications that benefit from high-performance processors
- Memory optimized: designed to deliver fast performance for workloads that process large data sets in memory
- Storage optimized: designed for workloads that require high, sequential read and write access to very large data sets on local storage. Optimized to deliver low-latency, random I/O operations per second (IOPS) to applications
- Accelerated computing: used for floating point number calculations, graphics processing, or data pattern matching
EC2 instance purchasing options:
- On-demand: pay, by the second, for the instances that you launch
- Savings plans: commitment to a consistent amount of usage for a term of 1 or 3 years
- Reserved instances: commitment to a consistent instance configuration for a term of 1 or 3 years
- Spot instances: lets you request unused EC2 instances
- Dedicated hosts: workload runs on a specific, physical host (server) that is fully dedicated to running your instances
- Dedicated instances: pay, by the hour, for instances that run on single-tenant hardware
- Capacity reservations: reserve capacity for your EC2 instances in a specific AZ for any duration
Reserved instances with reserved instances the benefit is the reserved capacity and the reduced cost. The most cost-effective option to purchase an EC2 reserved instance is with a partial upfront payment option with a standard 3-year term.
Standard reserved instances: they provide the most significant discount. They can be modified, but can’t be exchanged.
Convertible reserved instances: they provide a lower discount than standard reserved instances. However, they also let you exchange one or more convertible reserved instances for another convertible reserved instance with a different configuration.
Spot instances: suitable for workloads that can be interrupted. Available at up to 90% discount compared to on-demand.
Dedicated hosts: suitable for existing server-bound software licenses. Allows an organization to bring their own licensing on host hardware that is physically isolated from other AWS accounts.
EC2 information relevant to the exam:
- EC2 is a service that can be deployed per Availability Zone.
- You cannot apply multiple roles in a single EC2 instance.
- A self-managed relational database can be installed on EC2.
- AWS Snowball Edge natively supports EC2.
- Linux-based EC2 instances have a minimum charge of a minute.
- In the EC2 section of the AWS console, you can also configure the Elastic Block Store volumes (EBS) and the Elastic Load Balancers (ELB).
Amazon Machine Image (AMI)
Definition: a supported and maintained imaged provided by AWS that can be used to launch a preconfigured EC2 instance.
AMI information relevant to the exam: user must use an AMI from the same region as that of the EC2 instance. AMI has no bearing on the performance of the EC2 instance.
In the storage section, we will explore different storage options.
Initially, we will focus on Amazon Simple Storage Service (S3), its different storage classes and functionalities. Then, we will focus on file storage options, like EFS and FSx. Further, we will look into the AWS Storage Gateway and the different gateway types it supports. Finally, we will zoom in the EC2 instance store and Elastic Block Store (EBS) and its features.
Amazon Simple Storage Service (S3)
Definition: object storage service offering scalability, data availability, security, and performance. It stores objects in buckets.
S3 information relevant to the exam:
- Can store virtually unlimited data
- Can be accessed via a REST API
- Stores objects comprised of key-value pairs
- Cannot be attached to an EC2 instance
- Data ingress is free for all classes
- Data transfer between S3 and EC2 instances within the same region is free
Amazon S3 storage classes:
- S3 Standard: high durability, availability, and performance object storage for frequently accessed data. There is a per GB/month storage fee + data egress fee.
- S3 Standard IA (Infrequent Access): suitable for data that are accessed less frequently, but requires rapid access when needed. There is a minimum capacity charge per object + a retrieval fee.
- S3 One-Zone IA (Infrequent Access): it stores data in a single Availability Zone (other classes store them in three). It is the class with the lowest availability. It costs 20% less than the Standard IA class. There is a minimum capacity charge per object + retrieval fee.
- S3 Intelligent Tiering: it automatically moves data to the most cost-effective access tier based on access frequency.
- S3 Glacier Instant Retrieval: it delivers the lowest-cost storage for long-lived data that is rarely accessed and required retrieval in milliseconds. There is a retrieval fee.
- S3 Glacier Flexible Retrieval: up to 10% lower cost than S3 Glacier Instant Retrieval. Suitable for archive data that is accessed 1-2 times per year and is retrieved asynchronously. There is a retrieval fee.
- S3 Glacier Deep Archive: lowest-cost storage class for data that may be accessed 1-2 times per year.
Glacier data access options: two options provided, the standard one (3–5 hours) or the expedited one (1–5 mins, unless the archive is bigger than 250MB).
Object lifecycle management: feature of S3 that enables you to set rules to automatically transfer objects between different storage classes at defined time intervals.
Amazon S3 replication:
- Cross account replication: the contents of one bucket can be replicated from one account to another bucket in another account.
- Cross region replication: used to copy objects across buckets in different regions.
- Source and destination buckets must have versioning enabled.
- Destination bucket owner must have the destination region enabled for their account.
Amazon Elastic File System (EFS)
Definition: a serverless, fully elastic file storage service.
EFS information relevant to the exam:
- It is used for storing data and mounted on EC2 instances (could be on many at the same time across different Availability Zones).
- It is using the Network File System (NFS) protocol.
- EC2 instances can access files on an EFS system across many Availability Zones, regions, and Virtual Private Clouds (VPCs).
- EFS Infrequent Access storage class: you pay a fee every time you read data from it or write data to it.
Amazon FSx for Windows File Server: it provides fully managed Microsoft Windows file servers, backed by a fully native Windows file system. It is using the Server Message Block (SMB) protocol.
AWS Storage Gateway
Definition: a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.
Gateway types supported by AWS Storage Gateway:
- Amazon S3 File Gateway: virtual on-premises file server, which enables you to store and retrieve files as objects in Amazon S3.
- Amazon FSx File Gateway: fast, low-latency on-premises access to fully managed, highly reliable, and scalable file shares in the cloud using the SMB protocol.
- Tape Gateway: it is a virtual tape library (VTL) of virtual tape drives and a virtual media changer. It can be used to backup data with popular backup software.
- Volume Gateway: represents the family of gateways that support block-based volumes, previously referred to as gateway-cached and gateway-stored.
EC2 instance store
Definition: it provides temporary block-level storage for the EC2 instance.
It is located on disks that are physically attached to the host computer. It is good for low-latency, fault-tolerant, architectures (data not persisting for long) and fast I/O performance.
Elastic Block Store (EBS)
Definition: it provides block level storage volumes for use with EC2 instances.
EBS information relevant to the exam:
- Raw block-level storage that can be attached to an EC2 instance and is used by Amazon Relational Database Service (RDS).
- It is a virtual hard disk on the cloud.
- Cannot be accessed simultaneously by multiple EC2 instances.
- EBS snapshots are stored on S3 (easiest way to store a backup on an EBS volume). They are stored incrementally (user is billed only for the changed blocks stored).
- Both root and non-root volumes of EBS are encrypted at launch time, if user enables it, not automatically, using Amazon Key Management Service (KMS).
In the databases section, we will cover the databases that are available on AWS, as well as some data processing services that show up along the databases on the Cloud Practitioner exam questions.
We cover relational databases services (RDS and Aurora), no-SQL databases (DynamoDB and DocumentDB), data warehouses (Redshift), and in-memory caches (Elasticache). We also cover a couple of data processing services (Elastic MapReduce and Athena).
Relational Database Service (RDS)
Definition: a collection of managed services that makes it simple to set up, operate, and scale relational databases in the cloud.
RDS information relevant to the exam:
- The user cannot access the database’s OS.
- It supports automated backups and software patching.
- Not global, but per AZ.
- It has a multi-AZ feature (AWS reliability pillar).
- Creates read replicas, that improve the database scalability.
RDS Automated backups:
- Full backup.
- Point-in-time recovery: restores to a specific point in time with a granularity of 5 mins.
- Incremental backup: taking a backup of items that have changed since the last backup.
Definition: a relational database management system (RDBMS) built for the cloud with full MySQL and PostgreSQL compatibility.
Aurora information relevant to the exam:
- Fully managed relational database engine, part of RDS.
- Compatible with MySQL and PostgreSQL.
- Automated clustering, replication, and storage allocation.
- Automated backups.
Definition: is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale.
DynamoDB information relevant to the exam:
- NoSQL database.
- Can be scaled dynamically without incurring downtime (automatic scale).
- Global tables: auto-replication across regions.
- Stores items that comprise of key-value pairs.
Definition: is a fully managed native JSON document database that makes it easy and cost effective to operate critical document workloads at virtually any scale without managing infrastructure.
DocumentDB information relevant to the exam:
- Fully managed native JSON document database.
Definition: uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning.
Redshift information relevant to the exam:
- Data warehouse for large volumes of aggregated data.
- Ideally suited to analytics using SQL queries.
Definition: is a fully managed, Redis- and Memcached-compatible caching service delivering real-time, cost-optimized performance for modern applications.
Elasticache information relevant to the exam:
- In-memory data-store or cache.
- NOT a global service.
- Supports reads (not writes).
- Latency of under milliseconds.
- Works both with Redis and Memcached engines.
- Elasticache cluster can be launched via AWS Management Console and AWS CloudFormation.
Amazon Elastic MapReduce (EMR)
Definition: cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto.
EMR information relevant to the exam:
- Big data solution for petabyte-scale data processing.
- Based on Apache Hadoop.
Definition: is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL.
Athena information relevant to the exam:
- “Analyze petabyte-scale data where it lives”.
- Used for querying data in Amazon S3 using SQL.
The preparation for the Cloud Practitioner exam can be overwhelming, due to the huge amount of information introduced, especially for someone just starting with the AWS cloud. However, the actual exam itself is pretty doable if you have a good overview and understanding of the AWS landscape — no need to get into specific service deep-dives at this stage of your learning curve.
I do hope that this revision sheet helped to put some structure in the final preparation prior to the exam and I wish you good luck!