AWS Serverless Data Stores

Blog

Kanhaiya Chhipa & Surbhit Shrivastava.

September 17, 2020

Introduction

When designing and building an application, one of the key considerations is which database to use. A poor decision here can cost you severely, either by requiring costly ongoing maintenance of a database or by forcing a sensitive data migration to another solution. Scalability is another important aspect to keep in mind for handling variable application workloads.

Serverless Databases are gaining traction as they offer managed solutions to some of these traditional database problems. As part of this post we’re going to cover the following aspects of serverless databases to help you understand what they are and how they can add value to your solution:

Why Serverless Databases
AWS Serverless Platform and Advantages
Detailed discussion of few key categories (Relational and NoSQL) of AWS Serverless databases offerings.

“The phrase Serverless doesn’t mean servers are no longer involved. It simply means that developers no longer have to think that much about them. Computing resources get used as services without having to manage physical capabilities or limits.”

Serverless provides scalability. It also solves the waste of having servers up and running when no one needs them, by instantiating and running your business functions only when needed.

Building a serverless app requires one to change some architecture paradigms, so it is important to understand how AWS implements serverless and what modules they provide for such an architecture.

Why Serverless Databases

Key use cases:

Unpredictable Workloads (e.g. news site)
Cyclical Workloads (e.g. development and test database used on weekdays)
Intermittent workload (e.g. low volume blog site)

Unpredictable Workloads: This is probably the most obvious one so if you don’t know exactly what database load to expect for example News Site. As behavior is unpredictable so it would be nice if the database behind that website would scale up when traffic is high and then scale down all the way when traffic is low.

Cyclical Workloads: The classic example of this is a development or test database. QA activity happens mostly in day time. Consider this as an example then that workload is very cyclical so why not just shut down the database at night and then AWS will charge you nothing for that database at night because you’re not using it.

Intermittent workload: it’s not that the capacity is unknown but it’s more that you don’t know when that capacity will be needed. Example can be low-volume blog site where you want a system to just automatically scale up and down to the actual needs that you have.

AWS Serverless Platform

Here are the most common serverless building blocks. Lambda for computing, API Gateway for microservice interfaces, S3 for object storage, DynamoDB as an operational database, Aurora Serverless as a Relational Database, SNS and SQS for messaging and queueing (decoupling), Step Functions for workflow management, Kinesis and Athena for streaming and analytics, and several others for development tooling.
Together these tools make it easy to build solutions to complex problems with low administrative overhead.

AWS Serverless Platform Advantages

No server management
Flexible scaling
High availability
No idle capacity

Types of AWS Serverless Data Stores

Serverless Key-Value Database

Amazon DynamoDB: Consistent, single-digit millisecond latency at any scale. Good for various type of internet-scale applications need to scale to really huge like petabytes of information, trillions of items and things like real-time bidding AdTech, gaming, IOT

Serverless Relational Database
- Amazon Aurora Serverless (MySQL and PostgreSQL compatible edition): Database automatically starts up, shut down, and scale capacity up or down based on your application’s needs. Good for the traditional type of application like classic business applications ERP, CRM, e-commerce

Amazon RDS Proxy: Highly available database proxy that manages thousands of concurrent connections to relational databases, allowing you to build highly scalable, secure serverless applications that connect to relational databases.

Introduction to Aurora Serverless – a SQL based Serverless datastore

Amazon Aurora Serverless is an on-demand, auto-scaling configuration for Amazon Aurora (MySQL-compatible and PostgreSQL-compatible editions), where the database automatically start-up, shut down, and scale capacity up or down based on your application’s needs.

Performance and Scalability: 5 times faster than standard MySQL and 3 times faster than a standard PostgreSQL. Scale out up to 15 read replicas.
Availability and Durability: Fault tolerant, self-healing storage, 6 copies of data across 3 AZs, continuous backup to S3
Highly Secure: Network Isolation, encryption at rest/transit.
Fully Managed: It is fully managed. you don’t need to provision any hardware. You don’t need to take backups. You don’t need to do patching any of that.

Relational DB capacity management: It’s hard

Capacity management is hard. Basically, it’s a moving target that changes over time.

Maintaining optimal database capacity requires:

Fleet monitoring, knowing when to scale up before it’s needed.
Predicting your scaling requirements, months or years into the future.
Manual efforts to spin up and down databases for the intermittent workloads

Zero downtime database scaling requires:

Provisioning a secondary (larger) database fleet.
Complex application layer cut-over logic
Easy to get wrong by forking write workload.

How does Aurora Serverless Work?

In the case of AWS Aurora Serverless, it comes with an on-demand auto-scaling configuration. This means, the database starts up, scales capacity as per your application’s demand and shuts down when not in use.

What’s more? You run your database in the cloud without managing the instances or clusters. The Serverless Database model is built on the separation of storage and processing.

You create an endpoint, set up the minimum and maximum capacity if you like, and issue queries to the endpoint. This endpoint works as a proxy to a frequently scaled fleet of database resources. This empowers your connections to remain intact while scaling operations occur behind the stage.

The separation of storage and processing brings another benefit as well. You can easily scale down to zero processing and only pay for the storage requirements. Whenever your application demands, scaling happens in almost 5 seconds while building upon a pool of “warm” resources which are eager to serve your requests.

Features of Aurora Serverless

Serverless Databases come with some of the exciting features like:

Supports Multi-tenant Architecture. You don’t have to individually manage database capacity for each application in your fleet. Aurora Serverless manages individual database capacity for you.
Instead of provisioning and managing database servers, you specify Aurora Capacity Units (ACUs). Each ACU is a combination of processing and memory capacity.
You can choose to pause your Aurora Serverless DB cluster after a given amount of time with no activity. The DB cluster automatically resumes and services the connection requests after receiving requests.
It supports automatic multi-AZ failover.
The cluster volume for an Aurora Serverless cluster is always encrypted. You can choose the encryption key, but not turn off encryption.
You pay by the second and only when the database is in use.
You can share snapshots of Aurora Serverless DB clusters with other AWS accounts or publicly. You also have the ability to copy Aurora Serverless DB cluster snapshots across AWS regions.

Limitation of Aurora Serverless

You can’t give an Aurora Serverless DB cluster a public IP address. You can access an Aurora Serverless DB cluster only from within a VPC based on the Amazon VPC service.
Aurora Serverless doesn’t support the following features:
- Loading/saving data to/from an Amazon S3 bucket
- Invoking an AWS Lambda function with an Aurora MySQL native function
- Multi-master clusters
- Database cloning

Introduction to DynamoDB – a NoSQL based Serverless datastore

What is AWS DynamoDB?

AWS DynamoDB is a Fully Managed, Multi-region, Multi-master, durable NoSQL database. In AWS DynamoDB no Database administration is required. Amazon DynamoDB automatically spreads the data and traffic for tables over a sufficient number of servers to handle throughput and storage requirements. JSON formatted documents can be stored as items in AWS DynamoDB.
It is a key-value and document database that delivers single-digit millisecond performance at any scale.
DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second.

Serverless

If you are confused, what does a Serverless NoSQL Database mean? Let me give you a quick overview. We use the term Serverless when we don’t need to manage any servers (Software Updates, OS Patching, OS Security & etc.) rather someone else manages them for us and provides an abstracted view. When it comes to DynamoDB, AWS manages the underlying infrastructure, software and provides us an abstract view of Tables, Indexes (GSI, LSI), Throughput, Auto Scaling and Security Policies which consists of high-level constructs for the NoSQL database.

With DynamoDB On-Demand, capacity planning is a thing of the past. You don’t have to specify the capacity upfront, and you pay only for usage of your DynamoDB tables.

With DynamoDB, there are no servers to provision, patch, or manage, and no software to install, maintain, or operate.

DynamoDB automatically scales tables to adjust for capacity and maintains performance with zero administration.

Availability and fault tolerance are built-in, eliminating the need to architect your applications for these capabilities.

Fast and Flexible non-relational database service for any scale

Common AWS DynamoDB Usage Principles

AWS DynamoDB is more suited for storing JSON documents and use as a storage for key-value pairs. Having multiple types of indexes as well as multiple types of query possibilities makes it convenient to be used for different types of storage and query requirements. The following list contains basic principles to follow when designing DynamoDB tables and queries.

Don’t try to normalize your tables.
Embrace eventual consistency.
Design your tables, attributes, and indexes thinking of the nature of queries.
Avoid using DynamoDB Scan operation whenever possible.
Think about item sizes and using indexes effectively when listing items to minimize throughput requirements.

Limitation of DynamoDB

DynamoDB does not support complex relational queries such as joins or complex transactions.
DynamoDB is not suited for storing a large amount of data that is rarely accessed. S3 may be better suited for such use cases.
You cannot select the Availability Zone for your DynamoDB table.
Default replication of data for availability and fault tolerance is only within a region

DynamoDB Streams and AWS Lambda

This is one of the most important enablers. Using DynamoDB streams with Lambda triggers to feed data from DynamoDB to other services (perhaps as part of a pipeline which could include a data lake).

The stream is sharded to scale out as throughput grows, and Lambda scales automatically as required to process the data and push it to the next step.

Together, Lambda and DynamoDB streams provide reliable “at least once” event delivery: any “write” activity can become a trigger and Lambda can filter and take actions based on the change.

The stream is sharded to scale out as throughput grows, and Lambda scales automatically to process data and push it to the next step.

Any “write” activity can become a trigger and Lambda can filter and take actions based on the change.

This serverless example illustrates using DynamoDB streams with Lambda to feed data from DynamoDB to other services (perhaps as part of a pipeline which could include a Data Lake).

Patterns for Serverless Microservices

AWS DynamoDB is used for Serverless Microservices with different configuration patterns for various use cases.

Direct Access from RESTful API

The most common pattern for Serverless Microservices is to connect DynamoDB to an API Endpoint Code (Inside AWS Lambda) which is invoked through AWS API Gateway. It is also possible to directly connect DynamoDB to API Gateway if Microservice offers support for direct DynamoDB queries.

Note: It is also possible to invoke AWS Lambda as a RESTful endpoint if the client has AWS IAM credentials or AWS STS temporary credentials.

Event-Driven Updates

DynamoDB also can be updated, based on events other than Direct Access from RESTful API. For example, DynamoDB can be used to store metadata of files uploaded to Amazon S3. Using S3 Upload Trigger, Lambda function can be invoked upon file upload which is able to update the DynamoDB table. A similar approach can be used to perform DynamoDB updates in response to Amazon SNS.

Data Synchronization Between Microservices

If there are the same attributes stored in multiple Microservices DynamoDB tables, you can use Amazon Simple Notification Service (SNS) Topics. Using Amazon SNS it is possible to inform attribute changes from one service to another without each of them knowing each other.

For example, let’s say Service #1 Company Profile Table and Service #2 Company Statistics Table shared company name attribute. If the company name is modified in Service #1 that change needs to be propagated to Service #2 Company Statistics Table. Knowing these requirements, it is possible for Service #1 to publish the attribute change using DynamoDB Streams and a Lambda function to the SNS topic. When the change happens the Lambda function in Service #2 subscribed to the topic will update the Company Statistics Table.

Client Use case: Employee Benefit Management System

We have designed an Employee Benefit Management System for one of our clients using DynamoDB in conjunction with Lambda and API Gateway. The system is supported over both web & mobile platforms. This application is used by supported merchants to log transactions. Heavy write transactions per day are processed in DynamoDB at ease as writes are much faster in DynamoDB.

Introduction to Amazon RDS Proxy

Amazon RDS Proxy is a highly available database proxy that manages thousands of concurrent connections to relational databases, allowing you to build highly scalable, secure serverless applications that connect to relational databases.

AWS has worked hard to make relational databases work better in Serverless applications.

First, AWS released Amazon Aurora Serverless. This is a serverless version of the proprietary Amazon Aurora database that can automatically scale up and down according to your usage. This release helped with the pricing model issues around using a relational database.

Second, AWS announced improved VPC networking for AWS Lambda functions. This update greatly decreased the cold start latency for Lambda functions that use a VPC. This makes it more acceptable to use VPC Lambda functions in user-facing applications.

Finally, the Amazon RDS Proxy announced it handles the connection limits. Rather than managing connections in your Lambda functions, you can offload that to the Amazon RDS Proxy. All pooling will happen in the proxy so that you can handle a large number of connections in a manageable way

Benefits

Pool and share database connections for improved application scaling
Manage application data security with database access controls
Fully compatible with your database
Fully managed database proxy
Increase application availability and reduce database failover times

Conclusion

In this post, we looked at the different factors you should consider in choosing a serverless database. Then we looked at a few categories of databases you may consider in your application.

The future of Serverless Databases looks promising. The features of this modern technology have enabled us to draw focus on essentials like real-time access, scalability, security, and availability.

There is no easy answer for which database you should choose in a serverless application. DynamoDB checks a lot of the boxes, but its steep learning curve and lack of flexibility have burned more than a few people. We still think it’s the right choice in most situations, but you have to make a call based on your team and application needs.