Blog
AWS Serverless Data Stores
Kanhaiya Chhipa & Surbhit Shrivastava.
September 17, 2020
Back

Introduction

When designing and building an application, one of the key considerations is which database to use. A poor decision here can cost you severely, either by requiring costly ongoing maintenance of a database or by forcing a sensitive data migration to another solution. Scalability is another important aspect to keep in mind for handling variable application workloads.

Serverless Databases are gaining traction as they offer managed solutions to some of these traditional database problems. As part of this post we’re going to cover the following aspects of serverless databases to help you understand what they are and how they can add value to your solution: 

  1. Why Serverless Databases
  2. AWS Serverless Platform and Advantages
  3. Detailed discussion of few key categories (Relational and NoSQL) of AWS Serverless databases offerings.

“The phrase Serverless doesn’t mean servers are no longer involved. It simply means that developers no longer have to think that much about them. Computing resources get used as services without having to manage physical capabilities or limits.”

Serverless provides scalability. It also solves the waste of having servers up and running when no one needs them, by instantiating and running your business functions only when needed. 

Building a serverless app requires one to change some architecture paradigms, so it is important to understand how AWS implements serverless and what modules they provide for such an architecture.

Why Serverless Databases

Key use cases:

Unpredictable Workloads: This is probably the most obvious one so if you don’t know exactly what database load to expect for example News Site. As behavior is unpredictable so it would be nice if the database behind that website would scale up when traffic is high and then scale down all the way when traffic is low.

Cyclical Workloads: The classic example of this is a development or test database. QA activity happens mostly in day time. Consider this as an example then that workload is very cyclical so why not just shut down the database at night and then AWS will charge you nothing for that database at night because you’re not using it.

Intermittent workload: it’s not that the capacity is unknown but it’s more that you don’t know when that capacity will be needed. Example can be low-volume blog site where you want a system to just automatically scale up and down to the actual needs that you have.

AWS Serverless Platform



 Here are the most common serverless building blocks. Lambda for computing, API Gateway for microservice interfaces, S3 for object storage, DynamoDB as an operational database, Aurora Serverless as a Relational Database, SNS and SQS for messaging and queueing (decoupling), Step Functions for workflow management, Kinesis and Athena for streaming and analytics, and several others for development tooling.
Together these tools make it easy to build solutions to complex problems with low administrative overhead.

AWS Serverless Platform Advantages

Types of AWS Serverless Data Stores

           Amazon DynamoDB: Consistent, single-digit millisecond latency at any scale. Good for various type of internet-scale applications need to scale to really huge like petabytes of information, trillions of items and things like real-time bidding AdTech, gaming, IOT

Introduction to Aurora Serverless – a SQL based Serverless datastore

Amazon Aurora Serverless is an on-demand, auto-scaling configuration for Amazon Aurora (MySQL-compatible and PostgreSQL-compatible editions), where the database automatically start-up, shut down, and scale capacity up or down based on your application’s needs.

Relational DB capacity management: It’s hard

Capacity management is hard. Basically, it’s a moving target that changes over time.

Maintaining optimal database capacity requires:

Zero downtime database scaling requires:

How does Aurora Serverless Work?

In the case of AWS Aurora Serverless, it comes with an on-demand auto-scaling configuration. This means, the database starts up, scales capacity as per your application’s demand and shuts down when not in use.

What’s more? You run your database in the cloud without managing the instances or clusters. The Serverless Database model is built on the separation of storage and processing.

You create an endpoint, set up the minimum and maximum capacity if you like, and issue queries to the endpoint. This endpoint works as a proxy to a frequently scaled fleet of database resources. This empowers your connections to remain intact while scaling operations occur behind the stage.

The separation of storage and processing brings another benefit as well. You can easily scale down to zero processing and only pay for the storage requirements. Whenever your application demands, scaling happens in almost 5 seconds while building upon a pool of “warm” resources which are eager to serve your requests.

Features of Aurora Serverless

Serverless Databases come with some of the exciting features like:

Limitation of Aurora Serverless

Introduction to DynamoDB – a NoSQL based Serverless datastore

What is AWS DynamoDB?

Serverless

If you are confused, what does a Serverless NoSQL Database mean? Let me give you a quick overview. We use the term Serverless when we don’t need to manage any servers (Software Updates, OS Patching, OS Security & etc.) rather someone else manages them for us and provides an abstracted view. When it comes to DynamoDB, AWS manages the underlying infrastructure, software and provides us an abstract view of Tables, Indexes (GSI, LSI), Throughput, Auto Scaling and Security Policies which consists of high-level constructs for the NoSQL database.

With DynamoDB On-Demand, capacity planning is a thing of the past. You don’t have to specify the capacity upfront, and you pay only for usage of your DynamoDB tables. 

With DynamoDB, there are no servers to provision, patch, or manage, and no software to install, maintain, or operate.

DynamoDB automatically scales tables to adjust for capacity and maintains performance with zero administration. 

Availability and fault tolerance are built-in, eliminating the need to architect your applications for these capabilities.

Fast and Flexible non-relational database service for any scale

Common AWS DynamoDB Usage Principles

AWS DynamoDB is more suited for storing JSON documents and use as a storage for key-value pairs. Having multiple types of indexes as well as multiple types of query possibilities makes it convenient to be used for different types of storage and query requirements. The following list contains basic principles to follow when designing DynamoDB tables and queries.

  1. Don’t try to normalize your tables.
  2. Embrace eventual consistency.
  3. Design your tables, attributes, and indexes thinking of the nature of queries.
  4. Avoid using DynamoDB Scan operation whenever possible.
  5. Think about item sizes and using indexes effectively when listing items to minimize throughput requirements.

Limitation of DynamoDB

DynamoDB Streams and AWS Lambda

This is one of the most important enablers. Using DynamoDB streams with Lambda triggers to feed data from DynamoDB to other services (perhaps as part of a pipeline which could include a data lake).

The stream is sharded to scale out as throughput grows, and Lambda scales automatically as required to process the data and push it to the next step.

 Together, Lambda and DynamoDB streams provide reliable “at least once” event delivery: any “write” activity can become a trigger and Lambda can filter and take actions based on the change.

The stream is sharded to scale out as throughput grows, and Lambda scales automatically to process data and push it to the next step.

Any “write” activity can become a trigger and Lambda can filter and take actions based on the change.

This serverless example illustrates using DynamoDB streams with Lambda to feed data from DynamoDB to other services (perhaps as part of a pipeline which could include a Data Lake).

Patterns for Serverless Microservices

AWS DynamoDB is used for Serverless Microservices with different configuration patterns for various use cases.

Direct Access from RESTful API

The most common pattern for Serverless Microservices is to connect DynamoDB to an API Endpoint Code (Inside AWS Lambda) which is invoked through AWS API Gateway. It is also possible to directly connect DynamoDB to API Gateway if Microservice offers support for direct DynamoDB queries.

Note: It is also possible to invoke AWS Lambda as a RESTful endpoint if the client has AWS IAM credentials or AWS STS temporary credentials.

Event-Driven Updates

 DynamoDB also can be updated, based on events other than Direct Access from RESTful API. For example, DynamoDB can be used to store metadata of files uploaded to Amazon S3. Using S3 Upload Trigger, Lambda function can be invoked upon file upload which is able to update the DynamoDB table. A similar approach can be used to perform DynamoDB updates in response to Amazon SNS.

Data Synchronization Between Microservices

If there are the same attributes stored in multiple Microservices DynamoDB tables, you can use Amazon Simple Notification Service (SNS) Topics. Using Amazon SNS it is possible to inform attribute changes from one service to another without each of them knowing each other.

For example, let’s say Service #1 Company Profile Table and Service #2 Company Statistics Table shared company name attribute. If the company name is modified in Service #1 that change needs to be propagated to Service #2 Company Statistics Table. Knowing these requirements, it is possible for Service #1 to publish the attribute change using DynamoDB Streams and a Lambda function to the SNS topic. When the change happens the Lambda function in Service #2 subscribed to the topic will update the Company Statistics Table.

Client Use case: Employee Benefit Management System

 We have designed an Employee Benefit Management System for one of our clients using DynamoDB in conjunction with Lambda and API Gateway. The system is supported over both web & mobile platforms. This application is used by supported merchants to log transactions. Heavy write transactions per day are processed in DynamoDB at ease as writes are much faster in DynamoDB.

Introduction to Amazon RDS Proxy

 Amazon RDS Proxy is a highly available database proxy that manages thousands of concurrent connections to relational databases, allowing you to build highly scalable, secure serverless applications that connect to relational databases. 

AWS has worked hard to make relational databases work better in Serverless applications.

First, AWS released Amazon Aurora Serverless. This is a serverless version of the proprietary Amazon Aurora database that can automatically scale up and down according to your usage. This release helped with the pricing model issues around using a relational database.

Second, AWS announced improved VPC networking for AWS Lambda functions. This update greatly decreased the cold start latency for Lambda functions that use a VPC. This makes it more acceptable to use VPC Lambda functions in user-facing applications.

Finally, the Amazon RDS Proxy announced it handles the connection limits. Rather than managing connections in your Lambda functions, you can offload that to the Amazon RDS Proxy. All pooling will happen in the proxy so that you can handle a large number of connections in a manageable way

Benefits

Conclusion

In this post, we looked at the different factors you should consider in choosing a serverless database. Then we looked at a few categories of databases you may consider in your application.

The future of Serverless Databases looks promising. The features of this modern technology have enabled us to draw focus on essentials like real-time access, scalability, security, and availability.

There is no easy answer for which database you should choose in a serverless application. DynamoDB checks a lot of the boxes, but its steep learning curve and lack of flexibility have burned more than a few people. We still think it’s the right choice in most situations, but you have to make a call based on your team and application needs.