Recently we wrote an article about some of the best and simplest ways to reduce cloud spending — you can read it here. Our software engineering teams that work on cloud-based architectures strive to apply measures that cut spending without damaging performance. One case was particularly effective — a Godel data engineering team has applied measures to a clients AWS cloud setup which reduced costs by about 50%. For some services, such as AWS ECS Clusters and Kinesis, savings reached 80% on a mostly permanent basis.
Simply put, AWS is not about paying for what you use — it’s about paying for what you turn off. Disorganised or poorly managed setups can incur small costs which add up to big bills over time. For this client, cloud provisioning is essential to their core platform, but having so many environments running different services meant that costs were creeping up. The client decided to create a ‘cost optimisation group’ — a team of specialists from both the client and Godel that could take measures to reduce costs.
First for this new team was to analyse usage. It was essential to map out what services existed, why they were used and why they required the capacities they had. This exercise took place at the beginning of March 2020, so a major factor was that resources were being used less due to the pandemic. Therefore, temporary cost reduction measures could be taken to ensure services that weren’t in use were not being paid for.
More permanent measures were also required. Since the client had been operating the platform for several years, some resources were over-provisioned, misconfigured and legacy. It is sometimes a challenging task to unlock the details on these complex systems, especially since some of them had been discontinued. Nonetheless, once the analysis was conducted it became clear that major optimisation could be achieved.
The team worked on optimising various AWS services, including AWS Kinesis, AWS DynamoDB, and various ECS clusters. Most of these changes didn’t require much coding which meant results could be reached quickly. In most cases, the client had over-provisioned in AWS Kinesis shards for batch data processing, meaning some streams were sitting unused. To improve efficiency in this service, Godel ensured one stream would be used, rather than provisioning single use streams each time a new data pipeline was created.
DynamoDB presented its own challenges. This service stored a lot of data tables, and some were outdated, rarely used or completely dormant, but all were still being provisioned for. To optimise this service Godel took a three-pronged approach. Firstly, they removed the legacy tables that had fallen out of use. Then, tables that saw occasional use were switched to «on-demand» provisioning — a pricing plan, which was perfect for this group of tables, since it is based on usage. Finally, the team set up auto-scaling capacity across the service; since the tables were seeing unpredictable usage throughout 2020 this was the most cost-effective solution.
Another effective technique employed was scheduling using AWS Batch: with most of the client’s resources using ECS clusters, it was possible to schedule availability for only working hours rather than 24/7. This meant teams could be assured services were off by default during evenings nights and weekends — and had the option to turn things on if necessary.
An efficient cloud setup makes use of services in a cost-optimised fashion 24/7 — but as environments are built, it’s not as easy as it may seem to manage. Only by considering how each cloud service is used and taking unpredictable considerations into account, planning for efficient cloud performance becomes possible.