As the saying goes, the grass is always greener on the other side. The Cambridge dictionary defines this saying as “something that you say that means that other people always seem to be in a better situation than you, although they may not be.”
Photo by Claudel Rheault
This saying really hit home to me many years ago when I bought a home in the mountains of northern Arizona. I would look at my land and think, “Wow, I wish my range grass was as green as my neighbors.” Then I would walk to my neighbor’s and realize that he had the same mediocre grass I had. It’s all perspective. Looking from a distance, that picture-perfect meadow looked beautiful, but if you walked through that same meadow, you’d see all the mud, weeds, and bare spots.
The same thing is true for the technology you use to run your business. As a consultant, I get to assess and work in many environments. From the outside, these companies seem to have everything well-polished, secure, and resilient. But, in reality, they have just as much technical debt and outdated or mismatched tools as everyone else. It is just part of the business of IT.
Southwest Airlines during the 2022 holiday season is a great example. They’re a large company with thousands of employees and millions of customers that rely on them being able to schedule flights and the crews that fly them. All airlines have issues from time to time but who would have thought that neglecting IT systems would lead to such a big issue at the worst possible time? If we’re honest with ourselves, we all should have known this could happen… and will happen again.
I have been working in IT for decades (and consulting for many). Security and reliability have always been the most important areas that I focus on, whether that’s performing migrations or helping customers deploy new applications. In companies both large and small, there are certain things that I often see. Some of the most common issues are:
- Using on-prem monitoring and security tools in the cloud. Lift and shift has its place but not everything can, or should, be moved to the cloud verbatim. You’re already running on highly redundant hardware and the same rules don’t apply to meet the same SLAs. Especially with third-party appliances.Many of your vendors offer products that work well in the cloud. There are also native tools and solutions that work very well and provide most—if not all—the features of your on-premises solutions.
- Single points of failure, even in seemingly well-architected environments. I have seen customers that have well-architected environments with workloads distributed across multiple availability zones, but then they have a little service that runs on a single instance that the dev team has not addressed yet. Look for those single points of failure.
- Lack of visibility/monitoring. If a tree falls in the forest and no one is there to hear it, did it make a sound? Many clients lack proper monitoring and alerting in their cloud environment. AWS has many well-documented tool that are available to use. Some examples are AWS CloudWatch, AWS Security Hub, AWS Guard Duty, and more. These tools have no upfront costs and can provide the vital visibility needed to ensure your applications are running efficiently and securely.
- Account and user management. How many accounts a company has is often a well-thought-out decision. Do you divide workloads by environment, application, business unit? The list goes on. I can tell you if you only have one account and you have a production environment in the cloud, you don’t have enough accounts.Once you have more than one account, how do you manage them? AWS Organizations is there to help solve those management issues. AWS Control Tower, AWS SSO, and AWS Security Hub are all there to help manage your environment. Yes, these products have overhead and require time to manage, did you think operations tasks would disappear when you moved to the cloud?
- Lacking the ops side of dev ops. Dev ops allow developers to deploy quickly and speeds up application development. But when was the last time the environment was patched, or the container images updated? When was the base image of your instances created? Was image creation the last time patches were applied? Who is monitoring CPU and memory usage, or disk IO and the myriad of other things a good operations team deals with?
These are some of the most common issues I see. So, if you’re thinking all of these other companies have their act together, they might not. Everyone has technical debt, whether it’s in their code base or their infrastructure.
If you want more information about being well-architected and best practices in the cloud, email me at: jim.lentz@lightstream.io
How to address these issues in AWS:
AWS Well-architected reviews can help identify how well you follow best practices.
https://aws.amazon.com/well-architected-tool/
AWS CloudWatch can provide alerting and visibility into your environment.
https://aws.amazon.com/cloudwatch/
AWS Security Hub and related services can provide security visibility into your environment.
https://aws.amazon.com/security-hub/
AWS Systems Management Service can provide the operations component that might be missing.