Source: https://www.cultofmac.com/863106/install-bsod-wallpaper-mac-windows-blue-screen-of-death-day/
In an increasingly digitized world, recent events have starkly highlighted our vulnerabilities. When 8.5 million Windows computers were suddenly incapacitated by a single error, it became clear that our dependency on technology is fraught with risks. This article explores the implications of such failures, the role of corporate greed, and the looming challenges posed by AI integration. As both users and developers, we must take proactive steps to safeguard our data and ensure our systems' resilience.
What has happened?
Anyone who knows a little about computers has seen that last week 8.5 million Windows computers suddenly stopped working, showing the infamous Blue Screen of Death. There have been many articles about this incident, and while the details are important to some, the facts are straightforward: on a Friday, someone from a company called CrowdStrike updated one file in their security tool, and the error in that file compromised the Windows kernel, causing computers to stop working (unrecoverable error).
I don't want to focus too much on the incident itself and how recovery will take place (though it will take some time). What I want to discuss is what this incident revealed to us and to the general public: we are terribly underprepared for technology failures. This thought aligns with my concerns about AI and social media, and I want to share my thoughts on the current state of technology. As a software developer with many years of experience, I have a series of insights and suggestions.
Company Greed
We know companies are greedy; it's practically the foundation of capitalism. However, not all companies are born and raised equal. The reason behind so many computers breaking down is, first and foremost, a consequence of company greed. And don't get me wrong, we are not talking about a small company cutting corners due to financial struggles. We are talking about two companies: CrowdStrike and Microsoft. The first is worth billions (with a B) and the second trillions (with a T!).
Why do I say the problem derives from their greed? The reason is simple. As a software developer, I can assure you that there are multiple ways this could have been avoided, intercepted, and/or mitigated. Let's start with the most obvious:
- CrowdStrike updated one single file (in user space), and it was not read correctly by their main application (in kernel space). A company with proper procedures and automation in place could not have encountered this error. The fact that the main application was not extensively tested against a faulty file is inexcusable. The fact that a corrupted, malformed, or empty file was deployed is inexcusable.
- Microsoft Windows is a mess. Think for a second: it's your operating system, and you are not able to control it, run it in a workable safe mode, or roll back to a previous state. The fact that Microsoft, by its very controlling nature, does not allow you a degree of autonomy is not 'decent' in terms of respect for its users and customers.
But you were saying about AI
Yes, AI. This incident happened not only because of a wrong deployment but also because of a highly coupled tool with the base of your operating system. And, if I'm not mistaken, just a month ago both Microsoft and Apple announced that their respective operating systems would be heavily AI integrated. AI will monitor what you do, find information for you, and suggest things. This high level of automation (and what will likely happen in a couple of years) is completely dangerous. Most of us keep a significant amount of data on our computers, in our email, and in 'cloud' services, and the integration with AI tools will make us more dependent on these companies, not less.
This is scary because you will lose control and knowledge of how your system works, what it does, and where it stores its files. If something goes wrong for you (your iCloud/Microsoft/Google account is hacked or suspended), you might lose access to all your things. As the corporate world adopts more automated tools (and more controlling ones), you will be hopelessly linked to your corporation, and if anything stops working, nobody will question why or try to help you.
Even now, if you lose your Google Account password, you will not be able to speak with customer support that could easily verify you are the account owner and restore it (it happened to me, so I'm sure it's not a good situation).
The truth is, we have too much data, too many accounts, too many bills, emails, photos, and very little time (and knowledge) to even try to manage all of this. But we can do something, both as developers (if you are one and you are reading) and users. These are general suggestions, so they may not all apply to you. You don't have to follow all of them with the same degree of accuracy, but at least start being aware and take some actions.
What can a user do?
Here are a few ideas to at least try to mitigate the effects of encountering this sort of problem:
- Protect your accounts and passwords - This is the most obvious. All of us, to some degree, have many accounts. Some may be used only once and never logged into again. Keeping track of and updating access and passwords is crucial, not only for being in control of your data but also for keeping yourself safe from malicious attacks. You might think that some accounts are not worth your time, but even an old, unused account could be a potential security risk (maybe an e-commerce site you used once and that stored your credit card information). The first obvious step is to check on Have I Been Pwned if any of your accounts have ever been exposed. This is a good start to think about which actions you can take immediately. A step further would be to use a password manager: although both Google and Apple provide one integrated with their browsers, I recommend a third-party one like 1Password or LastPass. The first one is better, and its paid account has a reasonable price. The advantage of 1Password is that it allows you to use multiple accounts (if your company provides a Business 1Password account, they can work together without problems), and you can share a set of passwords between accounts. Also, if your company uses LastPass, the two plugins usually don't conflict too much with each other.
- Don't keep all your data in one place - This seems simple to understand and implement but can be tricky. Many of us have a Google, Microsoft, or Apple account, especially for email and main internet services (like Calendar, Drive). What I suggest is not being linked to only one of these. If you lose access, you would be cut off from your main email, unable to read email and reset other services' passwords. In this case, I suggest a free account with providers like Hey.com or Proton. You can easily redirect your existing emails to the new email account, so if you lose access, you'll still be able to receive emails and reset passwords for other accounts. Hey.com has a yearly subscription (and it's not particularly cheap), but it's privacy-driven and will reduce the amount of newsletter and spam emails you receive. Proton is more complete, and their Proton Drive could be a good cloud backup for your Google Drive, Apple iCloud, or Microsoft OneDrive. You'll probably need to sync the files manually, but a little safety is worth the effort.
- Think about privacy - Another step to keeping your data safe and in control is to review your main services, including social media, and spend some time on their privacy settings. Most services (like Facebook or Google) have a step-by-step tool to verify that your settings are correct, that no personal information is leaked, and that you have no old keys or devices connected. This may seem less important, but since AI feeds on public data, it becomes more important for your online presence.
- Backup! Backup! Backup! - Nothing can replace a good backup. Considering that a 1TB SSD costs around $100, you can get one (or two) and put all your photos, email, and documents on it. You probably need to update that physical backup only once or twice a year, so it's not a big effort. Remember, not only is your data yours, but a nice tool may stop working, be discontinued, or even forbid you to re-export imported data (I think of some photo-management tools that do that). As much as we keep old paper bills, it's a good habit to keep your digital paperwork in a nice and safe place.
- Have a Linux machine (or bootable USB key) - This might be too technical for some people, but to a certain degree having a Linux machine can save you if you have a big problem. It can be as simple as having a USB key with Linux, an old computer with Linux installed, or a home server like Umbrel or a simple Raspberry PI. Any of these solutions, combined with a backup, can make you operational in minutes instead of days.
What can a developer do?
If you are a developer, this might affect you differently. While the previous suggestions still apply with a slightly different focus (you could save your AWS credentials in 1Password, or have a Linux machine as your local test server), there are guidelines you can follow in your job to make your software as resilient as possible:
- Write more tests - This seems obvious, and maybe it is, but writing more tests is one of the key takeaways from what happened with CrowdStrike. Maybe your software is not as crucial as theirs, but if you write software for a living, you need to think of the value you are giving your customers. And if your tests only test simple cases, you should step up your game and have better and more diversified testing. This will make your software more reliable and allow you to manage future situations better.
- Identify points of failure - As a software developer, I have often seen this part overlooked. Considering how dependent we are on different tools and cloud infrastructure, I feel obliged to point out: you need to identify points of failure in your application. It could be as simple as your main database being inaccessible, or more complex scenarios, like: Would your app or website still operate without cache? What happens if the user review service or your CMS is offline? Does your app still work? If you have an e-commerce site, can the user still complete the checkout? Don't get me wrong, some services could be essential, like the payment system, but even in that case, you should just 'unplug' a service and see what you can do to keep the app operational or present the proper error messaging. For example, in the payment system, you could present a form to the user to be informed when the situation is back to normal, so they can complete the checkout they started.
- Create and test procedures - And this leads us directly to this point. Whatever your way of dealing with failures, you need to think about and implement procedures. Your database is offline? Have a script you can run that spins up a new database, loads the latest backup, and restores full functionality. Did you have the user insert their email from the previous example? Have a script in place that can send the user an email with the proper link to complete the checkout. Is your CMS not working correctly? Have an exported version of your content that can be read by your app as a fallback. But whatever your point of failure and the way to recover it, be sure to test and document it. And be sure to train other developers on your team on how to behave in such situations.
What can a company do?
If you own a company or work for one and have a say in its operations, here are a few other thoughts I would like to share:
- Print your stuff - It feels silly to say, but yes, you should print your important documents. It can be as simple as printing the bills of your services, your online bank statements, or your employees' contracts. In case of data loss (or inaccessibility), printed papers will still allow you to recover vital information. In some countries, it is required to keep these documents for up to five years, so it's a good practice.
- Prioritize security - As with regular users, but even more so if you are the owner/responsible party of a company, keep your employees and your users secure. This means it's your duty to implement software, procedures, and training to aim for the highest security in your company. We were talking about AI, and its use could affect your company: someone could use deepfakes to try to steal money or gain control of your infrastructure through an insecure computer or service.
- Own your data - This is probably the most crucial in the age of AI. Own your data. Don't delegate all of your data and its management to a third party, at least not exclusively. Always have a Plan B for your data to be stored safely and, if sensitive, to be inaccessible to AI tools or insecure software (an LLM could be hacked to return sensitive data hidden somewhere). This also means that whatever AI tool you use for your computer or infrastructure, think in terms of being able to detach from those tools at any time and retain as much data and knowledge as possible. For example, if you are using an AI tool to automate curriculum classification or an LLM in your app, be sure to save the original data and the LLM's response. In a few years, you might be able to write your own LLM and not depend on corporations.
- Test yourself offline - This is true for businesses that are not only digital. We have seen the effects of the CrowdStrike problem on hospitals, banks, and more. If you own any business that has real physical customers, be sure your business can continue running when computers don't work. It seems obvious, but it can be as simple as printing the daily orders on paper in the morning if you own a restaurant or having procedures in place to keep your business running with minimal or no internet connection.
Conclusions
As stated in the title, we are overwhelmed and incredibly unprepared for what technology is doing for us right now. In a future where AI will be used and integrated more and more, we might lose control of our data, become lazy, and not think of how to recover from data loss or inoperability. Unfortunately, these tools have evolved beyond our knowledge and capability to spend time learning how to use them effectively. But it's our duty (through small and big actions) to at least try not to be unprepared.
Published: Monday, Jul 29, 2024, 03:29 PM - Updated: Monday, Jul 29, 2024, 03:36 PM