Skip to main content

Friday deployment security risk

· 6 min read
Cichy
Maintainer of IanaIO - security

Pushing updates on Fridays poses a significant security risk.

On Friday, July 19, 2024, the world witnessed one of the most spectacular IT failures ever seen. A botched software update from the cybersecurity firm CrowdStrike Holdings Inc. caused countless Microsoft Windows computer systems around the globe to crash. Even if Microsoft's involvement was minor, the impact on its shareholders and customers was significant. On Monday, the stock dropped nearly $10, reaching $422 per share.

Omitting the Peter Principle is very dangerous

Analyst IanaIO attributed this decline primarily to the consequences of ignoring the Peter Principle, as well as a lack of basic security knowledge and proper deployment practices. Microsoft's conclusions from this failure do not address the root problem; instead, they exacerbate it, according to IanaIO's security expert Jaroslaw Cichon

"Omitting the Peter Principle is very dangerous, not only for company providers but especially for clients and their customers." ~ Jaroslaw Cichon

Source: [https://www.gsb.stanford.edu/faculty-research/publications/peter-principle-theory-decline]

Scalable Solution in Security is bad

Firstly, when faced with a security issue, the priority should be to identify the problem and neutralize the attack rather than publicizing it. Instead of adhering to the Peter Principle and avoiding system updates on Fridays, Microsoft implemented ineffective solutions. Rather than replacing the incompetent personnel in the security team responsible for the failure, Microsoft erroneously decided to outsource their security to third-party companies. This move complicates the identification of future failures outside their own environment, which is a highly problematic course of action.

Recognizing errors is more challenging with "Scalable Solution", despite its short-term speed benefits. This approach should be considered a temporary solution, which Microsoft has not explicitly stated.

Microsoft official statement

"We’re working around the clock and providing ongoing updates and support. Additionally, CrowdStrike has helped us develop a scalable solution that will help Microsoft’s Azure infrastructure accelerate a fix for CrowdStrike’s faulty update. We have also worked with both AWS and GCP to collaborate on the most effective approaches."

"We’re working around the clock and providing ongoing updates and support. Additionally, CrowdStrike has helped us develop a scalable solution that will help Microsoft’s Azure infrastructure accelerate a fix for CrowdStrike’s faulty update. We have also worked with both AWS and GCP to collaborate on the most effective approaches."

"This incident demonstrates the interconnected nature of our broad ecosystem — global cloud providers, software platforms, security vendors and other software vendors, and customers. It’s also a reminder of how important it is for all of us across the tech ecosystem to prioritize operating with safe deployment and disaster recovery using the mechanisms that exist. As we’ve seen over the last two days, we learn, recover and move forward most effectively when we collaborate and work together. We appreciate the cooperation and collaboration of our entire sector, and we will continue to update with learnings and next steps."

Learn more about how IanaIO addressed this issue. [https://security.iana.io/blog/ianaio-icss]

Microsoft Source: [https://blogs.microsoft.com/blog/2024/07/20/helping-our-customers-through-the-crowdstrike-outage/]

The often overlooked and unwritten rule of "do not deploy on Friday" has become not only a written rule but the most critical principle in software engineering security.

While many cybersecurity companies ignored the fact that the recent Meltdown at airports and global stock exchanges was caused by Microsoft with the involvement of CrowdStrike, a security company working with Microsoft, IanaIO - Security identified and prioritized this issue as a security vulnerability. This is because cybersecurity encompasses the protection of the customers of software providers, such as patients, investors, and travelers.

All Aspects of Security

This seemingly minor issue led to significant threats in several areas:

Cons:

Patient safety (due to disabled computers in hospitals) Financial security (risk of investors losing funds and disrupted liquidity due to disabled computers at global stock exchanges) Freedom of movement (disabled computers at airports)

According to reputable sources such as Bloomberg News

Bloomberg: Thousands of flights cancelled across the world after major Microsoft outage along with CrowdStrike. [https://www.bloomberg.com/news/articles/2024-07-19/microsoft-cloud-service-issues-disrupt-air-travel-operations?embedded-checkout=true] From ATMs to Flights, Epic IT Crash Leaves Trail of Chaos

  • Disruptions rippled across systems from Asia throught out Europe to the US (this inlcudes: UK Stock Market Exchange, Airports, Hospitals)
  • Issues triggered by a botched update of CrowdStrike software

Pros:

IanaIO, as a pioneering company in cybersecurity, identifies this issue as a security concern because it impacts not only the provider but, more importantly, the safety of the customers who use this software daily. This problem also underscores the benefits of decentralization over centralization. While centralization is easier to control, it poses a significant risk of a complete meltdown if something goes wrong, as recently demonstrated. Decentralizing systems can offer an additional layer of protection by requiring continuous monitoring for security issues. However, this is only effective if access to the SCS - Super Critical Systems, are isolated from the internet and restricted from third-party clients and is subject to constant scrutiny.

Broader Definition of Security Issues - Decentralization (Anti-Monopoly) vs. Centralization (Monopoly) vs. Application and User Software Security

Cons:

Corporations have deliberately overlooked the needs of people, prioritizing profits and influence. In cybersecurity, the worst practice is to transfer responsibility to external companies while intentionally disregarding the Peter Principle. Decentralized software is harder for a single company or individual to control. This is also why giants like Microsoft avoid it at the code level, masking their influence through agreements with other companies like CrowdStrike. They legally protect themselves from monopoly charges while endangering their customers' lives.

A monopolistic giants

A monopolistic giants can easily gain an advantage in financial markets by doing favors for the government, such as bypassing or not signing legislation that could weaken the monopoly and reduce control.

IanaIO Incorporates Unwritten Rules into Their Standards

Cons:

Therefore, IanaIO - Security designates this issue as a High-Level Alert for Application and Software Security and incorporates the rules: "Never deploy updates on Friday" and "Do not grant kernel-level access to third parties" into their security policy, standardizing these principles as top priorities for software and application security.

Why is it important to adhere to the "Never deploy on Friday" rule? [link do blog www.iana.io]

IanaIO's On Distributed Communications Chapter 2

· 4 min read
Cichy
Maintainer of IanaIO - security

On Distributed Communications Back to the Chapter One

Communication that could survive a nuclear war

In 1964, he published a paper on this concept titled "On Distributed Communications," which later contributed to the development of the ARPAnet, the research network that eventually evolved into the modern internet.

Paul Baran set out to build a means of communication that could survive a nuclear war. And he ended up inventing the fundamental networking techniques that underpin the internet.

In the early 1960s -- as an engineer with the RAND Corporation, the US armed-forces think tank founded in the wake of the Second World War -- Baran developed a new breed of communication system that could keep running even if part of it was knocked out by a nuclear blast. It was the height of the Cold War, and the nuclear threat was very much on the mind of, well, just about everyone.

Network of distributed nodes

Essentially, Baran devised a system capable of breaking communications into small packets and using a network of distributed "nodes" to transmit these packets. If one node failed, the remaining nodes would compensate.

Packet switching history

Baran's work anticipated what we now call "packet-switching," the fundamental technique for transferring information over the internet and its precursor, the ARPAnet. In a packet-switched network, data is divided into small blocks known as "packets." Although Baran didn't use this term, the network he envisioned utilized similar methods. "Paul Baran deserves recognition for conceiving this idea and demonstrating its potential benefits," says Vint Cerf, a key figure in the creation of the ARPAnet.

However, Baran wasn't alone in his thinking. Around the same time, Donald Davies at Britain's National Physical Laboratory was developing similar concepts and actually coined the term "packet-switching." Vint Cerf notes that the foundational design of the ARPAnet incorporated the work of both Baran and Davies, as well as significant contributions from Leonard Kleinrock, a UCLA professor, and Larry Roberts, an engineer.

The ARPAnet was an initiative funded by the Advanced Research Projects Agency (ARPA) of the U.S. Department of Defense. In the mid-1960s, ARPA enlisted Roberts to design the network, and Kleinrock was part of the UCLA team that, in 1969, transmitted the first message between the network's initial two nodes.

There is some debate over who should be credited for the packet-switching techniques that underpinned the ARPAnet, with some questioning the significance of Baran and Davies' contributions. "Baran was focused on developing communication strategies in the event of nuclear war and proposed packet-switching as one solution, but this research was somewhat distinct from the later internet," explains Marc Weber, founding curator of the Internet History Program at Silicon Valley's Computer History Museum. According to Vint Cerf, this controversy wasn't fueled by Baran himself, who was known for his modesty. "He was one of the smartest and most humble engineers," Cerf says. "He rarely took credit for much and was very clear that his work at RAND did not directly lead to the creation of the ARPAnet."

Jim Pelkey, who interviewed Baran, Davies, and Roberts in the 1980s, notes that Roberts devised the initial ARPAnet design before becoming aware of Baran's research. However, Pelkey also mentions that the two met in the mid-1960s, and some of Baran's ideas, including "hot-potato" routing—a method of quickly passing packets to their next destination—directly influenced the network's architecture. "This means that if you receive a packet, you should immediately forward it elsewhere," Cerf explains.

Chapter 2

Regardless of the controversy over Baran's role in the ARPAnet, his research undeniably marked a pivotal moment in network development, shaping the way networks are constructed to this day.

Learn more, Chapter 2: [https://security.iana.io/blog/fridaydeploymentsecurityrisk]

Resources: http://www.computerhistory.org

http://www.wired.com/wiredenterprise/2012/09/what-do-the-h-bomb-and-the-internet-have-in-common-paul-baran/historyofcomputercommunications.info

Wrong Privileged Binary on Production Systems

· 4 min read
Cichy
Maintainer of IanaIO - security

Wrong Privileged Binary on Production Systems

Note If you haven't read other posts, we strongly recommend doing so to gain a broader and often overlooked perspective on this issue.

  1. [https://security.iana.io/blog/ondistributedcommunications]
  2. [https://security.iana.io/blog/fridaydeploymentsecurityrisk]

You shouldn’t have an internet connected privileged binary running on your production systems, in other words, do not grant kernel-level access to third-party vendors.

In this security issue the case with all endpoint protection solutions in the market today. The case with anything privileged and internet connected. Not limited to endpoint protection solutions. It’s actually worse with endpoint protection because of false sense of security being a “cybersecurity” tool.

This is a wake up reminder that you shouldn’t have an internet connected privileged binary running on your production systems. What was a bad update could have easily been a massive adversary backdoor. A third party vendor will always be the weakest link.

Endpoint protection

That's not only the case with all endpoint protection solutions in the market today but the case with anything privileged and internet connected. Not limited to endpoint protection solutions. It’s actually worse with endpoint protection because of false sense of security being a “cybersecurity” tool.

Not if you control the deployment of updates. You deploy to test servers. Validate all is good then do full deployment. That is the problem. We no longer believe in test.

How do you control a system remotely if it doesn’t have access to the internet?

if i know you're going to install binary updates, and i've hacked the sigs, i nicely put a signed binary in the place you expect to find it, and wait for you to install it. it's definitely slower. but really not different.

Yes, but you cannot command and control the system after a malicious binary is installed because the system is not connected to the internet. I’m not following your argument.

i'm assuming in all scenarios, we have firewalls.

the argument is that if that machine is firewalled outbound to connect to a single update server, that's not "wide open".

the only difference between the two techniques is one gets updates slower than the other.

I still think that even in the scenario where your egress is strictly limited to only update servers, that having an egress to the outside world opens up the possibility for compromise of that pathway to command and control. If you have no path to the internet, then there is no path for command and control full stop.

Plus, letting things roll out slower and allowing for a bake in testing (even 24 hours) would prevent many things being discovered before promoting to prod.

if that was true then supply chain attacks wouldn't exist. but they do. i can continue to broadcast information to your machine every time i release a new binary. until you figure out the supply chain is corrupt your machine will dutifully upgrade on schedule

He’s talking about a real time continuous attack with a live connection. You’re talking about a virus.

This is why your production servers should not be connected to the internet.

No, supply chain attacks are more like malware and less like a virus. And yes, real-time becomes impossible if your server is offline, but then it's also not a target or even useful for an attacke

IanaIO's solution to this is Isolating Critical Systems

By using IanaIO's Cyber Security Standard, IICSS "IanaIO's Isolating Critical Systems Standard."

Elon Musk confirmed the security issue described by his employee, Christopher Stanley, who is responsible for cybersecurity at SpaceX.

IanaIO's solution for this security issue.

[link blog iana.io]

IanaIO's Security Standard ICSS - Isolating Critical Systems Standard

· 3 min read
Cichy
Maintainer of IanaIO - security

Use cases with different scenarios

How to keep kernel systems up to date with the latest critical patches across a 10k server farm

Scenario:

Some will say this is a wake up reminder that it's not possible to keep kernel systems up to date with the latest critical patches across a 10k server farm ... without some sort of privileged binary that can install updates. The problem the "do it all at once" default behavior.

Direct access to the internet is a bad idea.

Solution:

You can patch systems without them having direct internet access. This doesn't meant you can’t have privileged binaries, that wouldn’t make sense.

Super critical systems should be isolated.

Scenario:

If you have code that connects to a specific address, verifies the cert, verifies the binary is signed. Should be as good as some staging area. The problem is the lack of a rolling schedule. Default should be 2%, wait X hours, 10%, wait, etc.

Solution:

When I say to the internet, I mean outside of your network. Anytime you open your network to the outside world, even with certs, checks, etc - you open the door. Super critical systems should be isolated. I also agree with rolling updates.

System Hacks

Scenario:

Direct internet access to a host allows for command and control as the alternative option does not.

Solution:

there's not a huge meaningul difference between downloading a binary to a staging area and then installing it, or downloading it "more directly". it's not safer because you put it in a folder first.

it's safe because you rolled the updates and verified the binary

(although i do staging areas, because i run virus scanners on all binaries that enter my network)

Hack scenario with solution

Let's say I ship you a binary with a command/control in it, you install it. It's just slower.

How do you control a system remotely if it doesn’t have access to the internet?

'Hack'

Scenario:

Let say that I know you're going to install binary updates, and i've hacked the sigs, i nicely put a signed binary in the place you expect to find it, and wait for you to install it. It's definitely slower, but really not different.

Supply chain attacks

This is how all "supply chain" attacks happen by the way. People don't stuff binaries into other people's machines. They upload them to a package repo and wait for someone to pull them.

Temporary Solution:

IanaIO does staging areas, because we run virus scanners on all binaries that enter our network.

IanaIO's Security Standard - ICSS

What's the ICSS

IanaIO's solution to this security issue is Isolating Critical Systems by using IanaIO's Cyber Security Standard which is ICS "Isolating Critical Systems."

Wrong Privileged Binary on Production Systems You shouldn’t have an internet connected privileged binary running on your production systems, in other words, do not grant kernel-level access to third-party vendors.

Learn more: [https://security.iana.io/blog/toppriortysecurityissues#ianaios-solution-for-this-security-issues]