Why can’t I get over log4j?

Image of Tanya Janca

I haven’t written in my personal blog in a while, and I have good reasons (I moved to a new city, the new place will be a farm, I restarted my international travel, something secret that I can’t announce yet, and also did I mention I was a bit busy?). But I still can’t get over log4j (see previous article 1, article 2, and the parody song). The sheer volume of work involved (one company estimated 100 weeks of work, completed over the course of 8 days of time) in the response was spectacular, and the damage caused is still unknown at this point. We will likely never know the true extend of the cost of this vulnerability. And this bugs me.

Photos make blog posts better. People have told me this, repeatedly. Here’s a photo, I look like this.

I met up last month with a bunch of CISOs and incident responders, to discuss the havoc that was this zero-day threat. What follows are stories, tales, facts and fictions, as well as some of my own observations. I know it’s not the perfect story telling experience you are used to here, bear with me, please.

Short rehash: log4j is a popular java library used for application logging. A vulnerability was discovered in it that allowed any user to paste a short string of characters into the address bar, and if vulnerable, the user would have remote code execution (RCE). No authentication to the system was required, making this the simplest attack of all time to gain the highest possible level of privilege on the victim’s system. In summary: very, very scary.

Most companies had no reason to believe they had been breached, yet they pulled together their entire security team and various other parts of their org to fight against this threat, together. I saw and heard about a lot of teamwork. Many people I spoke to told me they had their security budgets increased my multitudes, being able to hire several extra people and buy new tools. I was told “Never let a good disaster go to waste”, interesting….

I read several articles from various vendors claiming that they could have prevented log4j from happening in the first place, and for some of them it was true, though for many it was just marketing falsehoods. I find it disappointing that any org would publish an outright lie about the ability of their product, but unfortunately this is still common practice for some companies in our industry.

I happened to be on the front line at the time, doing a 3-month full time stint (while still running We Hack Purple). I had *just* deployed an SCA tool that confirmed for me that we were okay. Then I found another repo. And another. And another. In the end they were still safe, but finding out there had been 5 repos full of code, that I was unaware of as their AppSec Lead, made me more than a little uncomfortable, even if it was only my 4th week on the job.

I spoke to more than one individual who told me they didn’t have log4j vulnerabilities because the version they were using was SO OLD they had been spared, and still others who said none of their apps did any logging at all, and thus were also spared. I don’t know about you, but I wouldn’t be bragging about that to anyone…

For the first time ever, I saw customers not only ask if vendors were vulnerable, but they asked “Which version of the patch did you apply?”, “What day did you patch?” and other very specific questions that I had never had to field before.

Some vendors responded very strongly, with Contrast Security giving away a surprise tool (https://www.contrastsecurity.com/security-influencers/instantly-inoculate-your-servers-against-log4j-with-new-open-source-tool ) to help people find log4j on servers. They could likely have charged a small fortune, but they did not. Hats off to them. I also heard of one org that was using the new Wiz.io, apparently it did a very fast inventory for them. I like hearing about good new tools in our industry.

I heard several vendors have their customers demand “Why didn’t you warn us about this? Why can’t your xyz tool prevent this?” when in fact their tool has nothing to do with libraries, and therefore it’s not at all in the scope of the tool. This tells me that customers were quite frightened. I mean, I certainly was….

Several organizations had their incident response process TESTED for the first time. Many of us realized there were improvements to make, especially when it comes to giving updates on the status of the event. Many people learned to improve their patching process. Or at least I hope they did.

Those that had WAF, RASP, or CNDs were able to throw up some fancy REGEX and block most requests. Not a perfect or elegant solution, but it saved quite a few company’s bacon and reduced the risk greatly.

I’ve harped on many clients and students before that if you can’t do quick updates to your apps, that it is a vulnerability in itself. Log4j proved this, as never before. I’m not generally an “I told you so” type of person. But I do want to tell every org “Please prioritize your ability to patch and upgrade frameworks quickly, this is ALWAYS important and valuable as a security activity. It is a worthy investment of your time.”

Again, I apologize for this blog post being a bit disjointed. I wasn’t sure how to string so many different thoughts and facts into the same article. I hope this was helpful.

Discoveries as a Result of the Log4j Debacle

Me, pre-log4j
Tanya making a silly face.
Happier times, before I knew anything about log4j.

Over the past 2 weeks many people working in IT have been dealing with the fallout of the vulnerabilities and exploits being carried out against servers and applications using the popular Log4J java library. Information security people have been responding 24/7 to the incident, operations folks have been patching servers at record speeds, and software developers have upgrading, removing libraries and crossing their fingers. WAFs are being deployed, CDN (Content Delivery Network) rules updated, and we are definitely not out of the woods yet.

​Those of you who know me realize I’m going to skip right over anything to do with servers and head right onto the software angle. Forgive me; I know servers are equally important. But they are not my speciality…

Although I already posted in my newsletter, on this blog and my youtube channel , I have more to say. I want to talk about some of the things that I and other incident responders ‘discovered’ as part of investigations for log4j. Things I’ve seen for years, that need to change.

After speaking privately to a few CISOs, AppSec pros and incident responders, there is a LOT going on with this vulnerability, but it’s being compounded by systemic problems in our industry. If you want to share a story with me about this topic, please reach out to me.

Shout-outs to every person working to protect the internet, your customers, your organizations and individuals against this vulnerability.

You are amazing. Thank you for your service.

Let’s get into some systemic problems.

Inventory: Not just for Netflix Anymore

I realize that I am constantly telling people that having a complete inventory of all of your IT assets (including Web apps and APIs) is the #1 most important AppSec activity you can do, but people still don’t seem to be listening… Or maybe it’s on their “to do” list? Marked as “for later”? I find it defeating at times that having current and accurate inventory is still a challenge for even major players, such as Netflix and other large companies/teams who I admire. If they find it hard, how can smaller companies with fewer resources get it done? When responding to this incident this problem has never been more obvious.

Look at past me! No idea what was about to hit her, happily celebrating her new glasses.

​Imagine past me, searching repos, not finding log4j and then foolishly thinking she could go home. WRONG! It turns out that even though one of my clients had done a large inventory activity earlier in the year, we had missed a few things (none containing log4j, luckily). When I spoke to other folks I heard of people finding custom code in all SORTS of fun places it was not supposed to be. Such as:

  • Public Repos that should have been private
  • Every type of cloud-based version control or code repo you can think of; GitLab, GitHub, BitBucket, Azure DevOps, etc. And of course, most of them were not approved/on the official list…
  • On-prem, saved to a file server – some with backups and some without
  • In the same repos everyone else is using, but locked down so that only one dev or one team could see it (meaning no AppSec tool coverage)
  • SVN, ClearCase, SourceSafe, subversion and other repos I thought no one was using anymore… That are incompatible with the AppSec tools I (and many others) had at hand.

Having it take over a week just to get access to all the various places the code is kept, meant those incident responders couldn’t give accurate answers to management and customers alike. It also meant that some of them were vulnerable, but they had no way of knowing.

Many have brought up the concept of SBOM (software bill of materials, the list of all dependencies a piece of software has) at this time. Yes, having a complete SBOM for every app would be wonderful, but I would have settled for a complete list of apps and where their code was stored. Then I can figure out the SBOM stuff myself… But I digress.

Inventory is valuable for more than just incident response. You can’t be sure your tools have complete coverage if you don’t know you’re assets. Imagine if you painted *almost* all of a fence. That one part you missed would become damaged and age faster than the rest of fence, because it’s missing the protection of the paint. Imagine year after year, you refresh the paint, except that one spot you don’t know about. Perhaps it gets water damage or starts to rot? It’s the same with applications; they don’t always age well.

We need a real solution for inventory of web assets. Manually tracking this stuff in MS Excel is not working folks. This is a systemic problem in our industry.

Lack of Support and Governance for Open-Source Libraries

This may or may not be the biggest issue, but it is certainly the most-talked about throughout this situation. The question posed is most-often is “Why are so many huge businesses and large products depending on a library supported by only three volunteer programmers?” and I would argue the answer is “because it works and it’s free”. This is how open-source stuff works. Why not use free stuff? I did it all the time when I was a dev and I’m not going to trash other devs for doing it now…. I will let others harp on this issue, hoping they will find a good solution, and I will continue on to other topics for the rest of this article.

Lack of Tooling Coverage

The second problem incident responders walked into was their tools not being able to scan all the things. Let’s say you’re amazing and you have a complete and current inventory (I’m not jealous, YOU’RE JEALOUS), that doesn’t mean your tools can see everything. Maybe there’s a firewall in the way? Maybe the service account for your tool isn’t granted access or has access but the incorrect set of rights? There are dozens are reasons your tool might not have complete coverage. I heard from too many teams that they “couldn’t see” various parts of the network, or their scanning tools weren’t authorized for various repos, etc. It hurts just to think about; it’s so frustrating.

Luckily for me I’m in AppSec and I used to be a dev, meaning finding workarounds is second nature for me. I grabbed code from all over the place, zipping it up and downloading it, throwing it into Azure DevOps and scanning it with my tools. I also unzipped code locally and searched simply for “log4j”. I know it’s a snapshot in time, I know it’s not perfect or a good long-term plan. But for this situation, it was good enough for me. ** This doesn’t work with servers or non-custom software though, sorry folks. **

But this points to another industry issue: why were our tools not set up to see everything already? How can we tell if our tool has complete coverage? We (theoretically) should be able to reach all assets with every security tool, but this is not the case at most enterprises, I assure you.

Undeployed Code

This might sound odd, but the more places I looked, the more I found code that was undeployed, “not in use” (whyyyyyyy is it in prod then?), the project was paused, “Oh, that’s been archived” (except it’s not marked that way), etc. I asked around and it turns out this is common, it’s not just that one client… It’s basically everyone. Code all over the place, with no labels or other useful data about where else it may live.

Then I went onto Twitter, and it turns out there isn’t a common mechanism to keep track of this. WHAT!??!?! Our industry doesn’t have a standardized place to keep track of what code is where, if it’s paused, just an example, is it deployed, etc. I feel that this is another industry-level problem we need to solve; not a product we need to buy, but part of the system development life cycle that ensures this information is tracked. Perhaps a new phase or something?

Lack of Incident Response/Investigation Training

Many people I spoke to who are part of the investigations did not have training in incident response or investigation. This includes operations folks and software developers, having no idea what we need or want from them during such a crucial moment. When I first started responding to incidents, I was also untrained. I’ve honestly not had near as much training as I would like, with most of what I have learned being from on the job experience and job shadowing. That said, I created a FREE mini course on incident response that you can sign up for here. It can at least teach you what security wants and needs from you.

The most important part of an incident is appointing someone to be in charge (the incident manager). I saw too many places where no one person was IN CHARGE of what was happening. Multiple people giving quotes to the media, to customers, or other teams. Different status reports that don’t make sense going to management. If you take one thing away from this article it should be that you really need to speak with one voice when the crap hits the fan….

No Shields

For those attempting to protect very old applications (for instance, any apps using log4j 1.X versions), you should consider getting a shield for your application. And by “shield” I mean put it behind a CDN (Content Delivery network) like CloudFlare, behind a WAF (Web Application Firewall) or a RASP (Run-Time Application Security Protection).

Is putting a shield in front of your application as good as writing secure code? No. But it’s way better than nothing, and that’s what I saw a lot of while responding and talking to colleagues about log4j. NOTHING to protect very old applications… Which leads to the next issue I will mention.

Ancient Dependencies

Several teams I advised had what I would call “Ancient Dependencies”; dependencies so old that the application would requiring re-architecting in order to upgrade them. I don’t have a solution for this, but it is part of why Log4J is going to take a very, very long time to square away.

Technical debt is security debt.

– Me

Solutions Needed

I usually try not to share problems without solutions, but these issues are bigger than me or the handful of clients I serve. These problems are systemic. I invite you to comment with solutions or ideas about how we could try to solve these problems.