Hunting Bugs in Your Sleep - How to Fuzz (Almost) Anything With AFL/AFL++
1:54PM Jul 26, 2020
Welcome back. Our friends at E FF are constantly standing up for digital rights and fighting endless battles. We at FAU are holding a fundraiser for E FF this year. Please do support as we want to continue to support them in whatever battles they are fighting with and helping the global community. finding bugs in a program gives you the opportunity to research further potential vulnerabilities and exploit them. Our next talk is hunting bugs in your sleep. How to fuzz anything with AFL and AFL plus, plus my brother. He's a cybersecurity student and have several projects related to virtualization containers, IoT security, and binary exploitation. So let's welcome grown up to deliver his talk today.
Hey, everybody, I hope you're all doing well. My name is Ron, I am going to be giving a presentation on getting started with AFL or AFL plus plus, or it's alternatively called How to hunt bugs in your sleep. So a caveat before I move forward. I've given this talk a couple of times before but I've never use the PowerPoint program I'm using now which is called sent and I've never done a pre recorded talk. So I haven't written this talk for that scenario, but hopefully it works out just fine. Obviously, if you have any questions, just note them down and I'll be here for the q&a after the talk is done. So who am I, as I just said, I'm varane. I've been an undergrad for 12 years because I couldn't decide what I wanted to do for a long time. My actual job is a data scientist. I work for a large international corporation doing Power BI and Tableau and Python stuff for them, basically anything they need Data Wise, but that's not really my passion. It's just a job that pays the bills. On the side. I do research in cyber security, and I guess you could call it hacking. I would like to pivot to that one day soon. And kind of a way I've started that is at UNC Charlotte, we have a hacking club ethical hacking club called the 49th Security Division, which this upcoming semester I'll be the president of previously, I was the executive coordinator. You can look them up. It's a really cool group. We do talks and presentations and projects, all related to cybersecurity and hacking. But enough about me, let's go ahead and talk about what we're going to talk about. So first thing I'm going to do is just talk a little bit about fuzzing. This audience is probably more familiar with fuzzing than other audience. As I presented to so I'll try to make that brief. And again, if you have any questions of what I covered, feel free to let me know. Then I'm going to go over AFL and AFL plus plus, I'm going to do a demo. And then I'm gonna do a couple of other things I don't want to spoil right away, and then wrap up. So let's go ahead and get started with the fuzzing intro. If you don't know, fuzzing is technically just brute forcing all it is is sending data into an application with the hope of getting the application to do something interesting. So you can see there for the second bullet point, bullet point, I put random in quotes. That's because it's not. It's technically brute forcing. But fuzzers have gotten so sophisticated now, that how they mutate the data before they try again, to insert it into a program is it's smarter. Let's say it's smarter. It's not just like a B, C, a BB, etc. When you think of just brute forcing, it's a little more sophisticated than that. So it's not really random. It's pseudo random. So because of that, the way that fuzzing works. The more time you have, the more tests you can run. So for example, the longest recorded fuzzing job that I could find, if you know of one that was longer, please let me know, was for the Windows Vista operating system tools, Microsoft fuzzed, that program or the series of tools for 15 months, before they decided to stop and I think into the 14th month, they were still finding unique crashes and hangs. So that's kind of an idea of you have to know when to stop. We'll talk more about that in a little bit. But people run these jobs for a very long time. I've run jobs for months before, I know people who run jobs for six months at a time some more. So more time has more tests. So the only things you need are the first two are obvious. Something to fuzz and a fuzzer. The last two which aren't very obvious or you need hardware. Most people use dedicated hardware for fuzzing for a reason I'll go into in a little bit, and then patience, but like I said on the last slide, you have to know when to stop. So we're going to talk about AFL in a little bit And AFL has a tool built into it to kind of help you with this. It basically tells you, hey, you've been fuzzing this for a while, and nothing interesting has happened. So now might be a good time to stop. And I'll show you that as well, when we when I do the demo. So if you look at the landscape of fuzzers, you can break them into two different categories, dumb fuzzers and smart fuzzers dumb fuzzers on the left of the screen, those are what I was talking about earlier where that is actual brute forcing where you just go ABC, a BB CC, throwing that input or that data into a program, hoping to get it to crash.
It just it doesn't take any feedback. It doesn't take any instruction. It just says cool. I'm going to throw some random things at this program with a hope to get it to crash. And then on the right side, there's smart fuzzers which can break down again into targeted fuzzers and feedback driven fathers. So as I mentioned before, the one we're going to talk about is AFL down in the bottom right? But there are other great fuzzers out there like peach fuzz that's one that people use a lot soli is another one that It's actually I think fully written in Python that people use a lot. Those are called targeted fuzzers. Though I think peach fuzz has a feedback driven fuzzing capability built into it. But either way, all that means for targeted fuzzing is just, you have an idea of what the input should look like. So you give it a template, and then it fuzz is based on within the bounds of that template, the feedback driven fuzzers, which are the ones the ones we're going to look at, what those do is they take the data, they see if anything interesting happened in the program. I'll explain how they do that in a second. And then if something interesting does happen, it takes whatever data it put in to get the interesting thing, it pulls it back, mutates it and then tries it again, or it adds to the back of the queue to try again. So we can go ahead and start talking specifically about AFL and AFL plus plus, if it's not obvious with the naming convention, AFL plus plus is just Well, a new world of AFL AFL was written by a Google employee mccalls Aloo. ski, if you're familiar with Puff the server analysis tool, he wrote that as well. And he abandoned that project a while back, abandons the wrong word for it. He stopped contributing to the project and encourage other people to contribute to it. So with these tools, you can start fuzzing Seelye tools immediately. There's actually a really impressive trophy case I'll show you here. This is from McCall Zalewski, his website his handle is L cam tough. And these are just a few of the programs that have been fun with AFL that have returned bugs that have now been patched. You can see you'll see names you recognize in here like the open BSD kernel, you'll see tour, you'll see ettercap there's a bunch of things in here. And this is an all of them. I've found bugs using AFL that are not in this trophy case, you can go to L cam tough sight l cam tough core dumped cx to read more about that. So with some words You can fuzz almost anything as well, which is what makes AFL and AFL plus plus so cool. Because out of the box, they work on ci tools immediately, like I said, but you can get them to fuzz a network. You can throw network packets at a at an application. You can fuzz things you don't have the source code to you can fuzz things that are non x86. That takes a little bit of tweaking, but it's doable. So this slide, this is an image from L Cam tuffs. Site mccalls lu skis site. And what this is, is a JPEG that was generated by AFL. So this might seem a little complicated. So if you have a question about this, just mark it down. And I'll I'll answer it at the end. But what happened here was McCall Zalewski to prove that AFL was a good feedback driven fuzzer he was fuzzing the DJ peg library. And the first file he started with the first and only file he started with, in what's called the seed corpus, the files you use. To begin your first job was a text file. I don't remember what it said it was something like hello world. And it was just a txt file, just plain text. AFL took that threw it at the D JPEG library, one of the tools I don't remember which one specifically, it was fuzzing after it got that feedback from the program saying nothing interesting happened. It mutated it and mutated it and mutated it. And I think this took about a I think it was eight hours of fuzzing and a mutated that text file in such a way that it created what you're looking at right now, which is an actual JPEG. So AFL was able to determine it could get further in the program by well, making something look like a JPEG. I don't know how close to exact the header was for the JPEG file after was mutated to this level. But you could open it and render the file to look just like what you're seeing now and ended up getting a bug or a crash rather out of D JPEG. So how does this work in AFL? I'm just going to be for right now I'm just gonna be talking about basic white box fuzzing which means you have access to the source code. With that source code, you can compile a program using AFL custom GCC, or ceiling compilers. And what those compilers do is they take the basic blocks of each program, I'll talk more about basic blocks in a moment. And it adds these flags to them. So AFL can track which basic blocks it's passed. And then it keeps an execution chain of those basic blocks. So it knows that I do this before that I access as part of the code before Is this a new path we found, etc. And again, that'll make a little bit more sense on the next couple of slides.
once you instrument that code using AFL, you have to get some seed files together. Like I said, That's called your seed corpus, like in the previous example, with the JPEG mccalls. Zalewski started with just a text file, you can start with anything. Obviously, you want to have a smart seed corpus. You want to have things like smaller files are always better files that you know might do something interesting already. So for example, if in that previous example Because those keys started with a JPG file, he would have gotten further in eight hours than if he started with the text file. And as I mentioned previously, because this is a feedback driven fuzzer, once a new path is found, whatever input found that new path is put to the it's mutated first then put to the back of the queue to be to be thrown to the program again. And every now and again, AFL has this cleanup process where it trims files down until the checksum of the execution path is affected. So I'm talking about execution path that might seem a little incomprehensible. And we talk a little bit more about that. Well, if you take this image as an example, the code on the left, anyone in the audience familiar with coding at all will recognize that this is just a simple program, you see exactly what it's doing. If x equals one execute foo, if y equals two, execute, execute bar, and then after everything returned, so if you take the case of x and y, both equaling three, the only parts of the code That will be hit are the first check for x, the second check for y and then return. So on the right, I've labeled all these what are called basic blocks. There's an academic definition for basic block. And I think it's in the assembly code, it has to have one entry point one exit and no jumps. I think that's correct. But that's not that's beyond my expertise, let's say. So I've labeled them and I'm showing you in green, which parts of the code would be hit if x and y are both equal to three. And then at the very bottom, on the left, you can see I've put the execution chain there a see those parts of the code are hit, if x was equal to one, that execution chain, or if X and Y were both equals one, the execution chain would be a, b, and E instead of A, C and E. So how AFL determines if it's found a new path, is it takes that execution chain and compares it to the previous execution chains that have already been found. And if they don't match It hashes them and then creates a checksum. And if the checksums don't match, it says, Oh, we found a new path. That means the input was interesting. Let's take whatever input found this new path, let's reintroduce it to the queue. And let's throw it back at the program again to see if we can do something else interesting. So this is more just for your reference. But this is effectively what the algorithm for AFL does. It's basically what I've already explained, it takes a candidate throws it at a program. If something interesting happens, it mutates it and puts it to the back of the queue. If I didn't make that clear enough earlier, when I'm saying it finds something interesting. AFL determines something interesting as either a new path, a hang in the program, or a crash. So I'm going to do a quick cut to a demo. Fortunately for me, I don't have to do this live. I've previously done it live, so nothing should go wrong, but if something does, I guess that's caused to be alarmed. Okay, so I'm going to run through a quick demo of AFL to show how easy it is to get started. Let me increase the screen size or the text size here rather. Because I work with students a lot, I write a lot of scripts to help people get started for people who aren't as comfortable compiling from source code. So I've actually written a script to get AFL downloaded and installed on your system. But I've also set it up so that it shows what commands are being run. So all you have to do is grab it from L cam, tough site. And the script, of course, will be available. it downloads a terrible tarball version of the program. And then you just sudo make Are you back you're supposed to make and sudo make install. I don't know why I wrote sudo there, and that's going to compile for us and install it on our system. Very straightforward. If you've done this before with a program this isn't going to be an issue for you. And like I said, if you haven't I can make this script available. So anyway, AFL is now installed. As I mentioned before, or I should have mentioned before, sorry if I didn't. AFL comes with a bunch of different tools as well. And there are two Really important ones, which are tests minimiser and corpus minimizer, which I will demo here. Okay, that's installed. That's good.
So next, what I'm going to do is I'm going to install an old version of SQL lite that I've worked on before. This is really simple. All this is going to do is create some directories and then download the source code for sequel lite to and install it on my system. I'm going to take the test cases that AFL provides for you, and I'm going to use those in my example. I'll show you those before we actually get started. Anyway, the only main thing that matters is once this command runs where it installs the sequel light source code, you're going to see that I've changed the CC options. So I can compile sequel lite using aflbs c compilers. I think that's about to pop up now. And I'll go over what the output looks like. Yes. So here you see I'm running configure from within the sequel lite directory. And I've changed The c++ compiler to AFL g c++ compiler so that they're wrapped compilers. What I'm going to use, that's what's going to insert those flags. So AFL can keep track of the execution chain. As this compiles, you're going to see, well, it's hard to see because it's going by quickly, you'll see AFL CC. As for I think that's the linker. By the way, this is being compiled with AFL right now. So those flags can be inserted, so we can much more easily fuzz the program. Well, I guess it's mainly so we get more we get better feedback whenever we find the program. So the test cases AFL comes with, it comes with some examples here. You can see there's all different kinds of files and formats and whatnot. And what I did in that script I just showed was, let me CD in the targets directory, SQL Lite.
Move them into a folder called in
There's a SQL file, there's a text file, there's image files, there's all kinds of things. And these are the things I'm going to throw at SQL lite to try to get some interesting crashes. And if nothing happens, AFL will mutate those files. And it will try to make something interesting happen with the mutated version. But let me talk about SQL lite first. Anyone familiar with SQL lite, or just SQL in general, you'll know that to create a table, you just run, create table, and then table name doesn't make a difference what it is. And then I don't even know if I need a parameter there. Okay. Do you need a parameter and then there you go that the table is created. So I'm going to drop that now. Okay, so one of the cool things about SQL lite though, is that you can throw commands at it from a file. So if I were to run something like SQL lite, and then I were to Oh, sorry, I guess I have to create the file first.
Let's move it in here.
File and then I want to run SQL eight.
So we got no errors there. So if you didn't pick up what I was doing, basically, I was just taking a SQL command putting into a file and then directing it into SQL Lite. That doesn't work. If you don't pipe it in, obviously, it just runs SQL lite and then never gets to the second part of the instruction. That's pretty apparent. Now the reason why I'm bringing that up is because how you configure something to be fuzzed with AFL does make a difference. So let me go ahead and just show a basic fuzz job really, really quickly. I mean, that we have to make this little bit smaller actually. So I'm just gonna,
hopefully that's still visible.
So all we have to do is we run AFL FAS use AI for input that's a required flag over output. Also a required flag and if you don't remember, I created two directories here. You're called in and out. So that's what I was using for those. I'm not just choosing those names arbitrarily, you're the names of the directories. In this in this directory, the in file, the in directory rather, has all of our test cases, and the out directory is empty. And we're going to just put it in the path to our target, which is SQL Lite. And then what AFL does is if you use this parameter, the to add symbols together, it replaces those two at symbols with the desired input. So remember how I just ran that command with SQL lite, where I didn't have the redirect operator. That's effectively what this is saying this is saying fuzz this but run it like SQL light, plus whatever the input is on the command line. And you'll see ignore all that error output for now I'm going to talk about that in a second. You'll see that what happens is, we have no new paths, we have 40 total paths because those are the that's the number of files. We started with, but we have no new paths. And this will run forever. I've tried it for a very long time. And nothing new will happen because of the way we ran it. So you do have to be careful about this. If you're running SQL late, or any program, really, if you remove those at symbols, AFL tries to pass the input into the program, like what we saw with the redirect. So hopefully that's clear. Again, normally, when I do these live presentations, people can ask questions freely, because it's, it's difficult to tell if this is being explained well or not, unless you can see audience feedback. But if I put those at symbols, it's saying, run this entire command with the input on the command line. If I remove them, it says, run this command and try to pass the input into the program. So I just run it like this. Again, ignore that error, but we'll talk about that a little bit. You'll notice that our total paths are increasing significantly, and in just a few seconds, we'll hit some unique crashes, but anyway While we're waiting for that to happen, I'll go ahead and just talk a little bit about what you're seeing here on the screen. So process timing, that's fairly straightforward. I think. It's obviously just saying, just statistics about your job about how long it's been running when the last unique crash was found, etc. Overall, overall results. Sorry, that's also fairly straightforward. Now we're seeing some activity here. So the paths are what I talked about earlier with that execution chain. That's what's being kept track of there. The unique crashes are straightforward, unique hangs is straightforward, what's not as straightforward as the cycles. So remember, I mentioned earlier that AFL has a built in tool to help you determine if you have been running your job for too long. A cycle is complete whenever all test cases in the queue are gone through. So this number will increase but it will turn yellow whenever all test cases in the queue have been gone through and no new paths were found. So basically, it's saying Okay, let's say for example, I've got a I've got a queue of 100 files. If it goes through all of those, and this number doesn't increase at all, this will turn yellow, signifying, hey, you know, you've probably done all you can with this cycle progress. This is Oops, sorry, cycle progress. This is exactly what it sounds like stage progress. I didn't go into this either, because just for time purposes, but AFL has, I think, six, maybe eight. Anyway, more than five different paths that it or I'm sorry, different stages that it uses to manipulate a file. The one running right now is havoc, which is what it sounds like, it's pretty much just not a random mess. What a pseudo random mess. Well, there's other things like bit flipping arithmetic. Again, you can read more about that on AFL AFL documentation. But these are all just methods of manipulating the file. There we go bit flip arithmetic, and they go in a certain order. whenever it's working deterministically like this, I haven't set it to do anything random. I'm actually not going to explain the remaining fields and the only reason for that is because of time. But frankly, those fields aren't as interesting unless you're super technical. The ones I've gone over already are all the ones you'll need if you want to find bugs and crashes with AFL. So I'm going to go ahead and stop this by hitting Ctrl C. And let's take a look at one of these crashes looks like so let's go into the as I mentioned before, we had the end directory which had our test files. Now we have the out directory, which was empty, but it shows a bunch of other things as well. Whoops. These are all the unique crashes that we found. Let's go ahead and open one up. Notice cat one out, I think that's probably the best. Let's see what our options are 28 Why not? 28 seems good.
Okay, so this is interesting, because as we said earlier, we can pass data into SQL lite from a file. So what's cool about this is even though we didn't have a lot of test cases that were looked anything like SQL commands, AFL has created these malformed. SQL commands for us that are causing SQL lite to crash. So if I were to run SQL lite, and then pass in this file,
get a crash. And so what what that's doing again, is executing these lines in SQL Lite. And what we have here is there's some information, some debugging information, but either way we've crashed the program. This is actual data that crashes or an actual file that will crash an instance of SQL Lite. And it really is that simple to get started fuzzing with AFL. All you have to do is like I said, either take the scripts that I have or download AFL or FL plus plus itself. If you download AFL, you get it from L cam, tough site, if you download AFL plus plus you get it from GitHub, but it's that simple. You can start fuzzing in less than five minutes, if you already have a target available that is. So I want to talk about two more things. Like I mentioned, it's the minimizer series from AFL. So let's go ahead and say let's look at our input directory. Let's list it out like this. So all of these files, you can tell by the bytes here in this column, sorry, in this column. These are all fairly small files. Anyway, small movie is actually a little too big. So is this one here, small archive. But these are all fairly small anyway. But if I want to take the seed corpus, I have the initial test files I have, and weed out the ones that either duplicate the execution chain, or don't do anything at all, I can use AFL corpus minimizer series or functionality. And how you do that is you have to give it an input file, which is we're going to use in and then you give it a new output file. It could exist already, but just make sure there's nothing in it and then run the command. Or I'm sorry, write the path of what you want to fuzz. So when we do that,
you can see that we have 40 files just like we had before you know that we have 40 test cases. And what this is doing is it's taking them or running them all through AFL FAS whoops sorry, running them through AFL fuzz and trying to determine what files are duplicating the execution chain and which are unnecessary. And you can see now we've narrowed it down to 16 files that do something interesting. So if I had started with this, I would have actually found those crashes much more quickly because I had, what's that 24 files that were just doing duplicate work. So it's always good to run this. In fact, you can stop a job. You can take the crashes you found from here, and then run those against the program in the AFL corpus minimiser it to go ahead and start the job over with really grand results. Well, hopefully grand results. There's another part of AFL called the test minimizer. Which what that does is it actually just focuses on a single file. So let's say from my new end directory, let's take a small archive dot cpio. What I can do with that is I can run AFL t men for test minimizer I say what files I want to start with? I want to start with small archive dot cpio. What file do I want to end with? This isn't a director, you're creating this as an actual file. So we'll call it new in new file. And then you write the path to the program you want to fuzz. Now what that's doing is that's actually clearing out bytes that don't seem to do anything to the program. Additionally, what do I mean by that? Well, if you look down here at the bottom, it says, file size has been reduced by 35%. It's been reduced to 332 bytes from wherever it started. I'm not sure where that was. It determined that by actually running that file 340 times against the program to see what can I remove without affecting the execution chain made by the original file? And this is meant specifically to make all your test cases smaller. For the reasons I've mentioned, smaller test cases and more targeted seed corpuses are the best way to get fuzzing results. So you can write a quick script to run AFL t men against All of your current crashes. If you want to start fuzzing again, and basically have a more successful, well a higher chance of success finding crashes. So that's all for this demo, I'm going to cut back to the presentation now. So that's how easy it is to get started with AFL. Like I mentioned before, AFL plus plus is just a fork of this, the actual fuzzing algorithm has been updated very slightly. So you can get the same speed and performance out of AFL or AFL plus plus. But AFL plus plus has some added features that aren't exactly built into AFL. You can make them happen, but you have to tweak it a little bit. But like I said, the works already done for you with AFL plus plus, another benefit to it is that it's an active community. So if you find something interesting, you can contribute it to the project as well. And the pull request will probably be accepted depending on what the feature is. So some of the things you can do with AFL and AFL plus plus is you can do what's called blackbox or non x86 fuzzing, those are actually both done the same way. All that means as you don't have access to the source code, so AFL will fuzz it anyway, it will use a feature of qemu called user emulation mode to not only execute the program but to determine when a crash has been found. You can also do parallel fuzzing again you can do this in both AFL and AFL plus plus. And all that means is you can have one main fuzzing job which is called the master and then you can have several slave fuzzing jobs, as many as you want really the master fuzzing process that fuzz is deterministically, meaning it's making the changes in AFL algorithm that you would expect. It's using all of the stuff we've discussed previously. But the slave processes are actually throwing randomized input at interesting looking test cases, mutating them randomly. There's other things you can do in AFL like you can make graphs or add ons and plugins for AFL and AFL plus plus, overall, even though the last demo I did was technically AFL, I'd recommend you go ahead and switch to AFL plus plus unless you really are in a resource constrained environment. So I want to talk a little bit more about non x86 fuzzing. And the reason for this is because it's something that I've worked on with a little bit of success. And this is a cool feature. I started doing this with AFL and I switched AFL plus plus, which is why I kind of use them interchangeably in my presentation because I, I still have processes that are running with AFL, all new process, I start running with AFL plus plus
blackbox. Like I mentioned previously, and non x86 fuzzing are done the exact same way using q emu. It's significantly slower. And there's a problem where if you're trying to do this on an x86 machine, you need all these q emu dependencies. And that brings me to something called Fox something I've called five o'clock shadow. Basically what this does is it bootstraps all of the annoying stuff you need for Q emu already. The only limitation to it is that you need a firmware image that you can extract with bin walk to get it to work. If you have that, you can get started fuzzing with Fox and AFL plus plus in practically no time at all. I currently have a repository of about 280, maybe closer to 300 gigs of firmware images I've downloaded from Linksys and D link and even some open WRT, and even some ACS router stuff that I pass through this to fuzz as much as I can. As I mentioned previously, as long as you have the hardware, you can really fuzz as much as you want. So what I'm going to do now is I'm actually going to cut to that demo before we wrap up. Okay, we're here for the fox demo. Now, sorry about the aspect ratio on this, I'm using a VM that isn't allowing me to go fullscreen in the way I want. So we're just going to work with this. Let me increase the screen or the text size a little bit. So what I've done here, I've talked a little bit about how you can start fuzzing x86 command line tools immediately. I talked a little bit about how you can fuzz Network applications and web applications using AFL plus plus, I don't have time to demo that today. But the other thing I talked about was blackbox and non x86 fuzzing and AFL accomplishes this by using a new user emulation modes I probably mentioned before in the presentation. And using that is really, really simple. The problem is you sometimes need library dependencies for Q emu to in to accurately use user emulation mode. You also need to set a couple of Environment Variables things like q emu LD prefix is one, you have to set that to a directory that's housing, the library files for the architecture you're trying to focus on. And I got so frustrated with this because I came into contact with a library of about 300 gigabytes worth of firmware for things like routers and cameras, and stuff like that. And most of it was either arm or MIPS, but none of it was x86. So I decided to go ahead and put together a, I guess it's really just a script called Fox, which deals with all the nonsense for you the things that you would have to figure out on your own Normally, if you're trying to bootstrap AFL to work with non x86 files on an x86 machine, and so I'm just going to show how easy this tool makes that what I've done here have two firmware images from a Seuss I think there's about the router images. I've unzipped the GTX 11,000 Let me go ahead and go into that directory. And if you ever use bin walk, I've just I've opened this with I extracted the firmware within walk. So you know there's a couple of unnecessary folders before you get to what you need. Let me go ahead and CD to where I need. Okay, that should be it. So to make this easy, if you start in the directory that has the basic home directory files, or I'm sorry, root directory files, such as bin, home etc The lib directory is what you need the most. But if you start here, and you run Fox, AFL and Fox will take care of everything you need. So let's go ahead and look at some stuff here, we might want to fuzz, let's actually have to make the text smaller for a little bit. Sorry about that. So I'm in the bin directory. And of course, there's stuff you recognize here. Let me make it all executable as well. There's stuff you recognize, there's bash, there's BusyBox. If you've done anything in the embedded space, you know all about BusyBox. But if I go to run one of these commands, like let's say, bash, I'm gonna run file on it. It says it's an arm executable right here. Which means if I try to execute it, like I do, here, I get an error saying no such file or directory. Basically, it's telling me that I don't have the library dependencies to run this because, well, I haven't set it up to do that. This is an x86 machine. It's dependent on ARM library files. So if I go into the directory that has the lib file the library file, and I export the environment variable, doing the LD prefix and set it to this directory. And then I use qu arm. To run bin bash, I'll be able to execute the
arm version of bash on my x86 hardware. So effectively, this is all that fox is doing. It's doing a couple of other things like it's automatically using the corpus minimizer and test minimizer to make your life easier, you don't have to worry about that. It's also keeping track of everything that you have fuzz before, like, for example, see what I have here. So I've every time it fuzz is a new architecture, it creates a directory for that architecture. If you look at MIPS, you can see it's named all of the things I've fuzz and it's kept part of their checksum devalue. And the reason for this is because again, if you're familiar with BusyBox You'll know that everything that's a symbolic link to BusyBox has the same checksum as BusyBox. So if you're fuzzing something that's part of BusyBox and you've already fuzz BusyBox it just Fox will just tell you, hey, look, you've probably use something to fuzz this program before it looks like the same program and that gives you information that can help you detail. Well, I guess, improve the details of the fuzzing job that you're going to do. If you look here, you'll see that for this MIPS version of Bz cat, I've got a hold. It looks like 46 unique maybe 47. I think it starts with zero unique crashes. I have hundreds of these now. And like I said, the thing that I found the most frustrating was getting things set up to start fuzzing, non x86 files. So let's just go ahead and show you what Fox does to make it a little bit easier. We're going to fuzz bash, it should give me a warning. So I want to say fuzz new binary directory of binaries. I planned on Adding the feature to fuzzing entire directory of binaries all at once. But the reason why that's not implemented yet is because for reasons we've talked about before, namely that you need to know the format, that the program you're trying to fuzz accepts input. You really should target your input a little bit better than just taking one like use using add add on everything marry. So for example, jesup does require a command line argument. So if you do jesup at, you're going to have more success than not having anything after jesup. But like we saw with SQL lite, SQL lite does not take a command line argument. So if you have that ad hat symbol after sequel lite, you will just find nothing. You could run it for years, you'll never find anything. So that's why that feature hasn't been implemented yet. So let's just go ahead and do a, a new binary. Let's point this to bin bash. There you go. I got the warning saying you fuzz this before. It keeps a list like I said of the exomes of all the files I fuzz Like I said, I have 300 gigs of firmware images. And if they're all running the same version of BusyBox, I only want to fuzz at once. Let's go ahead and override that for now we're going to fuzz it anyway. And I might skip this part of the video because this could take up to five minutes. But basically what this is doing is fox is extracting the what are the architecture of the file I'm trying to fuzz and then it's compiling AFL is q emu mode to run on that architecture. So if I leave this video up long enough, you'll see in this tiny text, maybe I can actually make it bigger now
you'll see that it says arm quite a few times. If I had targeted a MIPS binary, it would automatically compile AFL to fuzz MIPS binaries. I'm probably going to cut the video now and just cut to the part where the fuzz job actually starts. So we don't have to wait on this to compile. Okay, so AFL has been compiled to fuzz arm I'm going to make this text smaller cuz I think you need to see this. So it stops right before that. The job begins and it says, What would you like to make is the command line arguments. Because I'm running bash, I know I'm not going to need a command line argument. So I won't enter anything. But like I mentioned in the example earlier, if I was running jesup, I want to put an ad out here. Or if I was running gunzip, I would want to make ad Gz. Because remember, this AD AD, sorry, back arrow doesn't work. This AD AD gets replaced with the test input. But in this case, I'm just going to pass the input directly into bash. So you can see the corpus minimizers already worked. We start with those same 40 test cases. And it was minimized to 16. Because it's blackbox fuzzing that doesn't have access to the source code. So I determined the best way is just to start with some kind of seed corpus. In this case, I'm using the AFL test cases and then just minimize it per fuzz job, just to get as close as we can to something usable. Obviously, if you are, if you know what you're fuzzing, you can start with something a little more targeted. But if you're going to be right running this for a super long time, these binaries aren't big enough, that's really make a difference. You can start with something like this and AFL will do the work for you always a better idea to start with better test cases, though. So you saw how easy it was before to get started with AFL. And now if you just use this script, it's available obviously on GitHub for free. You can just you can get started fuzzing non x86 binaries, as well as this black box fuzzing anything really you can fuzz blackbox x86 programs using this as well. It doesn't have to be non x86. It's just designed to work around some of the more annoying things about fuzzing MIPS and arm binaries on an x86 machine. And at this point, I've only gone through like 10 gigs of those 300 gigs of firmware. So feel free to reach out to me, I'm happy to give that to anyone. And if you want to start fuzzing them finding some crashes, doing some tests, some analysis on the crashes, that'd be cool. I'm probably going to leave this one running. But that's it for this demo. So that's pretty much it for the presentation. To recap. As a feedback driven fuzzer that every time it finds something interesting with a particular input to a program, it mutates that input and then tries again at a different time. You definitely want dedicated hardware running at as much capacity as you can to be effective at fuzzing. What's great about using AFL and Fox and all this stuff is that there are companies like Google that are constantly fuzzing all the time, and you can't compete with Google. So fuzzing something like phone applications, which you can do with AFL or fuzzing something like non x86 binaries, firmware images, like what I've done here, all of that can be done and it's relatively a low barrier to entry. In fact, everything's free. The only thing you need is, like I said, dedicated hardware, but it is cool to find a bug in some of your favorite programs. And that's pretty much it. Thank you very much.
So thank you so much, Ron, for this amazing talk. We have few questions now that the audience would like to ask The first question is, what's the difference between AFL and AFL? Plus Plus?
Okay, I can answer that. First I realized I didn't say this at the end of my talk, but thank you, everyone for giving me the opportunity to do this. I appreciate that. So sorry, I forgot to add that to the video. The difference in AFL and AFL plus plus, AFL plus plus I think I mentioned it's a one point in the talk. After McCall Zalewski stopped contributing to AFL he stopped updating it. AFL plus plus is a fork of that. And that team, it's a group of people who are passionate about AFL and kind of buzzing in general. They've added a couple of commonly asked for features into it. So for example, network fuzzing with premium it used to be this big work around AFL you had to download AFL you had to edit some of the code you had to download another program called pre need to get it to work with AFL and AFL plus plus is a fork of FL that's that has that built into it automatically. Excuse me. They claim to have made some differences in the fuzzing algorithm, but it's not a noticeable difference. They said this was for performance reasons. I'm not even positive what all code they changed, but effectively, it's the current active version of AFL. major difference if you're doing something like I mentioned in the video in a resource constrained environment is that AFL plus plus is much bigger than AFL but nowadays, that's typically not not an issue. It's not so huge, you can use it.
Alright, there's another question.
So the person who missed the initial part of the talk but he has a question, Where is the repo official fuzzing input?
when you download AFL
Sorry, I was reading the question. So when you download AFL if you go into the AFL directory, there's also a test cases directory
in there, there's a bunch of other sub directories that are there basically the whole different kinds of files, zip files, image files, video format files, SQL files, text files. And in that script to that played in the, in the talk, all that's doing is looking into that test cases directory, and then recursively, looking into that test cases directory, and copying everything that's of type F, just a regular file type, not directory, and copying it into the in directory. So those are just for demonstration purposes. Those are the test cases given to you in the initial aflw repo.
I think I also mentioned near the end of the talk that it's always better to have a more targeted c corpus whenever you're starting but
for this to make it simple. And at the end wherever I'm closing, you know, firmware images or firmware binaries that are really small. I'm just taking all the test cases AFL already gives you and then throwing them at the at the bite binary.
We have another topic Question. Sorry. Did you watch the talk yesterday on binary recompile? It seems like it could be used to add the instrumentation that the AFL compiler adds, but right into precompiled binaries.
Honestly, I didn't see that talk, which is a shame because it sounds really, really interesting.
So I'm not positive about that. I've spoken with someone. A while back, we were talking about kind of utilizing fuzzing for CTF stuff for a CTF team we have, and they've mentioned using symbolic execution to help with fuzzing. I don't know if that is going to head in the same direction looking at the question.
I'm not sure off to off to look more into that. I'm not positive though.
All right, no worries about that.
Well, if the audience has any more questions, please send it across in the middle Fix check. And for now, thank you, Ron, for this wonderful talk. It was really amazing. For our audience, please do check out our schedule. We have multiple workshops and villages running in parallel. So yeah, we will be back in another 10 minutes for the next talk. Until then, stay safe. Thank you so much. Thank you