A Decepticon and Autobot Walk Into a Bar: A *New* Python Tool for Enhanced OPSEC
11:54PM Aug 1, 2020
natural language processing
Welcome back everyone to hackers on planet earth 2020 Welcome to day eight of this incredible conference Glad to see everyone who tuned in from around the world. And we also thank all of our attendees. our attendees are the only ones that can ask questions of our presenters, please feel free to join us in the matrix chat for that. As a reminder, we are still fun raising funds for E FF please, if you haven't donated yet, please do so through the link on the screen. And now it's our next talk. When we see the terms natural language processing or machine learning often our guts are correct and frequently it is vendor marketing material, and more often than not, it contains Fudd our speaker Joe gray served for seven years as a summary navigation electronics technician. Currently he is a CRN senior Osen specialist at complex. Please welcome Joe gray for his talk a Decepticon and Autobot walk into a bar A new Python tool for enhanced appsec.
Thank you very much for the introduction. And I would like to echo that you totally should donate to the E FF. They're a great organization, especially with the work they do with privacy. One of the first things I do whenever I get a new cell phone as I get there, come back with a warrant sticker and slap it on the back of the phone. So you should totally do it, and totally get those stickers as well. Um, so this is a Decepticon an Autobot walk into a bar. It is my new Python tool for enhanced appsec. The way I'm presenting this today is different than one might expect. I'm coming at this from the perspective of you do OSEP or you're doing something and you need to maintain a sock account and you don't really want to spend the time or you don't have the time if you have the same problem I do taking the time to manage those accounts. As a way, initially starting only with Twitter to be able to do that anonymously, so that you can have a profile that is not absolutely bare. So about me by day, I'm a senior Osen specialist at complex. Honestly, I'm not a Python guru. This was a labor of love in many ways. I, I learned a lot from it. This is actually the byproduct of me working through tutorials in different Python books and, and some machine learning and natural language processing books. The demos that were provided in the book are good, but I'm one of those people that I need to build something to really get my head wrapped around it. That being said, I'm incredibly interested in data science, specifically things like machine learning and natural language processing. I see what they can do for Osen and social engineering and I'm just trying to get my bearings with it. And that's what this utility was built for also about me. I'm a frequent competitor in the trace labs missing persons Oh cent search parties. And in the most recent competition, my team, the passer Inspection Agency got second place. The next competition is one week from today. And we're hoping to get first Fingers crossed, but we'll see how that pans out. And I'm very passionate about things related to OSEP and appsec. I see those two as the Yin to the Yang. So just to be honest, I'm not a programmer or developer, not a mathematician or a data scientist, and honestly, I'm not an expert when it comes to trafficking and domestic violence, which is an alternative use of this tool. Alternatively, it can be used for someone to abandon their account. So my degree is in it with focus insecurity. The highest math I have taken has been basic statistics and college algebra. I'm currently taking And some stuff on EDS, but I won't bore you with that. So before we get started, I want to provide a few definitions, just so that we all kind of understand when I use the term what it means. So since that's information gathered in an intelligence context, from public sources, oftentimes, people, especially when we're talking about the people who sent the people spoon feed it to us, and it's something that we really don't have to work too terribly hard for. In some cases, we have to draw conclusions and cross reference data and what have you, but in large, everything's from public sources. opsec that's operation security. Basically, that's your ability to hide masquerade or confused a potential adversary, based on what you are actually doing. So if you're on vacation in the Bahamas, to picture to your Twitter, with some with some EXIF data saying you're in Seattle or something to that effect. That certainly works. And I do know that Twitter and other social media platforms will remove that EXIF data. But that's just an example. Decepticon itself as a term I've been using for the last couple of years in doing presentations about opsec through disinformation and deception, machine learning. It's some vendor buzzword bingo, but at the same time, the official definition is it's the study of math and the algorithms used to improve automation. Data Science is the amalgam of several disciplines that that use use processes, programs, and algorithms to gain insight from data. And when we're talking about insight, we're talking about things that aren't necessarily visible to the human eye. Official intelligence that's some more buzzword bingo, but basically, that's the study of any intelligence sought to be shown by machines through programming. And we're trying to mimic what is called natural intelligence, which basically is demonstrated by human or animals
and a GPU, a graphics processing unit. Basically, it's a specialized system within a computer system that accelerates the processing of images or video, but it is very commonly used in machine learning to speed up processing. They did. For example, I'm running this on an Nvidia Jetson nano. The the board itself has four gigs of DDR four, but the memory contained within the GPU is DDR six, so it's quite a bit faster. So in a general sense, the problem that I started with was one of the leading ways that adversaries find their ways into the lives of their victims is social media. So it's just like a hammer is a hammer a tool or a weapon. And really, that depends on the intent of the person in control of the hammer. So in this case, this was my initial plan for The tool I was going to create it, and we were going to use it for people to abandon their accounts. But as I started messing around with the code, I was like, you know, I could probably manage a few sock accounts with this as well, which would be helpful in my recent and social engineering endeavors for the ability to gain access to groups to learn more, or to be able to browse things without worrying about my regular accounts showing up in people they may know or suggested people for them to follow, depending on how those algorithms are written. So when we talk about adversaries in the lens of the initial thought and idea behind the tool, basically, we were looking at malicious people seeking financial gain or to harm the victim. It could be abusers traffickers. Take a drink if you're playing the buzzword drinking game, because of nation states. It could also be political opponents or for someone with very weak opsec. Someone who doesn't understand that just because Facebook asked you how you feel today, or what you ate today does not mean that you need to publicly put a picture of your broccoli casserole on Facebook. So those become victims of opportunity, and the adversaries that just happen to stumble across that become adversaries of opportunities. So when we talk about victims, we've got public figures, victims of domestic abuse or trafficking. And the final example from the previous slide, people with poor or misguided opsec, a friend of mine from the Navy, and I were having a discussion not too long ago, and he was like, I don't have to worry about that. I use a VPN. I was like, I'm, I can still be attributed to you. And he went to tell me which VPN he was using. And I was like, here's a link to an article about a data breach with said VPN, like well, I also use a specific DNS provider and I don't remember what it was. It wasn't quad nine or anything like that. And I was like, Still not how it works, because that's just not the way the internet was designed. So people with good intentions can have misguided opsec. So, some of the advice that I've heard, especially for like abuse victims, lately say just abandon the account. Well, that's not always feasible. It can cue the adversary in that, hey, this person's not using this account. Now, let me go check and see if there's another account that meets the criteria of the victim.
People have to use social media for work and honestly, I have to use social media for work but at the same time, I've created accounts solely for the use at work, but not everybody segregates things for me personally, I one of the Maxim's that I live by in terms of social media, and my day job is, as long as I'm a colleague of someone's I will not send them or accept a friend request to hear from them. So some people, the first thing they do, hey, I finished onboarding yesterday. Let me send friend requests to everybody. That's not that's not smart to do to begin with. But from an optics perspective, it's definitely not smart to do. And honestly, why should someone live in fear? Why should your sock account have to live in fear of getting shut down because you don't have the time to manage it? I created a Twitter not too long ago for something I'm trying to get off the ground cold recent news monthly. I followed 16 other Twitter accounts. I never tweeted, I never sent a DM nothing. And the account got suspended, presumably because it had the word news in it honestly. But why if that were a sock account, why should I have to worry about it getting shut down because sometimes the sources that people use journalists pen testers Osen for threat intelligence professionals, they have to manage and for lack of a better term, babysit those accounts, to manage them and make sure that they stay safe. And then honestly, some victims, they're not aware that there is an adversary until it's too late. dissecting this a little bit more. Well, why don't why don't we just block the person? Well, they can continue to cause trauma because they can create fake accounts, they can create an alternative account under their real name, and leverage things like friends of friends. If the victim is using an alternative account, then the abuser could go through and report those accounts for being fake and get them taken down. That being said, I don't know how aggressive Facebook would be with that because I've reported several accounts that appear to be involved with child trafficking, and they are still there. They people can still reset passwords, especially if it was like a long term relationship. A relationship that was very intimate. The abuser could potentially know all of the password reset questions which that that amplifies some of the wisdom that we're hearing in the opsec world now about totally lie about your password reset questions. Don't use your mother's maiden name because honestly, I can just hop on family tree. Now our true people search and get that. And from a corporate perspective, companies can still be exploited. So as an OSHA investigator, I'm always skeptical of subjects that don't have a presence, more so than those who have some presence and subject I prefer to use the term subject when talking about people and targets when talking about companies. That's just trace labs calls it subjects and I do quite a bit of stuff with that team in terms of training and competing and contributing to their guides, but at the same time, I thought about it and feel a little bit less slimy. Going through and searching through social media using the term subject as opposed to target. But at the end of the day, why should we not have autonomy and agency over what's posted with minimal effort. And honestly, that goes for a victim as well as a sock account. Because if we want to maintain a sock, for example, we want to create a sock account that has a specific political leaning. We can go find some of the influential people that people have that political persuasion follow and insert their handles within this version of the Decepticon code. And at an interval, this code will go out, read their tweets, read it through the model, and post that is pretty good autonomy with minimal effort. So here are links to the repos the two different I've got two different forks of the code there. So if you do the clone from GitHub, basically what will happen is you'll get the entire Decepticon bot. The bot itself in the tensor directory is the one that will read from your account and tweet in your likeness. The one in the SOC edition will tweet from the likeness of others. I do realize that I need to go in and remove the line where it pins hashtag Decepticon and mentioned hope, the conference. So I'll be making that change to GitHub as soon as this presentation is over. But
you can very easily run with that. So about the code itself, it's written in Python. I've got several versions that I'm working on. The version that's published is written with TensorFlow and I hope it's pronounced Kerris could also be cross. Not entirely sure on that, but it's spelled ke e ra s. Oh no, that part's correct. Using this specific Nvidia port of TensorFlow version 2.20. The other one uses pi torch. I'm not discussing it in this presentation, I will explain why a little bit later. In terms of organizing the data, I use Python pandas package to use data frames. I don't use Jupiter notebook, but if you do, a Windows will show it to you kind of like a spreadsheet. So it's just kind of easy in that regard. For me, there are other ways to do it. within it, I'm using a long short term memory model to generate the text and the terrorists package is the vehicle for that long short term memory model. So, what is the long short term memory model in short as a type of recurrent neural network? So, a recurrent neural network basically takes into account past decisions to influence the outcome of the new decisions. So if you are following certain accounts. It's like the time a few years ago, that someone was just reading tweets and sentiments from all over the place. And people found out about it and started posting in those places. And basically, they the bot that was doing all of that ended up becoming racist, because of all the people posting racist stuff to just as a prank. So a recurrent neural network is subjected to that. But basically, they have the capacity, the capacity capability to remember previous things learned and that saved in a model file. So what's beautiful about this is the vector size is not a fixed size, which makes text speech and images processing through LST M's and Rn RNNs very ideal because we're this particular version of Decepticon I have plans to write it for both Facebook and LinkedIn. Later times, I just have to find the time to do so. And the thing about it is the Twitter version. Okay, the vector could be up to 200 tweets up to 280 characters. But whenever you start including things like mentions hashtags, leaks, and the code actually scrubs that out, then it certainly changes the game quite a bit, because nothing's going to be of a direct fixed link that and if anyone's actually analyzing the account, they are going to make note of the every tweet as precisely 273 characters. So with that, an LS tm is going to learn those order dependencies and sequence prediction. And, admittedly, tweets are a very small data set. The API has a limit of 200 tweets to be able to that you can pull down if you have a rate limiting, sleeping turned on which the code does but even to an Or tweets is a very small data set for this. So the accuracy is not there yet. Um, and as I said before LST M's are very popular for natural language processing because of the sequence prediction and dependency analysis or another piece of code that I'm also working on. That's a fork of this called intercept ICANN that would attempt to identify bot behavior, it would look for the dependencies and errors within the grammar associated with poor translation. So basically the equivalent of not being a native speaker and relying on Google Translate. So how does the code work? We start with a set of Twitter API keys and the Python Twitter package that's going to allow us to authenticate to Twitter and pull down the tweets for the user. By default, the tool is only going to read those of the owner by it has been modified to read anyone's tweets as long as As the person or account tied to the API can read them.
So you can't go reading protected tweets. I haven't tested it for reading protected tweets for people for whom I can actually read their tweets. That would be a an interesting case study as well. But nevertheless, it, it's a potential limitation. If you don't have the sleep on rate limit enabled, you will pull down about 30 tweets at a time. If you have it enabled, you can pull down 200. If you have access to a larger API, like an enterprise API, or the firehose API, then the 200, as far as I understand, goes out the window. There we capture the text, we parse everything out, we parse the text of the tweet into one column of the data frame. We capture the time and converted to epoch time and save it to another column. And then we measure the length Diversity, that's basically the ratio of the words used to the length of the text. So it's not very important in this particular iteration, it will be more important in future iterations, as it will measure things and ensure that the text is within tolerance or to make sure it's within tolerance, but at the same time, not too perfect. From there, we tokenize the tweets, which basically just means we separate it into words, so that we can create our corpus and a corpus in the world of natural language processing is just a large body of text used to produce something. So I called it the bag of words, the variable within the code is Bo W. So we collect some more stats, some things standard stats frequency analysis, in terms of the most commonly used hashtag the most commonly used links, the most common accounts mentioned and then Also posting a blog post in a role is very important, and I'll explain why in just a second. So then we move into the Generate module where we establish our vector sizes and sequences, we measure the number of patterns. To accomplish this, we actually convert all characters to numbers. And that's how the model actually works to works to establish the sequences and do the predictions. So it runs through pre modeling, that's going to do the model. It's going to create the model file, it will set x y values. And then from there, we're going to move into the trainer module which creates a pseudo random seed based off of the information contained in the corpus, and from that nature, query creator actually executes the model assesses the predictions, I've limited to 40 words. So anything more than that tends to go over 280 characters, incorporating spaces and all of that stuff as well. We have a logic check next. And that's where we do that to verify the to the 40 words, we remove any leaks any known characters to occur. If you have been following along on my C underscore three p Joe account or the DC 865 underscore owl account where I've moved the tweets to, you probably saw a few tweets, that's nothing but apostrophes and commas. And then there were a few that included brackets. I've written logic in there now to remove them. And then after that, it tweets. So the reason that for the posting interval, it's not mentioned on this slide, but posting interval basically as a measurement statistic measurement of the Epoch Times and that is what determines how long the code will wait before It acts again. So I've got a demonstration on the next, the next slides for the demonstration, but we'll take a look at the code first. So we won't go through it too fast. But a big thanks to all the people who helped me out with either figuring out some of the Python stuff or some of the logic stuff, giving me insight as to how things worked, as well as the people we're about to see defined as users. So this part, let me go to the bottom first, before we start defining modules. So with this, we get the stop words that's not used in this particular iteration, but it could later the director we just determined what directory we're in, we check for the existence of files, that's where we're going to save the model files and that's also where we're going to save the CSV. That is the output of the data frame. So basically, at one point, we are going to write the data referring to a CSV and save it there.
There we check for the presence of a GPU. If there is a GPU then we set the memory growth there. And then from here we have defined the API. I've got a separate file where we put the API keys in. That's handled in the imports. We defined the module as the model as sequential, that's for the LSTM. We'll do that a little bit later. And then the results would be get the user timeline. It's just get timeline. If it's an actual. It's the your account. So then going back to the top. First we log in, I set the global variable for the data frame, we move into the directory. So that's just to make sure that we are in the files directory, and then I set users as a global variable. In this case, I don't have the accounts listed here on the GitHub. But the accounts that I referenced with consent of the account owners were Dred jack, the wahba, myself, no stia Curio and wonder Smith way. So then we check to see if the CSV exists. And if it doesn't, then we define the data frame and move into tokenisation. If it does not, or I'm sorry, if it does, then we read it. And then we move into tokenisation, which is here. Bo HBO MB ol bag of hashtags bag of mentions bag of links. So I'm really not creative when it comes to variable names. So we just get the user timeline. This is for you in users. So you have to have something defined above for this to work, pulls down the tweet, does a few regular expressions to pull out hashtags and mentions adds them to the back Which will append them to the lists above. Then from there we move into the hopper, we do the same thing again. But at this time, we measure epoch, time and lexical diversity, substituting out mentions and hashtags, and we append everything to the data frame there. Once that for loop is done, we drop duplicates. Because if we don't that file size is going to get unmanageable very quickly. Then we move into the sorter. The sorter, basically is just going to convert all of the rows containing tweets into a list, we tokenize them. From there, we append it to the bag of words, we write the data frame to the CSV. We do our stats on it, which here, as you can see, we're doing lexical diversity, we get the mean of like lexical diversity, we get the standard deviation of lexical diversity in times. We determine post interval. And then from there, we do a little pop up on the screen about that. We do the frequency analysis. So it's just showing the hashtags links and mentions. And from there we move to generate. Here we define characters. This is where we convert characters to numbers, we determine the input length, the length of the characters that spits out on the screen. From here, this is just defining a few things before we get started. Move into pre modeling. So within pre modeling, this is creating the model files as a JSON file and an H five Fs file. we compile using cross categorical cross entropy and RMS prop for the optimization. Then from there, it goes through it we do it with the GPU if it exists. Then from there, we run it for seven epochs and a small batch size. I've tinkered with the sizes for the size of the lsdm. And the batch size, to be able to try to prevent memory from getting out of memory vulner, I'm sorry, out of memory errors, but it definitely exists. Then from here, if not, we move into modeling. Here we define an LST m of 160 bits, looking for a 60% dropout, looking to return the sequences, so we just add that a few times, activation softmax compile it, as we saw above, there's the file path for it. The checkpoints are going to show the file path the loss with verbosity, saving only the best. And then, here's where we actually kick it off again, x and y as we saw above, with 70, parks, same thing there.
We will load the weights from the file name, that's going to be the h5. fs file. Everything else here is the same goes into the trainer. From here, we're going to have the random seed. So just using NumPy is random when a random interval here, then the tweet creator. So we move all the characters there, from the bag of words, convert them back to characters from numbers, we create the process. So it's just going to run through this is based on data from the LS tm, we go into clean up, this is removing leaks, erroneous characters, as you can see here, mentions hashtags. These are all the things that I observed. And then right here post update, it's where it's actually going to post and then the repeater right here post post interval and then I set a random interval between zero and 480 seconds to subtract From post interval, just so it's not perfect every time, even though every time you tweak the post interval is going to change to a degree. And then for subsequent, it just reloads everything else and then moves into tokenisation. So that's the code here is the actual demonstration of the code. So I'm going to speed this up a little bit from time to time. So the codes just kicking off. So it's connected to Twitter receiving, retrieving the tweets. There we see. Organization, lexical diversity, there's our post interval. Here's the frequency analysis. So the accounts This one's looking at. It's been using hashtags for DEF CON memories EF f 30. Oh cent Osen search party have sent the planet Decepticon so forth and so on. It's been mentioning Kyle, bub No stia insock, Cory Doctorow, and so on. Then we see the number of characters and vocabulary. So this is just everything running through in the pre modeling this version. To be able to get a solid recording, I turned the box down from seven to three. You can use ever how many you want, the quicker you want it to pump out the fewer epochs you probably want to use given the data set size. I did edit the time in between each epoch takes between 350 and 450 seconds right now. So right here we see where it jumps really quickly. So within the demonstration itself, we have about another minute or so. So as we can see here, eta six minutes and it goes down to 10 seconds. That's just an edit in the interest of time.
So as this finishes
We have the random seed created, creating the tweet.
And there is the tweet. Oh, I needed to do GP update before they went to bed. I have a pretty good feeling as to which account said that but anyway. Nevertheless, this is just reading from those accounts. Is it perfect? No. is it accurate enough? Quite possibly. So then from here, everything I'm showing you is off of j top. So this is before the epoch starts, we see a high memory utilization. In some cases, it ate up a lot of memory. So as you can see here, I've got 50 gigs of swap. And I'll show you once we go talking about the platform, how I accomplished that as well. But as we see here, everything's growing. The GPU is starting to rev up
And there goes the GPU. So at this point, I can tell you it is in an epoch.
So for six minutes, everything's just going to, it's just going to tax that GPU, very hard for about six minutes. We're not going to watch it for six minutes. But then from there, just looking at other stuff within this, um, oh, that was showing that it was going through an epoch.
This is just looking.
So next we're going to look that's at the GPU stat usage.
That's it, the CPUs, all four of them. It's got a quad core arm
Apparently my video froze right here as I was exporting it. There's really not much else to see. It's just showing data about the fan and what have you. So we're not missing the meat and potatoes of this. So basically the project started around December of last year, I ran into a lot of shiny objects, a lot of squirrels, a lot of distractions and delays. And I had written a few things before like wiki leaker and the Recon ng g module. I've been reading about the stuff as I said before, and that's what kind of triggered it. The project stalled pretty good around March. But I I joined Brian brakes, seasick East group for a to give a talk. And one of the other presenters was talking about the Jetson platform and I looked into it, I was like, Hey, this is cool. So I ported the code over because before I was doing it on VM with three cores, eight gigs of RAM, just a standard Radeon GPU on a MacBook Pro. And to be able to get through one epoch took about an hour. This is what the Jetson nano platform looks like. The the pictures on the right are the bare bones platform on the left, that is with a T 300 card attached to it using a USB bridge. On the bottom, you'll see that there's a solid state drive connected to it as well. So I've got the main operating system of the Jetson running on a 256 gig microSD card on the mainboard with right now I think I have my one terabyte solid state drive in the bottom. So that works out for utilizing swap or anything else as necessary. But yeah, definitely, it's definitely efficient. Basically the specs on it is it has the quad core ARM processor with Tegra support and the GPU has 128 CUDA cores.
This is the
Same case that I use, I don't have the Wi Fi antennas or the camera attached to it. All in all, I have about $250 wrapped up in mine. But the Jetson platform itself only about 100 why I went with that very simply, it was external to the regular host, it had the GPUs with the CUDA cores. So it was able to expedite that analysis. And then someone had mentioned it to me, so I decided to test it. So lessons learned. Data Science is not a walk in the park, especially if you lack that heavy math background. Pi torch is faster than TensorFlow, but as I have tempted thus far, it is less reliable in output. I'm sure it is an operator error issue, but it's definitely something that I wasn't able to get working yet. That being said, you may if you attempt to play with this, you might have far better luck with PI torch. I do plan to eventually ported over to pi torch and have subsequent versions But just not there yet. So keep in mind that if you test this on your regular account, be prepared for a few things. Be prepared to lose followers be prepared for people to point out Hey, this is annoying or obnoxious, borderline spammy. And I mean I, I hold no ill will against that because they were included in the statements the people who sent me those DMS and I actually appreciate them coming to me as opposed to just blocking me following or unfollowing me or reporting me. So keep that part in mind. If you don't tweet Frequently, very frequently, you're going to have to edit the code to Put into stuff for
Edit the posting you rule which is what I've done for this,
that and most of my tweets have been, hey, I'm offering an investment course on such and such date. So My bot started offering courses from the past as well as coming up with patterns and offering courses in the future with non existent dates. So I also ran out of time. like to give it a data set to work off of tweeted in 280 characters chunks. Some poems like Jabberwocky, the Ravens still are We wear the mask. But then I also out of jail Did Smash Mouth All Star Rick Rule When you're dealing with this kind of stuff Prepare to do a lot of high prophesizing and a lot of AV testing, and as I stated before Twitter tweets at around 280 characters each is a very small data set for LSTM as of right now, the codes written to make use of GPUs and CUDA cores. If you do it on a server or a VM It can be very time consuming leasing GPU processing In the cloud is also Very expensive And it Again, the model is dependent upon what your account has posted, or the accounts that you're pointing. It at has posted if you if You or the subjects of those accounts of wipe those accounts. Or it's a fresh account like for a sock. You're going to need to get some Something posted for it to work. So you could write some of your own stuff or just immediately jump in and edit the user's line. I think it's mine 20 Nine to have a take a look at a few accounts. I don't recommend doing more than about three to six accounts just because of the size of the file. It can get tricky really quick. That being said, I'm going to open it up for questions.
back with Joe gray That was really fascinating. Talk so
Sure Can you hear me okay? Yes I can hear you just
Right. So Do you have some questions from the media Trix chat Again, the only way you can ask a question Have any of our presenters Add hope 2020 is By being an asset In the live stream chat channel. Our matrix instance. So, first question. You mentioned the tech lab. Oh center That's us. Our questioner has a general mistrust. The police but these events do sound fun. Do you know more about how they play Partner with law enforcement
I Am not astounding. With trace labs, so I can't Provide an official answer. My understanding is It's very similar Like a call for paper Kind of like a call for subjects I will say that in those of The bulletin provided to work off of typically comes from NamUs or the country's equivalent Something like NamUs or the Center for Missing and Exploited Children something to the effect I'm Ben Ben Basically law enforcement themselves. May compete but Research Labs acts as a conduit in In between the two because As all the information is submitted The judges Staff take Some time to clean up the data dude Duplicate the data, assemble it. That's when you see their stats posted about the numbers of pieces of intelligence. Then it's handed off to law enforcement. To my knowledge, there's no way Law enforcement is provided. A name to be able to come back to And I'm sure If something like that were to happen, it would be Law enforcement would go to trace labs in Trace labs would ask for consent. The user but I don't know of any situations that that's In the case, but again, I'm not A trade libstaffer so I can't speak Officially on the topic Okay
Have you considered Adding
functionality to your bots that could The generation of seemingly convinced Images based on previous posted Media user has already posted
considered that Something I do want to play With a for both of us And optic imagery. I've been looking at some Stuff put out by I think it's the University of Chicago. I don't recall
it's called Fox
Is I've got it pulled up. It is At the University of Chicago, it's called Fox as In guy Fox It's a cloaking So I've been looking at that from the outside. perspective by
Like I created Images
Sir Certainly is within the realm of power ability you In the future iteration My next three iterations I've kind of already started drawing the diagrams for include the capability of responding to DMS, replying to mentions and retweets as well. Yeah, I see I see a question. Have what was the daughter car On the Jetson zeta interface. I'm gonna get to that.
Yes. My apologies.
No worries. The other card is called a t 300 m. So If you look I've actually If you click that Amazon ID list right there. It has everything that you For the entire getup, including The Jetson The T 300 card The Solid State Drive Micro SD card Everything you need Except and actually I may have Have you been put The Wi Fi antennas all you need to know Don't have them.
Well mean this time convinced me to Finally, finally men show up Get a Jetson Am I To work with for some projects I'm really impressed by that. If anyone from
Nvidia Listening Shoot me An email to send royalties.
Now you're talking also about the lack of math. background for data science and machine learning. Do you have any recommendations for something Want to get up to speed on on data science? Machine Learning Resources for individu duals that are just starting out in these Field
absolutely I'm Turning around to look at my bookcase Can't be seen even If I had the camera on because Have a green screen. That being said, No start. has a natural language Processing book Right now, mining social Media is good As well neural network projects with Python is another good one. It doesn't really teach you how to do things but weapons of mass destruction is a good book for that as well.
Shameless, sorry. Oh, go has about say, yeah. Quick, shameless. Shout
out to our friends at no starch press. They do have a discount just for conference attendees. Look up the wiki for more information on that. On
the other topic of shameless plugs. Order my book due out via no starch press on October 13. It is not available on the search website yet. I have a short link, preorder dot s e o cent dot XYZ, it's nothing but a redirect. Link. It'll take you to the Amazon page. If you're not comfortable with that just search for practical social engineering on Amazon. One third of the book is over sent
Awesome. Now did speak back to the question about tools that tooling that you used. Did you consider pandas or other higher level data science tools, rather than trying to work directly with PI torch or TensorFlow?
I didn't, because I being a relative noob with Python pandas was what I was comfortable with. One of the books that I had been working through was a hardcore advocate of using pandas. So I just stuck with what I knew for now. That being said, if this blows up into something that requires a lot more love, I would be more open to using something else as well. The preorder link, I will put it in the chat. It is HTTP, but that's because it's a redirect domain.
And so again, we're trying to pull up that Pat, that last page there so we can show the GitHub repo Okay,
the GitHub repo, right? Yeah, yeah, that one right there. Go for it. And then I've actually got a coupon code for something that takes part takes place in one hour and 15 minutes if anyone is interested, depending on where you depending on where you are in the world, I'm giving six hours of Osen training starting at
10pm. Eastern time.
I failed to update my slides. So it's 10pm. Eastern time. It's going to be six hours of people Osen it is directly aligned to the missing person CTF. If you want to do it, use coupon code hope 1337 for 25% off.
All right, so
one last question here. Um, in your experiments, how many followers Did you find that you lost or possibly even gained? I
gained quite a few. I'm lost. I No, I would say it could be as many as like 200. I
The one thing I didn't think about doing in the very beginning was to pull down every single follower and do like a diff of it. If anyone else attempts that I would be interested to know their statistics. So given the number of followers I have I, like unless I'm on the mobile app, I can't see an exact number. I only see like 11.2 or 11.3, if I'm on the web, like on a browser, so that made it a lot harder because I was spending a lot less time accessing it via the mobile app. So had I pulled every follower down at the beginning, I could have done some pretty good x, y, or a B testing in terms of who I lost it who I gained. I Oh, actually, we've got like two more minutes. I didn't even show you this. But here's actually the account that I've been posting to this is what it's been tweeting as of late. So cool,
right, here's a rick roll. And there's some stuff about
And there's another rick roll. Yeah,
folks in the chat noted that this seems kind of like epic drunk posting. That's kind of
what it is. Because like I said, I didn't have my, my post interval, starting out was about 12 days. So I was like, I can edit the post interval or I can post a bunch of stuff. So tweeting, all star, Raven, the rick roll all that in 280 character chunks. doing that. I mean, it drove the interval down, but it didn't drive it down far enough. So I ended up having to modify the code anyway. So yeah, it's definitely interesting. Okay,
well, Joe Great. Thank you. for joining us at hope 2020 I can't wait to see you when we can next meet up again in person. Sounds like a plan
and do know that I will be toting around that quart of moonshine I threatened you all with before we came on the air. Oh
gods. All right, that sounds like that sounds like a challenge.
All right Ground Control. Take it away.