Stop Botting My Baby: How to Protect Your New Streaming Platform from Malicious Automation
12:53PM Aug 2, 2020
hi friends welcome to day nine of 2020. We are almost nearing our goal of the fundraising for E ff $15,000. Friends please support and spread the word. We also have a small announcement. We have lightning talks will begin today at 15 a 1500 hours at home. What's the hope announcement channel for more information and links. Our first topic for today is how to protect your new streaming platform from malicious automate automation by Randy zelinsky. He's an application security engineer at Warner media
Ready to go 2020
titled stop biting my baby. It's technically protecting a streaming platform from Microsoft may really appreciate pretty slight about me, an application security engineer for Warner media, sitting on our corrected consumer team, and therefore sponsible for HBO max. Well cover what that is on the next slide. Working on the launch of that recently forms the basis of this talk. Should you be wondering if you're in the right place? We're going to discuss application threats that popped up at launch. Specifically, what seems to be the state of the art for malicious automation? And go over what you're working against? Should you be writing your own thoughts of any kind? earlier in life, I worked for a couple security consulting shops mostly fantastic and different from current work. The word consulting can be kind of a fluffy term, but that experience was worthwhile to see a variety of different Programs at different places and different maturities. This talks title, stop biting my baby derives from a phrase I hear in consulting which equated to going into different software shops and calling their babies hopefully, that's one of you when you first hear it, but not something you understand until you're no longer have security mercenary, and now you're an in house person. sensitive about your own brand and your own apps that do become your baby to protect from the harsh world of computer attacks, Reddit comments and whatever. also spend some time on the offensive side, modding. Not super recently, but if there's a website where you can play flash games and people for money, that's probably a place I've been banned from and another life I was a full time programer after studying at St. John's University, notable as attended physical homophobe 2020. Looking forward to maybe hope 2022 there. And you can find me on the internet by just my last name. However, I'm not a fan of social media. I'm in like, GitHub.
Anyway, without further ado, we'll get started.
Working on HBO max every day, I have an abnormal level of intimacy with a platform. But friends and family still ask me, What is HBO Max, and understandably, there's been some public confusion. The short answer is to say we're a streaming service, not unlike other big names in this space, such as pay flight tv.com or movie veteran answer is that HBO Max is a gateway to all the content under Warner media. That's media organization that includes Warner Brothers. Turner plus hits brands like TNT and then cnn Crunchyroll HBO Cinemax. It's amazing how much content that really is from Turner Classic Movies to Game of Thrones to impending Justice League Snyder cut. We probably all know HBO, the original brands. And because that stands for home box office, it makes sense to me why that's the namesake of our direct to consumer offering. And notably, it's a recent offer. We launched on May 27 of this year, which was a week after I joined the team. I had never worked a big public launch like this. Felt way different from pentesting. Banks or utilities since at&t, ultimately always Warner media, and they've been very public about how HBO Max is a major investment them. There was a lot of pressure on the launch. There. Certainly upside to at&t is investment, like kind of a well funded startup the environment but for launch, it created a lot of pressure. For getting into specifics or war stories seems helpful to do a rough threat model and ask that question of who might attack us. We'll scope this into disk HBO max the application as a wider company, we might be subject to more sophisticated or well funded threats like hypothetically state sponsored Sony Pictures have in 2014. But for HBO Max, we can imagine our threats to be motivated mainly by money. And main threat is credential stuffers, people that are generally attacking our login. It costs 1499 per month to subscribe directly with us. There are many folks who didn't access for free like through at&t plan. But that direct monthly cost something we were criticized for being high relative to competitors. So if you make your living doing sketchy computer stuff, and can identify valid credentials for something worth almost $15 a month, that's an easy sell for a couple dollars at least. You can sell those on various forums or really enterprising and just set up an automatic shop yourself. Naturally, these actors aren't going to try typing in different credential sets okay by hand. So it makes a lot of sense to comment on us. And as a new streaming service, there's a bigger or at least fresher target on our backs and our competitors. Separate but related to credential stuffing would be automation that targets like a gift card endpoint, or something where string correlates to store value. That doesn't exist in the scope of HBO max application. As a wider organization, we had legacy functionality may have been acceptable for max within a certain timeframe, but I'll ignore that for the sake of our discussion. Plus, if you generate gift cards the right way, there should be a pretty high computational difficulty compared to just finding some already breached credentials. Also for for the sake of completeness, it seems that somebody could try to take down HBO max just to hurt at&t s overall stock. You could short it and profit off a predictable segment price. However, this would suggest some critical vulnerability in the app, or more related to cloud security, which put it out of scope.
After thing about our threats, we thought through as modern cloud based text athletes acquired some tools, we can take a closer look at the moving parts of HBO Max and what's going on in the normal user path. If you come on authenticated to our web app, you'll basically end up taking one of two paths. One, you might create a new account, usually also passing us your credit card information. To view login, that could be with a native HBO max account, or through some other provider, like a traditional TV service. Before doing either of those paths, though, you might browse around our content or our help articles. But ultimately, you're going to be pushed to do one of those two things. Or, of course, might be even go to a torrent site that's outside the scope of my job. So those user paths boil down into just three API calls. There's this slash API slash login, same thing slash account. And then slash payment slash methods. login and account the first two are just on one service payment methods is actually on another. And as is common to do now we have constant monitoring of our response code ratio. But most services, or you might just call that heart rate. By far, our either seeing statuses that are 200 something for good or 400 something for bad for errors either related to the format of your request or the actual card information, like a payment declined. So this error rate is a quick way to find out if there's something going on. Usually something that's louder in nature. Then, of course, just be out of that sort of ratio barometer. There's detailed logging taking place, but if you're under attack, it's harder to dig into that quickly. And I think anyone who's acted as analysts with an enterprise log searching program can probably empathize with that, that perhaps those machines are never given enough power to get Do things quickly. Something else to keep in mind is that even though consumers might spend more time with our platform than, say, online banking, the session management is much more liberal. People don't need to log in as much, which means there's less logging of them logging in. And that can kind of, I guess, probably make the way that we respond to things be different than, you know, one of those other types of sites that could be under attack. So launch week logistics, logistics wise, we launched on Wednesday, May 27. Eastern time. All of our apps became available in various app stores. early that morning, the web app was opened more widely and data that was the only switch that swept air For my team, we had a rotating schedule that would ensure throughout the first week post launch, that there was a human operator that is manning the trenches, so to speak. And then all the other supporting teams had something similar place, everybody around HBO max. Also, some of our vendors have their own stocks. So it'd be in like persistent chat rooms with their people, and then whoever else wanted or needed to be in the security room. So basically, everybody was at peak vigilance. What are we looking at during the launch week? Really, we're looking at all those tools described before, and I'll break down in more detail. Now, what that was.
So you might start off, you've got your client base telemetry, detecting some bots and keeping them away. That's not Clearly perfect science. And I see it as a cat and mouse game. Not that most of this stuff isn't. But you've got product in place along the lines of Google recapture, it's just not going to be a silver bullet for every type of malicious automation. So you're also paying attention to things like error rates for individual services or endpoints. As pointed out before, slash API slash logging was our big endpoint is our pig endpoint. And it's also part of a larger service too. It's not just by itself. One trick we saw during launch week is someone would come and for every time they hit slash API slash login, they'd also hit this other endpoint on the same service. And it's not hard to discern they're part of the same service since they're just at a simple host. And this other endpoint is one that I'll return to you 200 status code virtually all the time, you can kind of work out the rest for yourself. The idea being that our error rate alarms won't go off if they're service wide or host wide. Because even if every single request from a login endpoint is sending back a 400, something status, you're getting status from the alternate endpoint. So that enables you as an attacker now to keep control over the ratio of requests coming back and therefore their rate
and thus, not tripped the alarms that we have.
So yeah, that seems kind of smart.
When we saw this behavior, they were just slamming us in traffic too. So even if we were limited to host wide area, visualization and nothing else, nothing more specific. We'd have seen something without just by Count of requests. I guess over a long period of time, you're looking at these tools and kind of staring into the abyss. very normal patterns emerge in the traffic for certain times of days, for certain days of the week, basically comes out and leaves is. So if you throw that off with enough requests during any given time, especially at slow time of day, that's, that's just gonna be obvious even if you are not affecting the error rate in any way. So, beyond that, what do you do, if you sense something like that? Let's say an attacker is fairly successfully tested a ton of credential set on you. Other measures that have thus far failed in client Bob prevention has failed for whatever reason.
You start looking for a common thread. Some of these attributes that can be threads as I call them, hadn't even occurred to me before getting into this position. As alluded to before, I've written some scrapers and some hops on official labor API's myself in the past, but probably not for any sites that have invested as much in security as perhaps more media has. Obvious common attributes for an attacker might be broken out like this, where they come from what they look like how they behave. Indeed, this is how you might get identified as a single actor or a probably single actor, and then subsequently blocked. Because if you can identify these common threads without any modern locking block somebody out Mainly, this is stuff that's noticeable from the perimeter. Given blocking out the perimeter is straightforward. If you find common threads that aren't obvious unless you get into that internal logging, that's just, that's going to be a lot more complicated to put a stop to. And you can really stared at this with your block ingestion tools if you're up to it. So first, where does an attacker come from? Probably the most basic thing under this category is identifying someone by an IP address. You might see an attack that sends millions of requests from a single IP for a relatively short timeframe, like a couple hours. Of course, I guess it's relative to the amount of traffic that your application gets, but most of the time, that's going to be obvious and unusual. So you might say, well, just like me Trust sec. But it that tends to unfold in a whack a mole like pattern where someone's programmed their scripts in such a way that upon detecting a block, which is usually obvious, we'll start getting like, like, four threes or requests will just start timing out repeating really weird. Then you're scrambling to switch to another IP. That's easy on any modern cloud thing. of us. People, many in the perimeter when I started looking at a wider swath of network, either subnets or ASM. chart, autonomous system numbers. If you're unfamiliar with those, basically, an ASN is assigned by the various powers that be to an internet service provider, a hosting provider or server colocation place. Basically, any place that has A lot of network traffic is eventually going in ASM. And then the ASN is is separate and oftentimes more useful than going by some subnet that you drive from a bunch of IPS, you sample it, the ASN, is this just going to be more helpful and wider? And then I guess for both of those, especially for ASM, you can go searching to find out how to look those up yourself. You know, there's tons of tools out there you give an IP address, they'll give you the ASN, I think it's totally just part of normal who is for smaller sites like if you're running your own forum software, there are also public block lists out there, which will more or less ensure no hosting providers or colos. ASM can send traffic to your site. But as enterprise operation isn't likely to be liberal with a indefinite term network blocks like that. Depending on your target audience, even indefinite term IP blocks may make people hesitant. Because in the case of some consumer facing thing, maybe once an attacker does a DHCP release, somebody else could be blocked. That's a legit person. And that's why, with a lot of these wild products and different places, you can install rate limits, you'll see that a lot of them are actually short term bands. So you find a naughty IP address, and then there's a voting block with like 10 minutes of time. So that behavior subsides.
Then, what does an attacker look like? This in terms of their traffic that analysts look at These attributes kind of transcend network origins, but might also be shared by a big pool legitimate users too. So there's that risk level as well. Probably one of the easiest things to see at the perimeter is just request hackers. User Agent strikes me as one of the most generally significant. And then of course, you might have other headers that are proprietary to your app. You might not realize that a whole bunch of things that incoming requests can be used as sort of a computed fingerprint. This might be called TLS hash by the called device identifier. That kind of a closed box system but my understanding is you have all these things taken into consideration. You have all the request hackers, you have different ways that TLS handshake is happening. All these things are computed by your product. And they try to come up with a unique identifier for each user. And the degree of success on that seems to vary. And I guess you could write your own thing like this into a mobile app. But it seems more common to be computed by a product like a lof, does a bunch of magic and then spits out this hash. And this is just an interesting data point. Because, like with a streaming application, you might find that all instances of a specific device like a certain streaming stick, had the same hash. But in other cases, the hash can be unique to a specific, malicious automation script, or to a Docker container. So if you blew the container away, and instead of the same scripts, maybe it's some basic intentional, other differences, then you end up with a new hash. Overall, these types of identifier hash things are probably one of the harder elements to anticipate as a moderator. Then lastly, how does an attacker behave? Again, this is based on the traffic that analysts are seeing. Very perimeter based, as my include what has the request day and in what order how much time there is between requests. And then for things like a login endpoint, or any kind of challenge endpoint, there's not a fixed set of inputs, per se. The security team is going to get an idea of what's unusual, what's not. Like, you can imagine, we see people who seem to get the password wrong. Nine times, pretty quick succession and then they finally get on their 10th Drive that's still 10% success. rate for a single IP, and seems like pretty generous to begin with. So if you're below, you know, say that 10% success rate, that's pretty suspicious. And then maybe we're not talking to one IP, maybe this is like a cross analysis of a whole ASM specific fingerprint. If you're, if the path you're requesting is just heavily skewed towards log in if that's essentially all you're getting, for the obvious like 5050 split between log in and some 200 yield endpoint, I'm going to ask that you're one coordinated attacker. And that also gives me grounds to block you for streaming experience, like HBO Max, as you can imagine, and we're not just taking login credentials and payment methods. Like after people authenticate you we have a whole content service And a lot of stuff comes from a bona fide CDN. But there's also metadata that's requested around that content that we can easily see from the perimeter. And there are some other normal towels from the perimeter too. So if you're just hitting our services that do login payment methods, and then you'd never request anything that's related to content, like you never interact with that after logging in. That's pretty unusual.
Then, I guess, adopt all this stuff. regarding how you behave. There's even more information available from incline on tree should you have it. Often it's not feasible to correlate that with these more perimeter friendly points, at least in a timely manner when responding to anything. But as you're hitting somebody wasn't enough taken, take it out and used against you. The front end telemetry and bot detection varies from product to product. As mentioned before, that's closed, magic box. But my understanding is even if you got some scripted, headless browser based stuff, and you think it's convincing, it's still gonna lack the uniqueness of real user encounters. And even if you're defining some randomness in like, how the cursor moves around or times between things, over a long enough period of time and a big enough sample size, even some defined randomness from a script can become apparent, and we'll talk about that later. So my shift is, here's my personal horror story, tying everything together that we've seen thus far. My shift was towards the end of that first week, so I'd been in our persistent conference call supplementary since launch. But it's different when you're the one person responsible on your shift as the point person. And that's the way our schedule fell as fatigue over like 2am Eastern and be on shift until 2pm, which really like wasn't too bad. You just wake up super early to your statements of choice and your dashboards going. And then you might be alone until eight or 9am when other people start hopping on. And at that point, we instituted a bunch of active blocks on type things we've mentioned so far. So we had a bunch of blocks in place since launch, also had a collaborative document of other potential blocks. And it let's get across the tools and make a case in this time for why certain criteria pointed to an attacker, you blocking them will yield minimal risk to genuine users. So you know that background. And early in my shift only three or four in the morning, we're getting a lot of unnaturally behaving traffic for, from what seemed to be hosting services, particularly switching between two services. And you could just find that in the visualization of traffic. And then, further in the case that this was something weird, maybe 80% of the traffic had a brand new chrome user agent, and the remaining 20% ish cat a slightly older chrome on request for majorly hitting either slash API slash login, or that 200 yielding endpoint. Looks like there's actually some clicking through our landing page to the signup prompt before blasting the traffic slash API slash login and then other endpoints, just how the vast majority of that fell. But interestingly, I guess up to that point, the login requests if you're to just focus on those had about a five times higher success rate than other attackers we'd seen up till then. This is still nowhere close to normal. But it is something I'd have to explain to my team for everyone to feel comfortable blocking their various services out there like, have I been poned or hacking notice that you can use to spot check emails or username is versus big past breaches. So kind of going that route. I found all these successful logins had one particular breach in common, and as a breach that included pretty crackable password hashes, I'm the five so that was that basically made for a full picture. Couple hosting providers, probably driving headless chrome with a script, puppeteer or playwright would pull down from binary and not keep it updated. That kind of explained the user agent versions. And then the success rate came out of having a good credential list of stuff with seemingly one that hadn't been used thus far.
Thank you, Ronnie for your presentation. This is the part where we have an interaction with our audience with question and answers. We have some questions from the audience that we read out the first one to you. Please summarize your two cents for traffic analysis.
So I can't go into too much detail on that. But I feel like we spend our most time looking at a couple of flops that basically show us all the sort of walk discernible details coming in at the perimeter. We also spend a lot of time doing more analysis with sort of Long ingestion tool stuff are really sort of doing a more advanced investigation. We'll go into the end, kind of look into more details of what user identifiers might be linked to certain requests, and then what ties them together.
All right, there's another one of can you describe your interaction with HBO, concerning your war?
Sure. So I guess that's a kind of a general question. I feel like HBO makes up the heart of our content. And we're kind of in this process of switching from like, my official employer being HBO to just that being Warner media and Warner media. And this is one big thing, but I feel like the HBO content is one of the things that attracted me to the company. You might see at this Sopranos sweatshirt. Ah yes, I feel like people are still very attracted to the the HBO content and it gives me some cool pictures but my slideshows.
Alright, there's another question.
Did you do most of this work low? Or do you work with a colleague or a team of people? And you use a revision system to keep track of all your tools?
Yeah, so it's, it's kind of a combination of solo work and working to team. I feel like a lot of times they'll kind of get into this flow state with investigations solo, and then you'll take a bunch of findings and share them with the team and sort of get peer review and feedback. As for like the state of our tools, we've kind of gotten better at revision tracking over time to where like, we can hit the API for a bunch of tools. And periodically we have like a get repo that tracks changes We auto generate a change log based on whatever was changed in the last config version. So we're getting better at that. And I think that's a strong thing you need to keep track of, especially when you're changing tools really quickly to try to respond to attacks.
All right, there's about one, when you when you see an attack come from a particular IP. If you block the IP, then the attacker just cycles to another IP. What if instead of blocking, you just started sending a can static stop response?
This doesn't trip your typical script, because it's still seeing a response. Would that work?
Yeah, yeah. So that's, that's something that that we've kind of done it. You see a variety of, I guess, logic, and the way people have scripted I guess, is things that are attacking us. But sometimes, you'll see like, if we just send back a blank for a three hour, I guess kind of pick your responsive choice, there are certain tools that even that seems like after you've blocked them, they don't realize they're being blocked. I guess if your logic, you know, say you're sending back a blank for three, and the script is assuming that maybe any 400 status response means that the credentials were bad, it'll just keep picking you and try to dump the cloud bill for this attacker. So definitely, I feel like, especially if you're like running your own boss and protecting your own form are some smaller site you can play around with, like these stub responses and see how that affects who's who's coming into you. Because definitely, there's a variety of ways you can respond and some are more effective than others.
There's another one. What advice do you have for home attendees who might think similar issues in their workplace, how to get support and buying from management colleagues, etc.
Yeah, so I feel like I kind of mentioned this a little bit in my talk. But a big part of I guess our flow in responding to these things has been like really justifying why you're putting in certain blocks before we do them. There's a lot of sensitivity to blocking legitimate people that are coming to our experiences. Certainly, we don't want to do that. So we spend a great deal of time recording one why, you know, certain threads that we've blocked on what effectively blocked an attacker but also that they wouldn't someone else. And I think over time, you get better at trying to communicate that to the wider team. Not just like, Oh, we should block these things because we've seen this malicious traffic, but that's really detailing. You know why this is a continuous attacker, you know, maybe these sources are proven to attack you regularly and then that it won't impact legitimate users. So I see that as, as probably like, half of what I do around this stuff is just really communicating that effectively. I think honing those skills as much as just finding and blocking things is this valuable and advice I'd offer?
Okay. You mentioned that you are hiring. Could you give us some details on where the jobs are being posted? Or how do we apply?
Yeah, so I feel like this can be a little confusing. The HBO math jobs are actually pointed, like our warnermedia hiring site. So if you go and you start searching around for like, warnermedia jobs, I feel like they'll find the job postings and a lot of them and then like hyphen, HBO max. I know that I think right now We're looking for someone to sort of lead up an internal red team or a pretty senior person on that. And also another cloud security engineer. But certainly through the end of this year, I think we're considering to continue hiring for all of our sort of internal security based roles. But certainly feel free to find me on LinkedIn, or Brian lozada is kind of our seaso. And it's really purely inspiring stuff. So feel free to find him on LinkedIn, too.
We have one last question. I would request to keep it very short because we are running out of time. You mentioned sensitive sensitivity to false positives, the end users complaint come in, or how do you develop confidence that your users are not locking out of your customers?
Yeah, so on this one, I think it kind of depends on the culture around you're you're blocking Wi Fi. Especially we've gotten better at this over time, whenever we're putting in new blocks of any sort, we communicate pretty quietly to the other teams that might be impacted. And especially like we have a whole Slack channel on our customer experience support team who are interacting directly with customers, whether that's over live chat or phone or, or any of those, those forums. So we make sure to let them know that we're instituting new blocks and that hopefully, we're not going to block an eight digit users but some signs that a legitimate user is being blocked. So that's that's kind of our approach to that is just really socializing, that we're doing blocks, especially to the folks that are talking directly to customers.
All right, Randy, thank you so much for that insightful presentation and interacting with our audience. Our four quick announcement for all our audience who are watching for 2020 we have a lightning talk, starting 1500 us DVD today. Please join me. Thank you so much and we'll be back the next talk in the next 10 minutes.