Censorship Is No Longer Interpreted as Damage (And What We Can Do About It)
4:01PM Jul 27, 2020
Some of the know how gathered and code written throughout this time.
Opinions a voice in this talk, are my own and do not necessarily represent those of my employers. Also I apologize for the sound quality on this recording, started my online adventures as I imagine many of you have, in a simpler time when the internet was this new amazing technology the information superhighway your series of tubes where blog could be set up in a broom closet by everyone and their dog, and nobody could tell them apart. The dog and everyone, that is, although I'm sure there were and are many who still cannot tell a blog from a broom closet. In this cyber Land Before Time, the internet would interpret censorship as damage and route around it. Government seemed comically helpless. As far as anything digital was concerned. Whenever they tried to block anything the internet would get a strong systemic allergic reaction known as the Streisand effect and information would simply still flow.
But that was then.
Today, effective web censorship is within reach not only for China, who is willing to invest heavily into building their own in house technology and capacity, but also to less dedicated but no less censorship, eager regimes like UK, Russia, or Azerbaijan. Case in point, a few years ago, I have been, I have seen a local independent media site blocked completely in Kazakhstan, even though the site was behind CloudFlare how the block was server name indication based, and almost certainly used off the shelf hardware, sold by one of the major companies. Why even decrypt TLS traffic of the domain name is transmitted into clear text anyway. Point is censorship technology has taken huge strides in the last decade, hardware has become more powerful and less expensive and predefined filters are available for sale on the open market. At the same time, and no less importantly, more and more content became accessible solely only through fewer and fewer gatekeepers making making the censors jobs considerably easier,
these are these are just a few censorship or surveillance events or deployments that that we know of, we know that bluecoat has been used in Syria thanks to telecom mix. We know that Huawei is used as the backbone of internet censorship. Internet filtering in the UK. And we know that Rohde and Schwarz sold sold censorship equipment to Turkmenistan, these are these are off the shelf devices, this is this is a business that provides products to whoever can pay for them and the price is not very high.
And the problem, the problem is centralization centralization is the damage to the Internet, and there are multiple companies that are, let's say, most guilty of that.
These are Facebook CloudFlare
and Amazon and Google and Apple and don't get me wrong I am no fan of other big tech companies, but in the context of this talk, these are the most relevant for years, activists like myself warned that putting all your content in the big tech basket will end badly, that this creates single points of failure that will be difficult to work around in case of issues. Today we have specific examples for those companies, and other big tech players thwarting attempts at keeping the information flowing. For example, Facebook, experimented with, with their explorer feed in Serbia and caused independent media sites to lose half of the traffic, just because there were more clicks for the users to get access to, to their to their posts or CloudFlare had in 2019 head on half an hour outage affecting 12 million sites globally that's 80% of CloudFlare hosted sites then due to regular expression error. But of course, Google and Amazon, killing off domain fronting suspicious suspiciously soon after roskomnadzor complained about telegram and and signal using that in using Google's and Amazon's infrastructure. Thus, screwing signal over. Finally, Apple in China banning VPNs from from the Chinese from the Chinese App Store at the request of Beijing. These are. This is a good overview of of possible issues from innocent mistakes through callous experimentation to outright willful complicity. And with the sheer amount of user affected, this becomes a huge problem. Anyone can make a mistake, but if you're handling 12 million sites, then a lot is depending on you not making such a mistake. At the same time, these companies services are billed as solutions to Internet censorship, like encrypted SSI which supposedly solves the issue of SSI based censorship. It can only work with large providers like CloudFlare where an innocent looking domain can be used to front for a blocked site. Thus, it creates an incentive for more websites to move behind CloudFlare centralizing the internet more. This just moves the problem sooner or later, China and others will find ways to influence or pressure CloudFlare into dropping certain websites or not, not providing the service in China. Like they can already clearly influence Google or Apple. It says if we are trying to fix problems caused by decentralized by centralization by just a bit more centralization we need something better than what basically amounts to technological homeopathy. Now, I am making a few assumptions in this talk. First of all, I am assuming we're not dealing with a the guardian or Washington Post or New York Times level of or amount of traffic. If you're dealing with that you probably know more than I do and I would love to talk to you. Secondly, I'm assuming we're dealing with a public site, and one that is hosting reasonably static content meaning visitors don't need to log in and content is not bespoke or dynamically created per visitor or per, per visit. I'm also assuming that there's some technical capacity in the organization running the website. So that tools or techniques or suggestions that I will be mentioning can be deployed and managed in house. As always, there are no silver bullets. Everything comes at a cost, everything is a, is a trade off. And finally, I am assuming that we're trying to deal with targeted web web blocking. Not just all are shutting down the whole internet in a particular region. If the internet is down. The internet is down there's not much. There's not much that can be done by means of deploying something on your on your website. Let's talk about the usual suspects. Obviously the Tor Browser VPN siphon
four ways for over ways for determined users to access blocked resources. They're invaluable and effective. If you can, you should absolutely by all means deploy Tor hidden service for your site, but they do not work at a scale it is not reasonable to expect the whole population of the country to download the Tor browser or use VPNs. Instead, I want to focus on strategies website admins can employ to make their content available to visitors without requiring those visitors to do anything to download a piece of software or to change their browser settings or, or to do anything different than what they what they do. Usually when they browse.
When they browse the internet.
And they get in a negative way.
And in fact, it self hosting might even improve the loading times of your website. Also, your no script using visitors will appreciate greatly that they don't have to follow me allow fetching scripts from a dozen random domains. But more importantly, it will also make some of the more funky anti censorship stuff considerably easier. A little bit later Later, and if you need a script to the Google Wi Fi your funds.
There is a script for that. I can share a link later
and other don't is just don't do just a bunch of plugins. It is very tempting to install a CMS and then load it up with with dozens and dozens of different small plugins fixing or changing this thing or that thing or something else, but be very careful with plugins. One reason is, that's another thing that makes your site brittle that's another thing that makes my meet make updates difficult mate, the plugins might make assumptions that do not work very well with with certain anti censorship techniques.
They will slow your website down.
And most importantly, they're a huge, gigantic security risk. There are one of the main attack vectors often they're developed by by a kind of a guy in his dog shop. As soon as a plugin becomes somewhat popular the capacity to handle reported bugs and publish security updates in a timely manner, goes away quickly. At the same time plugins usually have full access to the database on the file system so a compromised plugin means a compromised site. Now that there have even been cases of plugins being sold by the original developers to third parties, then implemented backdoored backdoors in those plugins and issued updates that backdoored. Thousands of WordPress installations or other CMS installations. Of course this is less of a problem with static site generators, we will be talking about static sites.
before we start. Before we start, even talking about censorship, we need to deal with, let's say self inflicted damage. The thing is that for a dynamic site running a CMS like WordPress every request is really resource intensive. It means getting the data from the database parsing the templates laying out the HTML and then serving it to the user. And then the same thing has to happen for another identical request, and another and another and so on and so forth. I've been asked to deal with targeted DDoS attacks against websites more times than I can remember each and every single one of these turned out to be organic traffic, hitting the site, due to popular news item and site just going down under a reasonable amount of traffic for dynamic websites, any sufficiently high organic traffic is indeed indistinguishable from DDoS. So we really, really have to do have to do something about that. And one way of dealing with that is, is going static really a static site generator a static site is, is your best bet. If you want to stay online and have as many options for dealing with censorship and decentralizing your, your web presence as possible. There are plenty of static site generators for a while the main problem with them was that publishing content required directly editing code markdown files HTML, etc. There are however static site generators that have a user friendly admin interface. Please go and check them and and if you can move to a static site. Another advantage is that the code needed to update the site and to keep the site running, is, is different than the code actually serving the contents to visitors, or attackers. There is no database to be SQL injected there's no PHP code to be exploited there is no third or no third party plugins to be backdoored, just static files on the server. This is the only thing that that the visitors or potential attackers will be
another step or another technique. If you can go static or more if you just want to be fancy, you can start doing some serious caching at the edge. This sounds complicated but trust me you don't need CloudFlare for this developing your own caching strategy is a lot of work and there's there are several pitfalls as I have learned the hard way. But here's the good news I already did all of that work for you. And I'm sharing it in a way that you can basically take this config, and and start using it. So facade, as I call it, is an nginx config that has been tested and improved while in production for over five years at occrp. The basic idea is micro caching caching dynamic resources for short periods of time, so that when a spike in request comes, most of them get served from cache. But the content stays reasonably fresh, all the time. Of course all static resources can and should be cached way longer. No need to cache them for years, an hour or two should suffice. Now, we can have a bonus round, if you have once you have a caching edge node deployed, and this can be a tiny VPS somewhere that really doesn't cost much, and will protect your your back end from from crashing and burning under under, under load. Once you have that you can have to once you have to you can have more depending on the amount of traffic and what you expect you can easily scale up or down while maintaining control over your over your infrastructure. And since almost all requests get cached the limiting factor becomes bandwidth. So adding a small VPS to immediately adds bandwidth immediately adds capacity to your site without you needing to without you needing to add back end capacity to handle more database requests and and PHP FPM processes. The additional benefit is that all the cached content which usually means all of the popular content on your site could stay up and online, even if the back end is down for whatever reason, updates maintenance bugs problems accidental screw ups. This is one of the things that this file this config was in fact designed for. Once we have now that we have a reasonable way of handling organic traffic, if somebody gets interested in your site. We can actually start thinking about targeted malicious actions like breakings DDoS and censorship. And surprisingly, one of the most effective and easy to use techniques is a classic game of whack a node. Meaning
when you have multiple edge nodes
deployed and then ready. When a DDoS hits hits your site. More most of the time the botnet nodes that are perpetrating carrying out the D the DDoS will make the necessary DNS requests at the beginning and then just target the resulting IP addresses to maximize the bandwidth used for the actual attack. So if you have a way of quickly deploying your site to a new IP address or IP addresses, or just deploying New Edge nodes and, and for putting the traffic through them and you have low TTL on your domain, you can move all the legitimate traffic which is going to be checking with DNS servers often to the new edge nodes. While the DDoS pummels the old IP addresses, when when the organic traffic the normal visitors get a get a functional site.
And the DDoS
has has its fun with,
with the edge nodes that are now not used by the by the legitimate traffic. This is, this can also be used to deal with censorship on a little bit different level by moving to a new domain and, or perhaps new IP addresses, every few months whenever the current current ones get blocked provided of course that the absolute URLs are not hard coded anywhere in your site. Both of these approach approaches require manual intervention are somewhat somewhat labor intensive, but can be effective and have been affected for me. In short term in emergency situations. Sensors apparently move slowly. Another way. Another thing you can do, and I would strongly advise you to is to make sure that your site is available in the Wayback Machine censors might often miss the fact that censored websites can be accessed the other way but machine. So the next step is to make sure they your website is available there. Check if snapshots are made on a regular basis, if not you can contact archive, archive.org, and verify that content displays correctly. This is where simplifying your site and self hosting or resources starts paying off. This is also extremely useful in case of any calamity. Of course, we know that we all make backups. but having a public backup of all public content somewhere on web archive is definitely a nice additional safety net, but I would strongly suggest. Next step, could be just having a zip file just zip your content, especially if you're running a static site this is easy. Zip acampo zip your content published it as published as a zip bundle that way interested parties can just download the zip file distributed all over since like over the sneakernet. If everything else fails, ideally, of course, automate this and script this anytime you publish content to your static site. Make sure that the zip file is refreshed and created, and make sure that the extracted files work well where when viewed locally in a browser. This might require a little bit of fiddling with the URLs. But it's very very much worth it. Which, I would like to show you now. So, demo time.
I have downloaded the.
zip file of of occs occrp 's static projects. Here it is. I'm going to unzip it. This is over 400 megabytes of data so it's plenty of content.
Do. Now let's see how this looks in a browser.
Okay. Boom. Let's go to English because why not. And let's say, let's say, We shall see.
Ah, welcome cocaine wars.
All of this, this is a fully functional site, of course videos and and other external embeds are still embedded from external sources, but all of the load all of the content,
in the zip file. And as you can see, when, when done properly, it is easily available, available locally to take and available and browsable, local,
consider this once you have. Once you have a zip file like this. What you can do is you can also upload this, such as zip file to random places around the internet like Dropbox folders or some Google Drive or whatever, whatever works whatever makes most sense to potentially have the content as a zip file or is just separate files available later for for your visitors, if your site.
All you need to do in an emergency if your site is down, is to publicize the URLs to the like the links to the, to the public folders with the, with the data, of course, getting the word out, is another thing, and I would strongly suggest, I would strongly suggest creating a page on your website. If you know that you might be censored describing how what you're doing to bypass censorship or what your visitors could potentially do. To access your content. In case you're in case you're blocked and of course,
this is available
in our zip file.
Because that's the whole point.
so far we've covered
start with a new browser
and clear profile
in Firefox, so that I can show you that there are no shenanigans in the background. Let's create a profile let's call it. Sam is that.
And let's start Firefox.
Okay, so now let's visit the Sam is that website.
Of course, Sam is that is deployed on the samizdat website, what you will notice is that once the website loads. The ServiceWorker is not the ServiceWorker is not working. And there is no fav icon. The fun thing about this is that the fav icon is not available on the server, but it is available through ipfs. So, we can try to do is reload the site and lo and behold we have a fav icon and we have service workers working and we have gotten ipfs and other means of getting content. Ready and working. So, now we can. Now we can even do more fancy things right. For example, the slash example URL, does not exist on the server, if I copy it
to tell me it's a 404 not found. But thanks to the magic of samizdat. If I navigate there in a browser that has visited the site once. It will also load, because this has this content has been pushed to
to ipfs and is now, and is now available in ipfs.
Of course, this also means that I can just outright block the site.
a little bit more work.
As with a lot of this kind of technology there are privacy considerations peer to peer protocols can flag visitors for regime scrutiny.
of course. Your mileage may vary and it is important to choose the transports or the Choose the tools you're using to fit to fit the purpose and, and the threat model of your of your visitors, but visiting a blocked site is already enough of a red flag to a regime, and some is that could in fact be used in a way to improve safety of visitors. As you've seen it is possible to publish content through samizdat that is not directly accessible on on the regular site, which means you could have a fronting website with some innocent innocent content. And when Sammy stuff kicks in for a visitor actual content becomes available. Sammy that could also be configured to not do the fetch request to the fronting sites, site anymore. Once the service worker wants the service worker kicks in. So there are ways to use this technology, such that it improves the safety improves the privacy of other users. There are multiple next steps and tools that need to be done. This is currently a proof of concept slash alpha stage. I would definitely not suggest deploying this on a high profile public website. But I would love to work on this a little bit more and get it to a point where it where it is deployed on at least a few sites, the mobile support is missing. This of course does not break the site, it just means that on mobile samizdat doesn't currently work. This is something that can be fixed with, with a little bit more work. But, but again I didn't have the time yet deployment procedure needs to be way, way simpler and this is, this is my, my immediate next step. I'm going to focus on. And there's also going to be a lot more documentation. And finally, I'd like to implement, many many more more transport plugins. There are of course similar projects like my I spoke with, with a person from netblocks at mozfest, a about a year ago. And they had a similar idea also using service workers and hitting the hitting thinks some fallback IPS that worked surprisingly well. The only problem is that at that time it was about three years ago. Service Workers were not available on mobile, at all. Now they now ServiceWorker spec is implemented on mobile browsers. So, there is no reason not to develop this technology further and other similar projects is was looking at. It is not really developed anymore but again ServiceWorker sidestepping censorship, through through other means. worth mentioning at this stage are also push up, which is a mobile app generator that that implements Tor. Meaning, if, if the back end is not not available directly because of censorship or a problem. Push Up tries to the mobile app tries to fetch the content through Tor. We have also played with new node, and we tried to implement push up, sorry new node indirectly in push up. This isn't, this is, this is work that's being done at occrp but it proved a little bit more complicated than expected. Finally, some thoughts some closing thoughts. One of the main of them I guess would be that browsers browser vendors should lead in this space. And to be fair five Mozilla Firefox is doing is doing, somewhat well in this in this area. There's the Tor optics project which is backporting Tor Browser patches into Firefox there's project fusion, which which. The idea is to implement Tor directly in Firefox perhaps to run by default in the in the private mode. The problem is that the project fusion fusion is not really moving very fast the last wiki edits are from over two years ago. One important thing that would really really help would be improvements and enhancements for p2p peer to peer protocols directly in the browser in browsers. That's a, that's an ask that developers leave behind that made to Mozilla and so far. Not much has happened. But there is a beaker browser, which implements that's directly in the browser so this is not completely out of, out of the question. Finally,
internet, let. I'd like to make the point that Internet's original strengths and promise stem from the province of decentralized nature. We need to find a way to decentralize it again. This will require work and I'm not taking the easy way out and of course it's easier to deploy your site behind CloudFlare, but I'm kind of hoping that this will be more of the. This will be more fun.
But it's also
It is necessary.
If we don't want to end up with a glorified cable TV.
And we're back. What a great talk that was there are so many technical people at home, and so many of us spend time looking for vulnerabilities protecting against vulnerabilities. And I feel like sometimes we forget that the web is ubiquitous. And yet it's also a lot of things and you have to make a lot of choices and I think you've guided us, effectively, on, on different possibilities. We've had a number of questions. Yeah, thank you. Number of good questions on the chat, and we don't have a lot of time though I'm going to leap in with some that I think are more of asking for advice that you would offer or particularly like that like, you know that we're having a conversation with with someone what a, what a hacker might offer. Oh, someone asked, you mentioned issues with CloudFlare so when asked what do you think about Digital Ocean.
I think, you know, any any company that becomes a de facto monopolist the de facto default for a particular kind of service is is becoming a problem, right. I am not personally using digitalocean, I am not a fan of of digitalocean because it starts started feeding to me like another cloud for another Amazon another GitHub. For example, right where where a lot of a lot of eggs are in the single basket. Right, so of course if it fits your threat model if it's your, your needs. Go for it, and be wary of anything that, that smells of centralization and smells of abusing or using a position like that, for, for their own game.
Yeah, thanks thanks for that, um, another point of advice. I'm sure you've had these conversations, I'm not sure how polite you are, but when you're interacting with someone that's using WordPress using Drupal, and it's mostly because they just they just don't know how to do the types of things that you've talked about what sort of guidance Do you offer.
It's such a great invitation and a call for advocacy because you're right, hackers have the hopefully have the skills that the say content providers small businesses don't have. We're just in our last minute or so here, and I guess one thing that I'm also wondering that you can comment on is clearly it takes a lot of effort to keep up with all the technologies that you described but you're spending a lot of time you know upgrading versions and finding bugs and all this stuff. And I'm curious in your experience you know making all of the best choices you know let's say we followed all of your recommendations, how much work or how often are you still going to be making changes and making new decisions you know finding new technologies that type of thing.
So that's that's a difficult question. It's hard for me to answer because I only have this one. One vantage point where I was working for a reasonably large media organization, handling. One of the things I was handling was was website hosting for a member centers. And for me that was a reasonable amount of work it wasn't all of my work, but it was a reasonable amount of work, but that work went down would go dramatically down if, again, the random plugins were another problem, if, if random third party scripts were another problem. Every step you can make to simplify your your. Let's say CMS footprint, or, or complexity of your of your CMS setup really really matters, also in the terms of how much time you will spend, keeping it up and running and updated and all of that, and the other side, the other point I'm going to make is that the the other setup that that we were using, really, really helped us in a lot of ways, including whenever there was a member center or website that we were hosting, which did not have the capacity to upgrade their website and we did not have the capacity to help them with that because for some reason it was either a very complicated complex setup or, or anything else in the way. At least we knew that. Due to the, to the caching and due to the, to the setup. Most of the resources were read only for almost every visitor which means, this was a way smaller exposure were way smaller attack surface, which meant also that we had a little bit more capacity or the breathing space to to implement necessary changes when they were necessary,
thanks yeah that makes a lot of sense. Well, we really appreciate all your guidance your knowledge and I think I really particularly appreciated that you're inviting hackers to help to make this world a better place. It's been an illuminating talk right at the end of our time this has been Michael Isaac Wozniacki talking to us about the censorship is no longer interpreted as damage and I think more importantly, what we can do about that damage.
Thank you. Thank you very much. Thank you.