Secure Data Storage - WG
7:56PM Jun 25, 2020
Everyone Hi, Serge.
Kalia Hello, hello. Clear Tobias Do we have a host code by chance?
I'm so belie sent it to us in this email and it's hidden in the wiki and
think I know where it is and I'm gonna go find it. Okay.
All right. Sounds good. Like, it's fine if we can share screen, but it would be great if we could. Alright, we'll wait a couple more minutes for people to file in and buy a couple. I mean, like one
Chris mentioned that they had another meeting right before this one, so they might be a little late. Okay.
I got it. Man. I sent it to you. And
Thank you. Hello. Oh, hey, Chris. Welcome, welcome. Thank you.
You should leave because I said you'd be late. So
Well, it turns out the our insurance agent never called us. So that made it very easy to be able to make this meeting.
So just FYI, that meetings recorded, so just be aware of that.
All right. Sure.
Okay, invalid. Okay. There's another one. There's another one.
Let's try the other one.
Hi, everyone. Well, people are filing in. We're just settling the Call logistics. Oh standby, huh?
Okay, I read the whole email is
that someone that runs?
No worries, no worries. Okay. That does in fact, it does
in fact work.
Okay, so Hello, everyone. Welcome to the secure data storage Working Group weekly call. Our agenda is in chat. Here's the link. This meeting is recorded as zoom has been so kind to point out to us. Alright, so let's start with the intellectual property restriction reminder. This meeting is free to join. But if you're going to make any substantial contribution, which means things that are likely to affect the spec You can only do so after having signed the IPR policy, which can be found at on the front page of the secure data storage groups wiki. All right, so let's get to introductions and reintroductions.
Who would like to volunteer for introductions?
David mace and I forget did we do ask you to introduce yourself Friday?
I maybe it's hard to keep track sometimes. But I'm
David Mason and I work in a
digital identity group at the comfort of Canada and
following this for a while on my own and very interested follow along, mostly discussion On.
Welcome Welcome. Troy or Sunita, would you like to do an introduction or reintroduction? Hi there. This is Trey Rhonda from secure key in Toronto, Canada. All right, I welcome.
Hi, this is Rita from USA.
Well, we have a lot to cover today. So let's jump into it as a quick reminder, issue 35 naming of the spec and working group is still out there. But today, one of the things we thought so in previous meetings, we introduced the the notion of architectural layers. For encrypted data vaults and secure data hubs. We did a beginning of a deep dive into layer a the byte storage layer. And then for this meeting, we thought we want to approach from the other end and give a quick overview of how our encrypted data vault and hubs relate how two other components relate. What's the, what's the 30,000 foot view before diving back down into layer B and discussing data shards as one of the potential
specs that that we could use for layer B.
Okay, so here is you have the,
the agenda link
item for there, and I'll paste it directly. So this is a link to Google Slides that contains the Proposed overview that we want to go over today. In this call. We want to explain what our initial thinking was take questions and possibly even edit the structure collaboratively. So let's see if I can share screen first of all
before we get started
in with the with a diagram and
explanation of data shards and so on. Does anybody have questions or concerns?
Okay, here we go sharing screen.
Okay, can I receive the diagram
or hopefully pull pull it up on us. Their own.
I'll start off explaining the lay of the land and then I'll hand it over to the other chairs and editors to comments. As a quick key to the map, we want to start out with an overview of rough logical components and protocols. We are going to get to data models later on, but data models at the moment are not shown on this diagram and we also are experimentally denoting API endpoints as
as these pyramid shapes.
part of part of the structure here is
is our letter shape at At the bottom or at the innermost core, we have the encrypted data vault. And one thing that I would first draw attention to is burning is layer a is raw bytes storage, like we discussed during the previous call. The important thing to keep in mind is that ultimately, once objects or streams or collections are written into the hub or run into the data vault, they get encrypted and broken up into chunks. Their raw bytes can be stored on any number of back ends. In the cloud in a local database on disk, in your mobile phones, storage, they can be chiseled in titanium or stone, any sort of story edge storage mechanism. So, layer a, we want to mention as part of this working group, but it's ultimately out of our scope. Because each storage layer, whether it's file system or database, or cloud, each provider has their own API for storing the bytes. We really start paying attention or rather the working group really starts discussion at layer D. The encrypted data vault essentially exposes or at least in in the shape that that it is right now in the in the original draft spec. And that and everything that we're going to talk about on this call is of course subject to change to the working group. But to start with the encrypted data vault exposes three main API endpoints can be used to read and write configuration files for the vault itself. It can be used to read and write encrypted documents and objects.
And it can create and query on
All the operations, the reading and the writing are performed by the encrypted data vault client here, which isn't located on our initial alphabet layer proposal. But it's pictured here because it's it's a it's a crucial integration point between a lot of these components. So at a high level when you store objects in encrypted data vault you do so via the client,
the data vault
receiving keys and authorizations from either a key management system a one or more wallets or agents, from authorization servers, either internal, co located or or external. Shout out to Adrian. Since this since our approach, as laid down in the charter of this working group, is that we're focusing on client side data encryption. The client is responsible not only for the key management, permission management, but for actually encrypting the documents and storing them in the data vault. And then the data vault, stores them transparently on in layer a whatever Storage back end. It's set up with
your queue question on the queue. Do you want to take it off and chapstick I yes. And let me see if I can somehow
see chat because screen sharing messes with that a bit. But if you don't mind managing the queue for me. Yeah, you're gonna Yes, I can. Ah, haha. Fine. Yeah, you do it. You do it clear. Sorry. I'll be right, please. Okay.
What do we got?
We'll take the question though.
Okay. Mr. Yep. So
my question is, I'm seeing that the EDB client is being wrapped inside of the hub, which is surprising to me because I what is the is that actually suspecting that a user is having some third party in general, in the general case, managing their keys and etc for them, or is the hub something that actually could be seen as running on somebody's desktop? Or etc?
Good question. So, yes to all of those.
Since this this is a very rough sort of
logical components and naming, discussion, both encrypted data vault and hubs themselves can be running on the desktop, on as a mobile app in the cloud, any number on an IoT device, any number of configurations, it's, we're showing
somebody who's at the door, so Okay, so we're showing them we're showing the encrypted data vaults nested inside the hub
Partly to, to show one potential configuration one potential relationship.
As we can see here on the side, they encrypted data vaults can be standalone,
completely unrelated to hubs. If the use case desires it,
and can be can have standalone clients interacting with them directly. Or alternatively, standalone clients interacting with them through the hub. So we'll get to the hub portion. But does that answer your question? Hopefully?
I think so.
Okay. That'd be specific, Dimitri,
I think, in how we envisage hubs is that they can live anywhere. So you might have like three hub instances, two of which might be on your local devices, and one of which could be like on a cloud provider. The cloud provider instance wouldn't have the keys to be able to do the encryption and stuff because it's not you. But the other instances could be empowered to have keys. Because obviously you don't want some external party having those.
Laurie, leave your next. Ah, another variation of that is basically that hubs have specifics pieces of the way that they've been described. And their specification operates on plaintext. And so you know that you need some way of seeing plain text. And so you know that, that means you need to have some way of decrypting ciphertext and that's why hubs wrap both the EDB client and the EDB server interface in this picture. And this is just the way that we have currently come up with to try and show that these things can work together. If you think you have a better way of representing how these things could be joined together. We'd love to see comments on issues.
Thank you, Laurie.
In Daniel left,
sight, calm and in the
Google Slides document itself that it might help to show a second full stack of add on hub nearby. It's a good good idea and maybe we can show that probably on the next slide, just so that we don't crowd one slide too much.
This seems like there is a follow up
question from Chris. Cuz we're trying to the calls transcribed. So if it isn't said it doesn't show up in a transcription.
It's not real. Okay. Thank you. And I really appreciate you giving me a heads up. I can't see chat while presenting.
Chris, can you say your comment, please? Sure. So what I wrote in the chat is that the hub hub is a term kind of feels vague, like I can get the idea of what the encrypted data vault stuff is doing. My sense is, is that the hub has something to do with the indexing but I'm not really sure. It so I've Feel like if I if I got a more expanded definition of what the hub performs, and maybe I'd be less confused by this diagram.
Excellent. Okay, so that's where we're going next. So just wanted to start out with a lower layer of EDI, encrypted data vault plus client. And then wrapping that or maybe at a higher abstract layer is the identity hub. So these two things were the two source documents, the two input documents for our secure data storage spec, right we have the encrypted data vault paper slash spec, and the diffs identity hub.
repo is back and papers.
And so part of the reason that those communities plus several folks from other communities like hyper ledger Aires and so on, came together is we saw that the Work is largely orthogonal. And then we can approach it in layers so that the hub layer tends to deal with a higher level more abstract stuff, things like messaging and inboxes. Shout out to link data notifications, collection collections, and other higher level authorization policies and so on. Whereas encrypted data vault client is just for get put off encrypted documents. So identity hub is not meant to be vague. It's meant to reference the spec. Right, like
Does that answer your question?
Daniel, go ahead.
Yeah, I mean, hopefully I can add some clarity. I'll give an example a use case example. Okay, if I wanted to go create a decentralized version of Craigslist, it would probably be such that users would want to put out plaintext data in a location that could be crawled, and it would have to be semantic. So let's say there's a schema.org offer objects, because that's, you know, what Craigslist uses in their RDF in their tags already metadata tags. So why not use the thing that that whole industry is, is a user would craft of couch, you know, they're there for sale couch, they put an object and they stick it in their hub, which would go down into their Eevee. It would be encrypted, I guess, just because EDB encrypts everything, even though it would be plaintext. So the hub would respond when it gets a query to say, hey, do you have any, you know, schema.org offer objects, and then some app can collect them all. Just like Google crawls the web and say, Great, here's what looks like Craigslist, I went and found them in everyone's hubs everywhere, right. And the difference here is that if this was up on like some cloud provider, those objects, I suppose in this diagram and tell me if I'm wrong, Those objects, the actual cloud provider would have the key to, because if you know their intended public objects, it's basically being encrypted for sort of no reason, I suppose. Because you're telling the cloud provider, I do want you to be able to have access to this. Whereas other objects, which might be inside of the Divi portion, are not accessible. They're not public objects. They're stuff that you've encrypted from the client that, you know, obviously, the provider has no access to because they're not intended to be public, publicly resolve objects.
Thank you. And yes, you're absolutely right. And maybe we can, in this or in a further diagram, we can denote that from the outside. This we're covering the use cases of both private and encrypted objects, permission control objects, and completely public. So even though the file or document is public from the outside, by the time it filters down into the green into the encrypted data vault, it gets encrypted. So like by the time it hits hits, raw bytes storage, everything is encrypted. But to the outside, you can still have public facing documents. Agent Go ahead.
So from a privacy perspective, having the authorization server which is the policy decision point, which means that it has policies in it, and plain text in the same layer, and in the same governance entity, the same business, whatever you want to call it, is a huge problem. And it's unnecessary to introduce this, this privacy issue, because just because I have certain policies doesn't mean that I want the entity that has access to my plaintext data to to do that. That's the whole point of separating the policy decision point from the policy enforcement point.
Understood, we could completely agree and like your concern is very understandable.
This is, this is more the fault of the artist than the spec. It's not that everything up on the y axis is plaintext. And everything down is encrypted. No any wonder any unlikely all of these components have access to encryption, including, of course, the authorization server. So part of the reason why I separated them out the authorization server as a separate component is specifically to show that it can either be part of the hub button like part of the stack, or it can be externalized. User can bring their own authorization server. So it has nothing to do with whether it's a encrypted or not? Of course it is. This is a separate component just to show that it's swappable. Does that make sense?
Well, it does. But it sort of begs the question of why we're doing this. In the sense of, you know, in the in the broader sense, why are we inventing this? Why are we talking about storage at all, but I don't want to distract us. That's, that that can be a topic for another day. I just lost from that perspective. But yes, you're right. You don't have to encrypt the data. But but it most likely will be encrypted. I'm curious. No, I'm saying it most likely will not be encrypted. Because but again, we talked about that last week. I'll shut up.
No worries. I mean, obviously, if you have questions We definitely want to make sure it's clear to you and to everybody else. We all want to be on the same page. Then I see your next on the queue. I want to I want to remind you that hey, this is collaborative diagram. Adrian, if you want to get on the queue, if you have concrete proposals, like move the authorization server box to the side or down below, absolutely would love to, then go ahead.
What confuses me is the difference I see between the green box and the blue box. I mean, I'm just using the box color Sure, sure young green box is the EBV or the encrypted data wall, which is a collection of layers or components or whatever you want to call it provides a specific service set of services, right? All right, the blue and then it also is and contains all of those I mean, when you say GDP, it means that it contains all those three layers that you have there. Whereas the identity hub feels like it is more of an abstraction over the That stuff. Whatever it is, it's not that it's not a collection of components.
Yes, it is me looking at this.
Ah, no. Let me, let me explain that. It's mostly because I'm much more familiar. And when we were drawing the diagram, we were much more familiar with the udv side than the identity hub side. So it is not as detailed. Incidentally, not by design.
Man, Oh, go ahead.
yeah, two things. One don't have edit access. So I want to modify the diagram to see if the modification would make more sense to venue. And, and then I'm, Adrian, I want to go back to some of the things that you're concerned about
not editable now.
Okay, great. Thank you probably have reload. So Adrian, I want to go back to the things that you're concerned about, because I think that the goal here is to try and get the diagram into a state where you find it acceptable, not only you, but everyone else in the group. And I think what we're trying to do here is just iterate on kind of what we have. So let so let me try one iteration here. And this is really for what the new brought up. So bear with something like that make more sense to you. Were even
there is add part of the hub. That's what you're saying.
So this means it's a client that can be used right in in theory, we could we could say that I'm going to make a temporary Edit and then deleted immediately. So you know, we could say that You know, an ADB client could also go in the wallet. And the wallet. I don't know how to draw,
this but the wallet could find the wallet, you know, the wallet could actually talk to
this makes more sense. I mean, it's the same. Like I think some somebody said earlier Sorry, I didn't get the names. Somebody says that. And this is basically a lot of functionality happens and a lot of transformation happens in the client like encryption until the authorization may happen to glide. Yes. If you're embedding client into each of those components are made makes more sense to some extent like wallet has embedded client has an embedded client because that is where the security is informing the Yes, the secure part of the secure data storage is done. I suppose.
You're absolutely right. It is done on the client, and so on. Like this added mono. It more it shows more clearly the hub has any DB client, but it uses to talk to the encrypted data vault layer. And you're right that both kms and the wallet is likely to have their own client.
Okay, so my question is what would be lost if we took the authorization server PDP and move that completely out of the blue identity hub? Block such that they're requesting the request for authorization that has to be matched against the policies are just yeah, just like that. Now.
Are you moving? I love it.
Yeah. What what what about the scope of the SDS workgroup would be hiring. If we get That.
Okay, that is an excellent question. And I think then both Daniel and Manu have had some thoughts on the matter. I want to add real quickly that I want to make sure that it's clear that it can be outside. It can absolutely be outside. But also for developer convenience, it can be packed, packaged into the full identity hub stack. But Daniel Amano Do you want to?
Yeah, I mean, I think that, you know, here's how I think of it right? Like, and I'll just give you the top level flow. If I want to interact with someone, right? Like, let's say I want to interact with, you know, Alice to Bob, I would look Bob's at D ID up, I would find a service endpoint called, you know, identity have whatever this thing is application hub, I don't really care what the name is. It would be an array of multiple instances. So let's say there's one on Microsoft called, there's one at home that you have a static IP, doesn't matter. They're all instances. They're all peers. It's very masterless I can address any one of them. And I should get the same replicated thing. So I, they're all exposed the same hub interface, because we want to be able to talk over one cohesive application style interface. Within each of those instances, let's say there's two of them, they would have the EDP layer. And when I ping one, whether it's on my, you know, the difference between my thing at home and the one you know, in the cloud is one might have keys and wallets sitting right next to it that can actually decrypt certain things, right, that the one that's remote that you know, is maybe run by, say Microsoft doesn't, because you don't want some remote party to have the ability to decrypt those things. But they both follow the same hub level interfaces, so that when I address them as developer, I know I'm speaking one language to all the instances and they replicate between each other. So the hub is sort of a coordinator for making things application usable. Whereas the EDB itself is like, you know, you can't just be if I'm an app developer, I'm like, please give me your music playlists. I can't be like, please give me This random index number into book like that doesn't make any sense, right? Like you need something there to make sense of this.
That's what helps do.
Again, does that answer your question?
not really. But it gets closer to it because Daniel introduces the idea of the service endpoint, which is the point I made and in the other in the VIP core stack, that the privacy engineering point that at least for peered IDs, how you use the service endpoints, whether they point to the storage, or they point to the policy decision point. makes a big, big difference. So when Daniel describes his model of, you see a whole list of maybe replicated or I didn't quite follow, why we want to do that. I think I think we need to be careful from correlation and a privacy engineering perspective. I don't I'm not speaking from the developer perspective here, I'm not really concerned about that from given my role. I am concerned entirely about privacy. So what I'm saying is there will be many different types of service endpoints, and some of them will point to authorization servers, and some of them will point directly to let's say, encrypted or unencrypted storage.
Right? Because there's going to be things that are unencrypted that don't belong in the D ID document, and therefore end up in a service endpoint. So and there may be other you know, like notification endpoints might be service endpoints. So unless we have a from a privacy perspective again, unless we have it clear idea of the kinds of service endpoints that we expect to put in the public big document that's going to be crawled and indexed by everybody, with or without traffic analysis on top of it. from a policy perspective, I just can't get past that. So I'm saying I am totally happy with this conversation continuing if you just take your authorization server out of the blue box, and and now we sort of have this idea that the identity hub as some access to metal data and unencrypted data,
and then they're magically and I'm putting magically in quotes, whatever the authorization server has decided ends up policy enforcement point, but that's a technical detail. So I'm saying magically because I don't necessarily care about that. But I do care about that. I will use the service endpoints in the peer data example.
Thank you. And we're definitely not showing service endpoints on this particular view. We should probably have another one that illustrates document service endpoints. So wood is is a thinking model for moving out the authorization server. Adrian, is this. You feel better about this arrangement?
Well, yes, because the authorization flying is then just the policy enforcement point. And so if you for technical reasons, decided to move the encryption and meta and metadata indexing things and separate them out. Then in a sec, what you're doing is separating out policy enforcement point to this thing called the authorization client. I don't have any objection to that at all. My my perspective is only where does the requesting party issue that access requests, as long as the requesting party goes to the authorization server, rather than to the identity hub or the storage layer, I'm happy. In other words, I don't want to leak any information, any traffic like analysis or any kind of policy, inference base things I don't want to leak that anywhere outside of the policy decision point, it needs to be at the policy decision point. So data minimization, if you word needs to make sure that only capabilities or authorizations however we encode them leave The policy decision point and under no circumstances should a request ever go anywhere else because that's just unnecessary.
Thank you, Adrian privacy wise. Got it, then go ahead you're on the queue.
I'm so listening to what Daniel said, I'm, I'm asking if wallet is a client of the hub or client of the ETV because if you are thinking of a wallet, I'm sorry, hub as kind of a storage a that it's local or remote storage for the identity. And if I want to access somebody else's data, right, and of the data is shared specifically to me, then that data has to go through my wallet to be able to be decrypted. Or at least that's the way I think about it. So in so in this picture again, not so much And I have the wallet probably should be a client of the hub, not at
its, or at least in the current picture, it is both. So the arrow between the wallet and identity hub indicates the wallet is a client of the hub, and also contains an encrypted data vault client for storing money. You're on the cube.
Yeah, just to be clear, I don't know what I'm doing is moving things around to until people seem to will yell if that doesn't mean that doesn't mean I know. So I like I'm thoroughly confused about you know, what goes in the authorization server and what the authorization client is requesting from the server. Maybe it's like a token of some kind and then give it the ability to pass the token off to the ETV to pass to the authorization layer so the ATV can access like, that is not an answered question in my head. But, you know, Adrian seems to be happy with this, more or less happy with the diagram. So I think the next part of the discussion is going to have to get to, okay, now that the boxes are in the right places, what does that actually mean? So if anyone else understands what's going on, better than I do, please take over drawing diagram.
Because I'm still unsure. Thank you, and it is definitely an area for further discussion. Tobias. Ori, do you have comments?
Yes, I just put myself on the queue that I'm so perhaps the language around authorization server could be adjusted to accommodate some of the some of the other kind of x modes of access that essentially you have a component you have a client that somehow needs to get some token or cases capability of authorization. You know, one way that is done today. Quickly today, as you know, a website or a client relying party conducts an OAuth two flow and gets an access token out of that that's a protocol for obtaining delegated access. One of the other patterns that I think is, you know, relatively common is through using chepe to request access from someone's wallet. So I wonder whether or not to be more inclusive of that language. authorization server is very old to hearing maybe if we could, if we could tweak that language, so it means authorization server, and it's inclusive of an OAuth two flow, but it also doesn't have to exclude flows like involving Chevy. Maybe that would sit you know, with a logical
Thank you. It's a good point.
I don't know what that term is, by the way, but I was just pointing out that maybe that's the term that needs to be tweaked.
Understood. So I see Adrian on the queue. I want to be mindful of the time. I want to
Want to leave some time for the data shards? discussion? Hi, um, any questions about
actually, you know, we, it might, it might just make sense to continue this discussion and postpone data charge for another day. Okay.
And but we do want to say that the data shard spec essentially zooms in on this layer B, and replaces or complements part of this architecture. But it's not concerned with the more abstract stuff hierarchy.
You know, we could ask other people and I don't want to speak for the whole group. I'm just saying we I'm okay with moving that, Chris. I don't know what Chris thinks or anyone else. Chris, would you?
What do you think I'm excited about talking about it if we can have sufficient time to talk about it on this call?
All right. So the agent before I was just going to respond to that last point. The the one of the things that does concern from an engineering point of view is if somebody goes to the identity hub or to the storage first because they for whatever reason, they pick that service endpoint or they found that in an index somewhere you know in a directory or a registry somewhere and they are forced to go to the authorization server before they can get the token. Right. In other words, they showed up at the identity hub before rather than at the authorization server, which is actually the room of flow that traditional flow, it is not the transaction authorization flow, which is a authorization server first flow, but, but that is inefficient and and that that point is obvious. What I want to say is that from a privacy perspective, that inefficiency is intentional. In other words, we we can't keep people from making requests to something other than the authorization server. And bread, we can force them to go to the authorization server so that the policies can be kept away from the people can be kept in a separate layer and a separate place from the people that actually have the enforcement capability and the data. So I just look there. Sure. Yeah.
Okay. Let's pause there. And I encourage, I encourage you very much to bring up this point on
initiatives as always, and in fact,
toward towards the end of the call, we want to discuss what are some good next steps with this diagram. We want to continue to let people edited do we want to make a possible PR to the spec? But okay, so let's Daniel, do you have a quick comment?
Yeah to two things and adjust something agent setting and I'll make it. I think there should be another diagram. The next diagram should show exactly this. But she showed three instances one with it on a user's local device and how that has keys, one on a user's like maybe an endpoint that's at their house, which has keys and then a remote data store that doesn't have keys, but they're all still three instances of their thing. That would be awesome for me, I'll help work on it. The other thing I want to say is Adrian, if you have to, if you think about this is like a thing that could hold out data from across a huge spectrum of use cases, right? The hub that plus a DB together. You have to think about the fact that like it's going to try and replace things of the current day, right, like blog posts, I did decentralize Twitter. I'm working blue sky on Twitter, right. It would be weird if like to read someone's public Twitter feed, you had to go to like an hour. authorization server paying and then like, it's like it's a public Twitter feed, it should just be like I'm visiting a webpage, right? Like webpages Don't make me go do strange flows just so I can see public stuff. So I think we have to keep that in mind. And we can't force that model, because that's that model is only attuned for certain verticals and use cases. And the vast majority of the web does not function that way. On purpose, right. Thank you.
Yes. It's called surveillance capitalism. I'll shut up about that.
Thank you. Okay, so we've got 15 minutes. Sergey and Chris, is that is that sufficient to start the discharge discussion? Or do we want to say thank
you? I think you'd go Chris, because you talk to me.
Okay. Yeah, I think it's sufficient if we do a part one of two basically, or at least one part one of n like, I think we can do a high level overview. If there's enough time, if if people are interested in that, but it's just gonna be a zoom out. view of things and not a zoom in would that be good?
I certainly think could be helpful manner. Do you have a comment?
Yeah, just a just a general, I think date, we need more time for data shards. And I think it's perfect to do a high level intro in the next 15 minutes. And I'd like to spend the majority of next call time diving in just one opinion. Perfect. Likewise.
I put some stuff on the mailing list, and I'm gonna link it here. So I didn't. Sarah has done some diagrams of the more low level stuff. So I, I just linked a high level one, it's ASCII art go figure because with me, the and I threw it together very quickly. So there are really two layers. So we can see two major sections here, right. There's a thing that's called MDC and the thing called idea C. So idea C is the stands for immutable data. sharts MDS CS For mutable data shards, so that means really IDs is for documents that don't change. And MDS D is for documents that do change and it provides a way of basically giving revisions. So for example, I make that out, you know, Bob takes a wonderful photo of Alice's cat. Bob wants to be able to share that with Alice and etc. Bob, and after Bob does the upload, gets back an idea CRI, which is a capability that stores both. You can see at the bottom it says entry chunk and and data chunk data chunk with these kind of locks around them. Those are the IDC URI says, here's where you get the entry chunk. And here's the key to start decrypting this stuff and then that entry chunk if it you know if it has more chunks in it needs to point to basically can be read. So what's what's nice here is that you just basically get one year, I think You can share around and that you or I gives you sufficient access to be able to read it. Even though the underlying chunks, you know, we might have that via URL or a hash link or whatever. For those encrypted, say, 32 kilobyte chunks or whatever that we pass around. Those ones can be stored on some sort of abstract store. So at the lowest layer of the IDC diagram, you see a thing that says client store and maybe a server store. So the so for example, the simplest store you could possibly do is just have a local file system or a local in process memory store, right? So you're storing chunks in there. You don't need to go talk to any remote server. But oftentimes, you will be having some sort of store that interfaces with some remote store, like some sort of web service, or you know, a global DHT. It could be an encrypted data vault that provides specific authorization like so you would actually instantiate this store that fulfills a store interface, which basically as get a chunk, put a chunk, right? Presumably, you at least have get. And that could be, you know, again, like reading or writing local files, or it could be, that are just like, you know, written by their hash. Or it could be something that ends up
talking to something, why don't
we share the top, Chris, because you're focusing way down at the bottom, and we're going to want to
focus, I'm almost there. So why don't I at least stay where I am, and I'm gonna, I'm gonna move over there. I'm very close. So the the, the important thing to understand here is that in this first scenario, you know, Bob is taking the picture of Alice's cat only wants Bob Alice and maybe Alice's friends to be able to see it. So even though there are these other stores that are holding the chunks, where they're not actually aware of what the content is, but this first example is just an idea. See, it's not something that's updatable, right? So the If you had an idea you write to the client, it's able to talk to the stores and make sense of this. But, but we also aren't specifying whether a store is a local USB key, or if it's an encrypted data vault or whatever. So now if we go to the top to the MDC, what we're doing here is we're adding the ability to have multiple revisions. And what's important here is that when you make an MD, so with IDC document, there's just one UI for it right? You're just only going to have one UI for that thing. But for the NTSC document, there's actually three separate capability, your eyes, there's the readwrite verify capability. So if Alice is updating some sort of, you know, document that she's writing to, she's going to have readwrite verify, right, but if she's going to give Bob the ability to read it, she'll specifically attenuate it to just a read verify capability. So now Bob can see what the latest updates are to this document, but Bob can't make changes. And then there's also a verify only capability. So the reason that we have a verify only capability is that you'll notice the same way that ideas he has idea of stores with chunks of data. MDC has the idea of registries, the registries, keep track of updates, and they can see Oh, yes, this MDC. It does have a new update, but it doesn't know what that update means. It doesn't know it can't read it. It can't make it updates itself if it tried sending to an MDC client. Oh, yeah, actually, haha, the update is this other thing. It actually can't do that because a client will explode and say Nope, nope, that's I can tell that that wasn't a valid update. So all the registries are able to do is be able to verify, okay, yes, somebody told me this was the newest update, I checks out, and I'm going to share that the next time somebody asked me for what the latest updates are. So via some magic stuff that we borrowed entirely from Tahoe, le Fs, we basically take the approach to each one of these certificates. If you have the read write capability or sorry, the read verify capability, or of course, the read write verify capability. There, you'll see at the bottom of these certificates, there's a thing that says EC, right, the it's the encrypted location of what the actual thing is. So the thing is, is that on each one of these certificates, it actually points out an idea. See, except it's hidden in plain sight. It's hidden and encrypted on the certificate. And only if you have access to a capability that is read capable, are you able to actually determine it. And that's why the verify the registries are able to verify that updates are correct without actually knowing what they are. And that's the general overview of what data shards does data shards. So if we were going to look at data shards from layer perspective, we'd say that at the very bottom, you've got some sort of store that storing chunks and then have ideas see clients that when given that can either write out new immutable data shards on documents to those stores, or be given an idea, see you're right and be able to read them from them. And then at the top, we have, we have these client registries, which are able to keep track of updates to MVC documents, but don't actually know what they are. And then we have clients that are able to, that are able to write, create these new mutable data shards, documents, write updates to them, and read them. But some of the clients may have access only to reading. And some of them may have access to reading and writing. But it's just a UI that you share with somebody that actually gives you access to those and there's some extra stuff inside of the specification that explains how to be able to point at specific revisions. That's it One way that you might pointed a specific get commit. That's the very high level view of data shards given in a ridiculously short amount of time. I hope it was helpful.
Thank you so much.
I'm gonna go ahead.
So Chris, the the server registry stuff in the server store stuff, my expectation is that this is like, you know, it's generic, generalized, it can be anything. But in reality, like what's going to be the first implementation of a server registry in in what's going to be the first implementation of a server store.
So we already have multiple implementations of stores and, and have different kinds of stores where we have one that's a web server, one's really simple. One that's a, a file system, but of course there are and one of the one that's just in there. But of course those are, you know, these are just toy versions, right? And, and for the registry one, we have an example web server. Now what what I'm sure this group is interested in is, gosh, but how do we add authorization to these? Right? That's the key part. That really matters to the encrypted data vaults part, right? Because actually, what happens here is that the stores and the registries describe an abstract interface that you can implement anything that implements that interface of the appropriate methods, you can end up using. So for the case of encrypted data vaults, let's say that we already have a z cache LD way of handling authorization or whatever authorization method isn't really what you would do is you'd instantiate the Restore, and you'd instantiate the registry, like client side objects with the appropriate material that lets you access them, right. So in the case of if it was E, capital D, you'd hand it the certificate and you'd hand it the key Be able to access that inside of the client to instantiate the thing. And then you could talk to it. So we've already have web based version sees that work, and also even other toy like file based and in memory and stuff like that things. But you can absolutely compose them with any authorization system, including the ones that are being planned for on the encrypted data vaults. Does that answer your question?
Yes, but it raises another question around the the EDB interface, but I'm gonna leave that for another
day. Thank you. Thank you, Dan.
Just quickly, so, Chris, if I remember correctly, there are two reasons why you're proposing data shards. Number one is so that in the case, where the data is encrypted, the stored data is encrypted. The storage provider would not be liable for it or would not be able to see it. And, and the other cases to avoid censorship. Right? Those are the five remember correctly the two reasons why this whole article
and one more would be would be privacy as well. Right. So So yeah, liability of the providers, censorship and privacy.
Yes, well, privacy is a given. And actually that that's the nature of my question. So, um, is, is it fair to then say that given those two or three things that you agree with the idea that the policy decisions, however they're made, whatever the protocols there are, are not part of any of the things that you're talking about?
That they're done, right. Maybe we intentionally, we intentionally designed it so that you can add those policy decisions out another layer. So the registry and store systems were designed, so that you can actually plug the policy decisions right into their appropriate your use case.
And that's what I wanted to I wanted to sort of get it into the minutes that these two or three things that you just you and I'll just both said, are an important part of the overall privacy engineering of what we're doing. And they go hand in hand with moving the policy decision point to elsewhere. That's
Thank you. We have a couple more minutes. So does anybody else have questions?
All right, well, we'll continue an in depth discussion of data shards.
On on our next call, as well as issue review, if we have time.
In case it's Not clear one
in relation to the previous diagram, right around here would be the line between layer A and B. Conceptually, right. So the the encrypted chunks that the IDC client provides will be handed off to the to the store. At which point it essentially enters into layer a the mic storage.
Yep, I think that's right. Yep. I think there's one more question which is whether or not the certificate layer is similar enough. And this is certificates a little bit differently than the way they work in z capital D, but whether or not that those certificate things actually might also fit in layer a as well. Interesting.
Okay. All right. Well, we will, we will continue the discussion in issues and on the next call By the way, Christopher, fabulous ASCII art skills. I thank you. Thank you.
All right, any other last minute questions?
Thank you all. Thanks.
All right. Bye