Secure Data Storage - WG
7:49PM Jul 23, 2020
All right, it is two minutes past the hour. Welcome everybody to the weekly secure data storage Working Group call.
I will have the agenda link
in chat shortly
over to Kalia.
Okay. Welcome. So this is an IPR protected call. If you want to make substantive contributions, please make sure you go sign the agreement. And actually next week, we should review the list of who signed and staff because there were some folks who hadn't signed it.
And then I guess we're, oh, let's do introductions. Is there anyone new on the call? I'd like to introduce themselves.
Sure. This is Phil along from the p3 group. First time I've been on this call, but I've heard a lot about you.
john. I'm here at the invitation of Dimitri I, I worked for interrupted now I'm working on a new project around data interoperability at shape repo Comm. So I actually have an initial version of the product of the chief of Comm. And but I want to know what's happening in the world outside of solid four when it comes to projects that require interoperable So, we're gonna be hanging back and taking a listen.
All right, welcome.
Thanks. And then folks who've been on the call before if someone would like to reintroduce themselves.
Yeah, I can reintroduce myself. I've been here on a couple calls and Jace from bloom filled in for me. And one of them. I'm still trying to figure out how to actually sign those be signed, because I believe bloom is part of the depth. And so I reached out via email I just haven't heard back. So if anyone could help me on that, that'd be great. But I'm mostly just listening. We have a secure data vault implementation that came out Open Source prior to like any spec being finalized. So trying to see what needs to be changed on our end to be compliant, just kind of follow the space.
Great. Thanks so much. Do you
mind including a link to your implementation? I think a lot of people would be interested in checking it out if it's
posted in chat. Oh,
okay. And then if you can go over to the agenda and put your name in the attendees list, that would be great. And before we get into the main the main part of our call today which is hearing from Jonathan hold about IPL D, I just wanted to bring up this point this It's been raised about the voice, the recordings and the voiceprints. So we, I posted it on the issue around this in the in GitHub, but we got a response back from the, the company. So the voiceprints for this call are only within the organization. They're not in some global voiceprint database. And so if we want to have a discussion about that, and whether we want to keep going with the way things are, we can put have that another week. I think we should just go ahead with the call today. And and have a deeper discussion about whether we want to switch back to scribing or continue with this. The way we're going forward. So,
so everybody, please take a look on page 243 in preparation
Yeah. Great. Thanks. And then I think, um, is there anything else we should cover? Dimitri, before we go on to hear from Jonathan.
So for those of us, or rather, for people who are joining here for the first time, the story so far is we're working on the secure data storage spec. We've, there is rough consensus within the group that it should be done as a series of layers. We've had some architectural discussion on what those layers are. The two big major pieces are the low level encrypted data vaults, and the higher level, hub or secure data store names are still in flux. And over the last couple of calls, we've had presentations by similar adjacent projects such as ipfs,
and data shot
Who have existing sort of prior art on various layers, for example, data shards, very, very similar to what we're aiming for in layers A and B. ipfs. around in the similar area, though, a little more cross layer. So, on this call, we're continuing that trend and would like to hear from our very own Jonathan hold about IP LD.
Cool. So let me just share my screen.
Now I can't see everyone. That's strange. Is that new? Yeah.
Yeah, I ran into that too. super annoying. Okay,
so you can see my screen
You'd have to just put your slides in your presentation, like open up, like in your presentation software, just open the presentation window and kind of click on them and then you can see both but we see the sidebar.
Yeah, that's fine. Okay, so last week one gave a general overview of IP Fs and I want to dive a little bit deeper into IP LD and find out if there's a good fit for one or more of the layers that we're building upon. So I'm Jonathan Holt, I am a founder of transcend x, say healthcare interoperability company and I'm also in the role of being the CMO, the Chief Medical Informatics officer for consensus health. I am physician by training, triple board certified internal medicine, genetics and informatics. My background in informatics mostly is in bioinformatics and writing algorithms for species identity identification and barcoding. And then, in health Care interoperability and HL seven and the clinical genomics working group in standards organizations for representing genomes and fire an HL seven. So, last week we heard about ipfs. And just the high overview of IP LD and how it's used as a metaphor format for encoding and decoding Merkle link data structures, including things like ipfs, obviously Ethereum Bitcoin get And in contrast to linking data in other ways of where they have an authority ipfs in particular IP LD really talks about linking the data through immutable hashes that represent pointers to those locations, no matter where they are there. They have no authority. So let's talk about IP LD. So really, the concept is linking content across wherever it is in a distributed data store and your local device on Mars through the the hash of the content. So it's really a cryptographic key value pair. And it's censorship, proof by design, future proof by design. And it has a self describing payload payload by prepending significant bytes. That gives some semantic optics into the meaning of the data that actually is inherent in the the link. And this allows for semantic interoperability. So speaking about semantics, I'm sure you've all heard about JSON LD. So JSON LD is a lightweight, RDF, like way of describing the data attributes related the left hand side of items in JSON, and it really gives you great context for meaning. So we're on the same page as far as what this string represent referenced representation of comment means. And it describes a vocabulary. Here's schema with a comment. But it really doesn't have much context for the meaning of the string ified value. Here. Obviously, it seems as though it's hexadecimal represented because I put that zero x in front of it. But we don't really sort of know what that is, is just a string blog representation. In contrast, IP LD gives us more meaning both into if you want to do JSON LD on top of IP LD in giving more of the left hand side, but also gives you a peek into what is on the right hand side. In this case, this is a hash link and I'll explain that more but this big, long A string is a representation of a CD a content identifier, which I'll explain more that describes or is a pointer to the is a hash of the content which is a pointer to what that item decodes. So, the real problem that this solves is that many of the data structures in this web three world are directed a cyclic graphs or you can think of this as Bitcoin aetherium nodes have these hash bank linked lists, or the previous field points to a hash of the previous block
or the Merkel dag structure of transactions and Bitcoin blocks. And so, the idea of IP LD is that really we're entering this Merkel forest have multiple directed icelake a cyclic graphs that live In a decentralized world, and that there is an ability to link into them by using specific formatting. In this case, it's all done by multi format, which includes multi codecs, multi base and multi hashes, which I'll explain more in a minute. But it really defines the linking of this data using these cryptographic hashes. And importantly, there's no authority. There's, you can go through a gateway to resolve this, but you can host your own gateway. And you should get the exact same content because the content is addressed by the hash, not by who says so. So you're all probably familiar with the IP protocol as being a narrow waist or thin waist protocol that if you speak IP, you can speak so many different things on higher levels of the stack or lower levels that interface through the IP protocol. So what IP is LD attempts to do is to be the thin waist protocol to link to multiple directed a cyclic graphs to really to sort of be the link of the Merkel forest, and different systems. So IP LD is the identifier for IP. lt is simply the hash of the content. It is not the location or the protocol that you're using. It's not really about dereferencing or resolving or retrieving the resource, although that is really helpful and whole systems of lib, p2p, and bit swap and ipfs. Help do all that. But iPod stands on its own and really, in my mind is a state of being it just is and you can represent it in a relational database. You can store it someplace and you can have your own resolvers. But the iPod is just on its own. And it's about the content and how you link those contents. Not about resolving retrieving or storing content, although that is an important and useful aspect of this. So the components of iPod is the CD, which I alluded to before, but I'll explain a little bit more deep deeper. There's there is a data model for storage and resolving this, including dag seaboard dag, JSON, and YAML representations for storage on disk. And it facilitates to terminix deterministic serialization formats to get in and out of different encodings. For instance, representing it in a DAG seaborg structure. There's a deterministic serialization algorithm that you can parse and actually get the exact same structure out of it, provided you have some constraints. So there are also concepts of IP PLD selectors for selecting sub graphs and also transformations of IP into other formats. So here's an example of a CD and iPod. So here's this big long string. But when you break it down, it is a self describing identifier, where the first this baffi. And the first four letters represents, in this case, the multi bass being used. This is base 32 encoding the version of this data model. In this case, this is a CD version one, there are other those versions zero. I believe we're still on version one. That multi codec that describes what the content is formatted. And in this case, this is the underlying format is seaboard and a particular flavor of seaboard, which is dag seaboard. So it's represented in a deterministic format with all other tags stripped away except for tag 42, which I'll explain more in a minute. And then the hash algorithm that was you used to hash the data in this case is Sha 256. And it should be 256 bits long. And then finally, the hash in this instance is represented as hex. hexadecimal, I believe and yeah, and that all together is the what this base 32 c ID, that string that actually is baffi represents. So, here's an example of peeking into a DAG superstructure using I think the IP LD Explorer.
And he just gives you more of example of the objects in this case more in a diagnostic mode for easy readability, but it gives you the where there are links there are multi hashes and opening up some of more of the data behind this we can see some of these are links to pointers that are multi hashes that are Dag seaboard. So there's here's an example of a link. In this case I'm overwriting, JSON LD context using a non authoritative self describing c ID. In that case, it was actually in DAX Ebor, as an underlying representation but it is a serialized and encoded to Jason for easy readability but an alternative of this, which is I think we're we're next going after having registered IP Fs and IP ns and I Anna URI scheme, it would be representing IP LD has an appropriate scheme name for backwards compatibility with application JSON LD, where the context is a properly formatted you are I agree again, this is a non authoritative source but it does have a hierarchy which Why I believe the two slashes are
It's a self describing authority. I'm not going to show any examples just for time. So as I mentioned, the IP LD as dag seaboard strips away all other tags except for tag 42. So tag 42 is registered in Ayana as being IPL DC ID. And so in that tag, it is natively supported in NC four. So, you guys, most of you are probably familiar with my method, which is IP ID, which basically cryptographically publishes the document to the IP ns namespace as IP LD and I think the older version of unique resolver has this from the what I presented originally at rebooting web of trust, five years Meeting in Boston which is I published it as just raw JSON wised up and and now it's actually more extensible using IP LD. So I'll need to give a update to you Marcus for the right resolver for my did method I'm not gonna get into the selectors at all, but natively right now Jason paths are supported. So you can traverse the the JSON representation using a path notation, including the typical Jason path of zero for the first index in an array or key based values. And probably important to note is that B dag, C bore and cidrs paths the reason why the constraints prohibits numeric representations is because of JSON path selectors. But I certainly the into in formatting the did methods for the secret representation in DAGs, it certainly would be convenient to have key representations in native c bore. And having those native mappings between, let's say, major type two and minus seven for EDI, 25519, elliptic curve signatures, that those translate into a string representation from the ins.
tag registry into more human readable terms, but you have the flexibility of the key as int to four compactness. And that's what I'm working on now to update the examples in the my did the did core seaboard section. So here's an example of how it came out at the paper that I wrote but never got a chance to present at the rebooting web of trust in Buenos Harris was cancelled. But this is the crux of the paper that I presented. It was the ability to have a cozy, seaboard signed, see ID as the payload with a typo in there, but it basically a C bar signed Kobe is tagged 98 you have protected and unprotected fields, but the payload would be a CD with tag 42. But that's a B string with a CD as the payload. And here's just the example of the algorithm with Mitch type number one minus seven, which is Sha 256. For D DSA and it's where the key ID is a DI, D ID with the whole key including the key ID in the comes from the document. And then of course, the signature and these are in hexadecimal format for easy readability and seaboard diagnostic. Done So the idea of why IP LD might be a good fit is that it's natively supported at in CBR. And depending on what layer of the stack of representing the chunking of the data, it's really, it's really representing the metadata, both in a way that could be compatible with JSON LD, but as well as the more context of the payload of the right hand side of what what is this blob of stuff mean in the encrypted data. And this would be just a good mental model of how you drill down into the describing the data and those the chunks of the data that may be in the encrypted data format, on disk. And with that, I'll pause for questions.
And as a quick reminder, we use either the zoom raised hand for key management or you can Type A q plus in chat Monique, go ahead. Um, hey, Jonathan.
That was That was great. Thank you. Um, I hadn't seen a full kind of front to back description of IP LD before. So that that was great. Thank you for that. Um, I have a question on as to kind of the layer that you think the IP LD stuff fits in my assumption here is that it's it's probably I mean at the story, it's layered at the at the lowest bid do you do you see IP LD showing up at any other layers in the stack as we go up, like in the like in the kind of the document descriptions at the encryption layer, which is typically above storage at the hub layer? Where do you it would be helpful to know where you think the IP LD stuff fits in which layers?
Well, I have a hammer looking for nails, and I see everything As IP LD, so I think everywhere. So I think it has the ability to represent the pointer to the base layer the chunked chunks of raw data this encrypted, it has the opportunity to represent the metadata, both as JSON LD on top of IP LD, it also has their own way of representing the data that's, you know, sort of protected. So it's just a matter of how you model it. But the beauty of it is that it's like batteries included, like, like, it really is, you know, the best of JSON LD and the the best world of seaboard and tags. So, you know, there's still some open ended questions as far as how to extract it from a DAG seaboard CD into native seaboard. And that's some of the things I'm struggling right now with actually getting good examples for you or your request for the did core specification.
I think it's dependent how we chunk this up how we split it up into different layers, but I see it as having a role at many different layers. And I think even if we don't adopt by PLD, this is going to be what I'm going to be using underneath my encrypted data stored for integration, further, what we have already built.
Right. So I think I'm next on the queue. So first of all, thank you so much for doing this presentation. I think a lot of people are interested in how IPF as an IP LD handles very similar problems as to what we're trying to as a group. This is really interesting. My question was, if you could possibly go back to I think he was second to last slide where you had the ASCII diagram of, I wanted to have like the encrypted block on the metadata.
I wanted to ask you, if you could say A couple of words on how is the
How does IP LD handle authorization if at all. And I wanted to ask a question about this particular diagram, how is the encrypted data represented is it as a just the usual multi format index is it causae and so on.
So certainly it the codec that could be used as a new one that's called dag cosy. And so I haven't seen any examples of it. I'm experimenting with my own hopefully there's semantic interoperability with existing formats, but it would be a multi codec that represents
dag cosy. Okay,
So that meeting IP LD is the abstraction layer of data metadata how things are linked together together. And one of the links to let's say the encrypted data chunk would be a
linked multi codec.
Understood. And one one last question before Ori, and that is, does IP LD require the data to be encrypted? Or is that optional for the application? totally optional.
And it's up to how you do it.
Got it. Thank you. Or you're up next.
So I think you'll We almost got to the question that I was going to ask which is, you know, assuming dag Kobe and the format of the encrypted data is a CW NP and the identifier for it is some dag cosy identifier There's the there's the static version of this with cis. And there's the immutable identifiers that are IPO all the identifiers that are published on IP. And so I'm wondering if we could take this picture that you've got right here and add the extra layer that lets you start with an immutable identifier that wraps all of this. Like that seems like the thing that would be really helpful in unifying the fact that like, when for the most part, we're aside from some, for the most part, when we're talking about encrypted data ball, like I get your eyes, we're talking about a identifier to immutable resource, because permissions might be changing. And if it's a CWA, that means that the content ID would change. So I'm wondering like, whether you got like, how can we start with a mutable iPod identity Fire and then end with a DAG COEs a CW II, that seems like that would close the loop on a big chunk of the pure, you know, seaboard version of like the exact data model that we're all talking about.
Yeah, I think that gets into more of dereferencing and resolving of the underlying payload and content that you want to have a convenient string, human readable string that represent my resume, version 10 dot one. So I think, an ordinate so you want to have some way of actually then pointing to the underlying data structure? And I think that's Yes, IP ns published cryptographically publishing it, but I think there's so many other ways of, you know, you know, even like, you know, DNS right now, for resolution of IP ns, you can go to IP ns and go into Jonny crunch.me as IP NFS and that resolves my documents. So it's sort of I just a convenient pointer for metadata. But in the end, like you need to have a backwards compatibility to verify the Yes, this this is still a lot of closing the loop that there's actually a cryptographic backwards signature validating that this is this is indeed, signed by me that is you got here, but you also can trust it, I think you got to verify that it is close to the full circle.
So need that for ipns though, I thought the whole point of IP ns was like a stable identifier for mutable resources. So like ignoring DNS, I have an IP ns identifier and I'm meaning for that to point to a CW e that mutating over time. That's like a totally a thing that's like, possible right with IP ns.
Yeah, so IP ns really is just the hash of your public Key. Yeah, so you're cryptographically publishing content to your that, that, that the representation of your public key. So you can actually have like, stable links in in IP, let's say if you publish IP LD to your IP ns, and you call those those indexes into the key indexes into like a map of a JSON LD, but you've changed the the pointer to the record, that's totally fine. And I not guilty.
But that would not give you a portable mutable identifier that gives you a mutable identifier that's isolated under a particular public key. So like, if you move to another instance of an encrypted data vault on top of IP ns plus IP LD, that new vault would have a different public key because they don't share private keys and then the idea Have fires that you would want to sync between them would not have the same identifier. So there's no point.
But they couldn't, couldn't they actually share private keys if I if I'm control over my own vault, and I'm the one who's publishing it, and I'm moving it from one store to another store, but I've never actually shared the private key.
I mean, I'm assuming that this is Microsoft and Amazon or running these things, not me. But, but but the
keys, records and IP ns is, is a DHT just like any other content and ipfs. So it happens to be special use case of actually validating with but you're validating the signature on that record. So it's, but it really ultimately you can publish it into the DHT and, and even hold on, not even running a node. So I actually like you don't sleep at night needing a road node to actually play Publish records to ipns.
Does that make sense? It does. I want to, I want to ask a clarifying question to what already said, is it the case that you need a key pair for every document that you publish on ipns?
No, just the latest cache of your, your manifest. So if your your manictime manifest is actually as an IP, LD, and you're publishing that to your IP, ns namespace, and you've changed, you keep the same values, but you've changed the CIT to the link the right hand side. It will all still resolve, you just basically have updated the pointer. Okay, and I wish I could actually run through an example but I just changed laptops. I don't have everything sort of running up on this machine. Yep, no problem. Adrian in chat
expressed some confusion about about the discussion, I suspect. Adrian, were you asking? Are if the identifiers are mutable, or the documents, feel free to put yourself on the queue and ask a question. All right. So it sounds like come on what's next?
Yeah, a couple of, I think questions. So So I think, Jonathan, the thing that would help me the most to kind of understand the ecosystem, I don't know if other other folks are having this. This issue is, if we could, if we could just focus in on one use case, which is the storage layer, how IP LD fits in there, I think that would be helpful. I saw a couple of other places where it could potentially fit in. But the questions that Henri is, you know, are raising or or he's raising our
for us to nail down, right? I mean, I feel like you know, because we could use IP LD for everything, it becomes really difficult to talk about it because we're kind of talking about into this general sense. So specifically, you know, I would like to, to see what using IP LD would look like if we put it in as one of the metadata chunks in the current EDB document. I would like to understand how exactly what in ipns pointer would would look like to a mutable piece of data that can then be updated. Without those things, like I think I understand the general data structures, but um, they could be put together in a myriad of different ways, right? And right now, at least in my head, I'm kind of like See, like 4048 options of the way this could be put together and I don't know if certain which one's being proposed. So do you do you plan on like that as a next step to try and just like see see how it fits in? Or are you kind of presenting this hoping that other people kind of understand the structures and then integrate that into the work? What I guess the question is around like next steps, what do you what do you see as next steps here?
So I think I'm not proposing anything right now. I think I'm still trying to get my bearings as far as what this group is trying to accomplish. You know, I i've my first pass is actually finishing the seaboard specification in the documents and getting to semantic interoperability with the key formats for causae and in natively encode in Tibor but then also in DAGs Ebor and then have Some examples for interoperability showcasing it in the did spec registry that's that's like my task for the next many weeks to get that aligned. But I think for the I'm giving information as far as Hey, this specification is way cool. My hat's off to the protocol labs people I'm really impressed with their work. This is where my mind is at as far as it that makes sense to approach it. And, you know, I, there's many different ways to skin this cat. You know, I, in the back of my mind, I want to make sure that we actually have a solid foundation for something that is hacker proof that is really keeps the power in the individual, because we have this massive surveillance capitalistic machine at play. You know, I'm very much concerned that we're reinventing methods that are just going to do this exact same thing that has happened before. So so it mostly I think I'm approaching this with propose to make sure that we've actually thought this through in ways that are really keeping the end user, front and center. So you know, if I think I want you guys to ponder iPod a little bit where it might might fit in, perhaps not everyone's going to have this the same approach at either different multiple different layers. I think I think it makes sense that many of the layers, you know, certainly I think, resolving and retrieving it may not make sense to all go over IP Fs or lib p2p for bit swap. And so I think, you know, that But my point here is the IP LD stands on its own and can be used in a variety of different settings for resolving representing and the material underneath. And it is agnostic to the encrypted data methods underneath it, or the retrieval mechanisms on top of, again, trying to beat that narrow waist protocol.
Thank you. I believe Adrian is next and then Wayne. After that.
Oh, I thought man who might be next. Oh, I'm sorry, man. Are you
getting? No, no, I just went with thanks, Adrian. Okay.
Oh, well, I really don't didn't mean to break in. We've gone 40 minutes. I followed almost nothing. And the reason is that I can't map what's going on to even a single use case that I I might understand. But that's my problem. And I think it's at a level that doesn't need to be to take up the the time today.
All right, thank you, Wayne.
Yeah, I was gonna echo similar things I'm following, you know, what identifiers are and what the data packet formats might look like. But my general feedback is that what would be most helpful to meet is something like a state transition diagram, or understanding the whole narrative of how does a new baby data packet become, you know, big, beautiful, encrypted, SDS compliant, whatever, right and referencing, it's all these different aspects of it. That would be incredibly helpful. I one of them, you know, one of the pieces of tech documentation that kind of inspires this approach is, there was like a TCP IP manual by Stevens that I think like a lot of people liked and he would do a really good job of drawing these state diagrams, okay? You can be in these different states as a packet or whatever the connection can be in these different states, we can figure out how data interact with the SDS system as a whole. And just talk about that high level and follow a specific journey or use case or narrative. That would just help me immensely.
Thank you, Wayne.
Man, go ahead.
Yeah. So I was I was building off of what Adrian in Wayne just said, You know, I don't think this is necessarily Adrian, I don't think it's it's your problem. I think it's all of all of our problem to figure out exactly which use cases are benefited by the use of ipfs and IP LD and the whole, you know, protocol LAMP stack basically. So, and I'm struggling with it as well. So I don't think you know, I think a number of us are one use case that at least feels like Clear in my mind is the one around replication. So, uh, you know, ipfs, you could argue, does replication beautifully with just a couple of modifications. And it's got the right kind of anti censorship properties that Jonathan mentioned, right? I mean, if you're encrypting everything and you're pointing to things by content IDs, you don't really know what it is. And because you don't really know what it is, you might replicate it more readily than something where you do know what it is harder to take things down, all that kind of stuff. So the use case that the first use case to me feels like at the lowest layer, if we use it at the storage layer. replication becomes much easier, right? Because we're doing replicating, we're replicating content identifiers. And that's a good thing. So so that's one use case, I think it's pretty solid for for ipfs. I did want to point out that, um, at least digital bazaar is of the position that, um, encrypted content on public networks is a really bad idea in general, meaning that they you should you if if you think it's not, you're in an exceptional use case, because encryption has a shelf life. And it doesn't matter how you chop the data up. If it escapes onto a public network, you should assume that eventually that your encrypted data will be broken and will be decrypted. It's very easy for us to break 1980's encryption, and that really wasn't that long ago. So there's a there's a question around ipfs and whether or not we should so ipfs in public mode, I think is an anti pattern. Like let's please not do that, but ipfs in private mode, meaning like it's Within a provider, or within multiple data centers behind a single provider feels like a really useful thing. And you could see that, but but, I mean, at that point, it's kind of like an implementation detail, you the specification won't even see that you're using ipfs. There. So I'm curious if anyone disagrees with that position around encrypted content on public networks. Because that is what I think you'd have to do to put it into a layer above the storage layer, he would have to escape onto a public network. Any thoughts from the group on that? Especially you, Jonathan?
Yeah, I think you're starting to get into like peer to peer routing data dhts and I think that's I think much more than I'm willing to, to to take off as far as the IP LD, it discussion. So I think you know, I mostly if I On this data modeling layer of the right and left hand side of actually how to represent that, when we get into like, you know, private mode for ipfs of of accepting which peers you're going to connect to and shared secret that gets to be much more complex and, and overwhelming. So I think if we want to piece that together into smaller chunks of, you know where on that networking layer, actually do we want to talk about where there's synergy, but maybe just from the just the pure data modeling of the attributes for common vocabulary with that context and the pointers to hash based linking, I think that IP LD is a good fit for that.
Okay, let's see who is next. I think today vamos next.
Okay, so I have a lot of thoughts that I'm trying to organize, specifically here. I think a lot of what's really interesting and powerful about IP LD and IP Fs is the is the lack of authority component that we're just doing content based addressing. Unfortunately, I'm struggling to figure out how to fit that in how to take advantage of those benefits from the perspective of how we want EBS to work so my understanding, I think I think there's consensus here is that we want to ensure that the that there is an authority model around the DBS and that we're not going to be simply relying on encryption methods to protect data from getting into the hands of people that shouldn't have it. And so what that means is, if you're creating if you're putting out your private personal data somewhere, you want to restrict who can get access to it even if it's encrypted. So this is restriction to the encrypted information. And you also want to be able to have mutable stable identifiers for given documents or blobs of data so that you can decide to hand out to certain people access to those blobs to read it or make changes. And I'm struggling here to figure out how we can leverage the power of IP LD and ipfs under those scenarios. So talking about replication. If we I think there's two places where replication matter. The first is within a storage provider. So but I don't know how interesting that that particular problem is. It's, you know, within a given storage provider and EDB server or set of servers under one trust boundary, they're going to do some kind of replication backup strategy for all of the encrypted data, but there's no insight into the clear text in those situations. Then we also have replication which is up at a higher level where you do have insight into the unencrypted data, where you can make better decisions around what types of data to replicate with Where you want those things to go how to better resolve conflicts. And that's where the more interesting problem is, which is sort of, I think, what falls under what we would say, sort of like the hub layer. And because we pushed on encryption to the client so that we can enable these storage providers that don't get to see the data, but they can be enforcement points for the authority. What that means is the clear text is only seen by clients. And so you have to have a client that is able to get that data decrypted, and make decisions about what other ETV servers to send it to. So if you're using one storage provider a as an EDP server, and you want to pull some data down from that into a client, decrypt it, make some decisions about replication and send it to gdb server B, that decision is made by the client or whatever, some software agent that's running a client. It's not made by Ed by the DV servers. And so I don't know how useful it would be to necessarily run like a Private ipfs network on one of these servers would give you all the properties of replication, whatever. But like mohnish said, I think that's an implementation detail. And I don't think we're really getting the benefits out of that, that we want to get out of ipfs where the strengths are. And so I, these are, these are my thoughts on this? I don't really know. There's no question that's baked into any of that other than here's a bunch of feedback.
Thank you, Dave.
Hello, can you hear me? Yes.
Okay, great. So two things. First of all, I want to
I actually have Yeah, so the first thing is that I think there's been some impressions that I've kind of read from the room previously, that there's a data shards versus ipfs type perspective and then at least from the conversations that I've had with Serge I don't think we perceive it as that if we, in that we, I think that there's an opportunity where we could possibly use the same network that ipfs has to be able to distribute. basically use ipfs as a chunk storage network, which we have not tried yet, but we're interested in exploring. So I think that that option is open. And I just wanted to dispel that from some impressions. I've kind of felt like we're reverberating around the room possibly earlier. And on a previous call, the the, the second thing, so I mean, we don't know if we can do that. But I just want to say that I think we're open to exploring that. The second thing is I think that a concern of mine, though, is that ipfs does not yet seem to have an official way to store encrypted content. If I understand correctly, and especially not encrypting content that's chunked in and in a way that provides ambiguity as to what the content is that's being stored. So my impression is there isn't a spec for that yet. So at the moment, it's an application application decision about layering encryption on top. Now, that's a place where I do get a bit nervous. In terms of that, I would personally prefer to run a network where connect to a network where I know that everybody's doing encryption by default. But maybe if there is a collaboration on trying to actually just see about using the same hashing format, and storage operations as a storage back end with data charts, there may be some sort of way to achieve unity there anyway, and I saw something in the chat about Jonathan hold, maybe exploring an encryption specification. And maybe that's an opportunity for collaboration. But I, but I kind of don't know, because I'm, I'm not really completely caught up.
yeah, I think it's a lot to unpack and I and I encourage everyone to just to ponder this and not pushing ipfs to be a complete solution. But certainly IP LD is a great way of representing that left hand right hand side.
Thank you. And just just add my own comment, I think.
Chris, you I'm not sure if there's an explicit dichotomy or choice between IP LD and data shards and people's minds, but they are both systems that are basically spanning layer a and layer B on our diagram So, as a group, we've got several different choices for how do we format our identifiers? How do we, what's the format of the paper? How do we chunk things? How do we encrypt things? And so it's good to have. It's good to have options for discussions while we, while we create, use cases and all that. We will have to make hard decisions at some point. Anybody else have any other questions? We've got five minutes left.
Comment and chat from Dave Longley that I PLD could be really useful as a manifest foreman for chunks, which I absolutely agree with. I would love to, if you can, if you have some pointers as far as maybe an issue or a mailing list, where in the documentation to display To get the format for the manifest of the chunks.
Quick question, if the queue is empty, when we talk about chunks do we talk about them in the sense of deduplication as well?
I think that's an excellent question, Adrian. Some do, some don't. And at some point, we're gonna have to figure out what the specs position on that is. Or if it even needs to have one, right. So, so for example, if we were to use IP LD, maybe in theory, you get d duplication for free because they're content identifiers, right? But if we were to not use that, then we've got to think about how we do deep lubrication or if we even care and because Maybe deduplication isn't something we need to care about it at, you know, the layer we're operating at. But it's an excellent question. All right.
Alright. Unless there's any other questions we may get to add a few minutes early. As always, everyone, please take a look through the, through the issues. raised questions. Help us make decisions. Next week, Kelly, I believe we have a presentation from my data. Is that correct?
Yeah, from the operators group from in Henderson. And then, um, well, I believe we're also going to their meeting to present about what we're doing to that group.
So cross pollination,
pollination. Right. And all their meetings are on Wednesday mornings. Usually I think that's the one we're going to but I'll post it in our mailing list when it gets confirmed if folks want to join and answer questions that they have.
Right, okay. Yeah. So nonzero probability that there's a my data presentation next week. Yes. Well, oh, yeah. Wait, Porter. He asked me a question or
no, that's when the mining operators call it.
Got it? Yeah.
I was asking if we're going to theirs and they're coming here. The neck. Yeah,
both things are happening. So you do not have it's not a joint meeting. It's it's mutual presentations, I believe.
Well, thank you, everyone. Talk to you all next week.
See you in the issues.