Private Documents: Phase 1

    Problem

      Users, including ourselves, need a way to write private documents, and to collaborate in a close circle. Currently there's no way in Seed to share a document with a limited group of people.

    User Stories

    2

        As an Author, I want to publish a Private Document that is only visible and editable to the collaborators of its Home Document.

        As a Collaborator, I want to view the list of private documents contained in the home document directory.

    Solution

      To limit the scope, the initial implementation of the feature described in this document is very basic. Most limitations described below are temporary, just to simplify the implementation, and start learning in practice ASAP.

      No new permissions

        For now, only WRITER members of the site and its owner will be able to access private documents. No dedicated read-only permissions yet, and no segmentation on a per-document level — all writers will have access to all private documents.

        1

        This simplifies the business logic for enforcing permissions, and doesn't require creating any new roles and capabilities.

      Paths and directories

        Private documents will have random path names, and no nesting (i.e. no directories).

        2

        This gets rid of a whole class of problems like metadata leakage, cascading inheritance of permissions, privacy setting conflicts, and so on.

        Hierarchies of private documents could be built with explicit links, and we'll see how it goes. Another goal with this limitation is to see whether we need an explicit concept of a directory, because to this day I'm still questioning whether our approach for implementing directories implicitly based on paths is a good idea. One day I'll write a separate document about it.

        1

        Coincidentally, and depending on the outcomes of a rabbit hole described later in this document, there probably will be no ability to change document privacy after creation. You could still create different documents pointing to the same changes, but they'll be treated as separate documents — same as it works today.

      Networking

        To sync private documents the site owner needs to designate a server. Eventually this could be any PeerID, not only a web server.

        This lets us not worry about many things incidental to P2P syncing, like leaking metadata about private documents to peers that have no business knowing about them, and exposing relationships between Account IDs and Peer IDs to random peers.

        At least for now, in order to create private documents the site has to be configured with a web domain, using the existing webUrl attribute. In order to sync private documents, and publish changes to them, peers would have to talk to that server. This calls for a change in the way we currently push data to servers, which is a long due problem for us, as our current push flow is a bunch of workarounds.

        Another change we need to make is in BitSwap. We need to make sure we don't accidentally send private data via BitSwap just because someone had asked the CID of that data. So we need to somehow extend BitSwap in such a way that it knows whether the requested data is private, and whether the requester has access to it, and if not — pretend it doesn't know anything about it.

      Peer <-> Account relationships

        Currently we don't publicly expose information about the relationships between peers and accounts.

        This was mostly an accident, but it let us implement multi-account in the app much easier. Then we realized that it could be a privacy-preserving benefit, and now we don't want to let it go.

        Nevertheless, in order to enforce permissions for private documents, we need to authenticate peers. But we'll do it in a very limited fashion — we'd only expose this relationship to the servers, and only to those that are relevant for a given private document.

      Permanent data changes

        We decided to store the public/private status of a document at the level of the Ref blobs.

        This makes it possible for the same underlying changes to appear in both private and public documents. That, in turn, enables useful workflows — for example, keeping a private “work-in-progress” document alongside a public “published” version.

        The simplest implementation would be a private=true flag in the Ref. However, there are a few subtle implications to consider, and they lead into a bit of a rabbit hole (or a few).

        1

    Rabbit Holes

      Signaling and metadata

        When resolving a URL for a document, comment, or any other resource, it would be useful to know in advance whether the resource is private.

        By default, the discovery process asks all connected peers if they know anything about the URL. However, if the resource is private, most of those peers shouldn’t even be aware that it exists, or that I’m interested in it. Ideally, we'd query only the relevant peers, which for now means the server.

        1

        But how can we determine, before resolving, whether a URL points to a private resource, so we could only ask the relevant peers? This problem is often called "signaling".

        2

        A similar question appears elsewhere — how do we know whether an ipfs://<cid> link points to an image or a video? In our case, we currently signal this information in the parent context. If a block type is image — we know the linked resource must be an image, if it’s video — it must be a video. But if you look at this link without the parent context, you can't reliably tell what it is. Granted, in case of media files at least, most file formats carry their own signaling inside the file itself, usually in the form of magic numbers.

        So, signaling can live in the parent (container) context, or inside the content itself. In the context of our discussion, "content" is the URL of a private resource, and the question is whether they need to carry any signaling about their privacy situation or not.

        Signaling options

          One way to signal that a document is private could be with a special path prefix, such as /private.

          2

          Initially, I had very mixed feelings about this approach. However, in my conversation with ChatGPT about the overall problem it brought up the idea about this prefix being configurable by the site owner, and that made me like this idea more, even if we don't implement the configuration anytime soon.

          Another way could be pattern matching.

          If we say that private document paths are random, have fixed length, and don't have any slashes in them, then we could simply try to match the pattern of the URL to see if it could be private or not.

          This will probably work fine, but it doesn't seem very elegant.

          Coincidentally, two other rabbit hole arise: readability of private document URLs, and comment URLs.

      Human-friendly private URLs

        If we say private document URLs are random — we lose the human-friendly URLs. We already lost them with comments, and to be honest the experience hasn't been very pleasant (especially because we don't have meaningful OG images for comments). So, if users are going to exchange links to their private documents anywhere outside Seed, it may become hard for them to distinguish different documents.

        1

        This may or may not be a big deal. After all, many-many websites people use these days have this problem. We don't have human-readable URLs in Google Docs, YouTube, X, Instagram, Telegram, Bluesky, and so on. But we do have a human readable component in Notion, Linear, and other more modern tools.

        So, I've been thinking that maybe private documents could have Notion-like URL paths, such as /this-is-the-slug-of-the-document-$<special-random-id>. The $ or any other sign could be used to extract the special random ID part, and to distinguish this type of URLs from user-defined paths that we currently use (we'd prohibit this sign in user-defined paths).

      Comment URLs

        Even if we decide that private document URLs should signal about their privacy, this wouldn't apply to comments. And it's pretty sad. Obviously, private documents also need comments, and those comments also need to stay private.

        Currently comment URLs look like this: hm://<commenter-account-id>/<comment-tsid>. It seems like making comment URLs not having any relationship to their parent documents was a bit of a mistake.

        How can we tell by looking at this URLs whether this comment is private or not? How can we even know, that we need to go to a server of the document to fetch that comment? The URL is not even tied to the namespace of the document, but rather to the namespace of the commenter ¯\_(ツ)_/¯.

        Sometimes I'm guessing whether we need different kinds of comments — the ones that are part of the document itself — for collaboration and shaping the document, and the external ones — for third-party feedback about the resulting document.

        Or maybe we just need to support more than one way to address comments?

        Or maybe we could add "signaling" about the parent document to the comment itself, e.g. with a query parameter?

        I'm honestly not sure what to do about it at this point. Help wanted.

    To Do

      Despite being a simplistic version, there's still a lot of work to do, and Alex Burdiyan and Julio both need to be involved.

      Here's the currently known list of things to do:

        [x] Update libp2p and IPFS related dependencies. We're pretty outdated on this front, and there's been some big improvements that might be relevant for this project.

        Implement a proper push algorithm with optional authentication (gateway would allow pushes from anyone probably). List of things to push here

        1

        [x] Extend BitSwap to make it aware of private blobs and authenticated peers.

        Address the rabbit holes described above, and make the necessary permanent data changes.

        TBD...

        Designs

        Latest Designs - 1st Iteration simplified

          This designs follow the proposal for the project Private Documents: Phase 1 This means private document status will be decided when creating a document and they will not be children of any other document. Initially there status wont be able to change.