#7 Seperation between a Request and Consideration in a Merge / Pull

Closed
opened 2 years ago by yookoala · 7 comments

As discussed in the ML, it would be preferable to separate the request from the consideration in a merge / pull process.

In a federated network, a message always belongs to the creator of it. So the person / repository who does a PR / MR would have to owns it.

But in a source forge, a pull / merge process is always a communication between the source and destination. Both of them should have some rights over the process.

The logical choice is to break the PR down into 2 different entities over the federated network. That way, both ends can have full controls over their side. Both can do updates on their end. And these updates should be federated to the other side as a "friend suggestion".

As [discussed](https://framalistes.org/sympa/arc/git-federation/2018-06/msg00222.html) in the ML, it would be preferable to separate the request from the consideration in a merge / pull process. In a federated network, a message always belongs to the creator of it. So the person / repository who does a PR / MR would have to owns it. But in a source forge, a pull / merge process is always a communication between the source and destination. Both of them should have some rights over the process. The logical choice is to break the PR down into 2 different entities over the federated network. That way, both ends can have full controls over their side. Both can do updates on their end. And these updates should be federated to the other side as a "friend suggestion".
fr33domlover commented 8 months ago
Collaborator

Currently we're doing this for tickets using an Offer activity. We'll do it the same way for merge requests; the user controls the Offer object and the stuff inside, and if the offer is Accepted, then repo side creates its own copy that it controls. Whoever has authority to update the repo side's object (which likely includes the user who submitted it) can send an Update.

Currently we're doing this for tickets using an `Offer` activity. We'll do it the same way for merge requests; the user controls the `Offer` object and the stuff inside, and if the offer is `Accept`ed, then repo side creates its own copy that it controls. Whoever has authority to update the repo side's object (which likely includes the user who submitted it) can send an `Update`.

We'll do it the same way for merge requests; the user controls the Offer object and the stuff inside, and if the offer is Accepted, then repo side creates its own copy that it controls.

Sorry for repeating, but I'm not sure Offer is a good thing to do. It seems like just an additional level of complexity on top. Also, if the object is given a local copy, that will just cause confusion and duplicates around the federation.

See my comment here.

I strongly believe it should be more standard way of exchanging AP objects:

  • User A creates issue with ID https://domain.tld/issue/1 which relates to repository at ID https://repository.tld.
  • Repository adds the issue to their issues list after some validation that they want to possibly do, and then send an Accept back.

User A can try to close the issue (which verb would that be?), but the repository might have a policy that only the owner/editors of the repository are allowed to close it. If User A isn't such, it will Reject the closure.

User C subscribes to the repository. They should not accept closure messages from User A directly except for their own fork of the repository.

Creating new ID's for issues will just create confusion ESPECIALLY if the original remote creator sends it to other actors in the fediverse who start replying to it. If the same object is used, this will just work, since User A will do inbox forwarding for any reactions on the issue they created, so that they are also synced to the repository that was the target.

> We'll do it the same way for merge requests; the user controls the Offer object and the stuff inside, and if the offer is Accepted, then repo side creates its own copy that it controls. Sorry for repeating, but I'm not sure `Offer` is a good thing to do. It seems like just an additional level of complexity on top. Also, if the object is given a local copy, that will just cause confusion and duplicates around the federation. See [my comment here](https://notabug.org/peers/forgefed/issues/58#issuecomment-17037). I strongly believe it should be more standard way of exchanging AP objects: * User A creates issue with ID `https://domain.tld/issue/1` which relates to repository at ID `https://repository.tld`. * Repository adds the issue to their issues list after some validation that they want to possibly do, and then send an `Accept` back. User A can try to close the issue (which verb would that be?), but the repository might have a policy that only the owner/editors of the repository are allowed to close it. If User A isn't such, it will `Reject` the closure. User C subscribes to the repository. They should not accept closure messages from User A directly except for their own fork of the repository. Creating new ID's for issues will just create confusion ESPECIALLY if the original remote creator sends it to other actors in the fediverse who start replying to it. If the same object is used, this will just work, since User A will do inbox forwarding for any reactions on the issue they created, so that they are also synced to the repository that was the target.
fr33domlover commented 8 months ago
Collaborator

@jaywink, there's no standard way on the Fediverse to ask for an object to be created remotely. See my comment on #58 briefly explaining why it's an Offer and not Create.

The question of who hosts the object had tons of discussion on the mailing list long time ago; we decided the project/repo should host the issues. There's a whole mess with authenticity and access control when the user hosts it. For example, you can edit the issue without even letting the project know. Or their accept/reject of your edit may fail to reach some followers, who now tragically trust a fake version.

@jaywink, there's no standard way on the Fediverse to ask for an object to be created remotely. See my comment on #58 briefly explaining why it's an `Offer` and not `Create`. The question of who hosts the object had tons of discussion on the mailing list long time ago; we decided the project/repo should host the issues. There's a whole mess with authenticity and access control when the user hosts it. For example, you can edit the issue without even letting the project know. Or their accept/reject of your edit may fail to reach some followers, who now tragically trust a fake version.

Let's forget the Offer for a while. I assume it comes from the fact that you want to create another object "owned" by the repository.

This is not how the federated web works. When you submit an object to something that someone owns, that object is generally added to the list of whatever reactions as is - not as a new copy. Doing this would encourage centralization, not decentralization - which I don't believe is the aim of forgefed.

For example. I find an interesting forgefed repo and fork it. The server of the author goes down because they go back to github. I notice an issue and create an issue. Suddenly, my issue cannot be created or interacted with because the remote server is not around to create the "real" issue. You end up centralizing the repository to a single server instead of allowing repositories to live on multiple servers.

What if someone interacts with the issue I offered? For example I often create an issue and then multiple replies to add more details. If the remote server suddenly comes back in two weeks, what happens to all the local copies I have of the replies, pointing to the offered issue, not the real issue?

The question of who hosts the object had tons of discussion on the mailing list long time ago; we decided the project/repo should host the issues.

I'm not aware of a decision been made and hopefully we can still revisit this as an item to review.

There's a whole mess with authenticity and access control when the user hosts it.

I don't see why? Forgefed compatible clients should always check some access control rules that define how remote payloads should be trusted. For example a general rule could be "trust operations on a repository object sent by the repository actor.". This immediately makes updating issue impossible without the target repository using inbox forwarding to pass on the update.

This is how I'm sure moderation works across the federation. The owner of a thread can always remove comments, but the original remote commented can only remove their comments. I'm sure we want to allow people to remove their own issues.

If we don't forgefed wont be GDPR compatible by design where "right to be forgotten" is a strong thing.

Let's forget the Offer for a while. I assume it comes from the fact that you want to create another object "owned" by the repository. This is not how the federated web works. When you submit an object to something that someone owns, that object is generally added to the list of whatever reactions as is - not as a new copy. Doing this would encourage centralization, not decentralization - which I don't believe is the aim of forgefed. For example. I find an interesting forgefed repo and fork it. The server of the author goes down because they go back to github. I notice an issue and create an issue. Suddenly, my issue cannot be created or interacted with because the remote server is not around to create the "real" issue. You end up centralizing the repository to a single server instead of allowing repositories to live on multiple servers. What if someone interacts with the issue I offered? For example I often create an issue and then multiple replies to add more details. If the remote server suddenly comes back in two weeks, what happens to all the local copies I have of the replies, pointing to the offered issue, not the real issue? > The question of who hosts the object had tons of discussion on the mailing list long time ago; we decided the project/repo should host the issues. I'm not aware of a decision been made and hopefully we can still revisit this as an item to review. > There's a whole mess with authenticity and access control when the user hosts it. I don't see why? Forgefed compatible clients should always check some access control rules that define how remote payloads should be trusted. For example a general rule could be "trust operations on a repository object sent by the repository actor.". This immediately makes updating issue impossible without the target repository using inbox forwarding to pass on the update. This is how I'm sure moderation works across the federation. The owner of a thread can always remove comments, but the original remote commented can only remove *their comments*. I'm sure we want to allow people to remove their own issues. If we don't forgefed wont be GDPR compatible by design where "right to be forgotten" is a strong thing.
fr33domlover commented 8 months ago
Collaborator

Let's forget the Offer for a while. I assume it comes from the fact that you want to create another object "owned" by the repository.

Yup, it comes from the fact that the repo needs to have control of its issues. Imagine your personal to-do list. What if your to-do list items were stored by other people and they would be able to change those items? You need to be able to control what changes get added to the list. If you're on server A and one of your to-do items is on server B, and an edit is made to that item and you Reject that edit, how can people get the approved-by-jaywink version of the item? The canonical URL is on server B, and when people GET that URL they get the modified version you disagree with. And they don't even know you disagree with.

It's a bit like in a git workflow where there's no official version of a repo. Kind of like each person working on the Linux kernel just has their own copy of the repo. Controlling what gets into your copy is critical, for the security of the software. When people clone the Torvalds copy it's because they specifically choose to trust Torvalds.

Similarly here, the repo sees that issue that someone published, but it also wants to have its own copy of it, to be in control of what happens to it. For example, as a repo owner you want to have control over the closing of issues and avoid a case that someone maliciously closes all your issues. And you can't prevent it because they're hosted on servers you don't control. So, you get your own copies. There's no official version of the issue: There's the author's copy, there's the repo team's copy, there can be more copies by anyone who feels like having them. If you want to see the repo's version, you GET that version.

This is not how the federated web works. When you submit an object to something that someone owns, that object is generally added to the list of whatever reactions as is - not as a new copy. Doing this would encourage centralization, not decentralization - which I don't believe is the aim of forgefed.

The initial goal of ForgeFed is to allow you to contribute to a project and to follow a project across servers. Each project still gets to have one canonical URL on one server; changing that is a separate topic, that we aren't touching for now (and Idk if we should because the Spritely stuff would make storage p2p for the whole fediverse anyway, not specific to forges). So we wouldn't be encouraging centralization.

Technically, the offer-object-to-remote-server thing isn't being done on the Mastodon-based Fediverse. The Fediverse doesn't have a way-of-how-this-thing-works at all, because it doesn't do this sort of thing. I talked with people on #social and heard all kinds of alternative workflows that involve a sequence of activities and signatures and even C2S, for remotely creating objects.

On the Fediverse right now, there is no need to pass access control over an object to another actor. This use case just happens not to be needed in the plain people-post-toots scenario. Toots don't need editing and they need control only by the person who posted them. Everything else is caching.

Why doesn't that scenario just map to our case?

Imagine I open an issue on your repo. You host the repo, but I host the issue on my server. Your repo has a Collection of issues, and the ID of my issue is listed there. Suppose that the policy of your repo says, that only you can close issues. This just makes sure that people don't randomly close issues that aren't really solved.

Now I feel like being naughty and I close the issue. Just because I feel like it. I even add a comment saying that I fixed it, although I didn't. You send a Reject as a response, but, people who GET the issue see that it got closed. You want your Reject to reach all the issue followers, to let them know you disagree with the closing, but - oops, I control the list of issue followers. I simply choose not to do inbox forwarding. People don't get the Reject, they just see my Update closing the issue.

Most likely, as I continue and cause damage to many issues I opened on your repo, changing their titles and assigning random people to work on them and set their due date to tomorrow at 5AM, you realize you want to make copies of all the issues, to have something sane to work with. So that no matter what, you can keep working on your software. You realize you should have had those copies public in the first place, so that people who want to see the jaywink-approved versions can see them.

And if my server goes down, you still want people to be able to comment on the issues I opened. It's your repo, your project, you need to be able to track the open bugs and feature requests even if the servers they came from go down.

If you let me host the issues without keeping your own copies, how does one tell what the jaywink-approved version is? Imagine an issue gets 1000 edits, 500 of them are approved and 500 you rejected. How are people supposed to GET the issue and then analyze the 1,000 Update activities and tell what the approved version looks like? You may be keeping such an approved version in your cache, but people can't GET it if you don't give it a public ID URL.

This immediately makes updating issue impossible without the target repository using inbox forwarding to pass on the update.

But that brings us back to repos-are-centralized, because although issues are hosted elsewhere, you trust only a version the repo approves. It's a bit like repos hosting their copy, except there's no public URL for the copy, so people GETing the issue still see the non-trustworthy version. There's no need for the man in the middle, the repo can just proudly host its copy.

I'm sure we want to allow people to remove their own issues.

Well, something we don't allow is that you can't delete stuff from someone else's inbox after you sent it, right? Even in email, you can't. The best you can do is to ask politely. And even then, there's no guarantee.

The problem is that in your suggestion the owner of the issue is the author. If people post comments on an issue of your repo, that are against the CoC and you want to remove those comments, you can't. Because you don't control the issue. Suddenly each issue has its own CoC and its own guidelines-for-phrasing-an-issue: The ones the author happens to prefer.

If we don't forgefed wont be GDPR compatible by design where "right to be forgotten" is a strong thing.

Idk about that, can ActivityPub in general be GDPR compatible? If you want to delete a comment you made on my toot, it's up to me to decide if I feel like removing your comment from my cache. Your server can delete it, but my cache is under my control only. No single server on the Fediverse can guarantee that everyone else will delete your stuff from their cache. Same with ForgeFed issues, or any other federated object.

Idk much about law and GDPR but I have a question: Suppose I discover a very critical security bug in the Linux kernel. I open an issue. An hour later, I decide to delete it. According to GDPR, are they now legally required to delete that issue from the DB and forget that they ever saw it, despite obviously wanting to work on it and continue to track the progress and discussion?

> Let's forget the Offer for a while. I assume it comes from the fact that you want to create another object "owned" by the repository. Yup, it comes from the fact that the repo needs to have control of its issues. Imagine your personal to-do list. What if your to-do list items were stored by other people and they would be able to change those items? You need to be able to control what changes get added to the list. If you're on server A and one of your to-do items is on server B, and an edit is made to that item and you `Reject` that edit, how can people get the approved-by-jaywink version of the item? The canonical URL is on server B, and when people GET that URL they get the modified version you disagree with. And they don't even know you disagree with. It's a bit like in a git workflow where there's no official version of a repo. Kind of like each person working on the Linux kernel just has their own copy of the repo. Controlling what gets into your copy is critical, for the security of the software. When people clone the Torvalds copy it's because they specifically choose to trust Torvalds. Similarly here, the repo sees that issue that someone published, but it also wants to have its own copy of it, to be in control of what happens to it. For example, as a repo owner you want to have control over the closing of issues and avoid a case that someone maliciously closes all your issues. And you can't prevent it because they're hosted on servers you don't control. So, you get your own copies. There's no official version of the issue: There's the author's copy, there's the repo team's copy, there can be more copies by anyone who feels like having them. If you want to see the repo's version, you GET that version. > This is not how the federated web works. When you submit an object to something that someone owns, that object is generally added to the list of whatever reactions as is - not as a new copy. Doing this would encourage centralization, not decentralization - which I don't believe is the aim of forgefed. The initial goal of ForgeFed is to allow you to contribute to a project and to follow a project across servers. Each project still gets to have one canonical URL on one server; changing that is a separate topic, that we aren't touching for now (and Idk if we should because the Spritely stuff would make storage p2p for the whole fediverse anyway, not specific to forges). So we wouldn't be encouraging centralization. Technically, the offer-object-to-remote-server thing isn't being done on the Mastodon-based Fediverse. The Fediverse doesn't have a way-of-how-this-thing-works at all, because it doesn't do this sort of thing. I talked with people on `#social` and heard all kinds of alternative workflows that involve a sequence of activities and signatures and even C2S, for remotely creating objects. On the Fediverse right now, there is no need to pass access control over an object to another actor. This use case just happens not to be needed in the plain people-post-toots scenario. Toots don't need editing and they need control only by the person who posted them. Everything else is caching. Why doesn't that scenario just map to our case? Imagine I open an issue on your repo. You host the repo, but I host the issue on my server. Your repo has a `Collection` of issues, and the ID of my issue is listed there. Suppose that the policy of your repo says, that only you can close issues. This just makes sure that people don't randomly close issues that aren't really solved. Now I feel like being naughty and I close the issue. Just because I feel like it. I even add a comment saying that I fixed it, although I didn't. You send a `Reject` as a response, but, people who GET the issue see that it got closed. You want your `Reject` to reach all the issue followers, to let them know you disagree with the closing, but - oops, I control the list of issue followers. I simply choose not to do inbox forwarding. People don't get the `Reject`, they just see my `Update` closing the issue. Most likely, as I continue and cause damage to many issues I opened on your repo, changing their titles and assigning random people to work on them and set their due date to tomorrow at 5AM, you realize you want to make copies of all the issues, to have something sane to work with. So that no matter what, you can keep working on your software. You realize you should have had those copies public in the first place, so that people who want to see the jaywink-approved versions can see them. And if my server goes down, you still want people to be able to comment on the issues I opened. It's your repo, your project, you need to be able to track the open bugs and feature requests even if the servers they came from go down. If you let me host the issues without keeping your own copies, how does one tell what the jaywink-approved version is? Imagine an issue gets 1000 edits, 500 of them are approved and 500 you rejected. How are people supposed to GET the issue and then analyze the 1,000 `Update` activities and tell what the approved version looks like? You may be keeping such an approved version in your cache, but people can't GET it if you don't give it a public ID URL. > This immediately makes updating issue impossible without the target repository using inbox forwarding to pass on the update. But that brings us back to repos-are-centralized, because although issues are hosted elsewhere, you trust only a version the repo approves. It's a bit like repos hosting their copy, except there's no public URL for the copy, so people GETing the issue still see the non-trustworthy version. There's no need for the man in the middle, the repo can just proudly host its copy. > I'm sure we want to allow people to remove their own issues. Well, something we don't allow is that you can't delete stuff from someone else's inbox after you sent it, right? Even in email, you can't. The best you can do is to ask politely. And even then, there's no guarantee. The problem is that in your suggestion the owner of the issue is the author. If people post comments on an issue of your repo, that are against the CoC and you want to remove those comments, you can't. Because you don't control the issue. Suddenly each issue has its own CoC and its own guidelines-for-phrasing-an-issue: The ones the author happens to prefer. > If we don't forgefed wont be GDPR compatible by design where "right to be forgotten" is a strong thing. Idk about that, can ActivityPub in general be GDPR compatible? If you want to delete a comment you made on my toot, it's up to me to decide if I feel like removing your comment from my cache. Your server can delete it, but my cache is under my control only. No single server on the Fediverse can guarantee that everyone else will delete your stuff from their cache. Same with ForgeFed issues, or any other federated object. Idk much about law and GDPR but I have a question: Suppose I discover a very critical security bug in the Linux kernel. I open an issue. An hour later, I decide to delete it. According to GDPR, are they now legally required to delete that issue from the DB and forget that they ever saw it, despite *obviously* wanting to work on it and continue to track the progress and discussion?

I believe I already answered to all the above in my reasoning why I think double ID's via Offer are not the way things should work. You can't have federation and centralization at the same time, you have to choose which one you want. And I think this way it's more "centralized" than "federated". In a federated environment, data is stored over many places and the author of whatever the data is has some control over that data. That is what federation is. Forgefed wont be federation if it tries to also keep centralization.

I'll have a look if there is an old issue for this and if not, create one. This is a pretty core thing about federation via activitypub and I feel pretty strongly that duplicating ID's of everything is going to cause more issues than it will solve, so hopefully there is room for discussion from a larger amount of people still, in future comment rounds.

I believe I already answered to all the above in my reasoning why I think double ID's via Offer are not the way things should work. You can't have federation and centralization at the same time, you have to choose which one you want. And I think this way it's more "centralized" than "federated". In a federated environment, data is stored over many places and the author of whatever the data is has some control over that data. That is what federation is. Forgefed wont be federation if it tries to also keep centralization. I'll have a look if there is an old issue for this and if not, create one. This is a pretty core thing about federation via activitypub and I feel pretty strongly that duplicating ID's of everything is going to cause more issues than it will solve, so hopefully there is room for discussion from a larger amount of people still, in future comment rounds.
fr33domlover commented 7 months ago
Collaborator

"Data is stored by the author" applies to the case where the author needs to have authority over the content. This indeed happens to be the case for toots, but otherwise it's not always the case at all!

For example, when you submit a patch to a repo, you don't host the source code. The code gets added into the repo. The repo team hosts it.

I find it critical that whoever works on a project has authority over their to-do list and is able to enforce access control of it. Much like your git repos don't just allow everyone to anonymously push commits. Only you can push, and people to whom you've given access. You didn't write all the commits yourself, hopefully you got many MRs from nice contributors, but you still find yourself needing to enforce write access to the repo.

I see you opened a separate issue :) so I'm closing here.

"Data is stored by the author" applies to the case where the author needs to have authority over the content. This indeed happens to be the case for toots, but otherwise it's not always the case at all! For example, when you submit a patch to a repo, you don't host the source code. The code gets added into the repo. The repo team hosts it. I find it critical that whoever works on a project has authority over their to-do list and is able to enforce access control of it. Much like your git repos don't just allow everyone to anonymously push commits. Only you can push, and people to whom you've given access. You didn't write all the commits yourself, hopefully you got many MRs from nice contributors, but you still find yourself needing to enforce write access to the repo. I see you opened a separate issue :) so I'm closing here.
Sign in to join this conversation.
Loading...
Cancel
Save
There is no content yet.