#88 PROPOSAL: Representation of MR/patch

Open
opened 7 months ago by fr33domlover · 0 comments

Following the feedback on IRC and here on issues, and since my previous discussion about this has been documented as a weird confusing stream of thoughts, I'd like to start a proposal with a clear thorough description. Here it comes. Feedback wanted :)

I'm going to develop the proposal incrementally, to show you why it looks the way it does, so first read until the end and comment on the final proposal at the bottom.

If you don't feel like reading everything, feel free to skip to the bottom, but before you ask a question, please make sure it's not already answered in the whole thing.

(1) Unifying patches and MRs

What is a patch, conceptually? Put aside the communication protocols (email, activitypub). Put aside VCS specific details. A patch is a description changes to potentially be made to a file, or a tree of files. That description of changes is in some format that machines understand and the changes can be applied by a computer to the file / tree of files, producing an updated version of the file(s).

The exact thing that the word "patch" refers to varies, it can mean a slightly different thing in different VCSs. "Patch" or "change set" etc. may refer to:

  1. A change or set of changes to be applied to a single specific file (e.g. a diff)
  2. A change or set of changes to be applied to a tree of files (e.g. a git commit)
  3. A sequence of (usually named) change sets to be applied on top of each other against a tree of files (e.g. a linear sequence of git commits)

Now, what is a merge request? I propose to think of it as:

  • A sequence of one or more named change sets, i.e. the 3d meaning of "patch" that I wrote above, e.g. a sequence of git commits, or a sequence of darcs patches, etc.
  • A proposal that the change set(s) are considered to be applied to the repo
  • The proposal states a specific repo branch/tip on top of which the change set(s) are made and intended to be applied
  • The change set(s) are stored and produced from a fork of the repo, at a specific branch/tip, rather than being sent directly as a stand-alone description of changes generated by the user (e.g. the way patches-over-email are usually sent)
  • Sometimes a MR is a work item and can have a person assigned to work on it, have a priority, have labels, have a due date, be added to a milestone, etc.

When patches are sent over email, they too communicate/imply the proposal that the changes are considered to be applied to the repo. And like MRs, they can have discussion and code review. In other words, the difference between them is:

  • Patches are usually generated by the user while MR commits are taken from a repo fork on the server; but otherwise both a patch and a MR are basically a commit (or multiple commits) to potentially be made on a repo. For example, in Darcs, a patch-over-email is in the same format as the "committed" patches stored in the repo. The storage format is just VCS specific detail; the point is that both MR and patch describe or refer to changes, they just may do so in a different way / different representation
  • MR can be a work item in a web app, while patches are often just sent as change sets and may be attached to separate work items. But if we allow a patch/MR to be a work item, then we can represent patches and MRs in the same way: If you want to think about the thing you're submitting as a patch, simply don't use the work item features, or use software that doesn't offer those features to you.

Conceptually, a patch and a MR are both work items: They both basically tell the dev team "Hey look, someone proposed some diffs/commits to be applied to your repo, go over them when you have some time". Except when a patch is sent by email to a plain regular mailing list and not to a ticket tracker etc., an official work item would be created elsewhere, possibly manually, while in a web app it happens automatically.

I'm proposing to have a patch/MR unified concept. Before I outline an initial representation, I'd like to ask, what would we call this unified concept? "Patch set"? "Change set"? Anyway, for now, for the purpose of this proposal, I'm taking the 'P' from 'Patch' and attaching it to 'MR', and going to call the unified concept 'PMR' for the rest of this proposal :)

(2) Initial representation of patch/MR

Initial representation of a PMR:

  • A work item
  • A proposal-to-review-and-apply-the-changes is attached to the work item
  • The proposal refers to the actual changes that are proposed. It may do so by including patch diffs and/or referring to a source repo/branch in which the changes can be found
  • Discussion and code review are possible

In some of the examples below, PMRs specify both the source repo/patch and actual diffs. This is for demonstration; it's possible we'll do it that way for PMR federation, but it's a separate question; it's just for demonstration right now.

Also, in the examples below all those various components are be together in a single JSON-LD object/document. It doesn't have to be that way; I'm just doing that for simplicity right now.

Initial representation, representing the work item as a ForgeFed Ticket, the proposal as an ActivityPub Offer, and the actual patch diff(s) using a new hypothetical ForgeFed Patch type:

{ "@context": [ ... ],
  "id": "https://dev.example/alice/objects/34583485354",
  "type": "Ticket",
  "attributedTo": "https://dev.example/alice",
  "context": "https://forge.example/bob/coolBobRepo",
  "summary": "Fix the bug bla bla bla",
  "content": "This MR should fix issue #54 etc. etc.",

  "attachment": {
    "type": "Offer",
    "origin": "https://dev.example/alice/coolBobRepo/branches/bug54",
    "target": "https://forge.example/bob/coolBobRepo/branches/master",
    "object": {
      "type": "Patch",
      "attributedTo": "https://dev.example/alice",
      "mediaType": "application/x-darcs-patch",
      "content": "..."
    }
  }
}

In a user-generated patch, the object of the Offer is provided, and well as target, but origin probably isn't. In a MR, origin and target are provided and object probably isn't (unless we decide it is for federation purposes).

How does it look so far? Don't comment yet, read the rest of the proposal first :)

(3) Patch/MR versions

After a patch is submitted and gets code review, usually a new version is sent, with fixes based on the review. And that new version gets a review, and so on. There may be multiple rounds of review. Similarly in a MR, you can git push or force-push after opening the MR, adding or rewriting commits. Some facts about patch/MR versions:

  • Old versions remain available for viewing, and even interaction
    • In email, the old versions exist in the public mailing list archives
    • In web MRs, you can view the old versions, compare between them, etc. (e.g. GitLab CE allows to do that, maybe githu8 too, I didn't check)
    • Objects generated from old versions may be reused by new versions (e.g. if you commented on some specific diff, and then someone added/deleted/rewrote commits without changing the edits of that specific file, the specific diff and its code review remain relevant and available, e.g. this can be seen in GitLab)

So, the stuff attached to our work item isn't just a Patch object, it's a versioned patch. A sequence of patch versions. I haven't seen MR/patch versions need separate names or other properties; they just get numbered. I think it's enough to put them in an ordered list. So, replace that one Patch object with an ordered list (which starts with a single item, and later more can be added).

There are 2 ways to represent ordered lists, iirc: RDF lists and OrderedCollection. Since the object property is a standard AS2 property and doesn't have its JSON-LD @container set to @list, we need to represent the list/collection explicitly, so in terms of complexity they're similar. However I've never seen RDF lists used explicitly anywhere on the Fediverse, while collections and ordered collections are standard AS2/AP types and are used very often.

Would you prefer an RDF list?

Would you prefer some vocabulary for object versions, maybe grab from some existing RDF ontology?

In the examples below, I'm going to go with OrderedCollection.

When you send a patch / open a MR, obviously there's a single version. So, sounds kind of silly to already put a collection there. But in the Create flow, you host the MR you open. So, if you feel like making multiple versions and only then to let the repo know about your MR, that's up to you. In the Offer flow, where the repo hosts the MR, I guess it's the same, except maybe the repo will feel like taking only the last version and ignoring all the old ones. So, bottom line, no big deal having a collection. However of course in UI when you click "Open MR" or whatever, the UI probably sends a MR with a single version :)

Question: If there's a MR, and it only specifies the origin and target repos/branches, how are the different versions specified? The source and target branches stay the same between versions, but for each version we still may want to specify info such as:

  • List of comments / code review
  • Hash of last commit in the version
  • @id URI so that other stuff can refer to the version
  • A JSON description of the commits and changes in the version

Proposal: For a MR that specifies origin repo/branch without providing the patch diffs, still have an OrderedCollection of Patch objects, except those objects don't have mediaType and content. They have id and whatever else the forge wants to provide per version. So a "patch" here means a set of changes, that is provided either directly as a diff, or by referring to a repo/branch/commit/hash. MRs can do the latter.

(4) Okay, here's the proposal

In MRs, there are MR events that aren't tied to a specific version (such as work item stuff, and also discussion that isn't version specific; e.g. GitLab allows for that) and there are version-specific MR events, in particular code review.

The Ticket gets a replies collection with the version-independent discussion, and each patch version has its own replies collection with the discussion and code review specific to that version.

And here comes my actual proposal.

{ "@context": [ ... ],
  "id": "https://dev.example/alice/objects/34583485354",
  "type": "Ticket",
  "attributedTo": "https://dev.example/alice",
  "context": "https://forge.example/bob/coolBobRepo",
  "summary": "Fix the bug bla bla bla",
  "content": "This MR should fix issue #54 etc. etc.",
  "replies": "https://dev.example/alice/objects/34583485354/replies",

  "attachment": {
    "type": "Offer",
    "origin": "https://dev.example/alice/coolBobRepo/branches/bug54",
    "target": "https://forge.example/bob/coolBobRepo/branches/master",
    "object": {
      "type": "OrderedCollection",
      "totalItems": 1,
      "orderedItems": [
        { "id": "https://dev.example/alice/objects/34583485354/ver1",
          "type": "Patch",
          "attributedTo": "https://dev.example/alice",
          "mediaType": "application/x-darcs-patch",
          "content": "...",
          "replies": "https://dev.example/alice/objects/34583485354/ver1/replies"
        }
      ]
    }
  }
}

Unless I missed something, this should be enough for federation. The Patch content is the native VCS specific format, and the forge simply applies it to a repo when you click "Merge". And if MR is specified as origin repo/branch, the forge can just pull that branch and merge it into the local repo. That way, patches and MRs can be sent, discussed and merged, across servers.

There are more details of course, but we can decide them in separate issues. What this proposal does is define the basis for representing a patch/MR.

In your feedback remember we want to support C2S too, and although S2S federation is the main focus, the vocabulary choice should be possible to use in C2S and not conflict with C2S needs including UI and viewing stuff.

Thanks for reading & write thoughts ^_^

Following the feedback on IRC and here on issues, and since my previous discussion about this has been documented as a weird confusing stream of thoughts, I'd like to start a proposal with a clear thorough description. Here it comes. Feedback wanted :) I'm going to develop the proposal incrementally, to show you why it looks the way it does, so first read until the end and comment on the final proposal at the bottom. **If you don't feel like reading everything, feel free to skip to the bottom, but before you ask a question, please make sure it's not already answered in the whole thing.** # (1) Unifying patches and MRs What is a patch, conceptually? Put aside the communication protocols (email, activitypub). Put aside VCS specific details. A patch is a description changes to potentially be made to a file, or a tree of files. That description of changes is in some format that machines understand and the changes can be applied by a computer to the file / tree of files, producing an updated version of the file(s). The exact thing that the word "patch" refers to varies, it can mean a slightly different thing in different VCSs. "Patch" or "change set" etc. may refer to: 1. A change or set of changes to be applied to a single specific file (e.g. a diff) 2. A change or set of changes to be applied to a tree of files (e.g. a git commit) 3. A sequence of (usually named) change sets to be applied on top of each other against a tree of files (e.g. a linear sequence of git commits) Now, what is a merge request? I propose to think of it as: - A sequence of one or more named change sets, i.e. the 3d meaning of "patch" that I wrote above, e.g. a sequence of git commits, or a sequence of darcs patches, etc. - A proposal that the change set(s) are considered to be applied to the repo - The proposal states a *specific repo branch/tip* on top of which the change set(s) are made and intended to be applied - The change set(s) are stored and produced from a fork of the repo, at a specific branch/tip, rather than being sent directly as a stand-alone description of changes generated by the user (e.g. the way patches-over-email are usually sent) - Sometimes a MR is a work item and can have a person assigned to work on it, have a priority, have labels, have a due date, be added to a milestone, etc. When patches are sent over email, they too communicate/imply the proposal that the changes are considered to be applied to the repo. And like MRs, they can have discussion and code review. In other words, the difference between them is: - Patches are usually generated by the user while MR commits are taken from a repo fork on the server; but otherwise both a patch and a MR are basically a commit (or multiple commits) to potentially be made on a repo. For example, in Darcs, a patch-over-email is in the same format as the "committed" patches stored in the repo. The storage format is just VCS specific detail; the point is that both MR and patch describe or refer to changes, they just may do so in a different way / different representation - MR can be a work item in a web app, while patches are often just sent as change sets and may be attached to separate work items. But if we allow a patch/MR to be a work item, then we can represent patches and MRs in the same way: If you want to think about the thing you're submitting as a patch, simply don't use the work item features, or use software that doesn't offer those features to you. Conceptually, a patch and a MR are both work items: They both basically tell the dev team "Hey look, someone proposed some diffs/commits to be applied to your repo, go over them when you have some time". Except when a patch is sent by email to a plain regular mailing list and not to a ticket tracker etc., an official work item would be created elsewhere, possibly manually, while in a web app it happens automatically. I'm proposing to have a patch/MR unified concept. Before I outline an initial representation, I'd like to ask, what would we call this unified concept? "Patch set"? "Change set"? Anyway, for now, for the purpose of this proposal, I'm taking the 'P' from 'Patch' and attaching it to 'MR', and going to call the unified concept 'PMR' for the rest of this proposal :) # (2) Initial representation of patch/MR Initial representation of a PMR: - A work item - A proposal-to-review-and-apply-the-changes is attached to the work item - The proposal refers to the actual changes that are proposed. It may do so by including patch diffs and/or referring to a source repo/branch in which the changes can be found - Discussion and code review are possible In some of the examples below, PMRs specify both the source repo/patch and actual diffs. This is for demonstration; it's possible we'll do it that way for PMR federation, but it's a separate question; it's just for demonstration right now. Also, in the examples below all those various components are be together in a single JSON-LD object/document. It doesn't have to be that way; I'm just doing that for simplicity right now. Initial representation, representing the work item as a ForgeFed `Ticket`, the proposal as an ActivityPub `Offer`, and the actual patch diff(s) using a new hypothetical ForgeFed `Patch` type: ``` { "@context": [ ... ], "id": "https://dev.example/alice/objects/34583485354", "type": "Ticket", "attributedTo": "https://dev.example/alice", "context": "https://forge.example/bob/coolBobRepo", "summary": "Fix the bug bla bla bla", "content": "This MR should fix issue #54 etc. etc.", "attachment": { "type": "Offer", "origin": "https://dev.example/alice/coolBobRepo/branches/bug54", "target": "https://forge.example/bob/coolBobRepo/branches/master", "object": { "type": "Patch", "attributedTo": "https://dev.example/alice", "mediaType": "application/x-darcs-patch", "content": "..." } } } ``` In a user-generated patch, the `object` of the `Offer` is provided, and well as `target`, but `origin` probably isn't. In a MR, `origin` and `target` are provided and `object` probably isn't (unless we decide it is for federation purposes). How does it look so far? Don't comment yet, read the rest of the proposal first :) # (3) Patch/MR versions After a patch is submitted and gets code review, usually a new version is sent, with fixes based on the review. And that new version gets a review, and so on. There may be multiple rounds of review. Similarly in a MR, you can git push or force-push after opening the MR, adding or rewriting commits. Some facts about patch/MR versions: - Old versions remain available for viewing, and even interaction - In email, the old versions exist in the public mailing list archives - In web MRs, you can view the old versions, compare between them, etc. (e.g. GitLab CE allows to do that, maybe githu8 too, I didn't check) - Objects generated from old versions may be reused by new versions (e.g. if you commented on some specific diff, and then someone added/deleted/rewrote commits without changing the edits of that specific file, the specific diff and its code review remain relevant and available, e.g. this can be seen in GitLab) So, the stuff attached to our work item isn't just a `Patch` object, it's a *versioned* patch. A sequence of patch versions. I haven't seen MR/patch versions need separate names or other properties; they just get numbered. I think it's enough to put them in an ordered list. So, replace that one `Patch` object with an ordered list (which starts with a single item, and later more can be added). There are 2 ways to represent ordered lists, iirc: RDF lists and `OrderedCollection`. Since the `object` property is a standard AS2 property and doesn't have its JSON-LD `@container` set to `@list`, we need to represent the list/collection explicitly, so in terms of complexity they're similar. However I've never seen RDF lists used explicitly anywhere on the Fediverse, while collections and ordered collections are standard AS2/AP types and are used very often. Would you prefer an RDF list? Would you prefer some vocabulary for object versions, maybe grab from some existing RDF ontology? In the examples below, I'm going to go with `OrderedCollection`. When you send a patch / open a MR, obviously there's a single version. So, sounds kind of silly to already put a collection there. But in the `Create` flow, *you host the MR you open*. So, if you feel like making multiple versions and only then to let the repo know about your MR, that's up to you. In the `Offer` flow, where the repo hosts the MR, I guess it's the same, except maybe the repo will feel like taking only the last version and ignoring all the old ones. So, bottom line, no big deal having a collection. However of course in UI when you click "Open MR" or whatever, the UI probably sends a MR with a single version :) Question: If there's a MR, and it only specifies the `origin` and `target` repos/branches, how are the different versions specified? The source and target branches stay the same between versions, but for each version we still may want to specify info such as: - List of comments / code review - Hash of last commit in the version - `@id` URI so that other stuff can refer to the version - A JSON description of the commits and changes in the version Proposal: For a MR that specifies origin repo/branch without providing the patch diffs, *still* have an `OrderedCollection` of `Patch` objects, except those objects don't have `mediaType` and `content`. They have `id` and whatever else the forge wants to provide per version. So a "patch" here means a set of changes, that is provided either directly as a diff, or by referring to a repo/branch/commit/hash. MRs can do the latter. # (4) Okay, here's the proposal In MRs, there are MR events that aren't tied to a specific version (such as work item stuff, and also discussion that isn't version specific; e.g. GitLab allows for that) and there are version-specific MR events, in particular code review. The `Ticket` gets a `replies` collection with the version-independent discussion, and each patch version has its own `replies` collection with the discussion and code review specific to that version. And here comes my actual proposal. ``` { "@context": [ ... ], "id": "https://dev.example/alice/objects/34583485354", "type": "Ticket", "attributedTo": "https://dev.example/alice", "context": "https://forge.example/bob/coolBobRepo", "summary": "Fix the bug bla bla bla", "content": "This MR should fix issue #54 etc. etc.", "replies": "https://dev.example/alice/objects/34583485354/replies", "attachment": { "type": "Offer", "origin": "https://dev.example/alice/coolBobRepo/branches/bug54", "target": "https://forge.example/bob/coolBobRepo/branches/master", "object": { "type": "OrderedCollection", "totalItems": 1, "orderedItems": [ { "id": "https://dev.example/alice/objects/34583485354/ver1", "type": "Patch", "attributedTo": "https://dev.example/alice", "mediaType": "application/x-darcs-patch", "content": "...", "replies": "https://dev.example/alice/objects/34583485354/ver1/replies" } ] } } } ``` Unless I missed something, this should be enough for federation. The `Patch` `content` is the native VCS specific format, and the forge simply applies it to a repo when you click "Merge". And if MR is specified as origin repo/branch, the forge can just pull that branch and merge it into the local repo. That way, patches and MRs can be sent, discussed and merged, across servers. There are more details of course, but we can decide them in separate issues. What this proposal does is define the basis for representing a patch/MR. In your feedback remember we want to support C2S too, and although S2S federation is the main focus, the vocabulary choice should be possible to use in C2S and not conflict with C2S needs including UI and viewing stuff. Thanks for reading & write thoughts ^_^
Sign in to join this conversation.
Loading...
Cancel
Save
There is no content yet.