#25 Properties of a `Commit` object

Open
opened 4 months ago by fr33domlover · 5 comments

Besides the actual commit message, whose representation is discussed in issue #24, which properties should a Commit object have? A good way to check: Look at the web hook schemas of existing forges, including non-git ones (if they provide web hooks).

Besides the actual commit message, whose representation is discussed in issue #24, which properties should a `Commit` object have? A good way to check: Look at the web hook schemas of existing forges, including non-git ones (if they provide web hooks).
fr33domlover commented 4 months ago
Collaborator

Gitea: https://docs.gitea.io/en-us/webhooks

Essentially, a push event provides the following relevant info:

"ref": "refs/heads/develop",

The branch which was pushed into.

"before": "28e1879d02",

The hash of the commit at the tip of the branch before the push.

"after": "bffeb74224",

The hash of the commit at the tip after the push, i.e. the hash of the last commit in the push.

"compare_url": "http://localhost:3000/gitea/webhooks/compare/28e1879d029cb852e4844d9c718537df08844e03...bffeb74224043ba2feb48d137756c8a9331c449a",

URL of a web page displaying the changes between the 2 given revisions, before and after the push.


Now comes a list of commits, each looks like this:

{
  "id": "bffeb74224043ba2feb48d137756c8a9331c449a",
  "message": "Webhooks Yay!",
  "url": "http://localhost:3000/gitea/webhooks/commit/bffeb74224043ba2feb48d137756c8a9331c449a",
  "author": {
    "name": "Gitea",
    "email": "someone@gitea.io",
    "username": "gitea"
  },
  "committer": {
    "name": "Gitea",
    "email": "someone@gitea.io",
    "username": "gitea"
  },
  "timestamp": "2017-03-13T13:52:11-04:00"
}

There's also the whole repository object, and 2 user objects, which are "pusher" and "sender" and to be honest I'm not sure how they're different, and whether they're ever different.

"pusher": {
  "id": 1,
  "login": "gitea",
  "full_name": "Gitea",
  "email": "someone@gitea.io",
  "avatar_url": "https://localhost:3000/avatars/1",
  "username": "gitea"
}
Gitea: <https://docs.gitea.io/en-us/webhooks> Essentially, a push event provides the following relevant info: > "ref": "refs/heads/develop", The branch which was pushed into. > "before": "28e1879d029cb852e4844d9c718537df08844e03", The hash of the commit at the tip of the branch before the push. > "after": "bffeb74224043ba2feb48d137756c8a9331c449a", The hash of the commit at the tip after the push, i.e. the hash of the last commit in the push. > "compare_url": "http://localhost:3000/gitea/webhooks/compare/28e1879d029cb852e4844d9c718537df08844e03...bffeb74224043ba2feb48d137756c8a9331c449a", URL of a web page displaying the changes between the 2 given revisions, before and after the push. --- Now comes a list of commits, each looks like this: { "id": "bffeb74224043ba2feb48d137756c8a9331c449a", "message": "Webhooks Yay!", "url": "http://localhost:3000/gitea/webhooks/commit/bffeb74224043ba2feb48d137756c8a9331c449a", "author": { "name": "Gitea", "email": "someone@gitea.io", "username": "gitea" }, "committer": { "name": "Gitea", "email": "someone@gitea.io", "username": "gitea" }, "timestamp": "2017-03-13T13:52:11-04:00" } There's also the whole repository object, and 2 user objects, which are "pusher" and "sender" and to be honest I'm not sure how they're different, and whether they're ever different. "pusher": { "id": 1, "login": "gitea", "full_name": "Gitea", "email": "someone@gitea.io", "avatar_url": "https://localhost:3000/avatars/1", "username": "gitea" }
fr33domlover commented 4 months ago
Collaborator

GitLab CE: https://docs.gitlab.com/ce/user/project/integrations/webhooks.html#push-events

"before": "95790bf891e76fee5e1747ab589903a6a1f80f22",
"after": "da1560886d4f094c3e6c9ef40349f7d38b5d27d7",
"ref": "refs/heads/master",

Those are like in Gitea.

"checkout_sha": "da1560886d4f094c3e6c9ef40349f7d38b5d27d7",

This is identical to "after", not sure what it means.

There's some user data fields too, I guess describing the user who did the push?

"user_id": 4,
"user_name": "John Smith",
"user_username": "jsmith",
"user_email": "john@example.com",
"user_avatar": "https://s.gr4v4t4r.com/avatar/d4c74594d841139328695756648b6bd6?s=8://s.gravatar.com/avatar/d4c74594d841139328695756648b6bd6?s=80",

There's also a repository object, like in Gitea, and a list of commits, each of which looks like this:

{
  "id": "b6568db1bc1dcd7f8b4d5a946b0b91f9dacd7327",
  "message": "Update Catalan translation to e38cb41.",
  "timestamp": "2011-12-12T14:27:31+02:00",
  "url": "http://example.com/mike/diaspora/commit/b6568db1bc1dcd7f8b4d5a946b0b91f9dacd7327",
  "author": {
    "name": "Jordi Mallach",
    "email": "jordi@softcatala.org"
  },
  "added": ["CHANGELOG"],
  "modified": ["app/controller/application.rb"],
  "removed": []
},

And the total number of commits in the push, in case not all of them are listed in the web hook object:

"total_commits_count": 4
GitLab CE: <https://docs.gitlab.com/ce/user/project/integrations/webhooks.html#push-events> "before": "95790bf891e76fee5e1747ab589903a6a1f80f22", "after": "da1560886d4f094c3e6c9ef40349f7d38b5d27d7", "ref": "refs/heads/master", Those are like in Gitea. "checkout_sha": "da1560886d4f094c3e6c9ef40349f7d38b5d27d7", This is identical to "after", not sure what it means. There's some user data fields too, I guess describing the user who did the push? "user_id": 4, "user_name": "John Smith", "user_username": "jsmith", "user_email": "john@example.com", "user_avatar": "https://s.gr4v4t4r.com/avatar/d4c74594d841139328695756648b6bd6?s=8://s.gravatar.com/avatar/d4c74594d841139328695756648b6bd6?s=80", There's also a repository object, like in Gitea, and a list of commits, each of which looks like this: { "id": "b6568db1bc1dcd7f8b4d5a946b0b91f9dacd7327", "message": "Update Catalan translation to e38cb41.", "timestamp": "2011-12-12T14:27:31+02:00", "url": "http://example.com/mike/diaspora/commit/b6568db1bc1dcd7f8b4d5a946b0b91f9dacd7327", "author": { "name": "Jordi Mallach", "email": "jordi@softcatala.org" }, "added": ["CHANGELOG"], "modified": ["app/controller/application.rb"], "removed": [] }, And the total number of commits in the push, in case not all of them are listed in the web hook object: "total_commits_count": 4
fr33domlover commented 4 months ago
Collaborator

Properties of a Commit object, based on the info above from Gitea and GitLab CE:

  • Commit hash, e.g. "bffeb74224043ba2feb48d137756c8a9331c449a"
  • Commit message: In Gitea and GitLab this seems to be a single string containing the whole commit message. In ForgeFed, possibly the title and description will be split; this is discussed in #24 and the corresponding forum thread.
  • URL where the commit can be viewed; this can be the @id of the Commit in ForgeFed
  • Commit author name (e.g. John Doe), email, username (e.g. jdoe)
  • Same details for the committer, I suppose this for the case 1 person submits a patch and another person applies it and pushes to the repo, TODO read about this and make it clear here exactly what's the difference between author and committer and in which cases they differ (patch? MR? manually specified on command-line?)
  • Timestamp: When the commit was made, including author's local timezone
  • List of names of files added in the commit
  • List of names of files modified in the commit
  • List of names of files removed in the commit

Properties of a Push, in addition to having a list of commits of course:

  • Total number of commits in the push, it may be bigger than the number of commits listed if for performance reasons the list of commits for trimmed
  • ref: e.g. refs/heads/master, the branch that was pushed to
  • before: hash of the commit at tip of branch before the push
  • after: hash of the commit at the tip after the push
  • compare_url: points to page displaying diff between the revisions
  • Repository object
  • Pusher info: name, username, email, avatar URL

Feel free to examine other forges, including non-git ones, and comment below with info about them the same way I did above with Gitea and GitLab :)

Properties of a `Commit` object, based on the info above from Gitea and GitLab CE: - Commit hash, e.g. `"bffeb74224043ba2feb48d137756c8a9331c449a"` - Commit message: In Gitea and GitLab this seems to be a single string containing the whole commit message. In ForgeFed, possibly the title and description will be split; this is discussed in #24 and the corresponding forum thread. - URL where the commit can be viewed; this can be the `@id` of the `Commit` in ForgeFed - Commit author name (e.g. John Doe), email, username (e.g. jdoe) - Same details for the committer, I suppose this for the case 1 person submits a patch and another person applies it and pushes to the repo, TODO read about this and make it clear here exactly what's the difference between author and committer and in which cases they differ (patch? MR? manually specified on command-line?) - Timestamp: When the commit was made, including author's local timezone - List of names of files added in the commit - List of names of files modified in the commit - List of names of files removed in the commit Properties of a `Push`, in addition to having a list of commits of course: - Total number of commits in the push, it may be bigger than the number of commits listed if for performance reasons the list of commits for trimmed - ref: e.g. `refs/heads/master`, the branch that was pushed to - before: hash of the commit at tip of branch before the push - after: hash of the commit at the tip after the push - compare_url: points to page displaying diff between the revisions - Repository object - Pusher info: name, username, email, avatar URL Feel free to examine other forges, including non-git ones, and comment below with info about them the same way I did above with Gitea and GitLab :)

I'd like to ask the question: do we really want all this data in the payloads? Because to me it seems like it is duplicating the data in git. Would ref, message, author and timestamp be enough?

I mean to get all the other stuff you just fetch the repository and look up the ref.

Also maybe asking an even simpler question - why do we have commits at all? Wont it be enough to have a ref to a commit in a merge request for example?

Maybe I'm over simplifying things - I just want to challenge mirroring everything in the VCS to ForgeFed :)

I'd like to ask the question: do we really want all this data in the payloads? Because to me it seems like it is duplicating the data in git. Would ref, message, author and timestamp be enough? I mean to get all the other stuff you just fetch the repository and look up the ref. Also maybe asking an even simpler question - why do we have commits at all? Wont it be enough to have a ref to a commit in a merge request for example? Maybe I'm over simplifying things - I just want to challenge mirroring everything in the VCS to ForgeFed :)
fr33domlover commented 4 months ago
Collaborator

@jaywink, these are great questions! Here are the thoughts behind the stuff you mentioned :)

These properties, as you can see above, are simply based on web hooks in existing forges. So, let's ask the following question: Why does all this data exist in web hook payloads? I haven't looked deeply into that question, but I have some insights to get started:

  1. The properties aren't VCS specific, so a web hook handler can use them even without having to care about VCS differences
  2. A web hook may be able to use this data without having to clone the repo and do VCS commands on it. For example, if you want your web hook to announce the commit on IRC or on Matrix, there's no extra info needed. This is how it is in existing forges. If we removed properties, we'd force web hook handlers to clone repos and run e.g. git commands to grab that info
  3. Since this data is how things already work, if we change that we'll force everyone who migrates from custom web hooks to ForgeFed, to update their web hook handlers, and making them more complicated. If we just sort of mirror the existing situation, then it's compatible and very easy to migrate.
  4. When you browse a commit using a hypothetical ForgeFed client on your laptop/phone/whatever, your client can fetch the AP representation and display it on your screen. But without those properties, every single time you browse some repo or commit, your client would have to git-clone it to grab data. Imagine that GitLab or githu8 had JS in them which, every time you browse a repo, git-clones the repo to generate the HTML page for your browser to display.
  5. Software wanting to handle multiple VCSs, or just all supported VCSs, would have to have all the VCS programs available. Imagine GitLab or githu8 etc. refused to work unless you installed Git, Darcs, Mercurial, SVN and Monotone on your laptop or phone. If we can provide basic info about commits to allow browsing and web hook handling without VCS commands, that sounds to me like a valuable thing, especially since it's already how things work, so it's expected.

Of course, I'm not suggesting we model the entire contents of the .git directory as a JSON object. These are just basic properties, that are enough for many uses without diving into VCS details. And we can provide these, just like forges already do, and clients can always indeed clone the repo if they need more (for example, a web hook handler that runs CI builds will want to clone the repo anyway, to get the latest source files to build).

I focused above only on web hooks. Another use of Commit objects would be AP representations for their web pages, which may have a different, perhaps more detailed view of the commit than a web hook has. I'll examine one use at a time, right now just doing web hooks.

@jaywink, these are great questions! Here are the thoughts behind the stuff you mentioned :) These properties, as you can see above, are simply based on web hooks in existing forges. So, let's ask the following question: Why does all this data exist in web hook payloads? I haven't looked deeply into that question, but I have some insights to get started: 1. The properties aren't VCS specific, so a web hook handler can use them even without having to care about VCS differences 2. A web hook may be able to use this data without having to clone the repo and do VCS commands on it. For example, if you want your web hook to announce the commit on IRC or on Matrix, there's no extra info needed. This is how it is in existing forges. If we removed properties, we'd force web hook handlers to clone repos and run e.g. git commands to grab that info 3. Since this data is how things already work, if we change that we'll force everyone who migrates from custom web hooks to ForgeFed, to update their web hook handlers, and making them more complicated. If we just sort of mirror the existing situation, then it's compatible and very easy to migrate. 4. When you browse a commit using a hypothetical ForgeFed client on your laptop/phone/whatever, your client can fetch the AP representation and display it on your screen. But without those properties, every single time you browse some repo or commit, your client would have to git-clone it to grab data. Imagine that GitLab or githu8 had JS in them which, every time you browse a repo, git-clones the repo to generate the HTML page for your browser to display. 5. Software wanting to handle multiple VCSs, or just all supported VCSs, would have to have all the VCS programs available. Imagine GitLab or githu8 etc. refused to work unless you installed Git, Darcs, Mercurial, SVN and Monotone on your laptop or phone. If we can provide basic info about commits to allow browsing and web hook handling without VCS commands, that sounds to me like a valuable thing, especially since it's already how things work, so it's expected. Of course, I'm not suggesting we model the entire contents of the `.git` directory as a JSON object. These are just basic properties, that are enough for many uses without diving into VCS details. And we can provide these, just like forges already do, and clients can always indeed clone the repo if they need more (for example, a web hook handler that runs CI builds will want to clone the repo anyway, to get the latest source files to build). I focused above only on web hooks. Another use of `Commit` objects would be AP representations for their web pages, which may have a different, perhaps more detailed view of the commit than a web hook has. I'll examine one use at a time, right now just doing web hooks.
Sign in to join this conversation.
Loading...
Cancel
Save
There is no content yet.