Skip to content
This repository has been archived by the owner on Nov 18, 2021. It is now read-only.

Sort commits topologically first instead of by author date #386

Open
cirosantilli opened this issue Apr 23, 2015 · 70 comments
Open

Sort commits topologically first instead of by author date #386

cirosantilli opened this issue Apr 23, 2015 · 70 comments

Comments

@cirosantilli
Copy link
Collaborator

cirosantilli commented Apr 23, 2015

Like git log --topo-order does by default. E.g.: https://github.com/cirosantilli/test-log-order/commits/master , screenshot 2015-05-01.

Generated with:

git init
touch '0'
git add '0'
date='2011-01-01 00:01:01 +0000'
GIT_COMMITTER_DATE="$date" git commit  -m '0' --date "$date"
touch '1'
git add '1'
date='2010-01-01 00:01:01 +0000'
GIT_COMMITTER_DATE="$date" git commit  -m '1' --date "$date"

Tree:

 0 --> 1

Actual log:

Commit Date
0 2011
1 2010

Expected log:

Commit Date
1 2010
0 2011

The actual tree structure is more important than the timestamp, which is just an arbitrary value that can be controlled by users.

For example, if an old commit gets merged later, I'd expect to see it on the top of the log as the merge date is what matters most.

Not to mention my evil desire to annoy projects by making a future max commit cirosantilli/test-git-web-interface@ff86a7b, create a fake account, find a typo on some famous project with aspell and make a pull request. I bet it would pass, and when the project admins notice it, they would likely be forced to force push it away. MUAHAHAHA. But I won't do it :-)

Maybe this was mentioned at: https://help.github.com/articles/why-are-my-commits-in-the-wrong-order/

IMHO the best option is --topo-order from man git-log, as it shows the most "logical" topo sort possible. libgit2 even has it already: https://libgit2.github.com/libgit2/ex/HEAD/log.html#section-23

@cirosantilli
Copy link
Collaborator Author

Standard from Ivan today.

@jmm
Copy link

jmm commented Nov 18, 2015

I just noticed this in the list of commits on one of my PRs. They're not in the order I expected and I wondered if I screwed up my rebase, until I checked. Does this mean that GitHub said that's their intended behavior?:

Standard from Ivan today.

@cirosantilli
Copy link
Collaborator Author

@jmm hey, I cryptically meant that Ivan gave a standard reply that says nothing except acknowledge my email :-) So I don't know what they think about it.

@jmm
Copy link

jmm commented Nov 18, 2015

@cirosantilli ah, gotcha.

@toejough
Copy link

from https://help.github.com/articles/why-are-my-commits-in-the-wrong-order/

If you rewrite your commit history via git rebase or a force push, you may notice that your commit sequence is out of order ...

So, there is acknowledgement that the chosen sort method presents commits in the wrong order.

GitHub emphasizes Pull Requests as a space for discussion. All aspects of it ... are represented in a chronological order.

It's not clear to me how only showing things in the wrong order (where the 'right' order is the way the committer ordered them during rebase) fosters better discussion. If anything, I've experienced the opposite - that it confuses both the pull request submitters and reviewers, leading to wasted time and brain cycles better spent on creating and reviewing actual content.

If you always want to see commits in order, we recommend not using git rebase.

The rebase command is really useful for cleaning up commit histories by reordering, squashing, splitting, and rewording commits, making them easier for reviewers to understand. Would it be better if everyone just committed the right code in the right order with the right organization and messages the first time? Absolutely, but that is a fantasy world. Rather than give up a tool that enables better commit organization, it seems more sensible to display commits in the committer's intended order.

@andersk
Copy link

andersk commented Jun 14, 2016

Fixing this would not involve a huge paradigm shift. GitHub currently sorts the commits in a PR by author date (not sure if this was always the case, but I just checked). There is no reason they couldn’t simply use the commit date, rather than the author date, as the commit’s insertion point in the same chronological stream. Ties in the commit date ordering would be broken by another monotonic measure, such as the number of ancestor commits.

For people who follow GitHub’s current advice and never use git rebase or similar tools, the commit date will be equal to the author date and nothing will change.

For people who do use git rebase, the commit date is a much better indicator of what Git users expect “chronological” to mean. It’s when work on the commit was finished, rather than when it was started. When rewriting a commit, Git bumps its commit date to the current time; then, when replaying its descendants on top of it, Git bumps their commit dates as well. Because the replays may happen within 1 second, the resulting commit dates may be equal—hence the need for the tie breaker—but they will never go backwards (barring clock skew across machines or deliberate manual manipulation).

@timabbott
Copy link

+1, sorting by commit date, rather than author date, is definitely the correct solution here (I just came to this issue with the intent of posting what @andersk wrote).

@cirosantilli
Copy link
Collaborator Author

@timabbott note that I'm proposing topological sort, not commit date. If you want that, consider opening a separate issue.

@andersk
Copy link

andersk commented Jun 20, 2016

@cirosantilli Your original proposal is unclear on a couple of points:

  • “instead of by commit date”—but GitHub doesn’t sort by commit date, it sorts by author date, and that is the real problem that causes commits to be misordered.
  • “Like git log does by default”—but the default for git log is reverse chronological order, not --topo-order. On any sufficiently complicated repository (I tried git.git itself), git log, git log --topo-order, git log --date-order, and git log --author-date-order produce four different orders. In fact, the default git log misorders a handful of commits with respect to their parents, while the other three do extra work to avoid that. But the default is otherwise closest to --date-order.
  • If GitHub were to sort commits topologically, what would they do with pull request comments posted between swapped commits?

I’m only trying to refine your proposal with a plausible solution that correctly orders commits with respect to their parents in all realistic cases, while allowing an implementation consistent with GitHub’s requirements, which they explained to me as follows:

Date: Fri, 12 Apr 2013 15:00:15 -0700
From: Yossef Mendelssohn support@github.com (GitHub Staff)
To: andersk@mit.edu
Subject: Re: Commits shown misordered on pull request page

Anders,

We view a Pull Request as an ongoing discussion, and so the main view ("discussion" tab) shows all the activity in chronological order. This includes comments and references as well as the commits. With that in mind, we show the commits in that same order, whether or not that matches the graph order that Git has them stored in.

This isn't perfect, especially in cases like barnowl/barnowl#130 where there is no extra discussion. We're working to improve the experience, as we run into this same sort of confusion from time to time if we rebase or otherwise alter the commit history.

Yossef

To literally implement --topo-order GitHub would almost certainly have to drastically modify the pull request UI, e.g. by separating comments from commits. To implement my refined proposal they need only make a small non-disruptive tweak to the sorting function.

Keeping in mind that this is an unofficial issue tracker, you’re of course welcome to propose other solutions. Just remember that if GitHub ever looks at this issue, they’re only likely to change anything if it’s easy and it won’t break anything for other users.

@cirosantilli cirosantilli changed the title Sort commits topologically first instead of by commit date Sort commits topologically first instead of by author date Jun 20, 2016
@cirosantilli
Copy link
Collaborator Author

@andersk agreed on first two points.

I'm not sure if it would be that hard to implement topo order on PRs, GitHub could just use the server's timestamp + topo order to order the magic commit comments.

@andersk
Copy link

andersk commented Jun 20, 2016

@cirosantilli We’re talking about pull request comments, not commit comments. Pull request comments are not related to the Git graph and do not have a “topo order”.

@toejough
Copy link

Ahh, thank you for posting that email, @andersk. So, the real problem here is that GH is conflating two different concepts - commit history & code review (PR comments/code comments).

It's reasonable and in some cases desirable for long-term readability and organization to re-order and re-write your commit history.

It's entirely unreasonable and confusing to re-order or re-write your code review history.

It doesn't make much sense to tie the ordering of something with un-editable history with something with editable history. I wonder why it was done that way... It's good to hear that github is aware of the conflict and is trying to design a better interface, though.

@andersk
Copy link

andersk commented Jun 21, 2016

@toejough That may be a problem, but it is not this problem. Even if GitHub were to remember all previous commits when the commit history of a PR is rewritten, the result would make no sense if sorted by author date the way commits are sorted today, while it would work almost perfectly if sorted by commit date like I proposed.

@findepi
Copy link

findepi commented Feb 16, 2017

it would work almost perfectly if sorted by commit date like I proposed.

Yes, but almost. If i rebase to reoder my commits (of apply a fixup to one), all commits will have equal commit time (http://stackoverflow.com/a/28238046/65458 says we have precision of 1 s here), so one cannot sort over that.

However,any form of topology-related ordering (git log default git log --topo-order..) is going to make me happy, as far as PR are concerned.

@andersk
Copy link

andersk commented Feb 16, 2017

@findepi I addressed that point above: “Ties in the commit date ordering would be broken by another monotonic measure, such as the number of ancestor commits.”.

And again, the git log default is not topology-related.

@gvlasov
Copy link

gvlasov commented Mar 31, 2017

I can't believe this is an issue and I can't see why would someone ever prefer time-based sorting of commits in a PR over topological sort. I hope this gets fixed one day.

@pnowojski
Copy link

It really would be nice to finally fix this issue. It's annoying to do git rebase -i origin/master --exec "sleep 1" every time before publishing PR.

@shangxiao
Copy link

Using rebase's --ignore-date seems to work for me as it forces the author date to be the same as commit date.

We shouldn't need to do this though, I'm not sure that ordering by author date provides any purpose.

@timabbott
Copy link

Yeah, the thing that's super frustrating about this bug is that it displays commits in an order that is confusing and incorrect for everyone; there's essentially no application for which GitHub's ordering is better. And it should be trivial to fix.

In the Zulip project, we regularly hear from developers who get confused because of this issue.

@skullandbones
Copy link

It would be helpful if the user could select how the view is sorted then it would be a personal preference. I expect everyone would select view by commit date. It is a really annoying issue.

@nickurak
Copy link

nickurak commented Nov 6, 2017

Is there any way to view a sequence of git commits in github as they actually exist, not in the order they arrived conceptually in github? I'm used to caring strongly about how the sequence of commits tells a story, especially in terms of TDD-approach adding tests, then adding small commits that gradually make those tests pass, and not being able to visualize/review that story makes it awfully hard to use github PRs for code review.

@outofthecave
Copy link

outofthecave commented Nov 7, 2017

@nickurak As a workaround, I use the "compare" feature a lot because it seems to show the commits in topological order. For example, if you have a PR from a branch called my-branch into master, then you can go to https://github.com/MyUserName/MyRepo/compare/master...my-branch to see a comparison between the two branches. You can also get there by clicking "New Pull Request" on the repo's home page and then selecting the base and compare branches from the drop-down menus.

Edit: The network is also very helpful. You can find it at https://github.com/MyUserName/MyRepo/network.

@Happy-Ferret
Copy link

Curious.

Does anyone at Github actually care about ever fixing this or is it more important to make every developer's life more miserable for no obvious reason?

@hramrach
Copy link

I think it is git default so nobody just shouted loud enough for somebody to notice.

@larsbrinkhoff
Copy link

The Developer Support Team forwarded my request to the Product Team.

@boogisha
Copy link

Coming back to a pull request almost two years later (git-tfs/git-tfs#1214), rebased, cleaned up and force-pushed... and very sad to see commits are (still) out of any sane order (where they're actually ordered pretty carefully to make review as easy as possible).

It's beyond me how one of the best places for project (exposure and) collaboration can keep failing so much in regards to one of the most important aspects of the very same collaboration (being pull request review process)... :(

I understand rebasing is not the only way, but being the one (and only?) which in fact does care about clean project history (which in turn does make collaboration easier), could we please get a benefit of choice, at least...? So even if left as-is by default, users who prefer to see pull request commits as they actually appear in history (even after rewriting), can do so.

@ilan-t
Copy link

ilan-t commented Jan 19, 2020

One workaround is to rebase and update the dates. There are several ways to do that...
Manually with edit and commit ammend --date along the way
Or automatic using git rebase --ignore-date

Another nuance is that what I really care about is the review order (when clicking next/prev), and not necessarily the display order of commits in a list. Maybe fixing just the review order is less intrusive in implementation?

Although having both ordered right will be nicer.

@boogisha
Copy link

One workaround is to rebase and update the dates. There are several ways to do that...
Manually with edit and commit ammend --date along the way
Or automatic using git rebase --ignore-date

Yeah, I know, and I did use it before in desperation, but especially for these specific changes it's kind of sad to throw away the fact they've been written almost two years ago... :( Not the most important thing in the world, but it still could provide valuable context in some cases, drilling through history at later time.

@hramrach
Copy link

It's beyond me how one of the best places for project (exposure and) collaboration can keep failing

Maybe it isn't one of the best places. That's all

@chkno chkno mentioned this issue Feb 5, 2020
10 tasks
@EamonNerbonne
Copy link

EamonNerbonne commented Feb 15, 2020

The problem of comments-on-PRs interleaved with commits-in-topological order sounds easily solvable. Afterall, GH does have a date it can use to correlate the two; the push date; it even displays that in PRs. And since you can't push commits out-of-order (i.e. descendants before ancestors), the push date is always consistent with any topological order.

In short, a topological sort based on the tuple (push_or_comment_timestamp, author_date) would likely do exactly what everyone expects: unrelated commits from the same push are sorted by author date (rarely important, but hey), related commits are always descendants after ancestors, and comments appear between the pushes they're based on.

@robinmoussu
Copy link

Last friday, on github entreprise, it seems that it was solved! I really hope it wasn't just a random thing, but I thinb the PR I did should have been out-of-order by wasn't.

@Greg-Smulko
Copy link

Greg-Smulko commented Mar 11, 2020

It's so sad that there is a need for whole tutorials of how to fix it... https://andrewlock.net/how-to-fix-the-order-of-commits-in-github-pull-requests/ @andrewlock @github

@LabhanshAgrawal
Copy link

As a workaround I do

git rebase origin/master -i --exec "git commit --amend --no-edit --date=now" --exec "sleep 10"

Posted if it might help someone.

@findepi
Copy link

findepi commented May 14, 2020

@LabhanshAgrawal you can do it 10x faster. I use sleep 1.05. You can refer to https://gist.github.com/findepi/7abf17c8a26d3b74e8bb9527caadfdbe

@ilan-t
Copy link

ilan-t commented May 14, 2020

@LabhanshAgrawal you can do it 10x faster. I use sleep 1.05. You can refer to https://gist.github.com/findepi/7abf17c8a26d3b74e8bb9527caadfdbe

if push comes to shove, you can do it even faster than that, with a date -s and finally some ntpdate or other...
although this has system-wide effect, so won't fit everyone.

@LabhanshAgrawal
Copy link

@findepi well that was just an example, you can use whatever you find ok. I usually have only 4-5 commits per branch so it doesn't bother me.

@mxm
Copy link

mxm commented Jun 3, 2020

Please fix this annoyance. Thank you!

@larsbrinkhoff
Copy link

@mxm and everyone else,

Commenting here does nothing to get this fixed. Email to support@github.com.

@mxm
Copy link

mxm commented Jun 4, 2020

Done.

Subject: [Bug] Commit Order in PRs

Dear support,

There is a bug in the listing of commits of a PR. Commits are sorted by date, not by their 
hierarchical relationship. Since commits built on top of each other, it is important to
preserve the relationship towards each other. Sorting by date does not guarantee that 
because users typically change the order of commits via the rebase command.

Do you mind sending this to the engineering team?

Thanks,
Max

@EamonNerbonne
Copy link

To follow @mxm's example and to add the suggestion on how to clarify topological ordering in the face of external things like PR comments, I also posted feedback to https://support.github.com/contact/feedback/:

Currently, pull requests can be hard to review when they contain commits that have out-of-order dates; a situation that commonly occurs when rebasing or otherwise rewriting history. In particular, the commits are shown ordered by date, not in the order as committed, which can make diffs hard to read. Worse, if there are comments made in the pull-request, those too can appear out of order, possibly even appearing to comment on things that look like they happened later. In principle, if the author waits a long time betweeen committing and pushing, or if their clock is wrong, even without rebasing you can see weird and difficult to understand sequences of events.

It would be much easier to read such PRs if the events were sorted in the order they happened, not the order as declared by the authors. In other words: can you change the commit sort order to follow the topological git order and retaining author date at best as a fallback to disambiguate (linearize) the order?

To retain a logical ordering of commits with respect to comments on those commits, it is useful to observe that the topological commit ordering is always necessarily in agreement with the push-order, and the timestamp of the push is something github can observe and compare with the comment timestamp. Since you can't push commits out-of-order (i.e. descendants before ancestors), the push date is always consistent with any topological commit order.

In short, a topological sort based on the tuple (push_or_comment_timestamp, author_date) would likely do exactly what everyone expects: unrelated commits from the same push are sorted by author date (rarely important, but hey), related commits are always descendants after ancestors, and comments appear between the pushes they're based on.

Could you update the pull request event sequence to match that ordering? That way it's much easier to see which comment is in reaction to which push, and which commit is depends on other commits, especially in the face of clock errors or rebases.

For more discussion, see #386

@dwijnand
Copy link

dwijnand commented Jul 15, 2020

Fixed, apparently: https://github.blog/changelog/2020-07-14-pull-request-commits-now-ordered-chronologically/

(edit) I guess they went with "reverse chronological order", which is git's default. So it's not --author-date-order any more, but it's not --topo-order either, if that's what you wanted.

@graingert
Copy link

This appears to be chronological rather than topological

@robinmoussu
Copy link

We are changing the way commits are ordered in the pull request timeline and commits view. Commits are currently ordered by author date, which can cause commits to appear out of order in some scenarios, like after rebasing. With this change, commits are ordered according to their chronological order in the head branch, which is consistent with the ordering in Git.

This phrase is wrong (git ordering is topological, not chronological), but I assume that they meant topological instead of chronological…

@EamonNerbonne
Copy link

So, I just test this, and it's better... but still wrong. Seriously, how hard can it be :-/

@EamonNerbonne
Copy link

At least the commits are now in toplological order: yay!

However, PR comments are interleaved based on author date, not push date... that's weird.

@EamonNerbonne
Copy link

EamonNerbonne commented Jul 15, 2020

Maybe it's coincidence that github addressed this feedback so quickly, but just in case I sent more feedback:

Recently, you improve the commit ordering, as announced here: https://github.blog/changelog/2020-07-14-pull-request-commits-now-ordered-chronologically/ - that's great, and makes PR's easier to review!

In particularly, when the branch is rebased and commits have an author date that is in a different order git-DAG order, github PRs now show the commits in the order that they will be applied to the code (as opposed to resorted by author date).

However, sometimes people comment on PRs that are later rebased. And sometimes people forget to push commits, so push commits much later than they are authored. In these cases, the new ordering is still confusing (and conceptually wrong).

Consider the following scenario:

  • I author a branch B, add commit "B1", then push that and create a PR. Then I notice a typo, and fix it in commit B2 (but no push, so invisible to github).
  • A code reviewer looks at my PR, and notices the typo, and adds a comment: "Terrible PR, it has a typo!"
  • I notice I forgot to push, and push commit B2 only now.
  • I post "whoops, sorry - if you find anything else, maybe @some_random_user can fix those, I'm going on vacation!"
  • Result: The PR shows me creating it, and commits B1 and B2, and only then shows the reviewers comment, and then my response
  • @some_random_user looks at the PR, and tries to find said typo, (because it looks like the reviewer reviewed after I made all my commits) and is confused.

If github interleaves PR comments with commits based not on author date, but on push date, PR's would read better. After all, it's irrelevant when the commits were authored from the perspective of those looking at the PR; what matters is the order in which they were added to the PR. Since pushes are necessarily topology-preserving, there is no conflict between a topological ordering of commits, and a push-date based ordering of comments.

(note to self: proofread before hitting send... oh well!)

@EamonNerbonne
Copy link

EamonNerbonne commented Jul 15, 2020

@robinmoussu

This phrase is wrong (git ordering is topological, not chronological), but I assume that they meant topological instead of chronological…

Well, they may be referring to the chronology of the git commit timestamp, which, if your clock is monotonic, would be consistent with a topological ordering. Even if your clock isn't monotonic, it's possible git pulls tricks to hide that (no idea) and in any case observing clock non-monotonicity is typically hard. In some sense "chronological" might be equivalent to "topological" - if you use the commit timestamp, and if the clock is "good" enough.

@larsbrinkhoff
Copy link

I reordered the commits on a branch yesterday (without touching author date), and lo and behold, I think they did sort topologically now on the GitHub pull request page.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests