Skip to content
This repository has been archived by the owner on Jan 20, 2018. It is now read-only.

[design] .topmsg / .topdeps pollute commit history #38

Open
aspiers opened this issue Feb 9, 2015 · 15 comments
Open

[design] .topmsg / .topdeps pollute commit history #38

aspiers opened this issue Feb 9, 2015 · 15 comments

Comments

@aspiers
Copy link

aspiers commented Feb 9, 2015

.topmsg and .topdeps represent SCM meta-data, and so I believe it's a fundamental design mistake to include them in the commit history:

  • It violates the separation of concerns between storage of content (files) and metadata.
  • This pollution leads to extra noise appearing in the repo contents, and in every-day git commands and UIs.
  • Converting normal branches into topic branches requires rebasing the whole topic branch. (Worse, there isn't even a helper to do this in the general case - tg import creates a topic branch per commit which is usually undesirable.)
  • It prevents creation of a github pull request directly from a t/foo topic branch.
  • Changing .topmsg or .topdeps either creates more noise, or requires rebasing the whole topic branch.
  • Creating a topic branch requires two sets of edits: one for .topmsg, and one for the initial commit message.

Having said that, I can hazard a guess as to why this design decision was made: in order to adhere to the third stated tenet:

(iii) TopGit is specifically designed to work in a distributed environment. You can have several instances of TopGit-aware repositories and smoothly keep them all up-to-date and transfer your changes between them.

This is clearly a worthy goal, but I think the current implementation is the wrong way of going about it. (However the original author's comments on this decision are encouraging.) In fact topgit already stores other meta-data such as refs/top-bases/* outside the commit history. There are alternate approaches worth considering, e.g.

None of these will get fully and automatically transferred across remotes during using of normal git push / pull / fetch, but it wouldn't take much effort to fill in the gaps.

This was referenced Feb 9, 2015
@aspiers
Copy link
Author

aspiers commented Feb 10, 2015

Now that I think about it more, it seems to me that adding a tag to the base of the topic branch seems like the cleanest solution. This would solve #39 for free, since the tag could be lightweight or annotated. Of course there would need to be some way of tying the base tag to the topic branch, but this could easily be done by adopting a naming standard for the tag, just like topgit already does with refs/top-bases.

@greenrd
Copy link
Owner

greenrd commented Feb 13, 2015

One example is that it prevents creation of a github pull request directly from a t/foo topic branch.

The idea is that you would use tg export to create a cleaned-up branch for one or more topic branches, each of which would become a single commit in the cleaned-up branch, and then create a pull request for that.

@aspiers
Copy link
Author

aspiers commented Feb 13, 2015

@greenrd But what if I don't want to squash my topic into a single commit? That's generally a bad idea if the topic contains multiple logical changes. And it also requires an extra step, which is not ideal for the UX.

@imz
Copy link
Contributor

imz commented Feb 19, 2015

IMO, it's a very sensible and useful suggestion.

  1. If the meta-information was out of the trees, external Git-based tools that are not aware of topgit would be happy. For example, the maintainers of ALT distro work on packages in Git-based gear-repos. One workflow with gear is to use Git-branches for generating patches which are put into .src.rpm. I'd like to emphasize that here the patches are not a meaningful independent thing which is intentionally prepared and sent upstream, but rather a thing internal to the workflow. Package maintainers would collaborate on branches instead (and use TopGit to communicate with upstream, of course; here the repo represents work-in-progress, not necessarily prepared for sending upstream). But due to the extra meta-stuff from TopGit, these branches can't be cleanly fed into gear.
  2. It must be simpler to rewrite the descriptions or update dependencies (say, after a renaming), if they were external to the trees, Then rewriting a description or renaming a branch wouldn't cause rewriting the commit history, which is nice. (Not to say about how cumbersome it is to rewrite multiple commits to update all meta-files, but how easy it is to update the info at one place.)

@aspiers
Copy link
Author

aspiers commented Feb 20, 2015

@imz In reply to your points:

  1. Fantastic - thanks for the info which seems to prove that my thinking is along the right lines!
  2. If we used tags, it would be trivial to rewrite the topic branch description simply by deleting and recreating the annotated tag for that topic's base commit.

There's still the question of where to track a topic's dependencies though. I wonder if it shouldn't be inside the text of the annotated tag because

  1. clean separation of the description (unstructured text) from the dependency list (structured list of strings) will make the implementation programmatically cleaner, and
  2. it should probably be possible to change the dependencies and description independently of each other ... or should it? How often is the topic description likely to change without the dependencies changing, or vice-versa? Having said that, it probably makes sense to store a topic's dependencies and description via the same mechanism for consistency, so that there is a single unified workflow for reading / writing / sharing them. And maybe the practical benefits in simplicity of storing in the same object outweigh the architectural uncleanliness?

I've had some further thoughts on the possible locations for storing this metadata:

  • Store both inside (separate) annotated tags, e.g. if the branch is t/foo then the base of the branch would have two tags: top-base/foo (containing the optional description) and top-deps/foo (containing a list of dependencies, one per line).
    • Benefit: git fetch and git push --tags allow easy sharing of topics across remotes.
    • Visible via UIs such as gitk, which could be considered as an advantage (transparency) or disadvantage (clutter) depending on your point of view.
    • Benefit: gitk etc. would present both types of meta-data at the same location, i.e. on the same commit.
  • Store both in git notes on the base commit.
    • If they are to be kept separately, then we'd need to use two distinct namespaces: refs/top-bases/* and refs/top-deps/*.
    • Very nice advantage: history of dependency changes is preserved. (In contrast there is no reflog for tags.)
    • Disadvantage: visible via gitk etc. but as separate branches, and it's far less intuitive for the user to inspect the dependencies.
    • Can still be shared across remotes, e.g. via git rnotes, but this is more effort than just git fetch or git push --tags.
  • In .git/info:
    • Invisible to almost all existing git commands and tools, so probably a bad idea.
  • In .git/config:
    • Conceptually the cleanest approach.
    • Would allow editing of dependencies via a single git config command.
    • Hard to share across remotes, so not ideal.

Currently I'm leaning towards tags. I'm still undecided whether it should be one or two tags per topic though. It could be one tag by treating any line which starts with the magic string Dependency: as a dependency declaration. This would have the advantage that you could find out everything you need to know about a topic's metadata (i.e. description and dependencies) simply by viewing the contents of its base tag.

Thoughts welcome.

imz added a commit to imz/topgit that referenced this issue Feb 24, 2015
TODO: the problem with topgit internal files should be solved.
The problem is like this
(http://www.altlinux.org/Git/MergingBranches#.D0.B2.D0.BC.D0.B5.D1.81.D1.82.D0.B5_.D1.81_gear):

`tg export` exports nice patches, but it can't be called from `gear`.
Gear is stupid and includes them in the patches.

Possible solutions:

* teach Gear to use topgit (extra dependency?);
* modify TopGit not to store the meta-info (.topmsg, .topdeps)
  in Git trees (greenrd#38);
* somehow adapt `tg export` to the needs of a Gear-based workflow
  (some relevant remarks and objections:
  greenrd#42 (comment)).
imz added a commit to imz/topgit that referenced this issue Feb 26, 2015
TODO: the problem with topgit internal files should be solved.
The problem is like this
(http://www.altlinux.org/Git/MergingBranches#.D0.B2.D0.BC.D0.B5.D1.81.D1.82.D0.B5_.D1.81_gear):

`tg export` exports nice patches, but it can't be called from `gear`.
Gear is stupid and includes them in the patches.

Possible solutions:

* teach Gear to use topgit (extra dependency?);
* modify TopGit not to store the meta-info (.topmsg, .topdeps)
  in Git trees (greenrd#38);
* somehow adapt `tg export` to the needs of a Gear-based workflow
  (some relevant remarks and objections:
  greenrd#42 (comment)).
@aspiers
Copy link
Author

aspiers commented Apr 13, 2015

I am working on this issue this week as a Hack Week project.

I'm still undecided whether it should be one or two tags per topic though.

I think one tag per topic keeps it clean and simple, and it seems to me advantageous to be able to simultaneously view a topic's description and its dependencies (if both exist). For example, reading the description might make you realise that the dependency list is wrong. And I can't really see any disadvantages other than having to extract the dependencies from the tag text.

@aspiers
Copy link
Author

aspiers commented Apr 14, 2015

Another advantage of keeping the description and dependencies coupled together in the same chunk of metadata is that they can't accidentally get separated, e.g. when transferring between remotes, or if something goes wrong in the code. So it's clearly more robust.

However there is a disadvantage to using tags, which is that normal tags occupy a single namespace (refs/tags/*) which spans all remotes, so fetching tags from a remote would automatically pollute the local repo with remote topic branch metadata. I think it would be better to keep this metadata isolated per remote, just like normal branches are, in order to maintain the third goal stated in this issue's original description above. So currently I'm experimenting with the idea of manually creating tag objects via git mktag, and then associating them with the refs/topic-bases/* namespace via git update-ref (maybe better to use refs/topic-bases to distinguish new-style topic base tags from old-style refs/top-bases/* refs). This seems to work so far. It needs a tweak to gitk to help it understand tags which exist outside the refs/tags/* namespace, but this should be easy.

Whilst thinking about all this, further comparisons of the status quo (which relies on .topdeps and .topmsg at the beginning of the topic branch) with a tag-based approach occurred to me. Firstly, both are slightly vulnerable to loss of metadata. Currently you could lose .topdeps and .topmsg e.g. via a rebase -i which accidentally drops the commit. Similarly tags could accidentally be deleted, although this is harder if they live in refs/topic-bases/*.

A disadvantage of the status quo is that rewriting the metadata requires rebasing the whole branch, which changes all the SHA1s. This reinforces my believe that metadata should be kept separate from content! I'll update the issue description to include this.

@greenrd
Copy link
Owner

greenrd commented Apr 14, 2015

The idea is that changes to the metadata are tracked in the git history just like any other changes. How would you merge changes to the metadata if not?

@aspiers
Copy link
Author

aspiers commented Apr 15, 2015

One immediate observation is that tracking changes to metadata is supposed to be handled by the reflog, not the normal log.

But the key question here is, what are the use cases for tracking changes to the topic metadata, or being able to merge changes to the metadata?

For the latter, I can imagine that if two or more developers are collaborating on a topic and are separately editing the same .topmsg, it might occasionally be useful to be able to merge the differences if the .topmsg is intended to be used as a cover letter or pull request description, but I think the many significant disadvantages listed above outweigh this small benefit. After all, it's not even currently possible to merge commit messages, and without that, the ability to merge topic descriptions is somewhat dubious. The need to merge .topdeps seems even more tenuous, since it should be trivial to do this manually, and anyway it's not safe for this to happen automatically because there is no guarantee that the dependencies are equivalent across remotes.

Please let me know if I'm missing something. Thanks!

@greenrd
Copy link
Owner

greenrd commented Apr 15, 2015

it might occasionally be useful to be able to merge the differences if the .topmsg is intended to be used as a cover letter or pull request description, but I think the many significant disadvantages listed above outweigh this small benefit.

But it would still be a loss of functionality compared to the previous version of topgit. (Not to mention that you would need some way of converting topgit repositories to the new format.)

The need to merge .topdeps seems even more tenuous, since it should be trivial to do this manually

But it makes sure it happens.

, and anyway it's not safe for this to happen automatically because there is no guarantee that the dependencies are equivalent across remotes.

Well that is what a merge semantically means - if you pull in someone else's changes and they have added a new dependency, that should be added to your topic branch as well. That's how TopGit's distributed nature works. We only support adding dependencies at the moment, so a merge would only add new dependencies, and therefore we don't officially support rejecting new dependencies (except by using git reset to undo the merge). I think there should be a post-commit hook to run tg update etc. if .topdeps has been changed.

@aspiers
Copy link
Author

aspiers commented Apr 15, 2015

On Wed, Apr 15, 2015 at 03:59:28AM -0700, Robin Green wrote:

it might occasionally be useful to be able to merge the differences if the .topmsg is intended to be used as a cover letter or pull request description, but I think the many significant disadvantages listed above outweigh this small benefit.

But it would still be a loss of functionality compared to the previous version of topgit.

Not really, because I'm going to keep it backwards-compatible via a new topmsg.metadata config variable :-) If it's set to files (the default) then nothing changes. If it's set to tags then the metadata is stored as tags instead. In this case yes you will lose a small amount of functionality (tracking / merging metadata), but on the flip side, all the problems listed above will vanish. I suspect that for a lot of people, the tags mode will be preferable.

(Not to mention that you would need some way of converting topgit repositories to the new format.)

Yes, I am hoping to add that later on. Shouldn't be hard.

The need to merge .topdeps seems even more tenuous, since it should be trivial to do this manually

But it makes sure it happens.

, and anyway it's not safe for this to happen automatically because there is no guarantee that the dependencies are equivalent across remotes.

Well that is what a merge semantically means - if you pull in someone else's changes and they have added a new dependency, that should be added to your topic branch as well.

Understood, but what if that dependency topic is present but different in both remotes? Then you've introduced a local dependency on the wrong thing. The only way to solve this would be to recursively merge all topics in the dependency tree, and that sounds pretty hard to handle cleanly, e.g. how would you rollback the whole tree if something went wrong during the middle of the process?

That's how TopGit's distributed nature works.

Sure - I definitely want to avoid breaking the existing support for distributed setup. That's why I'm experimenting with tags in the refs/topic-bases/* namespace, rather than in refs/tags/*.

We only support adding dependencies at the moment, so a merge would only add new dependencies, and therefore we don't officially support rejecting new dependencies (except by using git reset to undo the merge).

Thanks for the info - I hadn't looked at tg depend before. Should be pretty easy to extend this (and even easier when it only requires rewriting a tag).

I think there should be a post-commit hook to run tg update etc. if .topdeps has been changed.

Why? We don't automatically run tg update if the contents of an existing dependency gets changed, so why automatically run it if a new dependency is introduced.

@greenrd
Copy link
Owner

greenrd commented Apr 15, 2015

Why? We don't automatically run tg update if the contents of an existing dependency gets changed, so why automatically run it if a new dependency is introduced.

Because that is what tg depend add does. (Actually, it also checks for dependency loops, which a post-commit should probably also do.) If a new dependency comes in via a merge, I think it should behave the same as if it comes in from tg add. Admittedly, it would trigger tg update to merge in the latest changes from all the dependencies, recursively, not just the new ones - but that is what tg depend add does, too, so the "defect" - if it is a defect - is already there.

Another possible counterargument is that if it's a fast-forward (or indeed a commit resulting from running tg depend add!), the relevant tg update for the new dependencies has already been done. So maybe my proposed post-commit hook should only run if HEAD is a merge commit.

@aspiers
Copy link
Author

aspiers commented Apr 15, 2015

Why? We don't automatically run tg update if the contents of an existing dependency gets changed, so why automatically run it if a new dependency is introduced.

Because that is what tg depend add does.

Ahh, OK. But in that case, there is no risk of the user being surprised by tg update being triggered, because they've invoked a tg command. In contrast, the idea of things happening automagically via post-commit hooks makes me uncomfortable, because a) it violates the Principle of Least Surprise, and b) there may be use cases where the user doesn't want this to happen.

(Actually, it also checks for dependency loops, which a post-commit should probably also do.) If a new dependency comes in via a merge, I think it should behave the same as if it comes in from tg add.

Here you are referring to a normal git merge of (say) a remote fork of the topic branch, right? And BTW I assume you meant tg depend add not tg add? Again, I think it might be safer if this kind of behaviour only happened via tg subcommands, e.g. tg merge which was a topic-aware wrapper for git merge.

Admittedly, it would trigger tg update to merge in the latest changes from all the dependencies, recursively, not just the new ones - but that is what tg depend add does, too, so the "defect" - if it is a defect - is already there.

Another possible counterargument is that if it's a fast-forward (or indeed a commit resulting from running tg depend add!), the relevant tg update for the new dependencies has already been done. So maybe my proposed post-commit hook should only run if HEAD is a merge commit.

Sounds reasonable. Thanks for all the info - this is greatly helping my understanding, even though we have gone off on quite a tangent ;-)

Going back to the focus of this issue, I am pretty sure that there are many developers (myself included) who would find topgit very useful purely for managing local-only topic branches, without any need to publish them and merge changes from other forks of the same topic branch. And in this case, the more lightweight approach offered by using tags instead of .topmsg / .topdeps offers the substantial advantages, as explained in this issue's description. So as long as my changes retain backwards-compatibility via a dual-mode approach (ideally with a way to easily convert between the two), I hope you will consider merging them once they're ready.

@hrj
Copy link

hrj commented Jul 13, 2015

@aspiers

I am not sure about the technical details of this design, but the use-case you describe matches mine:

I am pretty sure that there are many developers (myself included) who would find topgit very useful purely for managing local-only topic branches, without any need to publish them and merge changes from other forks of the same topic branch.

I just want to manage local patch sets. I and my team haven't yet reached the point where we would find remote collaboration on patch-sets useful.

If this design helps with this use-case, I am willing to try it out. Do you have an implementation somewhere?

@aspiers
Copy link
Author

aspiers commented Feb 21, 2016

@hrj Sorry for the very slow reply! My work will always be found at https://github.com/aspiers/topgit/branches but I can't remember what state I last left it in. I am hoping to return to this at some point soon though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants