Cool to run across a mention of one of my projects (transcrypt) out in the wild. I do like that blackbox isn't tied to just Git, and uses GPG...but that can also be difficult for adoption if GPG isn't a familiar tool on your team.
So, I've skimmed through most of the paper, and it seems like a bit of an apples to oranges comparison. My understanding is that GV2 is an extension to jGit that essentially performs GC on the repo (or more accurately waits to encrypt until post-GC), and then transparently encrypts everything altogether, rather than utilizing clean/smudge filters to encrypt files individually like the tools it's comparing itself to. It's funny that the author dismissed git-remote-gcrypt as being "under development" even though it's a much closer comparison to GV2.
The paper mentions that it's measuring the worst-case scenario for the clean/smudge filter-style tools as it's much more likely that you only need to protect a few files and not the entire repository, but I didn't see how the second section actually reflected this more-realistic scenario. I'm not saying that encrypting the entire repository is bad, but the overhead of using filters to encrypt the entire repository is a documented/known limitation of the other tools...so it seems a little odd to gloss over that.
Side note, stuff like "This process is repeated a total of 10 iterations for an ample sample size to draw statistical conclusions." worries me, but that's another conversation.
Overall though, glad to see more research in this area, and it sounds like GV2 might be a decent solution for people looking to protect their data in certain scenarios.
I agree, I'm happy to see more research here. For my uses, a filter based approach makes the most sense, though I can understand wanting a more holistic solution.
My needs are such that I might only want to encrypt a few files, and filters do an adequate job here. But I can understand a limitation where all objects must be encrypted, and thus filter perf is not appropriate.
Filters also have the limitation of leaking the commit and tree information, which may also be sensitive data on some projects.
This is an interesting strategy, and I'm glad to see more research done here, regardless.
As the name suggests, it uses gpg to encrypt and decrypt the data.
Unlike git-encrypt and git-crypt, it doesn't use smudge/clean filters.
Instead, it uses a special command (`git-gpg push $remote`) to push changes to a local unencrypted mirror of the remote repository. It then encrypts any newly created git objects, and finally rsyncs the new objects to the remote repository. So the remote is just a directory of zipped, gpg-encrypted files.
It has worked well for me over the past two years, but I don't expect that it solves every edge case.
Feedback welcomed through issues and pull requests.
I'm sure it works well for you, and you might be aware - but just in case you aren't, that's what "git remotes" are all about - you can write a remote helper instead, and "git push gpg://...." instead of "git gpg push". That's how p4, mercurial, etc client bridges are implemented.
See [0] for more. Obviously, this might be overkill for your use case, but this is the "right" way to do it.
As stated in Chapter I, distributed version control systems, particularly Git, have
been rapidly increasing in popularity among software developers in recent years [4]. The
problem exists when these organizations have sensitive data that they want to use with
Git in an unsecure environment. To secure an environment, especially over the internet,
involves high levels of cost. As the name implies, GV2 provides Git with a Virtual Vault
in a remote location, such as on an Amazon Cloud, using their Amazon S3 storage
service. GV2 provides new documented functionality and performance to the research
community. This is new research that has high interest from the Department of Defense
and other organizations who want to run applications using a third party cloud service
provider but also want to maintain confidentiality and integrity of their application data.
In the future, many traditional applications will be modified to support this same type of
security in an unsecure environment in an efficient manner that is transparent to the user.
It's awesome to see the US Military (or more specifically, people who've studied at military schools) doing work in crypto and cyber-security, especially with all the effort on the part of civilian agencies to erode it.
Alternatives:
[0]: https://github.com/StackExchange/blackbox