For every contributor to an open source project in the old days, there might be fifty failed forks on github, sure. But for every five failed forks, there will be one that thrives and which commits get accepted back.
What you are seeing is both an explosion in contributions and a permanent log of every failed contribution ever. This greatly affects your perception.
Back in the old days it was infinitely harder to provide and apply a useful patch, so it wasn't done in nearly the same frequency. Contributions were limited to a small circle of people motivated and skilled enough to climb the huge hurdle.
Nowadays, creating and submitting a patch is trivial, so the hurdle is much smaller. Hence you will get many, many more people to try and contribute, which, because of how github works is also visible to the public for all eternity.
At least in my case, none of my patches I sent in in the old days and which were not accepted are visible anywhere. Heck, most of the time you'd have a really hard time at even finding the patches that were accepted.
Github is far from killing open source. Quite to the contrary. But as visibility increases and hurdles get torn down, you might have to adjust your perception of reality.
Oh man, when I first started regularly contributing to open source, I had to make a patch and manually increment the numeric prefix when i committed it to a public cvs repository's patches directory.
Then I had to write an email to let people know about it.
That was followed by posting patches as attachments to issues on an issue queue, with no version control net on my side, unless you count committing CVS repositories into subversion.
OP just isn't aware of all the failed forks/patches because they were never as readily available and easy to merge before.
In Ye Olde Days at least the main repository was something you visibly had to maintain. The original author A would set up a project on sourceforge, and add B, C and D with commit access. (Indeed, I remember this being cited as one of the great advantages of git - it breaks down the two-tier society where you had an inner circle of authorized committers, and then second-class citizens who had to submit their patches). But it did mean you would do succession planning around it - if A got hit by a bus, someone else would have access to the One True Repository, and they could take over merging patches from outside contributors.
Ironically, github is making it easier to have a single central point of failure. Because it's so easy for the original author to merge patches from other repositories, there's no longer any pressure to give other people access to the "master" repository. Which is fine until the original author stops doing it (for whatever reason)
> For every contributor to an open source project in the old days, there might be fifty failed forks on github, sure. But for every five failed forks, there will be one that thrives and which commits get accepted back.
Define "failed fork". GitHub&al, propelled by decentralized VCSs, definitely changed the meaning of 'fork'. Previously, creating a fork was a serious act, taken only when you did not have access to the central repository and had to do modifications, or when a given project was going nowhere anymore, or not where or how you wanted. A fork was either a rebirth or a schism.
But that changed.
A fork is now part of the global workflow. One forks and submits a pull request almost like one would checkout, diff and mail a patch previously. The definition of a 'failed fork' is therefore wildly different since the very notion of fork evolved dramatically. Forks never submitting pull requests have their own use so you can't even qualify a fork based on that.
It's the standard tokyo tyrant cache backend for django with a bugfix (it reconnects if the connection dies). The pull request was made in august last year, and I've heard nothing back. Oh well.
On the other hand, my connections to tokyo tyrant don't crap out, and I've got a nice permalinked submodule in my main project. (Or I did before I switched to redis.)
Is this a failed fork? By the definition of the author, sure. But it's made me happy. And if someone else needs this bugfix, it will probably make them happy too.
Besides, it makes upgrading your variant much easier, since you're tracking a remote and can easily (and more often than not, trivially) fetch from then merge with/rebase on it.
Not necessarily. If a project doesn't merge a proposed patch, the patch could simply be deemed inappropriate for the projects chosen direction.
So if a fork doesn't get merged upstream, I see this as a failed fork.
If a upstream stops working on their project and stops accepting patches, the outlined problems can happen, but just look at any random sourceforge project not updated since 2006. In the old days, there was practically no way for other contributors to get back on track, but it wasn't logged for eternity either - the project just died.
Today, Github at least provides a chance to get back on track, but, again, your perception might be altered by the fact that on Github you don't just see the successful reboots, but also all the failed ones?
humm, the way I read it, the post you are replying to is speaking to the same issue as the article (proliferation of forks). What issue do you think it the article is ranting about?
As someone who has dealt with this, it's not as big a deal as you might think. Most future forks are based on older forks, so all the person at the end of the line has to do is fast forward onto the end of the branch. One FF merge, push, end of story.
When you have bifurcations, you can do an octopus merge - git is really good at resolving these things. Very little human effort is needed except where multiple revisions change the exact same line in different ways.
In addition to this, most patches that people submit are quite small. Even if you have 200 people submitting patches, the odds are that most of them fall into two categories: people fixing the same bug, and people working on completely different sections of code. Neither is a substantial problem to merge.
I think I can count on one hand the number of times I had to do any nontrivial merge work on patches from contributors... And you're pretty delighted to do it - it means they fixed something that really matters.
In theory, yes. And I never used to worry about this. But over time, in practice, it's been a bigger problem than I think you're giving credit to.
e.g. ...
"Even if you have 200 people submitting patches, the odds are that most of them fall into two categories: people fixing the same bug, and people working on completely different sections of code. Neither is a substantial problem to merge."
IME ... in practice, this is a HUGE problem. Because every one of those developers fixes the bug in a slightly different way.
The longer time goes on without the original Author fixing it, the worse it gets. And the cost to them - or anyone! - of sifting through the "100 variations on bug fix #123" becomes greater and greater.
Usually, you want to cherrypick individual lines and characters from 5-10 of the best "solutions" to the bug.
If you'd avoided the "100 alternative fixes", then those "improved" solutions would have been built on the "basic" solutions - and merging would be easy.
But because you've got to this massively-forked scenario, all of the patches have been written independently and incompatibly.
Not in practice. In practice, people base their work on the most recent work. You get maybe 4-5 versions of any given bug fix, max, proportionally to how easy it is to fix.
from within your checkout of your fork. In one of the projects on GitHub that I've forked, we are merging between each other without the original project owner even being involved.
If A disappears with merges pending … then B/C/D find they have 3 distinct codebases, and no way within GitHub to do a simple cross-merge.
Now, the situation is not lost – if B, C, and D get in contact (somehow) and negotiate which one of them is going to become “the primary SubAuthor” (somehow), and they issue manual patches to each other’s code (surprisingly tricky to do on GitHub)...
If B, C and D get in contact via, I dunno, github messages, and pick a primary subauthor, it's very easy to issue manual patches. If I'm B:
git remote add C ...
git remote add D ...
git pull C master
git push github master
I agree github might not have a button for this, but I'm pretty sure most github users are comfortable with the git command line.
I really did feel the same way as the author for a long time, but I haven't yet seen any of my fears manifest in practice. The vast majority of forks die without fanfare after serving some singular purpose. People who want to contribute code do so more easily than ever. People who fight over ownership of open source projects are just jerks like they've always been. There may be more of them, or more of them are more visible now that we all use Github, but I consider this a trivial downside of an otherwise remarkable ecosystem.
I've seen smooth transitions between de facto ownership of projects a ton, but never a bitter divide where both ends are actively maintained.
One pathological example is delayed_job, which has changed 'leadership' a few times over the years. It's still pretty easy to look at the 'network' graph and choose the endpoint you want... or just use the published gem.
I've been the leader of a project I assumed from another guy that he assumed from yet another guy and then the original guy even assumed leadership back after a while. No one missed a beat. If you're involved in this community, you are most likely capable of tracking down the "correct" fork.
Unfortunately, the "published gem" part is the one that has given me the most trouble historically. But now that pretty much everyone uses bundler for gems this should be a nonissue - you can even specify a branch of a fork that you'd like to build.
The author has this completely backward: this is fundamentally a social problem - the difference is that with GitHub it's actually visible. Anyone old enough to remember the pre-DVCS era should remember chasing down patches in bug trackers, blog posts, etc. and maintaining local forks — with the requisite terror-inducing periodic gigantic merges. Now we've lost all of the manual labor in that process and made it easy for anyone who wants to do things the right way to do so – it's still possible to waste your collaborators' time if you really want to but before it was almost a requirement of the process.
As a minor point of craft, this also illustrates an area where more training is needed: the problems described are most common when someone makes a fork and keeps every single commit in a single branch. Using feature branches – and it'd be awesome if Github started encouraging that with the fork & edit model – makes most of the listed problems far more manageable.
When you hear Linus speak about his workflow when working on the kernel, he always mentions his "Web of Trust" concept. I think the problem is not inherent to github, but rather to the idea we have when we foolishly think of the possibilities combining git and a social network.
The truth is, programming is still very much about people, and you need to trust the people in order to pull their code. Trusting the people goes beyond trusting the code. If you give me great code, then disappear or decide to make an unmergeable fork, it will harm my project as described in the article.
On the other hand, if I get to know the people behind the pull requests, learn to talk to them and get them to be more involved in the project, then the risks exposed can be easily circumvented.
Actually, if you fork from the main branch, you can still pull commits from other collaborators--though I don't know if you can send pull requests to other people, haven't tried. But it is doable. There is nothing to stop you from making another remote branch that tracks another person's repo and share code that way.
I don't agree that GitHub is killing open source. However, the author has a point. It is hard(er) to merge into other forks, which is a shame since git is so good at this.
I'm not critising GitHub, their software is great, but in the next iteration, they should consider addressing this.
Most Open Source projects die. That's why CPAN, PyPi etc. are able to have such a huge number of packages, a significant part, if not most, has no documentation, tests, support or is dead all of which in practice is more or less the same.
"In the old days" you didn't notice it as much because those projects just disappeared but with Github they don't, in fact they're all over the place.
I'm not sure if this is a problem or one big enough worth caring about but in any case Github isn't the problem.
It would be nice if authors could "archive" or "abandon" repositories which could be filtered out on searches by default and be displayed less dominantly on profiles.
To be fair, the usual consequence for a project that loses its Author is to die.
It seems that github could facilitate the migration of an "ownerless" project to a designated fork - including facilitating the selection of who has the designated fork. Just support for the informal process outlined in the submission.
It's interesting that linus deliberately avoided having a "designated fork" in git, but instead made them all equal, and you just pull from who you trust. Of course, in his case. his fork was the socially designated one, so this was not a problem he experienced or had to solve.
The commit graph (gitk --all if you are using plain decentralised git, GiHub's network graph for the convenient everyone-github-knows online version) makes it quite obvious which author is good at reviewing and integrating patches. With a little bit of side-channel communication, a deficient maintainer is easy to replace.
Also, someone who is late at merging patches won't have a lot of difficulty catching up. If they did no divergent work at all, it's just a matter of picking the best integrator and fast-forwarding.
I don't think projects faltering out due to the bus factor [1] not being taken into account is github's fault.
set up a team repository and give multiple people commit access? Team/project accounts should probably be more of a standard feature of open source projects on github, once things get beyond a certain point.
Wow, no offense dude but this was a really crappy post.
This problem have existed for all eternity, just as other people are stating: There are a lot more projects, people who contribute and transparency of those than before.
And choosing that title. It just seems you are writing one of those. "Look at me! I am writing something controversial"-articles
Welcome to HackerNews! Since you're a new user (green name), a friendly explanation of why you seem to be getting downvoted: At HN, we try and encourage respectful discourse, even when we dislike or disagree with what is being said. If you'll look at the other articles, other people seem to agree with you that the article is poorly written and that the title is sensational, but they aren't being downvoted because the way they phrase those complaints comes across as less insulting or ad hominem.
I think the author is missing something that makes Open Source work that few people appreciate. It's a fundamentally lossy development model. A certain number of patches/features end up in /dev/null for any Open Source project.
You can think of each "fork" as a new start-up trying out a new idea. But instead of reinventing the entire world, they get to start with a functioning product. The vast majority of these start-ups will fail but the ability to experiment (and fail) with forking is fundamentally what makes Open Source development better (at least IMHO) than proprietary development.
A lot of people look toward Open Source development thinking that there's a lot of wasted development and that that's a problem worth solving, but that's like the government trying to make 100% of businesses successful.
This has nothing to do with github. Github is what it is as the name suggests, it's a convenient hosting platform for git projects. The fix/merge issues are between developers and has been around since open source first started. It's people issues, not github.
This reminds me of the famous Churchill quote: "It has been said that democracy is the worst form of government except all the others that have been tried."
Most of the article is true, but GitHub is also leagues above anything else out there, and certainly leagues above the mailing list with hand-crafted patches by de-demonizing forks and turning projects more into a meritocracy.
There is certainly room for improvement, but I think it's a step in the right direction.
The very fact that both forks are available on github means that you can check out both forks, then merge changes locally. After that, you can use the merged code to create a new github project that is not a github fork of the original ones.
If you really have a tangled web of failed forks, this is the way to fix it by starting afresh with a merger of the best forks.
What would go a long way towards fixing this is having an organization plan that's free, but only allows public repositories. That way the code could be entrusted to more than a single individual as it is now. Commit rights are one thing, but having ultimate control over the repository is usually limited to one person.
The only way that you can stop this is by having multiple maintainers for a project so that projects don't just die if the main maintainer is hit by a bus.
And yes, this crappy, albeit well known situation is not just specific to Github.
reading that makes me wonder how open source projects existed at all before GitHub...
GitHub exposes the once private forks that people had lying around on their HDDs, so I count that as a plus.
As for developing open source in a collaborative way, that goes much further than just a git infrastructure, there's mailing lists, patch reviews, roadmap discussion etc... exactly like in ye olde days
Agreed. But GitHub did a lot more than just that - across the board it removed the barriers to collaboration (I used to run a few projects on SourceForge, and contribute to others; the ease of GitHub was like a breath of fresh air).
It got people excited and feeling free and able to collaborate.
...and so (I suspect) we're today less tolerant of unexpected barriers to collaboration. GitHub gets you hooked, then makes it extremely difficult to manage the "handover" part of a project (something that SF - for all its failings - handled pretty well).
The projects that die this way may well never have existed without GitHub in the first place - but that's not an excuse to just kill them off under a burden of maintenance crud.
I think your post fail to recognize a few things. I am not going to point out the various other argument, however I think you'd need to acknowledge the fact that first, there are much more people contributing to OSS nowadays than during the "SF's days". Moreover, with the acceleration of online collaboration in all its forms, we are overwhelmed with new trends that are sometimes hard to interpret. I genuinely believe that these trends tend to self-regulate themselves over time and that users, in the end, learn to better leverage the tools they are introduced to.
What does this have to do with git_hub_? Isn't this a "problem" with git?
I don't think it is a problem at all because without git (or hg, bazaar etc.) or github we wouldn't even have such a thriving open source and open development community. Collaboration didn't get harder with DVCS it got easier.
His issues sound more like issues with open source project governance. Github has just made it easier to contribute, and thus it is becoming clear that good open source projects have good governance. The owner of a project on github is not the only one who can merge pull requests, they can add multiple collaborators on a project...
If anything it does highlight the need for more people to form collectives around projects and use the organisation tools to own the projects...
What you are seeing is both an explosion in contributions and a permanent log of every failed contribution ever. This greatly affects your perception.
Back in the old days it was infinitely harder to provide and apply a useful patch, so it wasn't done in nearly the same frequency. Contributions were limited to a small circle of people motivated and skilled enough to climb the huge hurdle.
Nowadays, creating and submitting a patch is trivial, so the hurdle is much smaller. Hence you will get many, many more people to try and contribute, which, because of how github works is also visible to the public for all eternity.
At least in my case, none of my patches I sent in in the old days and which were not accepted are visible anywhere. Heck, most of the time you'd have a really hard time at even finding the patches that were accepted.
Github is far from killing open source. Quite to the contrary. But as visibility increases and hurdles get torn down, you might have to adjust your perception of reality.