10 Replies Latest reply on Jan 15, 2011 12:19 PM by meetoblivion

    Proposal to switch to Git

    rhauch

      I would like to propose that the ModeShape project switches to Git and using GitHub. I originally made a similar proposal just over a year ago. At the time, the major concerns were that Git was still too new, that few of the contributors had any Git experience, and that switching would raise the bar for attracting new contributors. And, we were in the midst of our rebranding from JBoss DNA to ModeShape, so we didn't want to take on too much all at once.

       

      A lot has changed since then.

       

      First, quite a few of ModeShape's developers have gotten more experience with Git, and have figured out for ourselves why so many people love Git. Several of us have been using Git-SVN for some time, and are really enamored with creating branches for each JIRA issue and the ease of switching between branches. Committing changes to multiple branches is a piece of cake. In fact, everything is just far easier with Git. I can't ever imaging going back to using SVN clients.

       

      Second, Git has crossed the chasm. Git has been adopted on lots of open source projects, including many of the more prominent projects at JBoss.org (including Hibernate, JBoss AS 7, Infinispan to name a few). In July 2010, GitHub announced that the 1 millionth repository was created. I'd even venture to say that Git is now the VCS of choice for newer open source projects.

       

      Third, Git and GitHub have actually made it much easier for people to get involved in the project, and would actually lower the bar for new contributors. Using GitHub's forking mechanism, any user could just fork the "official" Git repository for ModeShape, immediately start making fixes or changes, and then generate a pull-request to ask that their changes be incorporated into the "official" repository. And so many high-caliber users are on GitHub, and these are exactly the kind of new contributors we want to attract.

       

      I understand that several contributors have still not yet tried Git, and are quite comfortable with SVN. But I think we've reached the point where we really can and should be taking advantage of the benefits of Git and GitHub, because it will make our day-to-day activities easier.

       

      Here are some of the benefits of Git:

       

      • Support for non-linear development, which means that branches are cheap and easy to create, and can be easily merged onto multiple other branches. So, it's very easy to create a branch for each JIRA issue, and then merge the changes for that issue onto one or more release branches. And it's easy to switch branches to work on something else.
      • Git is distributed, meaning every instance (or "clone") of the repository is a complete copy, and Git makes it easy to push and pull changes between repository instances. So each developer has a complete, local history of the entire ModeShape source code. But because Git is so efficient, a full local clone of the ModeShape repository actually takes less disk space than a single checkout from Subversion. (This is because a Subversion checkout contains two copies of every file: one that you work with and one in the ".svn" directory.) Git also allows developers to make local commits and see version history even when disconnected from the network. Once connected, these commits can be pushed to other repositories.
      • Git is fast since most operations are done locally.
      • Git is safe because Git clones of a repository each contain the full history and therefore can act as full repository backups. Creating backups -- and keeping them in sync -- is literally just a few commands on a cron job. And since every developer's local repository contains the full history, they can serve as backups to your backups.
      • Strong authenticated history means that every Git commit is dependent upon all prior development history, and every commit is uniquely identified by a secure hash of its contents. These identifiers mean its not possible to change a version of the code without the identifier changing. Plus, these identifiers make it easy for a repository instance to know whether it's already seen a particular commit - two changesets with the same identifier are indeed the same commit.
      • File renames and moves are tracked implicitly because Git tracks snapshots of the entire directory structure. Git detects renames not when taking snapshots but when comparing histories, meaning renames and moves can be discovered and identified at any time between any points in time.
      • Using multiple remote repositories is easy, because Git is inherently distributed. This means that while each developer has their own local Git repository, then can interact with multiple remote repositories. For example, each developer might also have a public GitHub repository (which is a fork of the authoritativerepository), and it is here they can push their changes to share with others before they are merged into the authoritative codebase, yet they can also easily pull changes from the authoritative repository. The ability to easily work with multiple remote repositories opens up all kinds of possibilities for collaboration that were simply not possible with a centralized version control system.
      • Git metadata doesn't pollute the directories, making it easy to use your operating system and other tools without stepping on or seeing the metadata. All Git information is stored within a single ".git" directory at the top of the directory structure. (Remember how Subversion littered your working area with ".svn" directories, and how removing or moving them wreaked havoc with Subversion? No more with Git!)

       

      We can use Git by itself, and in that respect it would be used similar to how we use SVN today, albeit with a lot of improvements. But we can also couple it with GitHub to gain even more.

       

      GitHub provides a place in the cloud where projects can put their "official" repository so it is easily accessed by anyone and everyone. GitHub makes it easy for each developer to fork the official repository to create their own publicly available repository, where they can publish "proposed changes" to the project's codebase, and ask the project to pull these changes into the official repository. Using GitHub, the submitter and the community can review these proposed changes and even make comments (even line-by-line comments), and the submitter can make changes to their proposed changes. In other words, GitHub has an out-of-the-box code-review tool!

       

      In short, Git and GitHub offer a lot of features and benefits, and I'd like to switch to Git and GitHub shortly after the upcoming 2.4 release. Sure, this change will require some learning on the part of those contributors who are unfamiliar with Git, and we'll provide as much help as possible to ease the transition.

       

      There's already a lot of really good content on the web for learning about Git and GitHub. Among them I'd recommend:

      1. Pro Git by Scott Chacon. This is an excellent and very readable book that's available for free online or for purchase in e-book or paper form. Definitely the best place to start.
      2. ModeShape Development Workflow describes how contributors set up and use Git with ModeShape. Not all the examples work at this time, since we haven't yet created our official ModeShape repository on GitHub.
      3. Infinispan and GitHub is an excellent and detailed recipe for how Infinispan's community uses Git and GitHub. Many open source projects use this model, and ModeShape would too.