Skip to main content

Best Practices with Maven: OSS forks

Recently I came across a company that is forking several open source Java projects. I saw they were making a mistake that I also made a few years ago and have since learned from.

In Maven's distributed repository architecture project artifacts, like JAR files, are uniquely identified by a coordinate system composed of a group identifier, an artifact identifier, a version number, optionally a classifier, and a packaging type. For instance, the most recent version of the Apache Commons Lang project has a Maven coordinate (i.e.groupId:artifactId:version:classifier:type) of commons-lang:commons-lang:2.5::jar.

A few years ago, if I wanted to make custom changes to this project I would get the source, make my changes and then deploy the result to our private Nexus repository under a new groupId such as com.jaxzin.oss:commons-lang:2.5::jar. That might seem reasonable. Then a year later or so I tried something different and changed the artifactId like this commons-lang:commons-lang-jaxzin:2.5::jar.

Unfortunately there is a serious problem with both of these approaches. Maven supports transitive dependencies which means, if you include a dependency you get its dependencies 'for free'. But what happens when you depend on com.jaxzin.oss:commons-lang and indirectly include commons-lang:commons-lang? With either approach, Maven has lost all knowledge that these two artifacts are actually related. And when I say 'related' I mean they include different versions of the same classes. When Maven loses this relationship, it can't perform version conflict resolution and will include both versions in the output. It will compile against both in the classpath. If you are building a WAR file, it will include both in the WEB-INF/lib directory. If you are assembling or shading an "uber"jar, it will include the classes from both in your giant jar with all its dependencies. And unfortunately, the one that 'wins' is nearly indeterministic.

So what's the solution? How do you properly fork an open-source project privately?

The trick is to change the version, and leave the groupId and artifactId alone. That way, Maven still can detect the relationship and can perform version conflict resolution. So to complete the example I would fork Commons Lang 2.5 to a new coordinate commons-lang:commons-lang:2.5-jaxzin-1::jar.

Now I do have one further suggestion, but it's of questionable practice and I'm not sure how well it works. You might consider forking version 2.5 to version 2.6-jaxzin. This way, if Maven attempt to resolve version conflicts, it will know that your fork is 'newer/better' than 2.5. Maven sees version with qualifiers as being older than the unqualified version. I think the assumption is that if you are qualifying a version its a pre-release version like 1.0-alpha-1, 1.0-beta-1, or 1.0-rc-1. You can read more about how Maven version conflict resolution works and I know they have a major overhaul of this logic available in Maven 3.0 with the Mercury project.

But, in practice, when I've run into version conflicts like this I will add an exclusion clause where I depend on an artifact that is including the conflict transitively.

Comments

Popular posts from this blog

TeamCity build triggering by GitHub

So I started using GitHub for a side project and discovered their very cool feature of service hooks. A service hook allows a repository administrator to setup a callback to another service when a commit is made to the repository. For example it can send an email, or chat a message via Jabber.

Now continuous integration servers, like TeamCity, can poll source control systems every few minutes to see if any changes have been committed. But wouldn't it be more efficient to use a service hook to trigger a build?
Looking at GitHub's service hooks, there wasn't one already available to callback a TeamCity server, but right on that same page was a link to the open source repository for GitHub Service Hooks. They "eat their own dogfood" so to speak and make it very easy to contribute new service hooks back to them. So I took an evening, did my first Ruby coding in a while which included more time getting Ruby setup and working on my Macbook than actually coding. In a …

Paperless

I've been slowly going paperless over the past decade. The first step on my journey started in 2000 when I signed up to use a payment service, PayTrust, to receive my incoming bills, scan them, and put them online for me to pay. The next major step was probably when I got a digital camera to replace my traditional film cameras. It might not be considered a "paperless" use case, but it has lead to very little hardcopies over the years as monitors and HDTV with screensavers and AppleTVs have become so beautiful.  Back to the paperless office, my next big step was eFileing my taxes but that didn't come until about 5 years later. Then suddenly about two years ago, I hit a real shift in my desire to go completely paperless when I got my iPad and installed Evernote.

digital notes...
If you aren't familiar with Evernote its an excellent app, available on all the major desktop and mobile OSes, that makes note-taking and organizing really simple. The killer feature is …

Simplifying logging with Maven and SLF4J (Part 2)

So in my previous post I explained how to simplify your logging with Maven and SLF4J. If you haven't read it yet, please do before reading more.  Since then I've discovered an easier and cleaner way to remove the secondary frameworks from your Maven dependency tree.

Here's a revised overview of the steps:

Decided which logging framework will be your primary, aka who will actually write to your log file.Define the dependency scope of all the secondary frameworks to be 'provided'.Configure your project to depend on drop-in replacements of each secondary framework from SLF4J.
Define secondary frameworks as provided
Use the dependencyManagement section for this. Its used when you might have a dependency transitively.
Add dependency on SLF4J Add the following to your pom.xml
Conclusion
So now in only 3 steps you can redirect all your logging to your primary logging framework without changing a line of code!