Friday, December 17, 2010

Trends of Vulger Phrases

Of course my mind first goes to typing vulgar phrases into Google's new N-gram Viewer but the results were pretty neat.

Tuesday, September 14, 2010

Simplifying logging with Maven and SLF4J

UPDATE:Ceki commented below which prompted me to rewrite the third paragraph.

UPDATE 2:I have a better way of configuring Maven and SLF4J now.

The mismatch between logging frameworks always seems to come up in projects I've developed over the years. Little-by-little I've learned and relearned how to navigate the nest of runtime logging that occurs in non-trivial applications. With my latest project I think I finally converged on a solution that I'll carry forward to future projects.

So what am I really talking about? Have you ever been stumped, even for a short time, about where a certain log message is going and why it might not appear in your log? Often this happens when you are trying to debug an issue with a third-party library that's using a different logging implementation them your application. If you are nodding from familiarity, skip the next paragraph.

Let's start from the beginning. There are several logging implementations available for Java, the best known being Log4J and the java.util.logging (JUL) API that was added in JDK 1.4. You may also encounter Apache commons-logging which is a logging facade for library developers to abstract their logging so library users are free to choose their own implementation such as Log4J. But commons-logging has well-documented issues so that's where slf4j comes in as my logging facade of choice. I actually code all my new applications to use the slf4j API directly because it adds functionality over the logging implementations like parameterized log messages that are only evaluated if the associated logging level is enabled. But that's tangent to this post.

So how have you solved the usage of multiple logging implementations? In the past I've tried different techniques:

  1. Configure each framework separately and have them write to their own log files.
    • Pro: Is there one?
    • Con: You need to monitor multiple files
    • Con: You need to understand the usage and configuration of each framework.
    • Con: If you are in a J2EE container you lose the log adminstration it provides.
  2. Redirect everyone to standard out and redirect standard out to a file.
    • Pro: Its all going one place.
    • Con: Its all in standard out, even errors which can make adminstration tools fail to filter messages properly.
    • Con: You still need to configure each secondary framework including message formatting so all log lines follow the same pattern.
  3. Write and connect bridges for each secondary framework so that they will log to one framework that is responsible for writing to the file, which I'll call the primary framework.
    • Pro: Its all going one place and category names, and levels are preserved through to the primary framework.
    • Pro: If the primary framework supports programmatic changes to log levels, it works for all frameworks.
    • Pro: If you are in a J2EE container that support live updating of log levels, it works for everything.
    • Con: You need to know what you are doing to develop the bridges, configure the bridge in each secondary framework and connect it all together properly.

So what's the magic solution I found? Well its a variation of #3 but it doesn't require any coding or any real knowledge of how to configure secondary frameworks. And I'll show you how to accomplish it using Maven and a logging facade API called SLF4J.

Here is the overview of the steps:
  1. Decide what the primary framework will be.
  2. Ban all secondary logging framework in your projects.
  3. Update your dependencies to exclude the banned dependencies.
  4. Configure your project to depend on drop-in replacements of each secondary logging framework from SLF4J and to use SLF4J to send everything to your primary framework.

Decide what the primary framework will be.

For my latest project, I'll be deploying to Glassfish which use JUL internally and has support in its admin console for live updating log levels without restarting the server, so it was easy to pick JUL as my primary framework.

Ban all secondary logging frameworks



Add the following plugin to your pom.xml


Add exclusions

This is where mvn dependency:tree -Dincludes=log4j:log4j is really helpful. If you try to build now, your build fail on the enforcer rule. Run this command to find which dependency is including log4j and add the necessary <exclusion>. Repeat until you find no more dependencies on log4j. Repeat for each of the banned dependencies.

Add dependency on SLF4J

Add the following to your pom.xml


Conclusion


So in 4 steps you can redirect all your logging to your primary logging framework without changing a line of code!

Sunday, March 21, 2010

Best Practices with Maven: OSS forks

Recently I came across a company that is forking several open source Java projects. I saw they were making a mistake that I also made a few years ago and have since learned from.

In Maven's distributed repository architecture project artifacts, like JAR files, are uniquely identified by a coordinate system composed of a group identifier, an artifact identifier, a version number, optionally a classifier, and a packaging type. For instance, the most recent version of the Apache Commons Lang project has a Maven coordinate (i.e.groupId:artifactId:version:classifier:type) of commons-lang:commons-lang:2.5::jar.

A few years ago, if I wanted to make custom changes to this project I would get the source, make my changes and then deploy the result to our private Nexus repository under a new groupId such as com.jaxzin.oss:commons-lang:2.5::jar. That might seem reasonable. Then a year later or so I tried something different and changed the artifactId like this commons-lang:commons-lang-jaxzin:2.5::jar.

Unfortunately there is a serious problem with both of these approaches. Maven supports transitive dependencies which means, if you include a dependency you get its dependencies 'for free'. But what happens when you depend on com.jaxzin.oss:commons-lang and indirectly include commons-lang:commons-lang? With either approach, Maven has lost all knowledge that these two artifacts are actually related. And when I say 'related' I mean they include different versions of the same classes. When Maven loses this relationship, it can't perform version conflict resolution and will include both versions in the output. It will compile against both in the classpath. If you are building a WAR file, it will include both in the WEB-INF/lib directory. If you are assembling or shading an "uber"jar, it will include the classes from both in your giant jar with all its dependencies. And unfortunately, the one that 'wins' is nearly indeterministic.

So what's the solution? How do you properly fork an open-source project privately?

The trick is to change the version, and leave the groupId and artifactId alone. That way, Maven still can detect the relationship and can perform version conflict resolution. So to complete the example I would fork Commons Lang 2.5 to a new coordinate commons-lang:commons-lang:2.5-jaxzin-1::jar.

Now I do have one further suggestion, but it's of questionable practice and I'm not sure how well it works. You might consider forking version 2.5 to version 2.6-jaxzin. This way, if Maven attempt to resolve version conflicts, it will know that your fork is 'newer/better' than 2.5. Maven sees version with qualifiers as being older than the unqualified version. I think the assumption is that if you are qualifying a version its a pre-release version like 1.0-alpha-1, 1.0-beta-1, or 1.0-rc-1. You can read more about how Maven version conflict resolution works and I know they have a major overhaul of this logic available in Maven 3.0 with the Mercury project.

But, in practice, when I've run into version conflicts like this I will add an exclusion clause where I depend on an artifact that is including the conflict transitively.

Thursday, March 18, 2010

Not for Adoption

Last night was my first session as a volunteer at the Danbury Animal Welfare Society (DAWS). I had attended an orientation a few weeks back and that's when I saw the facility for the first time, learned about the standard operating procedures and policies, and got to meet some of the cats I'll be working with. Now I'm not a person that enjoys change or meeting new people, but other than my immediate family I don't think many people are aware because I try very hard to hide my discomfort. Who knows, I could be wrong so feel free to call me out in the comments!

So far, this entire experience is quite out of my normal comfort zone, but I'm forcing myself to do this for several reasons. I learned about DAWS after attending their Puppy Love Ball, a fund-raiser they held in February, in support of a friend of mine who was honored as their Person of the Year. They premiered this mission video at the event and it had me hooked. To learn that DAWS is a shelter that doesn't euthanize animals for non-medical reasons was a real inspiration to me. I didn't even know this idea of a 'no-kill' shelter even existed. Another reason, I have wanted to volunteer somewhere simply to give back to a community. I'm not sure where my misunderstanding came from, but I've always imagined volunteering to be an unpleasant activity, like cleaning or manual labor. So when I found out that I could volunteer to be a 'cat socializer', it sounded like a perfect fit. Since my wife is very allergic to animal hair and dander, this would also be my chance to interact with cats on a regular basis without owning one.

This next statement may come off as brutally honest, another reason I was outside my comfort zone is the facility. Inside, its a clean, spacious and wonderful facility for the animals, but when I first arrived its exterior isn't exactly what I had expected especially since my introduction to DAWS was the grandeur of the Ball held at the Ethan Allen. The building looks like a large residential home and could easily be overlooked as a place of business. After visiting the DAWS location in Bethel I understood the importance of the sign and landscaping that my friend Paul created. Hearing about their goal to fundraise for a brand-new state-of-the-art facility by 2019, it made me even more impressed and connected with DAWS. I'd love to be a small part of them reaching that goal.

I arrived last night a few minutes early, uncertain what to do. Seeing others waiting outside for the door to open, I introduced myself and found out they were waiting to become adopters. I was starting to get a little nervous since none were other volunteers. Was I late? Was there a different entrance? Did I not read an email fully? After a few minutes I discovered I hadn't made a mistake when other volunteers arrived.

After a few minutes we were let in and I introduced myself and was introduce to the other cat program volunteers that were there. Unfortunately I'm joining the volunteer program at a time when everyone needs to be more hands-off with the cats because of a skin infection that is making the rounds in the shelter. They have done an amazing job handling the situation. I hope in a few weeks the issue will behind DAWS.

From what I gathered from the my first session, a two-hour evening consists of feeding the cats, playing with them, cleaning litter boxes and then finishing the night by getting everyone back in their cages. This is the kind of volunteering I can handle!

The thing I really took away from last night though was my experience with a cat named Daphne. She's a beautiful black & white cat and very reminiscent of the family cat I grew up with for 14 years named Purrina. She has some trouble with her hind legs, and she seems a bit of a special-needs case because of it. On her cage is a sign that reads "Not for Adoption" and those three words seemed to say so much about DAWS. Here is a cat that in many cases would be euthanized because no one goes to a shelter looking for a special-needs pet. But instead DAWS, with their no-kill policy, chose to give this cat a chance to be adopted and she's so close to going home.

Daphne cemented why I want to give two hours of my week to DAWS, in a time when I feel like I don't have 5 extra minutes to spare. If you are located in the greater Bethel/Danbury area, I encourage you to donate your time or donate your money.



Friday, March 12, 2010

First Impressions from NoSQL Live

Today I drove up to Boston for the day to attend NoSQL Live. My experience so far within the NoSQL community has been limited to what we've built in-house at Disney and ESPN over the past decade to solve our scaling issues, more recently has been ESPN's use of Websphere eXtreme Scale, and the very latest has been my own experimentation with HBase which hasn't gotten much further than setting up a four node cluster. I've read a little about Cassandra, memcached, Tokyo Cabinet and that's about it. So before the sandman wipes away most of my first impressions of the technologies discussed today, I wanted to record my thoughts for posterity or, at the very least, tomorrow.


Cassandra
Cassandra seems to be the hottest NoSQL solution this month with press about both Twitter and Digg running implementations. My impression, I'm wary of "eventual consistency". I don't feel I understand the risk and ramifications well enough to design a system properly. When Jonathan Ellis of Rackspace Cloud mentioned that Digg needed to implement Zookeeper-based locking on top of Cassandra so that diggs get recorded correctly, I realized how poorly I understand eventual consistency and how risky it could be. But my impression of Cassandra isn't all negative, it definitely seems to have less baggage than HBase by not being built on top of HDFS. I'll get into what that means a little later.

Memcached
Unfortunately the speaker that 'represented' memcached gave off a vibe that really turned me off to the product. I know that's incredibly shallow, but this is first impressions after all and not perfectly-evaluated impressions. Mark Atwood sat on the first panel of the day "Scaling with NoSQL" and his whole attitude seemed to say "memcached is all you'll ever need and these guys next to me are just overdesigning hacks". His answers were short and his tone was quite condescending even when addressing audience questions. Not a very good first impression of him. But luckily today wasn't my first impression of memcached as I was pointed in its direction just last week by a Disney colleague. My research before today has me intrigued about using it as a replacement for ehcache as a second-level cache provider to Hibernate which we use as an ORM in one system at ESPN right now.

Document Oriented Databases (Riak, CouchDB, MongoDB)
Wow, this is a subgroup of NoSQL technology that I had heard of in passing but was really unaware of what problem they were trying to solve. Riak had the best answers for scaling and operational-ease. With homogeneous nodes and consistent hashing, Riak promises that adding and removing nodes are seemless. CouchDB and MongoDB sounded like a 'me too' answer so I'm interested to find out what that really means for each, or better yet what it doesn't mean. But the concepts of document-oriented databases really meshes well with ESPN's current fantasy user database. Our fantasy user profiles are stored in a traditional RDBMS as serialized maps of maps, one row for each user. Since its serialized to a BLOB column its completely opaque to reporting and analytics. To keep that model but have vendor support for divining information and having transparency into it sounds exciting. I really need to look into these. Riak definitely won this round of first impressions.

Tokyo Cabinet
This was a technology I was referred to by a colleague and read through their site last week. I was far from impressed then since it seems much too low-level for my taste, similar to my impressions of Carbonado which we use at ESPN. The lightning talk by Flinn Mueller got me a little more interested. He seems to be doing interesting things but from an analytics and reporting perspective. He was vague on how loads the data from his primary store and what the scale of the data is, so my first impression: its a toy. I'm sure that's an unfair characterization but I'm not trying to be fair tonight. But honestly, Tokyo Cabinet makes no bones that it punts on horizontal scaling which is the deal breaker for me.

Hypertable
I looked at Hypertable (as in read their website) about 18 months ago on the suggestion of a colleague when discussing HBase around the same time. This conference didn't change my opinion, which is "It's HBase but written in C". It doesn't seem to bring anything else to the table which to me is a blocker. JVM implementations are available for all the operating systems I use and so I don't like the idea of needing to find the right binary to download for a given box. When it comes to Java vs. C, I choose Java but I'm also extremely biased as I've been a Java developer nearly my entire career.

Full-stack JavaScript
This was my favorite of the lightning talks, and possibly my favorite of the conference. It felt a little tangential to the NoSQL topic, mainly because Jim Wilson covered more than just data storage. The idea, what if you could use JavaScript on your server, in your client and use JSON for talking between the layers and as the storage format? Crazy, right? I say brilliant. His few slides were mildly embarrassing that dissected each of the popular stacks of today by how many languages you need to learn (Java, XML, etc) as well and the various impedance mismatches between layers (ORM, Object marshaling to JSON or XML, etc). "ORM is an antipattern" was an enlightening take on something I've accepted as necessary. Full-stack JavaScript is something I'll be lusting after for a long time, especially since he made it sounds so attainable with node.js, rhino and MongoDB. As soon as his slides are online I'll be linking to them as well as passing them around the office.

HBase
Well I saved HBase for last. It's the one I've had the most experience with, though that experience can still be measured in hours. As I hinted at earlier, this conference gave me the first impression that HDFS is a weight around the neck of HBase. I was surprised to get that feeling from the room, since my impression has been purely positive so far. It is also getting a lot of flack from the 'single point of failure' problem associated with the current HDFS architecture's Name Node. Apparently performance is a dog since it was "only" designed to be highly distributed with no promise of when you'll get your data. This burden seems to carry over to HBase. But after talking to Ryan Rawson one-on-one at the end of the HBase lab, it's clear he is of the strong opinion that its getting a bad wrap. He also makes very convincing arguments about the scale of what HBase is currently doing in real production environments vs. competitors like Cassandra. It's very pursuasive and you can read more of the details in a very active thread I kicked off on the HBase user group earlier this week.

Conclusion
HBase is still the front-runner of my personal candidates for a NoSQL option for ESPN as it has been for a long time. Cassandra's design choice of eventual consistency is a little scary to me because I don't know yet how to design for it, not because it is inherently bad choice. Documented-oriented databases just made a big blip on my radar. Memcached is interesting if I want to stick with a traditional ORM-based architecture. Tokyo Cabinet and Hypertable are all but off my radar. And the lusty vixen of them all is a full-stack JavaScript architecture.

Disclaimer: Though I mention my employer ESPN in this post, these are my own personal opinions and don't represent the opinions of the company. The final decision on this stuff is "above my pay-grade" as they say.

Wednesday, February 24, 2010

Java Puzzler

I bought the book Java Puzzlers by Josh Bloch and Neal Gafter a few years ago and enjoyed it. If you aren't familiar with it, it covers rare, odd and usually counter-intuitive edge cases of Java in a brain-teaser style of presentation. For both a Java user and an analytical mind its a fascinating book.

Well today I stumbled onto a puzzler of my own and thought I'd share. Can you tell me what the main method of this class will output?
1 : public class Parser {
2 :
3 : private static <T> T parse(String s, Class<T> type) {
4 : // Simple implementation only supports long primitive
5 : if(type == long.class) {
6 : return (T) Long.parseLong(s);
7 : }
8 : throw new UnsupportedOperationException(
9 : "parse() only supports long right now!");
10: }
11:
12: public static void main(String[] args) {
13: System.out.println(parse("1234", long.class).equals(1234));
14: }
15: }

So what gets written to standard output? If you said 'true', then you'd be wrong. If you said 'false', then you'd also be wrong! The code above won't even compile! There are a few features of Java that are colliding here to give odd results, including generics, type erasure, primitive types and autoboxing.

So what happens exactly? You'll get a compile error like this:
Error:line (6) inconvertible types
found : long
required: T
The error is on the return statement line of the parse() method. Long.parseLong() returns a primitive long, but since a parameter type can only represent an Object type, you can't cast a primitive to an parameter type. Also, it appear auto-boxing doesn't work in this case, so it fails with a compiler error.

But wait, doesn't the code explicitly use the Class object that represents the long primitive, long.class? But see 'long.class' is syntactic sugar for the constant field Long.TYPE which is of the parameterized type Class<Long> since type variables only support using Objects.

So realizing this, we can change the return statement to use a Long instead of a long:
    return (T) Long.valueOf(s);
Now the code will compile, but something is still wrong because it prints out 'false'.

Knowing that the return type of parse("1234", long.class) is Long, take another look at the puzzle. Can you see the problem? This one is subtle. The numeric literal '1234' that is inside the call to equals() needs to be auto-boxed and since it is a literal int, it is boxed into a Integer. But an Integer object isn't considered equal to a Long object because they are different types, so equals() returns false.

So it a few interesting features combine for some strange behavior. Hope you enjoyed this puzzle!

Monday, January 25, 2010

UFOs, Ghosts and Bigfoot

So I hope you don't think I'm a complete loon, but I've always been interested in the paranormal, cryptozoology and lights in the sky. I think it is driven by my need to try to answer all those unanswered questions in life.

Lately I've been watching various 'reality' shows on these topics like MonsterQuest, Paranormal State and Ghosthunters. Don't get me wrong, I'm not gullible enough to believe these shows are finding evidence of anything, its just some good ol' fashioned mind-numbing popcorn TV for me to fall asleep to.

Watching them has reminded me of a few unexplained incidents I've had in my life and I wonder if you've experienced anything similar? I've got three that all happened to me over a decade ago but were so unnerving that I remember them vividly. So here are my campfire ghost stories, feel free to share yours!

Incident #1: The Mysterious Flash
I don't remember the exact date or even the season of this first incident, but my best guess is it happened during the summer of 1993. It was the summer before my sophomore year of high school. I had a goal to read 50 books that summer. I know what you're thinking "wow, was he really that cool?" My motivation was a challenge from my freshman English teacher that I couldn't do it, and I ended up proving him wrong. The reason I'm telling you all this is to set the scene that I was in my bedroom reading a book around 10 o'clock at night.

My Dad is awesome and years earlier had installed overhead track lighting in my room attached to a touch-sensitive dimmer switch. The track lights hung from the ceiling and I had most of them pointed outward to face the walls so I wouldn't be blinded. However the one light that hung near the head of my bed I faced inward to act as a reading light. For whatever reason, I was laying in bed with the lights dimmed as low as I could and still be able to read. I can't explain the inner workings of my 15 year old self but I liked the lighting that way.

As I'm laying there in bed with my back propped against the wall my entire room is flooded with brilliantly blinding light! I stare at my book and realize the light isn't coming from above me it's from behind me as I can see my shadow and the shadow of the book I'm holding cast against the far wall of my room like someone is holding a spotlight a few feet behind my shoulder. And before I can react it's gone! The whole flash takes about half a second. Then as I turn to see where the light came from I come to the realization that I'm laying on the wall, it couldn't have possibly come from behind me and cast the shadows I saw. Now being rational, my first thought is that I imagined the shadows and it was a simple power surge so I go downstairs and ask my parents if they saw anything, but they hadn't.

Incident #2: The 727 at 500'
For this next incident I wasn't alone. I was still in high school and my Mom and I were in the car together coming home from an after-school activity, probably band practice or musical rehearsal. It was twilight and as we drove home I see a plane on the horizon. As I watch it I realize it is not near the horizon because its far away, it's because it is close but flying low to the ground. I pointed out the plane to my Mom and wondered out loud if something is wrong with it. She was surprised and decided to pull over to watch it. As the plane drew closer we both realized that it was a commercial airliner, not a small prop plane out of Danbury as I was expecting. Now mind you, I grew up in the hills of New Milford where the closest airport that can land an airliner is 40 miles away. It continued to approach us and at this point I realized it was traveling extremely slow for a jet of its size. Finally as it passed directly overhead I realized it was the size of a 727, no more than 500 feet from the ground, and going maybe a 100 miles an hour which is very close to, if not less than, the stall speed for an jet like that. But if that all wasn't strange enough, here's the part that haunts my mother and I to this day...it was dead silent. I mean so silent that the crickets stopped chirping. The jet passed and disappeared past the treeline and we never heard a thing. In that pre-internet age, my only recourse was to check the newspaper every day for at least a week. Nothing.

Incident #3: Close Encounters of the Orange Kind
My last weird event is probably the clearest of the three in my mind. It was September of 1995 near the beginning of my senior year of high school. My Dad and I were flying to Florida for the weekend so I could tour the Ringling School of Art and Design and see if it was where I wanted to spend the next four years of my life studying computer graphics. In the end, I went to Trinity College in Hartford but that's another story.

On this trip we flew down on a Friday night. It was one of the first night-time flights I'd ever been on so I was pretty excited. We were seated on the left side of the plane that faced the Atlantic Ocean for most of the trip. Even without much to see but the blackness of the ocean I still sat watching out the window almost the entire trip. Midway through the trip I got to see thunderhead clouds out on the horizon. It was a lot of fun watching a lightning storm from 200 miles away. But as I watching I realize that one of the clouds has begun to light up with a constant orange light. The cloud wasn't lit in a diffuse way, it was very clear it was emanating from a point within the cloud. If you've seen the movie Close Encounters of the Third Kind in the scene where the UFOs arrive you can only see points of lights within the clouds, it looked exactly like that but for only one light source. I'm not saying I think this was a UFO, only that the special effects in this movie are a good approximation of what I saw. So I immediately got my Dad to lean over and take a look to see if he had any idea what it was. As we sat there and watched, it became extremely bright. It was so bright that you could see the light cast through the windows onto the right wall similar to what happens when the setting sun is off the left wing. We watched it brighten for about 30 seconds, stay at full brightness for a minute or two and then fade to nothing in about the same amount of time it took to arrive. To this day I have no idea what I saw.