Equals and HashCode

Java's Collections and Relational database (and thus Hibernate) relies heavily on being able to distinguish objects in a unified way. In Relational database's this is done with primary keys, in Java we have equals() and hashCode() methods on the objects. This page tries to discuss the best strategies for implementation of equals() and hashcode() in your persistent classes.

 

Why are equals() and hashcode() important

Normally, most Java objects provide a built-in equals() and hashCode() based on the object's identity; so each new() object will be different from all others.

This is generally what you want in ordinary Java programming. And if all your objects are in memory, this is a fine model. Hibernate's whole job, of course, is to move your objects out of memory. But Hibernate works hard to prevent you from having to worry about this.

Hibernate uses the Hibernate session to manage this uniqueness. When you create an object with new(), and then save it into a session, Hibernate now knows that whenever you query for an object and find that particular object, Hibernate should return you that instance of the object.  And Hibernate will do just that.

However, once you close the Hibernate session, all bets are off. If you keep holding onto an object that you either created or loaded in a Hibernate session that you have now closed, Hibernate has no way to know about those objects. So if you open another session and query for "the same" object, Hibernate will return you a new instance. Hence, if you keep collections of objects around between sessions, you will start to experience odd behavior (duplicate objects in collections, mainly).

The general contract is: if you want to store an object in a List, Map or a Set then it is an requirement that equals and hashCode are implemented so they obey the standard contract as specified in the  documentation.

 

What is the problem after all?

So let's say you do want to keep objects around from session to session, e.g. in a Set related to a particular application user or some other scope that spans several Hibernate sessions.

The most natural idea that comes to mind is implementing equals() and hashCode() by comparing the property you mapped as a database identifier (ie. the primary key attribute). This will cause problems, however, for newly created objects, because Hibernate sets the identifier value for you after storing new objects. Each new instance therefore has the same identifier, null (or <literal>0</literal>). For example, if you add some new objects to a Set:

// Suppose UserManager and User are Beans mapped with Hibernate

UserManager u = session.load(UserManager.class, id);

u.getUserSet().add(new User("newUsername1")); // adds a new Entity with id = null or id = 0
u.getUserSet().add(new User("newUsername2")); // has id=null, too so overwrites last added object.

// u.getUserSet() now contains only the second User

As you can see relying on database identifier comparison for persistent classes can get you into trouble if you use Hibernate generated ids, because the identifier value won't be set before the object has been saved. The identifier value will be set when session.save() is called on your transient object, making it persistent.

If you use manually assigned ids (e.g. the "assigned" generator), you are not in trouble at all, you just have to make sure to set the identifier value before adding the object to the Set. This is, on the other hand, quite difficult to guarantee in most applications.

 

Separating object id and business key

To avoid this problem we recommend using the "semi"-unique attributes of your persistent class to implement equals() (and hashCode()). Basically you should think of your database identifier as not having business meaning at all (remember, surrogate identifier attributes and automatically generated vales are recommended anyway). The database identifier property should only be an object identifier, and basically should be used by Hibernate only. Of course, you may also use the database identifier as a convenient read-only handle, e.g. to build links in web applications.

 

Instead of using the database identifier for the equality comparison, you should use a set of properties for equals() that identify your individual objects. For example, if you have an "Item" class and it has a "name" String and "created" Date, I can use both to implement a good equals() method. No need to use the persistent identifier, the so called "business key" is much better. It's a natural key, but this time there is nothing wrong in using it!

 

The combination of both fields is stable enough for the life duration of the Set containing your Items. It is not as good as a primary key, but it's certainly a candidate key. You can think of this as defining a "relational identity" for your object -- the key fields that would likely be your UNIQUE fields in your relational model, or at least immutable properties of your persistent class (the "created" Date never changes).

In the example above, you could probably use the "username" property.

 

Note that this is all that you have to know about equals()/hashCode() in most cases. If you read on, you might find solutions that don't work perfectly or suggestions that don't help you much. Use any of the following at your own risk.

 

Workaround by forcing a save/flush

If you really can't get around using the persistent id for equals() / hashCode(), and if you really have to keep objects around from session to session (and hence can't just use the default equals() / hashCode()), you can work around by forcing a save() / flush() after object creation and before insertion into the set:

// Suppose UserManager and User are Beans mapped with Hibernate

UserManager u = session.load(UserManager.class, id);

User newUser = new User("newUsername1");
// u.getUserSet().add(newUser); // DO NOT ADD TO SET YET!
session.save(newUser);
session.flush();             // The id is now assigned to the new User object
u.getUserSet().add(newUser); // Now OK to add to set.
newUser = new User("newUsername2");
session.save(newUser);
session.flush();
u.getUserSet().add(newUser); // Now userSet contains both users.

 

Note that it's highly inefficient and thus not recommended. Also note that it is fragile when using disconnected object graphs on a thin client:

// on client, let's assume the UserManager is empty:
UserManager u = userManagerSessionBean.load(UserManager.class, id);
User newUser = new User("newUsername1");
u.getUserSet().add(newUser); // have to add it to set now since client cannot save it
userManagerSessionBean.updateUserManager(u);

// on server:
UserManagerSessionBean updateUserManager (UserManager u) {
  // get the first user (this example assumes there's only one)
  User newUser = (User)u.getUserSet().iterator().next();
  session.saveOrUpdate(u);
  if (!u.getUserSet().contains(newUser)) System.err.println("User set corrupted.");
}

 

This will actually print "User set corrupted." since newUser's hashcode will change due to the saveOrUpdate call.

This is all frustrating because Java's object identity seems to map directly to Hibernate-assigned database identity, but in reality the two are different -- and the latter doesn't even exist until an object is saved. The object's identity shouldn't depend on whether it's been saved yet or not, but if your equals() and hashCode() methods use the Hibernate identity, then the object id does change when you save.

 

It's bothersome to write these methods, can't Hibernate help?

Well, the only "helping" hand Hibernate can provide is hbm2java.

hbm2java does not (anymore) generate equals/hashcode based on id's because of the described issues in this page.

You can though mark certain properties with <meta attribute="use-in-equals">true</meta> to tell hbm2java to generate a proper equals/hashcode.

 

Summary

To sum all this stuff up, here is a listing of what will work or won't work with the different ways to handle equals/hashCode:

 

no eq/hC at alleq/hC with the id propertyeq/hC with buisness key
use in a composite-idNoYesYes
multiple new instances in setYesNoYes
equal to same object from other sessionNoYesYes
collections intact after savingYesNoYes

 

Where the various problems are as follows:

 

use in a composite-id:

To use an object as a composite-id, it has to implement equals/hashCode in some way, == identity will not be enough in this case.

 

multiple new instances in set:

Will the following work or not:

HashSet someSet = new HashSet();
someSet.add(new PersistentClass());
someSet.add(new PersistentClass());
assert(someSet.size() == 2);

 

equal to same object from another session:

Will the following work or not:

PersistentClass p1 = sessionOne.load(PersistentClass.class, new Integer(1));
PersistentClass p2 = sessionTwo.load(PersistentClass.class, new Integer(1));
assert(p1.equals(p2));

 

collections intact after saving:

Will the following work or not:

HashSet set = new HashSet();
User u = new User();
set.add(u);
session.save(u);
assert(set.contains(u));

 

Any best practicies for equals and hashcode

Read the links in 'Background material' and the API docs - they provide the gory details.

Furthermore I encourage anyone with information and tips about equals and hashcode implementations to come forward and show their "patterns" - I might even try to incorporate them inside hbm2java to make it even more helpful ;)

 

Background material:

Effective Java Programming Language Guide, sample chapter about equals() and hashCode()

Java theory and practice: Hashing it out, Article from IBM

Sam Pullara (BEA) comments on object identity: Blog comment

Article about how to implement equals and hashCode correctly by Manish Hatwalne: Equals and HashCode

Forum thread discussing implementation possibilities without defining a business identity: Equals and hashCode: Is there *any* non-broken approach?

According to The java.lang.Object documentation it should be perfectly ok to always return 0 for the hashCode(). The positive effect of implementing hashCode() to return unique numbers for unique objects, is that it might increase performance. The downside is that the behavior of hashCode() must be consistent with equals(). For object a and b, if a.equals(b) is true, than a.hashCode() == b.hashCode() must be true. But if a.equals(b) returns false, a.hashCode() == b.hashCode() may still be true. Implementing hashCode() as 'return 0' meets these criteria, but it will be extremely inefficient in Hash based collection such as a HashSet or HashMap.