Pannonian Coder: September 2013

"Idiomatic" way to index entities in Neo4j is to do that only on few types of them, usually the ones that are most often used, or for some reason are the most practical to be accessed directly. Of course, top level entities (such as Company or User in some business domains) just have to be indexed since they cannot be fetched via some other entity.

So, let's say we have 2 types of entities - Company and Department, and they are in one-to-many relationship. Company would have to be indexed, but Department would not because it can be traversed to starting from the parent Company. This fetching via traversal is actually one of best selling points of Neo4j because the speed of that operation generally doesn't depend upon size of whole dataset, unlike SQL databases that have to perform JOIN-ing of different tables which involves tackling with their indexes and performance of that ultimately depends upon table size.

Anyway, all seems good, but it can have some impact on your service layer.

Until now, when you had your Department entities indexed, you had some service layer operation with only one argument needed to reference the entity:

 public interface DepartmentManager {  
  void activateDepartment(UUID departmentUuid);  
 ...  
 }

And now we must introduce another argument to identify parent Company to be able to traverse the graph to Department in question.

 public interface DepartmentManager {  
  void activateDepartment(UUID companyUuid, UUID departmentUuid);  
 ...  
 }

Of course, one can argue that we could decide to index Department entities also to simplify accessing them, but then this same reasoning can lead us to index almost all types of entities that we want to operate on at service layer, and we surely want to avoid that for reasons described in the beginning of this post.

Polyglot persistence is all the rage now, and one of more exotic types of databases around are graph DBs, so we decided to give it a shot for a part of larger system. We picked Neo4j. Even if we were not a Java shop, we would probably stumble on it anyway since it definitely looks the most popular graph database right now.

After working with it for some time I noticed a thing that I really like - object-graph mismatch is much lower than object-relational one. Although there are numerous things where object-relational mismatch shows its face, one of things that bothered me the most is that I always had to take good care of what type/role I will be referencing some object with.

In Java land, even with as poor meta-model as it has (compared to some other more exotic languages out there), we can reference some entity from another one by many ways - using class, subclass or interface, And we all know that one of great principles of good OO design is to reference objects by their role, which can be expressed in any of mentioned language constructs. Interface, if sufficient, is usually the most preferred way to express an object role.

Here's an example ...

Let's say we have a User class that has reference to its owner entity, described by UserOwner interface. This UserOwner interface is the role that the owner entity plays in that case.

 public class User {  
   private String name;  
   private UserOwner owner;  
 ....  
 }

And let's say this UserOwner role can be played by multiple different entities - company and department. Let's even say that users themselves can be the owners of other users. If we were to express this in Java, we would implement this UserOwner interface by many classes:

 public class User implements UserOwner {  
 ...  
 }  
 public class Company implements UserOwner {  
 ...  
 }  
 public class Department implements UserOwner {  
 ...  
 }

So how would we map this case to SQL world? We would have USERS table, but also COMPANIES and DEPARTMENTS table.

And to express the reference from user to its owner, we would need to have a foreign key that points from USERS table to .... to.... to what? We don't have a concept in SQL world that would "mark" USERS, DEPARTMENTS and COMPANIES tables belonging to some "USER_OWNERS" type so we could define the foreign key by that target. Problem is that SQL meta-model is still much poorer compared to OO meta-model, and it doesn't have a concept of supertables (for hierarchies of tables), or some other concept that would mark records as being of multiple types.

In Neo4j it is straightforward - unlike SQL database, it is schema-less so we don't burden ourselves with types, we just have a special relationship type (named let's say "BELONGS_TO") that corresponds to association between a User and its UserOwner.

You see how different users reference their owners via BELONGS_TO relationship, regardless if that entity is company, department or other user. Now you can write simple Cypher queries such as this one which without any fuss fetches the owner of some entity:

 START user=node(<someUserId>) MATCH user-[:BELONGS_TO]->owner RETURN owner;

In application layer, we would cast result of that query to UserOwner object and do with it whatever that role allows us (via methods on that interface).

Sweet!

Pannonian Coder

Monday, September 16, 2013

Referencing non-indexed Neo4j entities in service layer

Friday, September 6, 2013

Neo4j and beauty of role-based entity referencing