Vintage, Fashion and Non Recursive Hierarchical SQL Queries

C#,Fashion,Vintage,Hierarchical,SQL Modeling

Posted by Alex Peta on November 29, 2013 Copyright© from Bing images : Swimmers competing in the 2016 Ironman World Championship triathlon in Kailua-Kona, Hawaii (© Tom Pennington/Getty Images)

First of all , glad to say : I’m back!!!

Want to start off with 2 things really quickly before I begin with the topic :

  1. been extremely busy the last couple of months and will detail what I’ve been doing in a later blog post.
  2. I’ve upgraded the blog comment system to use Disqus.

Now to begin our topic.

Fashion

I’ve been assigned about a week ago to work on a really old nasty project , .NET 2.0 ASP Classic if you want to know the details, and my job was to upgrade and make it IE10 compatible.  I’m going to spare you the agony of reading the implementation details (hacks) and we agree not to speak about it. :-). I’m going to be fair here, this isn’t a job I was looking forward to.

And it all has to do with Fashion. That’s right. If you didn’t see by now, programmers are fashionable people when it comes to writing code. We are superficial, we care a lot about the looks! Also like in fashion, we follow and start trends, we gather at presentations and see how others write code and when it comes to “modeling”, coding is the place where you see them all :-)

Vintage

But unlike the fashion industry, besides all the resemblances, we have one thing that we don’t agree on  : vintage. Fashion says its cool, we say its old and smells.

Vintage will never be a trend in programming.

That’s because we always are interested in the new, the faster, the better technologies.

Non Recursive Hierarchical SQL Queries

And this is how I started struggling with the project to make it work. The more I struggled the more I had to dive deeper and understand what was going on there and with each breakpoint set I was getting closer to something I did not expect……(insert tension / drumroll here)

The app was showing a report  with a lot of rows based on a tree class and I was curious to see how the tree was populated so when I saw the implementation I felt like discovering a piece of gold inside a piece of crap dirt.

Here is our example hierarchy :  a simple basket that has fruits and vegetables in it.

hierarchy

The result that we want to achieve is to get a sub tree from this large tree as quickly as possible and not recursively.

The answer : pre-compute the data before inserting it into a different representation then the “parentId” foreign key usual representation.

Now its time to “mark” all the nodes by doing a “Depth-first” search in the tree by adding 2 values for each node:

  • left side index – meaning start point
  • right side index – meaning leaving point

Here is the tree after it has been marked:

marked_hierarchy

Let’s see how our end result table will look like with the data inserted from the marked tree:

Name LeftIndex RightIndex
Basket 1 16
Fruits 2 9
Apple 3 4
Cherry 5 6
Banana 7 8
Veggies 10 15
Tomato 11 12
Onion 13 14

Now lets see some queries in action :

  • Get all the fruits sub-tree:
SELECT LeftIndex as start, RightIndex as end FROM table where Name = “Fruits”
SELECT * FROM table WHERE LeftIndex > start AND RightIndex  < end

First I am getting the startIndex (left) and endIndex(right) for the subtree that  I want to retrieve and then I am doing the actual query just by comparing values that are in my expected interval.

And we can continue like this to get all the sub-trees that we want, you got the big picture.

Now lets write some code that will represent our tree in C#. We will need a node first of all :

namespace NonRecursiveHierarchicalSQLQueries.DataStructures.Concrete
{
    public class Node<T>
        where T : INode
    {
        #region Properties
        public int LeftIndex { get; set; }
        public int RightIndex { get; set; }
        public T Value { get; private set; }
        public List<Node<T>> Children { get; private set; }
        #endregion Properties

        #region Constructors
        public Node(T nodeValue)
        {
            this.Value = nodeValue;
            Children = new List<Node<T>>();
        }
        #endregion Constructors

        #region Public Methods
        public virtual void InsertChild(Node<T> child)
        {
            Children.Add(child);
        }

        public void MarkTree(ref int index)
        {
            LeftIndex = index++;

            foreach (var child in this.Children)
            {
                child.MarkTree(ref index);
            }

            RightIndex = index;
        }

        public void Visit(Action<Node<T>> action)
        {
            action(this);
            foreach (var child in Children)
            {
                child.Visit(action);
            }
        }
        #endregion Public Methods
    }
}

This is a quick generic implementation of a tree node. It has a Value property that can be any T where T is an INode and also 2 more numeric properties LeftIndex and StartIndex  that we are using to insert in our table columns in the database.

I’ve also create a Mark method that is the method to call when marking each node and setting the left and right indexes.

And finally we have a Visit method that accepts a action with the current node and passes this action recursively to its “children” – this is to offer different flexible actions to do on the tree.

Now lets use this generic node to create a concrete implementation :

public class MyDataHolder : INode
    {
        public string Name { get; private set; }
        public MyDataHolder(string name)
        {
            Name = name;
        }
    }

public class SimpleTreeNode : Node<MyDataHolder>
{
    public SimpleTreeNode(MyDataHolder value)
        : base(value)
    {
    }
}

After we got our concrete implementations, its time to create a tree and mark it and check if our Depth-first implementation works ok.

class Program
    {
        static void Main(string[] args)
        {

            //Creating the tree
            SimpleTreeNode root = new SimpleTreeNode(new MyDataHolder("Basket"));

            SimpleTreeNode fruitsNode = new SimpleTreeNode(new MyDataHolder("Fruits"));
            SimpleTreeNode appleNode = new SimpleTreeNode(new MyDataHolder("Apple"));
            SimpleTreeNode cherryNode = new SimpleTreeNode(new MyDataHolder("Cherry"));
            SimpleTreeNode bananaNode = new SimpleTreeNode(new MyDataHolder("Banana"));

            fruitsNode.InsertChild(appleNode);
            fruitsNode.InsertChild(cherryNode);
            fruitsNode.InsertChild(bananaNode);

            root.InsertChild(fruitsNode);

            SimpleTreeNode veggies = new SimpleTreeNode(new MyDataHolder("Veggies"));
            SimpleTreeNode tomato = new SimpleTreeNode(new MyDataHolder("Tomato"));
            SimpleTreeNode onion = new SimpleTreeNode(new MyDataHolder("Onion"));

            veggies.InsertChild(tomato);
            veggies.InsertChild(onion);

            root.InsertChild(veggies);


            //mark the tree left and right indexes
            int start = 0;
            root.MarkTree(ref start);


            //create a visit action delegate that will print to console
            //because the visit allows us to add any action delegate here
            //this is the place where we could create an action that inserts in our database.
            Action<Node<MyDataHolder>> consoleWriteAction = (node) =>
            {
                Console.WriteLine("{0}\tl={1} r={2}", node.Value.Name, node.LeftIndex, node.RightIndex);
            };

            root.Visit(consoleWriteAction);

            Console.ReadLine();

        }
    }

You can probably tell that instead of my consoleWriteAction I could have easily used a action to insert the values in the database and this would be exactly what we were looking for.

Of course this isn’t optimized in any way. We could eventually make a “IsDirty” property for each node so that we can insert/update only the necessary nodes not all at once etc.

Conclusion

Working with legacy code is ugly and undesirable. No product manager/client would ever spend one development hour on refactoring. The logic that they use all the time is : “so we have the same functionality but better, what’s in it for me?”.

..BUT sometimes it is worth it! really seldom, you come across pieces of code like this and see that great/smart code has been done even in the early days of building apps.

Note 

Be sure to check out all the code here back over on Github : https://github.com/alexpeta/Blog/tree/master/NonRecursiveHierarchicalSQLQueries