Category Archives: algorithm

Why not store numbers as diff of previous number?

This morning had a thought. We store numbers in a fixed sized memory space. So if we use four bytes to store numbers we would need eight bytes to store the numbers 5 and 6. But, what if we store 5 in four bytes and then 6 in two bit, as a delta? The bits can indicate +1, 0, -1., here an increment of the 5 by one. Larger increments would use more bits of course. Thus, we naturally get compression.

Subject areas: mobile computing, Internet of Things.

True, this wouldn’t work as a ‘live’ memory storage, the I/O would be complex. Or would it? In constrained devices such as Wearable Computing, for example, a smart watch, or in an Internet Of Things remote device, memory limitations may require compressed storage.

As usual, this is not a new idea. It is related to “Delta Encoding” or “Data differencing”. An interesting article on how delta coding could be used for compression is “Effective compression using frame-of-reference and delta coding”.

I have not seen this delta encoding memory approach mentioned anywhere yet.

Links

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Linear list sorting using weighted digraph possible?

In the shower this morning revisited the sorting subject. The prior post presented a non-efficient, though interesting (to me)) approach using a topological ordering graph algorithm. Is it possible to make it efficient?

Linear graph creation
So I went back to my initial idea of creating the graph with three types of edges: less than (LT), greater than (GT), and equal (EQ). Thus, it would be a linear process to iterate the list of objects and create the graph based on the relationship between neighbors. That is, compare the first with the second, the second with the third and so forth. With the original list [9, 2, 12, 40, 12, 3, 1]

The resulting graph is:

list transform

Embedded sub graphs
Is there any way to take that and break and merge it so that the algorithmic efficiency is high? With the labeling of edges we created a weighted digraph.
What we have is potentially three digraphs. Listed by taking the inverse of each edge (sink,source):
The less than graph LT: (2,9),(12,40),(3,12),(1,3)

separate less than digraph

The greater than graph GT: (2,12),(12x,40)

separate greater than digraph
separate greater than digraph

The equals graph EQ: in this example empty, but the 12 -> 12x equality is in the original data.

Sub digraph sorting and merge
One approach to sorting the complete list is taking each subgraph (LT, GT, EQ), sorting each and then merging the results.

The LT graph sorted is:

sorted LT DAG
sorted LT DAG

The GT graph sorted is:

sorted GT subgraph
sorted GT subgraph

Then we just apply a merge algorthm. The gain with this approach is that we broke the graph into three or two subgraphs. We can reapply this recursively, but this is just a rediscovery of existing sorts like Merge Sort?

Continued graph augmentation?
Combined into one graph we get:

embedded DAGs

Note that if we take the sinks end of each edge we have:
less direction: 2, 12, 3, 1
greater direction: 12x, 2

Compare this to the original graph from previous blog post where we we compared every node to each other:

presorted digraph
presorted digraph

Just noticed that in the presorted digraph above there is a link from vertex 1 to 2 then to 3, but there is also a link from 1 to 3. If we remove the shortest link and keep the longest multiple links (the reverse of some graph algorithm goals) we are left with the correct sequence 1, 2, 3. Hmmm.

No sort, just links between subgraphs?
Could be possible to do a merge of the later separated graphs? Or can the original graph be augmented to capture extra information? Lets see, if I walk GT, and compare each source with LT, and the first greater than or equal to source vertex create a link to it and to the sink of that node? If the vertexes already appear skip? Something like:

DAG with Internal links
DAG with Internal links
But this one shows an ambiguity ‘2’ points to both ‘9’ and ’12’.

Updates

Further reading

  1. Weighted graphs and networks
  2. M. Jessup’s sample code on StackOverflow
  3. Topological.java
  4. topological_sort
  5. 4.2 Directed Graphs. Text book, part of course content at Princeton
  6. Directed graph
  7. “JUnit Tests as Inner Classes”
  8. An informal introduction to O(N) notation
  9. Bogosort
  10. Algorithm of the Week: Dijkstra Shortest Path in a Graph
  11. Design Distributed Digraph Algorithms using MapReduce; Journal of Computational Information Systems 7: 7 (2011) 2267-2276. http://www.jofcis.com/publishedpapers/2011_7_7_2267_2276.pdf.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

List sorting using topological ordering of a digraph

By creating a pseudo weighted directed graph on a list of elements, it is possible to apply graph algorithms to the sorting problem. This offers an alternative to approaches such as QuickSort or Merge sort. As an experiment a Java program implementation is shown.

Terms: topological sorting, directed acyclic graph, Java, JUnit, algorithm

This subject is continued in Linear list sorting using weighted digraph possible?

Pseudo weighted DAG
Given a set of numbers, for example, [9, 2, 12, 40, 12, 3, 1], we can create a directed acyclic graph (DAG) where each number is a vertex and an outgoing edge points to all other elements greater then it. “9” will have an edge to 12 and 40. “12” will have an edge to 40 and also to the other “12” element. This latter edge will be a special ‘equal’ edge and indicates duplicates. If this were an ordinary edge, the graph would have cycles.

Transforming this list into a DAG yields:

list converted to digraph
list transform

Note that once this transformation is complete we essentially know the order of vertices, we just don’t have a linear arrangement yet. We don’t have to revisit any element and compare it with another as in some other sorting algorithms. Of course, this initial step is not efficient, O(n^2).

Topological sorting
A topological sorting (toposort) of a directed graph is a linear ordering of its vertices. An advantage of a toposort is that it runs in linear time. To this time must be added the graph construction phase, and in this implementation, a final step is inserting duplicate elements into the result.

A toposort of a complex graph can have many solutions. As applied here, there is only one solution. But, this remains to be proven or is already known to be true in Graph Theory. See uniqueness.

Implementation

Listing one below contains the Java implementation that sorts the list of elements given above. The program contains an embedded JUnit test. In the test the method loadDag(map, data) creates the DAG. While creating the edges, if a source and sink have values that are equal the ‘equal’ edge is created. The sink vertex is then marked as a duplicate.

The modified toposort in the program will skip vertices that are marked as duplicate. The resulting list is sorted, but is missing the duplicate vertices. In the final step each node in the result set is visited and if it has any outgoing equal edges these destination nodes are inserted into the final result list.

The resultant outputs of the program is:

Sort with dupes: [one:1, two:2, three:3, nine:9, twelve:12, twelve2:12, forty:40]
Sort with unique: [one:1, two:2, three:3, nine:9, twelve:12, forty:40]

Nested JUnit test classes
Though not part of the topic, the programming example used a nested JUnit test class, an interesting approach.

Summary
Presented was a possible approach to list sorting by use of graph algorithms. A Java programming language based example implementation was given.

Listing 1, source available here.

GraphSort.java
package com.octodecillion.sort;

import static org.hamcrest.core.Is.is;
import static org.hamcrest.core.IsEqual.equalTo;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertThat;
import static org.junit.Assert.assertTrue;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;

/**
 * 
 * Sorting using directed graph embedding based topological ordering.
 * 
 * @author Josef Betancourt
 * @since 20120101T1212
 * 
 * 
 *        Example code based on code created from M. Jessup's *
 *        {@link "http://stackoverflow.com/questions/2739392/sample-directed-graph-and-topological-sort-code"}
 *        Which itself is an implementation of one presented in Wikipedia
 *        article: {@link "http://en.wikipedia.org/wiki/Topological_sort"}
 * 
 * 
 */
public class GraphSort {

	/**
	 * Modification of algorthim presented in Wikipedia article.
	 * 
	 * {@link "http://en.wikipedia.org/wiki/Topological_sort"}
	 * 
	 * <pre>
	 * 		L ? Empty list that will contain the sorted elements
	 * 		S ? Set of all nodes with no incoming edges
	 * 		while S is non-empty do
	 * 		    remove a node n from S
	 * 		    insert n into L
	 * 		    for each node m with an edge e from n to m do
	 * 		        remove edge e from the graph
	 * 		        if m has no other incoming edges then
	 * 		            insert m into S
	 * 		if graph has edges then
	 * 		    return error (graph has at least one cycle)
	 * 		else 
	 * 		    return L (a topologically sorted order)
	 * </pre>
	 * 
	 * See also: Algorithm presented in "Algorithms", 4th Edition by R.
	 * Sedgewick and K. Wayne.
	 * {@link "http://algs4.cs.princeton.edu/42directed/Topological.java.html"}
	 * *
	 * 
	 * @param <T>
	 * @param allNodes
	 * @return sorted list or empty list if not possible
	 * @throws IllegalArgumentException
	 *             if graph has cycles
	 */
	public static <T> ArrayList<Node<T>> topoSort(final Node<T>[] allNodes) {

		ArrayList<Node<T>> workList = new ArrayList<Node<T>>();

		HashSet<Node<T>> sinks = new HashSet<Node<T>>();
		for (Node<T> n : allNodes) {
			if (n.inEdges.size() == 0) { // indegree zero? Or source?
				sinks.add(n);
			}
		}

		while (!sinks.isEmpty()) {
			Node<T> nextNode = sinks.iterator().next();
			sinks.remove(nextNode);

			if (nextNode.duplicate) {
				continue;
			}

			workList.add(nextNode);

			for (Iterator<Edge<T>> it = nextNode.outEdges.iterator(); it
					.hasNext();) {

				Edge<T> outNode = it.next();
				Node<T> dest = outNode.to;
				it.remove();
				dest.inEdges.remove(outNode);

				if (dest.inEdges.isEmpty()) {
					sinks.add(dest);
				}
			}
		}

		// All edges are removed? If not, we have a cycle.
		for (Node<T> n : allNodes) {
			if (!n.inEdges.isEmpty()) {
				throw new IllegalArgumentException("Cycle present. Node, "
						+ n.name
						+ " has inEdges. Topological sort not possible");
			}
		}

		ArrayList<Node<T>> resultList = new ArrayList<Node<T>>();
		for (Node<T> node : workList) {
			resultList.add(node);

			if (!node.equiEdges.isEmpty()) {
				for (Iterator<Edge<T>> iterator = node.equiEdges.iterator(); iterator
						.hasNext();) {
					Edge<T> edge = iterator.next();
					resultList.add(edge.to);
				}
			}

		}

		return resultList;
	} // topoSort

	/**
	 * A node.
	 * 
	 * @param <T>
	 */
	static class Node<T> {
		public final String name;
		public final T value;
		public final HashSet<Edge<T>> inEdges;
		public final HashSet<Edge<T>> outEdges;
		public final HashSet<Edge<T>> equiEdges;
		public boolean duplicate;

		public Node(final String name, final T value) {
			this.name = name;
			this.value = value;
			this.duplicate = false;
			inEdges = new HashSet<Edge<T>>();
			outEdges = new HashSet<Edge<T>>();
			equiEdges = new HashSet<Edge<T>>();
		}

		public Node<T> addEdge(final Node<T> node) {
			Edge<T> e = new Edge<T>(this, node);
			outEdges.add(e);
			node.inEdges.add(e);
			return this;
		}

		@SuppressWarnings("unchecked")
		public Node<T> addEdges(final Node<T>... nodes) {
			for (int i = 0; i < nodes.length; i++) {
				Node<T> node = nodes[i];
				Edge<T> e = new Edge<T>(this, node);
				outEdges.add(e);
				node.inEdges.add(e);
			}

			return this;
		}

		public Node<T> addEquiEdge(final Node<T> node) {
			Edge<T> e = new Edge<T>(this, node);
			equiEdges.add(e);
			// node.equiEdges.add(e);
			node.duplicate = true;
			return this;
		}

		@Override
		public String toString() {
			return String.format("%s:%s", name, value);
		}
	} // Node<T>

	/**
	 * A directed edge.
	 */
	static class Edge<T> {
		public final Node<T> from;
		public final Node<T> to;

		public Edge(final Node<T> from, final Node<T> to) {
			this.from = from;
			this.to = to;
		}

		@Override
		public boolean equals(final Object obj) {
			@SuppressWarnings("unchecked")
			Edge<T> e = (Edge<T>) obj;
			return (e.from == from) && (e.to == to);
		}

		@Override
		public int hashCode() {
			return super.hashCode();
		}

		@Override
		public String toString() {
			// TODO Auto-generated method stub
			return String.format("%s->%s", from, to);
		}
	} // Edge<T>

	// ====================================================================
	// Nested JUnit test.
	// ====================================================================

	/**
	 * 
	 * Tests {@link GraphSort}.
	 * 
	 * Uses nested JUnit testing approach advocated by Ben J. Christensen in <a
	 * href=
	 * "http://benjchristensen.com/2011/10/23/junit-tests-as-inner-classes/"
	 * >"JUnit Tests as Inner Classes"</a>
	 * 
	 * @author jbetancourt
	 * 
	 */
	@RunWith(JUnit4.class)
	public static class GraphSortTest {
		@SuppressWarnings("unchecked")
		private Node<Integer>[] allNodes = new Node[0];

		/**
		 * Sort list with no duplicate nodes.
		 */
		@SuppressWarnings("boxing")
		@Test
		public void should_sort_without_duplicates() {
			uniqueList();
			ArrayList<Node<Integer>> result = GraphSort.topoSort(allNodes);

			assertFalse("empty result", result.isEmpty());
			assertThat(Integer.valueOf(result.size()), is(equalTo(6)));
			assertTrue("result not sorted", inOrder(result));

			System.out.println("Sort with unique: "
					+ Arrays.toString(result.toArray()));
		}

		/**
		 * Sort list with no duplicate nodes.
		 */
		@SuppressWarnings("boxing")
		@Test
		public void should_sort_with_duplicates() {
			dupeList();
			ArrayList<Node<Integer>> result = GraphSort.topoSort(allNodes);

			assertFalse("empty result", result.isEmpty());

			System.out.println("Sort with dupes: "
					+ Arrays.toString(result.toArray()));

			assertThat(Integer.valueOf(result.size()), is(equalTo(7)));
			assertTrue("result not sorted", inOrder(result));

		}

		@SuppressWarnings("unchecked")
		private void uniqueList() {
			String nodeData = "nine,9;two,2;forty,40;twelve,12;three,3;one,1;";
			String dagData = "one,three;one,twelve;one,forty;one,two;one,nine;two,twelve;two,forty;two,three;two,nine;three,nine;three,forty;three,twelve;nine,twelve;nine,forty;twelve,forty";

			Map<String, Node<Integer>> nodes = loadNodes(nodeData);
			loadDag(nodes, dagData);

			List<Node<Integer>> list = new ArrayList<Node<Integer>>();

			for (Entry<String, Node<Integer>> entry : nodes.entrySet()) {
				list.add(entry.getValue());
			}

			allNodes = list.toArray(new Node[list.size()]);

		}

		@SuppressWarnings("unchecked")
		private void dupeList() {
			String nodeData = "nine,9;two,2;forty,40;twelve,12;three,3;one,1;twelve2,12";
			String dagData = "one,three;one,twelve;twelve,twelve2,one,forty;one,two;one,nine;two,twelve;two,forty;two,three;two,nine;three,nine;three,forty;three,twelve;nine,twelve;nine,forty;twelve,forty;twelve2,forty";

			Map<String, Node<Integer>> nodes = loadNodes(nodeData);
			loadDag(nodes, dagData);

			List<Node<Integer>> list = new ArrayList<Node<Integer>>();

			for (Entry<String, Node<Integer>> entry : nodes.entrySet()) {
				list.add(entry.getValue());
			}

			allNodes = list.toArray(new Node[list.size()]);

		}

		/**
		 * 
		 * @param nodes
		 * @param dagData
		 */
		private void loadDag(final Map<?, ?> nodes, final String dagData) {
			String[] dagStrings = dagData.split(";");

			for (int i = 0; i < dagStrings.length; i++) {
				String[] spec = dagStrings[i].split(",");

				@SuppressWarnings("unchecked")
				Node<Integer> source = (Node<Integer>) nodes.get(spec[0]);
				@SuppressWarnings("unchecked")
				Node<Integer> sink = (Node<Integer>) nodes.get(spec[1]);

				if (!sink.duplicate || (source.value != sink.value)) {
					source.addEdge(sink);
				} else {
					source.addEquiEdge(sink);
				}

			}

		}

		/**
		 * 
		 * @param nodeData
		 * @return
		 */
		private Map<String, Node<Integer>> loadNodes(final String nodeData) {
			Map<String, Node<Integer>> nodeMap = new HashMap<String, Node<Integer>>();

			String[] nodeSpec = nodeData.split(";");

			for (int i = 0; i < nodeSpec.length; i++) {
				String[] string = nodeSpec[i].split(",");
				String name = string[0];
				String value = string[1];

				@SuppressWarnings("boxing")
				Node<Integer> node = new Node<Integer>(name,
						Integer.parseInt(value.trim()));

				nodeMap.put(name, node);
			}

			return nodeMap;

		}

		/**
		 * Check that list is sorted.
		 * 
		 * @param nodes
		 * @return
		 */
		@SuppressWarnings("boxing")
		private boolean inOrder(final ArrayList<Node<Integer>> nodes) {
			boolean result = true;
			Iterator<Node<Integer>> iterator = nodes.iterator();
			Node<Integer> prev = iterator.next();

			while (iterator.hasNext()) {
				Node<Integer> current = iterator.next();

				// in order?
				if (!(current.value >= prev.value)) {
					result = false;
					break;
				}
			}

			return result;
		}
	}

} // End class GraphSort.java

The graph was created with GraphViz using the input:

## File: listDAG.gv
## Command to generate: dot -Tpng -o listDAG.png listDAG.gv 
digraph {
	
	rankdir=LR
	
	
	A [label="9"];      
	B [label="2"];     
	C [label="12"];      
	D [label="40"];      
	E [label="12"];      
	F [label="3"];     
	G [label="1"];      
	G->B
	G->F
	G->A
	G->C
	G->D
	G->E
	B->F
	B->A
	B->C
	B->E
	B->D
	F->A
	F->C
	F->E
	F->D
	A->C
	A->D
	A->E
	C->D
	C->E [label="equi", fontcolor=darkgreen]
	
	label="\nlist: [9, 2, 12, 40, 12, 3, 1] transformed to DAG by Josef Betancourt";
}

Genesis of this idea
One day I was wondering if randomness could be used to sort. The idea I had was, as it turns out, a “better” Bozosort. Instead of swapping indiscriminately, I would swap elements if they were out of order. I had other ideas. But, then I started drawing some diagrams, and it led to other ideas based on graph construction.

I gave up the use of randomization. Is the technique presented using graphs unique? Does it work? Can it scale? Or Does it add to the list of bozo sorts?

Funny xkcd comic: “Ineffective Sorts

Updates
20130104:

  • Recreated the image using LR ranking in GraphViz.
  • Fixed algorithm which broke when I added missing arc from second 12 to 40. TG for Unit tests.

Further reading

  1. Weighted graphs and networks
  2. M. Jessup’s sample code on StackOverflow
  3. Topological.java
  4. topological_sort
  5. 4.2 Directed Graphs
  6. Directed graph
  7. “JUnit Tests as Inner Classes”
  8. Bogosort
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.