This morning had a thought. We store numbers in a fixed sized memory space. So if we use four bytes to store numbers we would need eight bytes to store the numbers 5 and 6. But, what if we store 5 in four bytes and then 6 in two bit, as a delta? The bits can indicate +1, 0, -1., here an increment of the 5 by one. Larger increments would use more bits of course. Thus, we naturally get compression.

Subject areas: mobile computing, Internet of Things.

True, this wouldn’t work as a ‘live’ memory storage, the I/O would be complex. Or would it? In constrained devices such as Wearable Computing, for example, a smart watch, or in an Internet Of Things remote device, memory limitations may require compressed storage.

As usual, this is not a new idea. It is related to “Delta Encoding” or “Data differencing”. An interesting article on how delta coding could be used for compression is “Effective compression using frame-of-reference and delta coding”.

I have not seen this delta encoding memory approach mentioned anywhere yet.

In the shower this morning revisited the sorting subject. The prior post presented a non-efficient, though interesting (to me)) approach using a topological ordering graph algorithm. Is it possible to make it efficient?

Linear graph creation
So I went back to my initial idea of creating the graph with three types of edges: less than (LT), greater than (GT), and equal (EQ). Thus, it would be a linear process to iterate the list of objects and create the graph based on the relationship between neighbors. That is, compare the first with the second, the second with the third and so forth. With the original list [9, 2, 12, 40, 12, 3, 1]

The resulting graph is:

Embedded sub graphs
Is there any way to take that and break and merge it so that the algorithmic efficiency is high? With the labeling of edges we created a weighted digraph.
What we have is potentially three digraphs. Listed by taking the inverse of each edge (sink,source):
The less than graph LT: (2,9),(12,40),(3,12),(1,3)

The greater than graph GT: (2,12),(12x,40)

The equals graph EQ: in this example empty, but the 12 -> 12x equality is in the original data.

Sub digraph sorting and merge
One approach to sorting the complete list is taking each subgraph (LT, GT, EQ), sorting each and then merging the results.

The LT graph sorted is:

The GT graph sorted is:

Then we just apply a merge algorthm. The gain with this approach is that we broke the graph into three or two subgraphs. We can reapply this recursively, but this is just a rediscovery of existing sorts like Merge Sort?

Continued graph augmentation?
Combined into one graph we get:

Note that if we take the sinks end of each edge we have:
less direction: 2, 12, 3, 1
greater direction: 12x, 2

Compare this to the original graph from previous blog post where we we compared every node to each other:

Just noticed that in the presorted digraph above there is a link from vertex 1 to 2 then to 3, but there is also a link from 1 to 3. If we remove the shortest link and keep the longest multiple links (the reverse of some graph algorithm goals) we are left with the correct sequence 1, 2, 3. Hmmm.

No sort, just links between subgraphs?
Could be possible to do a merge of the later separated graphs? Or can the original graph be augmented to capture extra information? Lets see, if I walk GT, and compare each source with LT, and the first greater than or equal to source vertex create a link to it and to the sink of that node? If the vertexes already appear skip? Something like:

But this one shows an ambiguity ‘2’ points to both ‘9’ and ’12’.

By creating a pseudo weighted directed graph on a list of elements, it is possible to apply graph algorithms to the sorting problem. This offers an alternative to approaches such as QuickSort or Merge sort. As an experiment a Java program implementation is shown.

Pseudo weighted DAG
Given a set of numbers, for example, [9, 2, 12, 40, 12, 3, 1], we can create a directed acyclic graph (DAG) where each number is a vertex and an outgoing edge points to all other elements greater then it. “9” will have an edge to 12 and 40. “12” will have an edge to 40 and also to the other “12” element. This latter edge will be a special ‘equal’ edge and indicates duplicates. If this were an ordinary edge, the graph would have cycles.

Transforming this list into a DAG yields:

Note that once this transformation is complete we essentially know the order of vertices, we just don’t have a linear arrangement yet. We don’t have to revisit any element and compare it with another as in some other sorting algorithms. Of course, this initial step is not efficient, O(n^2).

Topological sorting
A topological sorting (toposort) of a directed graph is a linear ordering of its vertices. An advantage of a toposort is that it runs in linear time. To this time must be added the graph construction phase, and in this implementation, a final step is inserting duplicate elements into the result.

A toposort of a complex graph can have many solutions. As applied here, there is only one solution. But, this remains to be proven or is already known to be true in Graph Theory. See uniqueness.

Implementation

Listing one below contains the Java implementation that sorts the list of elements given above. The program contains an embedded JUnit test. In the test the method loadDag(map, data) creates the DAG. While creating the edges, if a source and sink have values that are equal the ‘equal’ edge is created. The sink vertex is then marked as a duplicate.

The modified toposort in the program will skip vertices that are marked as duplicate. The resulting list is sorted, but is missing the duplicate vertices. In the final step each node in the result set is visited and if it has any outgoing equal edges these destination nodes are inserted into the final result list.

The resultant outputs of the program is:

Sort with dupes: [one:1, two:2, three:3, nine:9, twelve:12, twelve2:12, forty:40]
Sort with unique: [one:1, two:2, three:3, nine:9, twelve:12, forty:40]

Nested JUnit test classes
Though not part of the topic, the programming example used a nested JUnit test class, an interesting approach.

Summary
Presented was a possible approach to list sorting by use of graph algorithms. A Java programming language based example implementation was given.

package com.octodecillion.sort;
import static org.hamcrest.core.Is.is;
import static org.hamcrest.core.IsEqual.equalTo;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertThat;
import static org.junit.Assert.assertTrue;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;
/**
*
* Sorting using directed graph embedding based topological ordering.
*
* @author Josef Betancourt
* @since 20120101T1212
*
*
* Example code based on code created from M. Jessup's *
* {@link "http://stackoverflow.com/questions/2739392/sample-directed-graph-and-topological-sort-code"}
* Which itself is an implementation of one presented in Wikipedia
* article: {@link "http://en.wikipedia.org/wiki/Topological_sort"}
*
*
*/
public class GraphSort {
/**
* Modification of algorthim presented in Wikipedia article.
*
* {@link "http://en.wikipedia.org/wiki/Topological_sort"}
*
* <pre>
* L ? Empty list that will contain the sorted elements
* S ? Set of all nodes with no incoming edges
* while S is non-empty do
* remove a node n from S
* insert n into L
* for each node m with an edge e from n to m do
* remove edge e from the graph
* if m has no other incoming edges then
* insert m into S
* if graph has edges then
* return error (graph has at least one cycle)
* else
* return L (a topologically sorted order)
* </pre>
*
* See also: Algorithm presented in "Algorithms", 4th Edition by R.
* Sedgewick and K. Wayne.
* {@link "http://algs4.cs.princeton.edu/42directed/Topological.java.html"}
* *
*
* @param <T>
* @param allNodes
* @return sorted list or empty list if not possible
* @throws IllegalArgumentException
* if graph has cycles
*/
public static <T> ArrayList<Node<T>> topoSort(final Node<T>[] allNodes) {
ArrayList<Node<T>> workList = new ArrayList<Node<T>>();
HashSet<Node<T>> sinks = new HashSet<Node<T>>();
for (Node<T> n : allNodes) {
if (n.inEdges.size() == 0) { // indegree zero? Or source?
sinks.add(n);
}
}
while (!sinks.isEmpty()) {
Node<T> nextNode = sinks.iterator().next();
sinks.remove(nextNode);
if (nextNode.duplicate) {
continue;
}
workList.add(nextNode);
for (Iterator<Edge<T>> it = nextNode.outEdges.iterator(); it
.hasNext();) {
Edge<T> outNode = it.next();
Node<T> dest = outNode.to;
it.remove();
dest.inEdges.remove(outNode);
if (dest.inEdges.isEmpty()) {
sinks.add(dest);
}
}
}
// All edges are removed? If not, we have a cycle.
for (Node<T> n : allNodes) {
if (!n.inEdges.isEmpty()) {
throw new IllegalArgumentException("Cycle present. Node, "
+ n.name
+ " has inEdges. Topological sort not possible");
}
}
ArrayList<Node<T>> resultList = new ArrayList<Node<T>>();
for (Node<T> node : workList) {
resultList.add(node);
if (!node.equiEdges.isEmpty()) {
for (Iterator<Edge<T>> iterator = node.equiEdges.iterator(); iterator
.hasNext();) {
Edge<T> edge = iterator.next();
resultList.add(edge.to);
}
}
}
return resultList;
} // topoSort
/**
* A node.
*
* @param <T>
*/
static class Node<T> {
public final String name;
public final T value;
public final HashSet<Edge<T>> inEdges;
public final HashSet<Edge<T>> outEdges;
public final HashSet<Edge<T>> equiEdges;
public boolean duplicate;
public Node(final String name, final T value) {
this.name = name;
this.value = value;
this.duplicate = false;
inEdges = new HashSet<Edge<T>>();
outEdges = new HashSet<Edge<T>>();
equiEdges = new HashSet<Edge<T>>();
}
public Node<T> addEdge(final Node<T> node) {
Edge<T> e = new Edge<T>(this, node);
outEdges.add(e);
node.inEdges.add(e);
return this;
}
@SuppressWarnings("unchecked")
public Node<T> addEdges(final Node<T>... nodes) {
for (int i = 0; i < nodes.length; i++) {
Node<T> node = nodes[i];
Edge<T> e = new Edge<T>(this, node);
outEdges.add(e);
node.inEdges.add(e);
}
return this;
}
public Node<T> addEquiEdge(final Node<T> node) {
Edge<T> e = new Edge<T>(this, node);
equiEdges.add(e);
// node.equiEdges.add(e);
node.duplicate = true;
return this;
}
@Override
public String toString() {
return String.format("%s:%s", name, value);
}
} // Node<T>
/**
* A directed edge.
*/
static class Edge<T> {
public final Node<T> from;
public final Node<T> to;
public Edge(final Node<T> from, final Node<T> to) {
this.from = from;
this.to = to;
}
@Override
public boolean equals(final Object obj) {
@SuppressWarnings("unchecked")
Edge<T> e = (Edge<T>) obj;
return (e.from == from) && (e.to == to);
}
@Override
public int hashCode() {
return super.hashCode();
}
@Override
public String toString() {
// TODO Auto-generated method stub
return String.format("%s->%s", from, to);
}
} // Edge<T>
// ====================================================================
// Nested JUnit test.
// ====================================================================
/**
*
* Tests {@link GraphSort}.
*
* Uses nested JUnit testing approach advocated by Ben J. Christensen in <a
* href=
* "http://benjchristensen.com/2011/10/23/junit-tests-as-inner-classes/"
* >"JUnit Tests as Inner Classes"</a>
*
* @author jbetancourt
*
*/
@RunWith(JUnit4.class)
public static class GraphSortTest {
@SuppressWarnings("unchecked")
private Node<Integer>[] allNodes = new Node[0];
/**
* Sort list with no duplicate nodes.
*/
@SuppressWarnings("boxing")
@Test
public void should_sort_without_duplicates() {
uniqueList();
ArrayList<Node<Integer>> result = GraphSort.topoSort(allNodes);
assertFalse("empty result", result.isEmpty());
assertThat(Integer.valueOf(result.size()), is(equalTo(6)));
assertTrue("result not sorted", inOrder(result));
System.out.println("Sort with unique: "
+ Arrays.toString(result.toArray()));
}
/**
* Sort list with no duplicate nodes.
*/
@SuppressWarnings("boxing")
@Test
public void should_sort_with_duplicates() {
dupeList();
ArrayList<Node<Integer>> result = GraphSort.topoSort(allNodes);
assertFalse("empty result", result.isEmpty());
System.out.println("Sort with dupes: "
+ Arrays.toString(result.toArray()));
assertThat(Integer.valueOf(result.size()), is(equalTo(7)));
assertTrue("result not sorted", inOrder(result));
}
@SuppressWarnings("unchecked")
private void uniqueList() {
String nodeData = "nine,9;two,2;forty,40;twelve,12;three,3;one,1;";
String dagData = "one,three;one,twelve;one,forty;one,two;one,nine;two,twelve;two,forty;two,three;two,nine;three,nine;three,forty;three,twelve;nine,twelve;nine,forty;twelve,forty";
Map<String, Node<Integer>> nodes = loadNodes(nodeData);
loadDag(nodes, dagData);
List<Node<Integer>> list = new ArrayList<Node<Integer>>();
for (Entry<String, Node<Integer>> entry : nodes.entrySet()) {
list.add(entry.getValue());
}
allNodes = list.toArray(new Node[list.size()]);
}
@SuppressWarnings("unchecked")
private void dupeList() {
String nodeData = "nine,9;two,2;forty,40;twelve,12;three,3;one,1;twelve2,12";
String dagData = "one,three;one,twelve;twelve,twelve2,one,forty;one,two;one,nine;two,twelve;two,forty;two,three;two,nine;three,nine;three,forty;three,twelve;nine,twelve;nine,forty;twelve,forty;twelve2,forty";
Map<String, Node<Integer>> nodes = loadNodes(nodeData);
loadDag(nodes, dagData);
List<Node<Integer>> list = new ArrayList<Node<Integer>>();
for (Entry<String, Node<Integer>> entry : nodes.entrySet()) {
list.add(entry.getValue());
}
allNodes = list.toArray(new Node[list.size()]);
}
/**
*
* @param nodes
* @param dagData
*/
private void loadDag(final Map<?, ?> nodes, final String dagData) {
String[] dagStrings = dagData.split(";");
for (int i = 0; i < dagStrings.length; i++) {
String[] spec = dagStrings[i].split(",");
@SuppressWarnings("unchecked")
Node<Integer> source = (Node<Integer>) nodes.get(spec[0]);
@SuppressWarnings("unchecked")
Node<Integer> sink = (Node<Integer>) nodes.get(spec[1]);
if (!sink.duplicate || (source.value != sink.value)) {
source.addEdge(sink);
} else {
source.addEquiEdge(sink);
}
}
}
/**
*
* @param nodeData
* @return
*/
private Map<String, Node<Integer>> loadNodes(final String nodeData) {
Map<String, Node<Integer>> nodeMap = new HashMap<String, Node<Integer>>();
String[] nodeSpec = nodeData.split(";");
for (int i = 0; i < nodeSpec.length; i++) {
String[] string = nodeSpec[i].split(",");
String name = string[0];
String value = string[1];
@SuppressWarnings("boxing")
Node<Integer> node = new Node<Integer>(name,
Integer.parseInt(value.trim()));
nodeMap.put(name, node);
}
return nodeMap;
}
/**
* Check that list is sorted.
*
* @param nodes
* @return
*/
@SuppressWarnings("boxing")
private boolean inOrder(final ArrayList<Node<Integer>> nodes) {
boolean result = true;
Iterator<Node<Integer>> iterator = nodes.iterator();
Node<Integer> prev = iterator.next();
while (iterator.hasNext()) {
Node<Integer> current = iterator.next();
// in order?
if (!(current.value >= prev.value)) {
result = false;
break;
}
}
return result;
}
}
} // End class GraphSort.java

The graph was created with GraphViz using the input:

## File: listDAG.gv
## Command to generate: dot -Tpng -o listDAG.png listDAG.gv
digraph {
rankdir=LR
A [label="9"];
B [label="2"];
C [label="12"];
D [label="40"];
E [label="12"];
F [label="3"];
G [label="1"];
G->B
G->F
G->A
G->C
G->D
G->E
B->F
B->A
B->C
B->E
B->D
F->A
F->C
F->E
F->D
A->C
A->D
A->E
C->D
C->E [label="equi", fontcolor=darkgreen]
label="\nlist: [9, 2, 12, 40, 12, 3, 1] transformed to DAG by Josef Betancourt";
}

Genesis of this idea
One day I was wondering if randomness could be used to sort. The idea I had was, as it turns out, a “better” Bozosort. Instead of swapping indiscriminately, I would swap elements if they were out of order. I had other ideas. But, then I started drawing some diagrams, and it led to other ideas based on graph construction.

I gave up the use of randomization. Is the technique presented using graphs unique? Does it work? Can it scale? Or Does it add to the list of bozo sorts?