A very simple data file metaformat

What is the simplest data file metaformat you can create and yet be able to handle future complexity? I started puzzling about this yet again.

Also see follow up post: Simple Java Data File
An example application is given here: Java testing using XStream, AspectJ, Vsdf

Scenario
I had some maintenance to do. So, to reduce the big ball of mud I decided to use external data files. This is where the complexity came in. If I have d data files for each component, and c “components” then the total number of data files is d*c. Future maintenance of so many files is not optimal.

One thing about the required data files in this scenario, some would contain lists, others would be key, value pairs, and so forth. Could these be combined into one data file? I looked at JSON, YAML, XML, and even GRON. Though good they seemed excessive. What if, for example, I needed a simple list? In a simple text file this could be stored with an item per line, or using simple separators. In the aforementioned metaformats not so.

Solution
I revisited the Windows INI file format and just added metadata to a section. A section, indicated with a header “[...]“, also indicates what its data format is. Also, we allow subsections: [type:identifier/section]. This is similar to a URI. The subsection, which can be a hierarchical ‘path’, is optional. The type is optional, default being list (update: text). If the file has no sections, it is just a line oriented file of data in a list. (Update: line oriented string data).

In the original ini file format, the section data were key=value pairs. Here we follow the freedom of a HEREDOC.

The data type indication is practical when standard collections are being created such as lists, map, arrays, and so forth. We can use a generic “text’ type for a non-typed string payload. Since a host application will know what data it is extracting from a data file, the higher level types such as XML, JSon, and others are of limited value.

The use of subsections in the section name allows scoping, but this was also possible in the original INI file format, just not “formalized”. True subsections should probably be nested sections, i.e. hierarchy. But, then we are now losing the simplicity.

Subsections (though not nested) allow the use of cascading data. Data in a section is automatically reused or available in matching subsections.
See Cascading Configuration Pattern.

Example

# Example very simple data file
#
[>list:credit/tests]
one
two
three
[>csv:credit/report]
one,two,three
[>properties:credit/config]
one=alpha
two=beta
three=charlie
[>xml:credit/beans]
<description>
	<item>one</item>
	<item>two</item>
	<item>three</item>
</description>
[>json:credit/alerts]
["one","two","three"]
[>credit]
one
two
three
[>gron:credit/coverages]
["one","two","three"]

 

Section Production Rule
**** Note: this is an incorrect production *****
file ::= section* ;
section ::= ‘[>' (type:)? identifier ('/' subsection)? ']‘ (data)+ sectionTerminator;
data ::= (line lineEnd)*;
identifier ::= name
subsection ::= name [/name]* ;
sectionTerminator ::= ‘[<' identifier ('/' subsection)?
name ::= [a-zA-Z0-9-_]+

 

What we have now is a line oriented data file that can contain other data formats, and with no sections the file is just a line oriented list. Listing three is a demo in the form of a Groovy language JUnit 4 test.

Listing 2, Groovy language JUnit test as a demo

import com.octodecillion.vsdf.*
import static com.octodecillion.vsdf.Vsdf.EventType.*
import org.junit.Test
import static org.junit.Assert.assertEquals

/** Test Vsdf */
class VsdfTest /*extends GroovyTestCase*/ {
	def LINESEP =  System.properties.get("line.separator")
	
	@Test
	void testshouldGetListData(){
		def reader = new Vsdf()
		reader.reader = new BufferedReader(
			new FileReader(new File("data-1.vsdf")))
		
		def theEvent = reader.next()		
		
		while(theEvent != Vsdf.EventType.END){
			def event = reader.getEvent()
			
			if(isSectionCreditWithList(event)){
				def data = event.text.split(LINESEP)				
				assert data.size() == 3				
				assert ( (data[0] == 'one') && 
					     (data[1] == 'two') && 
					     (data[2] == 'three') )				
			}
			
			theEvent = reader.next()
		}
			
	}	
	
	/** */
	def isSectionCreditWithList(evt){		
		return evt.id == 'credit' && evt.dataType == 'list'
	}	

}

 

Sample run:

groovy -cp . VsdfTest.groovy
JUnit 4 Runner, Tests: 1, Failures: 0, Time: 281

Limitations
Not quite correct yet. One issue is that file encoding format. If we want to include other formats they have their own requirements, Java properties, JSON, XML, and so forth. For example, JSON is Unicode. I don’t think this is a major issue, this solution is meant for config data, so ASCII files are adequate.

Also, should the sections have terminators? Right now, the end of a section is simply the start of another. (Update: the version of this concept in actual use is terminator based, i.e., [<] or [<id/subsection...])

Implementation
Below in listing 3 is a very simple implementation in the Groovy language to show how easy this data file is to use. Note this is just a proof of concept and has not been thoroughly tested. I don’t think the use of mark and reset in the file reading is robust; how do you determine the correct read ahead buffer? To make it easier to parse I think the format will need to have section terminators as does HEREDOCS in Linux.

Source code available as a gist.

Listing 3, Groovy implementation

// File Vsdf.groovy
// Author: Josef Betancourt
//

package com.octodecillion.vsdf
import groovy.transform.TypeChecked;
import java.text.BreakDictionary;
import java.util.regex.Matcher
import org.codehaus.groovy.control.io.ReaderSource;

/**
 * @author Josef Betancourt
 *
 */
class Vsdf {
	String currentFolder
	String iniFilePath
	Reader reader
	int lineNum
	int sectionNum
	VsdfEvent data
	int READAHEADSIZE = 8*1024
	def LINESEP = System.properties.get("line.separator")

	enum State {
		INIT, ACCEPT, SHIFT, END
	}
	
	public enum EventType {
		COMMENT, SECTION, END
	}
	

	def state = State.INIT

	public Vsdf(){
		currentFolder = new File(".").getAbsolutePath()
	}
	
	/**
	 * Value object for parsed sections.
	 *  
	 */
	class VsdfEvent {
		EventType event
		String dataType
		String namespace
		String id
		String text
		int lineNum
		int sectionNum
		String sectionString
	}
	
	VsdfEvent getEvent(){
		return data		
	}

	/**
	 * 
	 * @return
	 */
	@TypeChecked
	EventType next(){
		String line = reader.readLine();
		lineNum++
		
		if(line == null){
			return EventType.END
		}
		
		String type =''
		String namespace = ''
		String id = ''
		data = new VsdfEvent()
		EventType eventType
		
		def isBlank = !line.trim() 
		
		// skip blank lines	
		if( isBlank){
			while(true){
				line = reader.readLine()
				lineNum++
				if(line  || line == null){
					break;
				}
			}
		}	
		
		def isComment = line =~ /^\s*#/
		if(isComment){
			data.text = line
			data.event = EventType.COMMENT
			data.lineNum = lineNum
			eventType = EventType.COMMENT			
		}
		
		if( line.trim() =~ /^\[>.*\]/){ // section?
			eventType = EventType.SECTION
			data.sectionString = line
			sectionNum++
			processSection(line, sectionNum, data)			
		} // end if section head

		return eventType 
	}
	
	/**
	 * 
	 * @param line
	 * @return
	 */
	def processSection(String line, int sectionNum, VsdfEvent data){
		data.event = EventType.SECTION
		data.sectionNum = sectionNum
		
		Matcher m = (line.trim() =~ /^\[>(.*)\]/)
		String mString = m[0][1]
		def current = mString.trim()
		if(!current){
			def msg = "section $sectionNum is blank"
			throw new IllegalArgumentException(msg)
		}
		
		def parts = (current =~ /^(.*):(.*)\/(.*)$/)
		if(!parts){
			data.id = current
			data.dataType='list'
		}else{
			long size = ((String[])parts[0]).length
			data.dataType = size > 0? parts[0][1] : ''
			data.id = size > 1  ? parts[0][2] : ''
			data.namespace = size > 2 ? parts[0][3] : ''
		}	
		
		String readData = readSectionData()
		data.text = readData	
	}
	
	/**
	 * 
	 * @return
	 */
	String readSectionData(){
		StringBuilder buffer = new StringBuilder(READAHEADSIZE)		

		while(true){
			reader.mark(READAHEADSIZE)
			String line = reader.readLine();
			lineNum++
			
			if(line == null){
				reader.reset()
				break
			}
			
			if( line.trim() =~ /^\[>.*\]/){ // section?				
				reader.reset()	
				break
			}else{
				buffer.append(line + LINESEP)	
			}
		}	
		
		return buffer.toString()	
	}
}

Further Reading

  1. INI file
  2. Data File Metaformats
  3. Here document
  4. JSON configuration file format
  5. Groovy Object Notation (GrON) for Data
    Interchange
  6. Cloanto Implementation of INI File Format
  7. http://groovy.codehaus.org/Tutorial+5+-+Capturing+regex+groups
  8. URI
  9. Designing a simple file format
  10. The Universal Design Pattern

Similar Posts:

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

4 thoughts on “A very simple data file metaformat

  1. Pingback: JSON configuration file format | T. C. Mits

  2. Pingback: Simple Java Data File in JavaT. C. Mits | T. C. Mits

  3. Pingback: Very simple data file format simplified | T. C. Mits

  4. Pingback: Groovy implementation of INIX file format, part 2 | T. C. Mits

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>