What is the simplest data file metaformat you can create and yet be able to handle future complexity? I started puzzling about this yet again.
Also see follow up post: Simple Java Data File
An example application is given here: Java testing using XStream, AspectJ, Vsdf
Scenario
I had some maintenance to do. So, to reduce the big ball of mud I decided to use external data files. This is where the complexity came in. If I have d data files for each component, and c “components” then the total number of data files is d*c. Future maintenance of so many files is not optimal.
One thing about the required data files in this scenario, some would contain lists, others would be key, value pairs, and so forth. Could these be combined into one data file? I looked at JSON, YAML, XML, and even GRON. Though good they seemed excessive. What if, for example, I needed a simple list? In a simple text file this could be stored with an item per line, or using simple separators. In the aforementioned metaformats not so.
Solution
I revisited the Windows INI file format and just added metadata to a section. A section, “[...]“, also indicates what its data format is. Also, we allow subsections: [type:identifier/section]. This is similar to a URI. The subsection, which can be a hierarchical ‘path’, is optional. The type is optional, default being list (update: text). If the file has no sections, it is just a line oriented file of data in a list. (Update: line oriented string data).
The data type indication is practical when standard collections are being created such as lists, map, arrays, and so forth. We can use a generic “text’ type for a non-typed string payload. Since a host application will know what data it is extracting from a data file, the higher level types such as XML, JSon, and others are of limited value.
The use of subsections in the section name allows scoping, but this was also possible in the original INI file format, just not “formalized”. True subsections should probably be nested sections, i.e. hierarchy. But, then we are now losing the simplicity.
Subsections (though not nested) allow the use of cascading data. Data in a section is automatically reused or available in matching subsections.
See Cascading Configuration Pattern.
Example
# Example very simple data file # [>list:credit/tests] one two three [>csv:credit/report] one,two,three [>properties:credit/config] one=alpha two=beta three=charlie [>xml:credit/beans] <description> <item>one</item> <item>two</item> <item>three</item> </description> [>json:credit/alerts] ["one","two","three"] [>credit] one two three [>gron:credit/coverages] ["one","two","three"]
file ::= section* ;
section ::= sectionStart (type:)? identifier (‘/’ subsection)? sectionEnd ;
sectionStart ::= ‘[>' ;
identifier ::= name
subsection ::= name [.name]* ;
sectionEnd ::= ‘]’ ;
name ::= [a-zA-Z0-9-_]+
What we have now is a line oriented data file that can contain other data formats, and with no sections the file is just a line oriented list. Listing three is a demo in the form of a Groovy language JUnit 4 test.
import com.octodecillion.vsdf.*
import static com.octodecillion.vsdf.Vsdf.EventType.*
import org.junit.Test
import static org.junit.Assert.assertEquals
/** Test Vsdf */
class VsdfTest /*extends GroovyTestCase*/ {
def LINESEP = System.properties.get("line.separator")
@Test
void testshouldGetListData(){
def reader = new Vsdf()
reader.reader = new BufferedReader(
new FileReader(new File("data-1.vsdf")))
def theEvent = reader.next()
while(theEvent != Vsdf.EventType.END){
def event = reader.getEvent()
if(isSectionCreditWithList(event)){
def data = event.text.split(LINESEP)
assert data.size() == 3
assert ( (data[0] == 'one') &&
(data[1] == 'two') &&
(data[2] == 'three') )
}
theEvent = reader.next()
}
}
/** */
def isSectionCreditWithList(evt){
return evt.id == 'credit' && evt.dataType == 'list'
}
}
Sample run:
groovy -cp . VsdfTest.groovy JUnit 4 Runner, Tests: 1, Failures: 0, Time: 281
Limitations
Not quite correct yet. One issue is that file encoding format. If we want to include other formats they have their own requirements, Java properties, JSON, XML, and so forth. For example, JSON is Unicode. I don’t think this is a major issue, this solution is meant for config data, so ASCII files are adequate.
Also, should the sections have terminators? Right now, the end of a section is simply the start of another. (Update: the version of this concept in actual use is terminator based, i.e., [<] or [<id/subsection...])
Implementation
Below in listing 3 is a very simple implementation in the Groovy language to show how easy this data file is to use. Note this is just a proof of concept and has not been thoroughly tested. I don’t think the use of mark and reset in the file reading is robust; how do you determine the correct read ahead buffer? To make it easier to parse I think the format will need to have section terminators as does HEREDOCS in Linux.
Source code available as a gist.
// File Vsdf.groovy
// Author: Josef Betancourt
//
package com.octodecillion.vsdf
import groovy.transform.TypeChecked;
import java.text.BreakDictionary;
import java.util.regex.Matcher
import org.codehaus.groovy.control.io.ReaderSource;
/**
* @author Josef Betancourt
*
*/
class Vsdf {
String currentFolder
String iniFilePath
Reader reader
int lineNum
int sectionNum
VsdfEvent data
int READAHEADSIZE = 8*1024
def LINESEP = System.properties.get("line.separator")
enum State {
INIT, ACCEPT, SHIFT, END
}
public enum EventType {
COMMENT, SECTION, END
}
def state = State.INIT
public Vsdf(){
currentFolder = new File(".").getAbsolutePath()
}
/**
* Value object for parsed sections.
*
*/
class VsdfEvent {
EventType event
String dataType
String namespace
String id
String text
int lineNum
int sectionNum
String sectionString
}
VsdfEvent getEvent(){
return data
}
/**
*
* @return
*/
@TypeChecked
EventType next(){
String line = reader.readLine();
lineNum++
if(line == null){
return EventType.END
}
String type =''
String namespace = ''
String id = ''
data = new VsdfEvent()
EventType eventType
def isBlank = !line.trim()
// skip blank lines
if( isBlank){
while(true){
line = reader.readLine()
lineNum++
if(line || line == null){
break;
}
}
}
def isComment = line =~ /^\s*#/
if(isComment){
data.text = line
data.event = EventType.COMMENT
data.lineNum = lineNum
eventType = EventType.COMMENT
}
if( line.trim() =~ /^\[>.*\]/){ // section?
eventType = EventType.SECTION
data.sectionString = line
sectionNum++
processSection(line, sectionNum, data)
} // end if section head
return eventType
}
/**
*
* @param line
* @return
*/
def processSection(String line, int sectionNum, VsdfEvent data){
data.event = EventType.SECTION
data.sectionNum = sectionNum
Matcher m = (line.trim() =~ /^\[>(.*)\]/)
String mString = m[0][1]
def current = mString.trim()
if(!current){
def msg = "section $sectionNum is blank"
throw new IllegalArgumentException(msg)
}
def parts = (current =~ /^(.*):(.*)\/(.*)$/)
if(!parts){
data.id = current
data.dataType='list'
}else{
long size = ((String[])parts[0]).length
data.dataType = size > 0? parts[0][1] : ''
data.id = size > 1 ? parts[0][2] : ''
data.namespace = size > 2 ? parts[0][3] : ''
}
String readData = readSectionData()
data.text = readData
}
/**
*
* @return
*/
String readSectionData(){
StringBuilder buffer = new StringBuilder(READAHEADSIZE)
while(true){
reader.mark(READAHEADSIZE)
String line = reader.readLine();
lineNum++
if(line == null){
reader.reset()
break
}
if( line.trim() =~ /^\[>.*\]/){ // section?
reader.reset()
break
}else{
buffer.append(line + LINESEP)
}
}
return buffer.toString()
}
}
Further Reading
- INI file
- Data File Metaformats
- Here document
- JSON configuration file format
- Groovy Object Notation (GrON) for Data
Interchange - Cloanto Implementation of INI File Format
- http://groovy.codehaus.org/Tutorial+5+-+Capturing+regex+groups
- URI
Similar Posts:
- Simple Java Data File
- JSON configuration file format
- Java testing using XStream, AspectJ, Vsdf
- Use Groovy to find missing lines in a file
- Groovy Object Notation (GrON) for Data Interchange

Pingback: JSON configuration file format | T. C. Mits
Pingback: Simple Java Data File in JavaT. C. Mits | T. C. Mits