Tag Archives: data

Groovy implementation of INIX file format, part 2

Continued experimentation with the INIX “ini” file format. I take the original post code and add an alias feature.

Background
In three prior posts I presented a very simple metadata file storage approach. Pretty much using the INI file format with the sections as “heredocs.” In original INI format, afaik, the data within the sections were properties only, x=y.

Updates

  • Dec 25, 2015: Just saw a mention of Tom’s Obvious, Minimal Language (TOML) This is not directly related to what I’m after with Inix, but interesting as an example of a simple markup language.
  • Alias
    Now I am working on adding the ability for a section of data to load data from another section. The sections to load are indicated by the ‘@’ symbol and then the section path. Multiple sections can be indicated. Though I’m using the term ‘alias’ for this, perhaps a better term is ‘importing’. So far, I can parse the alias from the tag string.

    I have not implemented the actual import. One complexity left to solve is recursion. If this section imports another section, what if that section imports others sections?

    Alias use case
    Since currently the Inix file format is being used for test data, aliasing allows reuse of data without duplication, i.e., DRY. This is problematic with hierarchical data like JSON or XML, but much easier with lists or maps. Further features like overriding and interpolation would be useful for Java Properties data. The goal would be to eventually support use of the Cascading Configuration Pattern.

    Example 1

    [>First]
    Data in first    
    [<]
    
    [>Second@First]
    Data in second
    [<]
    

    Now when the data in section “Second” is loaded, the data from the aliased section is prepended to the current section data:

    Data in first    
    Data in second
    

    Tag format
    The section tag format is now: [>path#fragment@aliases?querystring]. Note that unlike a URI, the fragment does not appear at the end of the string.

    The section ID is really the path#fragment. Thus, the end tag could be [<] or can use the section ID: [<path#fragment]. Example 2

    [>demo1/deploy#two@account897@policy253?enabled=true&owner=false]
    stuff here
    [<demo1/deploy#two]
    

    Grammar
    The start of a grammar follows, but has not been ‘checked’ by attempted use of a parser generator like Antlr.

    grammar Inix;
    section: start CRLF data end;
    start: '[>' path (fragment)?(alias)*('?' args)? ']';
    end: '[<' path? ']';
    path: NAME ('/' NAME)*;
    fragment: '#' NAME;
    alias: '@' NAME
    args: (NAME=NAME ('&' NAME=NAME)*)?;
    data: (ANYTHING CRLF)*;
    NAME: ('a'..'z' | 'A'..'Z')('a' .. 'z' | 'A'..'Z'|'0'..'9'|'_');
    
    TODO:

    1. Do the actual import of aliased section data.
    2. Allow multiple params per param key: ?measure=21,measure=34,measure=90. Or better yet, just allow array in arg string: measure=[21,34,90],color=red

     

    Implementation
    Source code available at Github: https://gist.github.com/josefbetancourt/7701645

    Listing 2, Implementation

    Test class
    Note that there are not enough tests and the implementation code has not been reviewed.

    Listing 3, Test class

    The test data is:

    Listing 4, data file

    Environment
    Groovy Version: 2.2.2 JVM: 1.7.0_25 Vendor: Oracle Corporation OS: Windows 7

    Further Reading

    1. The Evolution of Config Files from INI to TOML
    2. Groovy Object Notation using ConfigSlurper
    3. Configuration Files Are Just Another Form of Message Passing (or Maybe Vice Versa)
    4. INI file
    5. Data File Metaformats
    6. Here document
    7. JSON configuration file format
    8. Creating External DSLs using ANTLR and Java
    9. Groovy Object Notation (GrON) for Data
      Interchange
    10. Cloanto Implementation of INI File Format
    11. http://groovy.codehaus.org/Tutorial+5+-+Capturing+regex+groups
    12. URI
    13. Designing a simple file format
    14. The Universal Design Pattern
    Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

    Simple Java Data File

    In a prior post “A very simple data file metaformat” I presented an idea for an INI type of file and used an example implementation in the Groovy language.

    I rewrote it in Java. One change was the use of section terminators. I’m still not sure about specifying of the data type in the section header.

    Note the only testing of this is the embedded JUnit tests as shown in the source.
    Update: A version of this is being successfully used to store test data for a suite of unit tests. Each file holds several sections with hundreds of lines each of XML and other types.

    API

    • load(): Load all sections into a map.
    • load(String id, String subsection): Load a specific section.
    • T loadAs(final Class type, final String id, final String subsection): load a section as the given class type
    • next(): Pull event style reading.
    • void parse(final BufferedReader reader,
      final Callable callable)
      : Push event style reading.
    • void parse(final String filePath, final Runnable runnable): Push event style reading.

    Update 1 20121208T1809-5
    Java Jar manifest files have a very useful property. Attributes which are in the ‘main’ section are inherited by individual entries unless redefined in the entry. For the application I envisioned this metaformat, that would be ideal. One way of doing this is that any data contained in a section ‘x’ is also applicable or in scope for any section x/y. And, so forth, x/y/z would have the data in x/y available.

    Update 2 20130107T2000-5
    Added method Map load() throws IOException. This is more easy to use method of accessing inix file section data. This is only in the source Gist repo.

    Update 3 20130113T1300-5
    Added loadAs(…) to make loading specific types easier.

    Update 4 20131028T0939-5
    Doh, forgot about mentioning Apache Commons Configuration. It does support hierarchical INI files. However, the sections in that format only support property mappings.

    The test data file is:

    # Example very simple data file
    ! Another comment
    
    #[>map:/junk]
    #[<]
    
    [>list:credit/tests]
    one
    two
    three
    [<]
    
    [>credit/report]
    one,two,three
    [<]
    
    [>properties:credit/config]
    one=alpha
    two=beta
    three=charlie
    [<]
        [>xml:credit/beans]
    	<description>
    	    <item>one</item>
    	    <item>two</item>
    	    <item>three</item>
    	</description>
    	[<]
    	
    [>notype/sub]
    [<]	
    
    [>credit/alerts]
    ["one","two","three"]
    [<]
    [>credit]
    one
    two
    three
    [<]
    
    [>credit/coverages]
    ["one","two","three"]
    [<]
    
    
    

     

    Further Reading

    1. INI file
    2. Java Properties file format
    3. Data File Metaformats
    4. Here document
    5. JSON configuration file format
    6. Groovy Object Notation (GrON) for Data
      Interchange
    7. Cloanto Implementation of INI File Format
    8. http://groovy.codehaus.org/Tutorial+5+-+Capturing+regex+groups
    9. URI
    10. RFC822
    11. MIME
    Click to expand implementation source
    Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

    A very simple data file metaformat

    What is the simplest data file metaformat you can create and yet be able to handle future complexity? I started puzzling about this yet again.

    Also see follow up post: Simple Java Data File
    An example application is given here: Java testing using XStream, AspectJ, Vsdf

    Scenario
    I had some maintenance to do. So, to reduce the big ball of mud I decided to use external data files. This is where the complexity came in. If I have d data files for each component, and c “components” then the total number of data files is d*c. Future maintenance of so many files is not optimal.

    One thing about the required data files in this scenario, some would contain lists, others would be key, value pairs, and so forth. Could these be combined into one data file? I looked at JSON, YAML, XML, and even GRON. Though good they seemed excessive. What if, for example, I needed a simple list? In a simple text file this could be stored with an item per line, or using simple separators. In the aforementioned metaformats not so.

    Solution
    I revisited the Windows INI file format and just added metadata to a section. A section, indicated with a header “[…]”, also indicates what its data format is. Also, we allow subsections: [type:identifier/section]. This is similar to a URI. The subsection, which can be a hierarchical ‘path’, is optional. The type is optional, default being list (update: text). If the file has no sections, it is just a line oriented file of data in a list. (Update: line oriented string data).

    In the original ini file format, the section data were key=value pairs. Here we follow the freedom of a HEREDOC.

    The data type indication is practical when standard collections are being created such as lists, map, arrays, and so forth. We can use a generic “text’ type for a non-typed string payload. Since a host application will know what data it is extracting from a data file, the higher level types such as XML, JSon, and others are of limited value.

    The use of subsections in the section name allows scoping, but this was also possible in the original INI file format, just not “formalized”. True subsections should probably be nested sections, i.e. hierarchy. But, then we are now losing the simplicity.

    Subsections (though not nested) allow the use of cascading data. Data in a section is automatically reused or available in matching subsections.
    See Cascading Configuration Pattern.

    Example

    # Example very simple data file
    #
    [>list:credit/tests]
    one
    two
    three
    [>csv:credit/report]
    one,two,three
    [>properties:credit/config]
    one=alpha
    two=beta
    three=charlie
    [>xml:credit/beans]
    <description>
    	<item>one</item>
    	<item>two</item>
    	<item>three</item>
    </description>
    [>json:credit/alerts]
    ["one","two","three"]
    [>credit]
    one
    two
    three
    [>gron:credit/coverages]
    ["one","two","three"]
    

     

    Section Production Rule
    **** Note: this is an incorrect production *****
    file ::= section* ;
    section ::= ‘[>’ (type:)? identifier (‘/’ subsection)? ‘]’ (data)+ sectionTerminator;
    data ::= (line lineEnd)*;
    identifier ::= name
    subsection ::= name [/name]* ;
    sectionTerminator ::= ‘[<' identifier ('/' subsection)? name ::= [a-zA-Z0-9-_]+

     

    What we have now is a line oriented data file that can contain other data formats, and with no sections the file is just a line oriented list. Listing three is a demo in the form of a Groovy language JUnit 4 test.

    Listing 2, Groovy language JUnit test as a demo
    import com.octodecillion.vsdf.*
    import static com.octodecillion.vsdf.Vsdf.EventType.*
    import org.junit.Test
    import static org.junit.Assert.assertEquals
    
    /** Test Vsdf */
    class VsdfTest /*extends GroovyTestCase*/ {
    	def LINESEP =  System.properties.get("line.separator")
    	
    	@Test
    	void testshouldGetListData(){
    		def reader = new Vsdf()
    		reader.reader = new BufferedReader(
    			new FileReader(new File("data-1.vsdf")))
    		
    		def theEvent = reader.next()		
    		
    		while(theEvent != Vsdf.EventType.END){
    			def event = reader.getEvent()
    			
    			if(isSectionCreditWithList(event)){
    				def data = event.text.split(LINESEP)				
    				assert data.size() == 3				
    				assert ( (data[0] == 'one') && 
    					     (data[1] == 'two') && 
    					     (data[2] == 'three') )				
    			}
    			
    			theEvent = reader.next()
    		}
    			
    	}	
    	
    	/** */
    	def isSectionCreditWithList(evt){		
    		return evt.id == 'credit' && evt.dataType == 'list'
    	}	
    
    }
    

     

    Sample run:

    groovy -cp . VsdfTest.groovy
    JUnit 4 Runner, Tests: 1, Failures: 0, Time: 281
    

    Limitations
    Not quite correct yet. One issue is that file encoding format. If we want to include other formats they have their own requirements, Java properties, JSON, XML, and so forth. For example, JSON is Unicode. I don’t think this is a major issue, this solution is meant for config data, so ASCII files are adequate.

    Also, should the sections have terminators? Right now, the end of a section is simply the start of another. (Update: the version of this concept in actual use is terminator based, i.e., [<] or [<id/subsection...]) Implementation
    Below in listing 3 is a very simple implementation in the Groovy language to show how easy this data file is to use. Note this is just a proof of concept and has not been thoroughly tested. I don’t think the use of mark and reset in the file reading is robust; how do you determine the correct read ahead buffer? To make it easier to parse I think the format will need to have section terminators as does HEREDOCS in Linux.

    Source code available as a gist.

    Listing 3, Groovy implementation
    // File Vsdf.groovy
    // Author: Josef Betancourt
    //
    
    package com.octodecillion.vsdf
    import groovy.transform.TypeChecked;
    import java.text.BreakDictionary;
    import java.util.regex.Matcher
    import org.codehaus.groovy.control.io.ReaderSource;
    
    /**
     * @author Josef Betancourt
     *
     */
    class Vsdf {
    	String currentFolder
    	String iniFilePath
    	Reader reader
    	int lineNum
    	int sectionNum
    	VsdfEvent data
    	int READAHEADSIZE = 8*1024
    	def LINESEP = System.properties.get("line.separator")
    
    	enum State {
    		INIT, ACCEPT, SHIFT, END
    	}
    	
    	public enum EventType {
    		COMMENT, SECTION, END
    	}
    	
    
    	def state = State.INIT
    
    	public Vsdf(){
    		currentFolder = new File(".").getAbsolutePath()
    	}
    	
    	/**
    	 * Value object for parsed sections.
    	 *  
    	 */
    	class VsdfEvent {
    		EventType event
    		String dataType
    		String namespace
    		String id
    		String text
    		int lineNum
    		int sectionNum
    		String sectionString
    	}
    	
    	VsdfEvent getEvent(){
    		return data		
    	}
    
    	/**
    	 * 
    	 * @return
    	 */
    	@TypeChecked
    	EventType next(){
    		String line = reader.readLine();
    		lineNum++
    		
    		if(line == null){
    			return EventType.END
    		}
    		
    		String type =''
    		String namespace = ''
    		String id = ''
    		data = new VsdfEvent()
    		EventType eventType
    		
    		def isBlank = !line.trim() 
    		
    		// skip blank lines	
    		if( isBlank){
    			while(true){
    				line = reader.readLine()
    				lineNum++
    				if(line  || line == null){
    					break;
    				}
    			}
    		}	
    		
    		def isComment = line =~ /^\s*#/
    		if(isComment){
    			data.text = line
    			data.event = EventType.COMMENT
    			data.lineNum = lineNum
    			eventType = EventType.COMMENT			
    		}
    		
    		if( line.trim() =~ /^\[>.*\]/){ // section?
    			eventType = EventType.SECTION
    			data.sectionString = line
    			sectionNum++
    			processSection(line, sectionNum, data)			
    		} // end if section head
    
    		return eventType 
    	}
    	
    	/**
    	 * 
    	 * @param line
    	 * @return
    	 */
    	def processSection(String line, int sectionNum, VsdfEvent data){
    		data.event = EventType.SECTION
    		data.sectionNum = sectionNum
    		
    		Matcher m = (line.trim() =~ /^\[>(.*)\]/)
    		String mString = m[0][1]
    		def current = mString.trim()
    		if(!current){
    			def msg = "section $sectionNum is blank"
    			throw new IllegalArgumentException(msg)
    		}
    		
    		def parts = (current =~ /^(.*):(.*)\/(.*)$/)
    		if(!parts){
    			data.id = current
    			data.dataType='list'
    		}else{
    			long size = ((String[])parts[0]).length
    			data.dataType = size > 0? parts[0][1] : ''
    			data.id = size > 1  ? parts[0][2] : ''
    			data.namespace = size > 2 ? parts[0][3] : ''
    		}	
    		
    		String readData = readSectionData()
    		data.text = readData	
    	}
    	
    	/**
    	 * 
    	 * @return
    	 */
    	String readSectionData(){
    		StringBuilder buffer = new StringBuilder(READAHEADSIZE)		
    
    		while(true){
    			reader.mark(READAHEADSIZE)
    			String line = reader.readLine();
    			lineNum++
    			
    			if(line == null){
    				reader.reset()
    				break
    			}
    			
    			if( line.trim() =~ /^\[>.*\]/){ // section?				
    				reader.reset()	
    				break
    			}else{
    				buffer.append(line + LINESEP)	
    			}
    		}	
    		
    		return buffer.toString()	
    	}
    }
    

    Further Reading

    1. INI file
    2. Data File Metaformats
    3. Here document
    4. JSON configuration file format
    5. Groovy Object Notation (GrON) for Data
      Interchange
    6. Cloanto Implementation of INI File Format
    7. http://groovy.codehaus.org/Tutorial+5+-+Capturing+regex+groups
    8. URI
    9. Designing a simple file format
    10. The Universal Design Pattern
    Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.