Category Archives: storage

Groovy implementation of INIX file format, part 2

Continued experimentation with the INIX “ini” file format. I take the original post code and add an alias feature.

Background
In three prior posts I presented a very simple metadata file storage approach. Pretty much using the INI file format with the sections as “heredocs.” In original INI format, afaik, the data within the sections were properties only, x=y.

Updates

  • Dec 25, 2015: Just saw a mention of Tom’s Obvious, Minimal Language (TOML) This is not directly related to what I’m after with Inix, but interesting as an example of a simple markup language.
  • Alias
    Now I am working on adding the ability for a section of data to load data from another section. The sections to load are indicated by the ‘@’ symbol and then the section path. Multiple sections can be indicated. Though I’m using the term ‘alias’ for this, perhaps a better term is ‘importing’. So far, I can parse the alias from the tag string.

    I have not implemented the actual import. One complexity left to solve is recursion. If this section imports another section, what if that section imports others sections?

    Alias use case
    Since currently the Inix file format is being used for test data, aliasing allows reuse of data without duplication, i.e., DRY. This is problematic with hierarchical data like JSON or XML, but much easier with lists or maps. Further features like overriding and interpolation would be useful for Java Properties data. The goal would be to eventually support use of the Cascading Configuration Pattern.

    Example 1

    [>First]
    Data in first    
    [<]
    
    [>Second@First]
    Data in second
    [<]
    

    Now when the data in section “Second” is loaded, the data from the aliased section is prepended to the current section data:

    Data in first    
    Data in second
    

    Tag format
    The section tag format is now: [>path#fragment@aliases?querystring]. Note that unlike a URI, the fragment does not appear at the end of the string.

    The section ID is really the path#fragment. Thus, the end tag could be [<] or can use the section ID: [<path#fragment]. Example 2

    [>demo1/deploy#two@account897@policy253?enabled=true&owner=false]
    stuff here
    [<demo1/deploy#two]
    

    Grammar
    The start of a grammar follows, but has not been ‘checked’ by attempted use of a parser generator like Antlr.

    grammar Inix;
    section: start CRLF data end;
    start: '[>' path (fragment)?(alias)*('?' args)? ']';
    end: '[<' path? ']';
    path: NAME ('/' NAME)*;
    fragment: '#' NAME;
    alias: '@' NAME
    args: (NAME=NAME ('&' NAME=NAME)*)?;
    data: (ANYTHING CRLF)*;
    NAME: ('a'..'z' | 'A'..'Z')('a' .. 'z' | 'A'..'Z'|'0'..'9'|'_');
    
    TODO:

    1. Do the actual import of aliased section data.
    2. Allow multiple params per param key: ?measure=21,measure=34,measure=90. Or better yet, just allow array in arg string: measure=[21,34,90],color=red

     

    Implementation
    Source code available at Github: https://gist.github.com/josefbetancourt/7701645

    Listing 2, Implementation

    Test class
    Note that there are not enough tests and the implementation code has not been reviewed.

    Listing 3, Test class

    The test data is:

    Listing 4, data file

    Environment
    Groovy Version: 2.2.2 JVM: 1.7.0_25 Vendor: Oracle Corporation OS: Windows 7

    Further Reading

    1. The Evolution of Config Files from INI to TOML
    2. Groovy Object Notation using ConfigSlurper
    3. Configuration Files Are Just Another Form of Message Passing (or Maybe Vice Versa)
    4. INI file
    5. Data File Metaformats
    6. Here document
    7. JSON configuration file format
    8. Creating External DSLs using ANTLR and Java
    9. Groovy Object Notation (GrON) for Data
      Interchange
    10. Cloanto Implementation of INI File Format
    11. http://groovy.codehaus.org/Tutorial+5+-+Capturing+regex+groups
    12. URI
    13. Designing a simple file format
    14. The Universal Design Pattern
    Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

    Private P2P HSM, a network for optimal virtual storage

    In “Hierarchical Storage Management, from drive to cloud” I wrote about the idea of applying HSM by using a central storage or Cloud model. Here I discuss the next stage, using Peer-to-peer technology to create private clouds. The central remote cloud services are then useful as the expensive slow end point of data migration.

    Background
    Hierarchical storage management (HSM) is a system of using layers or tiers of storage resources to migrate files according to various criteria. Data would autmatically be migrated from RAM to Solid-State Drives, to disks, and finally to Tapes. This is an old technology, around since the 1970s, and primarily used at enterprise business. However, various forms of HSM have been used in consumer products, services, or at the OS level. One example is a new service such as Bitcasa.

    P2P Synchronization
    The use of P2P to synchronize storage to one’s own devices got a big boost with BitTorrent Sync (BTSync) created by BitTorrent. BTSync, though not a replacement for something like DropBox, offers an additional model to offset some limitations of cloud storage.

    P2P HSM
    BTSync showed that Peer-to-peer (P2P) sync works and is useful. Could this approach be used as a base to provide not only synchronization but migration of files to different tiers? A peer-to-peer-hierarchical-storage-management (p2pHSM).

    With this approach, the private storage hierarchy is a true cloud, a private network of storage resources. This private cloud could also be connected to traditional cloud storage vendors, like BitCasa or DropBox, creating a hierarchy of storage Clouds. A further optimization is for the private P2P HSM to arbitrate or bid with multiple external Cloud providers for best rates and other criteria.

    Scenario 1
    Your extended family and real friends provides many computing devices to the private cloud: smartphones, NAS, USB drives, STBs, vehicles, laptops, PCs, tablets, and so forth. Each of these devices has limited storage capacities and bandwidth limits.

    The storage resources on each device allocate a percentage of storage and bandwidth to the private cloud. The storage is secure and private, and its content available via access control permissions (ACL). For example, children cannot access their parents content.

    1. Your walking outside wearing your head mounted device, like Google Glass, and a Hummingbird flies into view, you turn on Record and create a video.
    2. The video gets transmitted to your mobile device.
    3. Your device notes that you are running out of space and a P2P storage request is made to the network of devices available.
    4. A high-end mobile phone is found that has extra space on a memory card.
    5. The original mobile device sends the video to that device.
    6. A week later, no one is viewing that video anymore so the system migrates that video to other storage.
    7. The next level is the laptop since it has plenty of room and is faster.
    8. A month later the video is again migrated to the family PC which has plenty of free space.
    9. Two months later it is again migrated to the family’s external cloud storage service if the monetary rules configured allow it.
    p2pHSM
    p2pHSM

    At every file migration, the system seeks the optimization of price, performance, access, security, and capacity. Only when the private cloud is approaching certain limits or for backup purposes is the top, slowest, and most expensive tier, the conventional Cloud, used.

    Issues
    Of course, the history of file sharing systems is full of controversy. Some would say that a private P2P adds to the problems, notwithstanding that there are already plenty of public P2P networks and massive file sharing already occurring. Some issues are:

    • Security
    • Privacy
    • Copyrights and fair use
    • DRM
    • Is data accessed at the “Migration Level”, or migrated back?

    Resources

    Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

    BitTorrent Sync on mobile

    Just tested the new free file share and sync software from BitTorrent Labs, BitTorrent Sync. I shared files between my home PC and my mobile phone, Samsung Note. It works and was easy to set up and use.

    Test
    Installed on Windows 7 PC. Copied some files onto the shared folder. Installed BTS on Android phone. Added a new folder to share. On PC copied the secret code (a very long alphanumeric string) to the BTS running on the mobile phone. Saw the files from the PC. On PC copied a new file into shared folder. Saw that file on mobile device. Sweet.

    One thing I noticed is that if you turn off WiFi you lose connectivity. I saw no settings on the mobile device for turning on 4G use.

    About
    You can read all about BTS on their official site or search web. In a nutshell it allows you to share your own files among your computer and devices without using a central server. What that gives you is security, privacy, speed, and no size limits. No cloud. Note that this is still in beta mode.

    Corporate issues
    Of course, this gives another avenue for corporate information to be compromised, by Bring Your Own Device (BYOD) initiatives, or just careless use.

    Alternatives
    The number one alternative is DropBox, of course. One of many advantages of DropBox is that it doesn’t require that both devices sharing files are turned on and connected. But in “Roll your own Dropbox with BitTorrent Sync on Amazon EC2” Sam Glover shows how to use your own server to do this. He shows how to use the Amazon EC2 system. If you run own home based server, it of course would be much easier.

    Further reading

    Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

    play music app won’t show sd card

    On Samsung Galaxy 2, mp3 files stored on the SD card are not being recognized by Google Play Music app. Solution.

    I tried some ideas found on Android forums, even uninstalling and reinstalling the app. Nothing worked.

    BTW, There is some weird caching that is not shown in the system. Anyway, this is what I did:

    1. I copied everything on the sd card to a a folder on a Kies Air attached PC.
    2. Deleted the files from the sd card.
    3. Powered down the phone.
    4. Turned it back on.
    5. Ran the Play Music app.
    6. It said there is no music on device.
    7. Copied all the music from the folder on the PC (from step 1).
    8. Powered down and back up.
    9. Ran the app again.
    10. Says it is scanning for media.
    11. Finds it.
    12. Give the phone back to it’s owner. Yuck.

    All the steps are probably not necessary, but this worked for me.

    Update
    March 2, 2013: It started happening again. Maybe its time to update to Android’s Ice Cream Sandwich version.

    Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

    Hierarchical Storage Management, from drive to cloud

    Years ago I came upon the HSM concept. Is this now applicable in today’s networked world for personal use? I propose HSM can be extended to encompass the Cloud.

    HSM is an enterprise data storage technique, a tiered storage. Data is automatically moved from expensive but fast storage systems like hard disk arrays to cheaper but slower systems like optical or tape drive.

    Conceptually, HSM is analogous to the cache found in most computer CPUs, where small amounts of expensive SRAM memory running at very high speeds is used to store frequently used data, but the least recently used data is evicted to the slower but much larger main DRAM memory when new data has to be loaded. — http://en.wikipedia.org/wiki/Hierarchical_storage_management

    In the consumer world we have in our PCs fast hard drives and on mobile devices fast flash memory. A simple two tiered system would migrate least recently used files from local media to the cloud. In the home or SOHO environment, a three-tiered system is possible. Solid-State Drives (SSD) of modest size could be the 1st tier, SATA disk the 2nd, and finally, Cloud services can provide the 3rd tier.

    For example, you have a PDF on your system that is a great resource, but you haven’t used it in a few weeks. The HSM manager would take that file and move it to the cloud (secure, private, encrypted, …., of course). In its place, to allow access by the user, is a link to the HSM managed storage location. Next time you use the file it will be migrated back to the local storage (but now also backed up in the cloud).

    This is really an application of “file virtualization“.

    Note that the HSM in enterprise systems is not simply based on “files” but on the underlying storage mumbo jumbo (frames, and all that).

    This approach could make the potential future Windows 8 ‘Storage Spaces’ be even more useful. On *nix OS this is possible to implement now. It probably already is.

    Demo
    Here is a conceptual demo. We’ll use a known cloud storage service provider like Dropbox. As far as I know, Dropbox does not offer HSM.

    On your PC you set a property on various folders that makes them eligible for HSM monitoring. This could be accomplished using a GUI and drag&drop. The HSM will immediately copy the folders to the SSD on your system or the main hard drive, if the files are not already on the fastest subsystem. In the original location of the folders, a link to the new locations will be created (soft links?). The end user will not see any difference. Kind of like “web folders” or WebDAV protocol.

    After a period of time, the local HSM monitor will record which files have not been used and invoke the Dropbox local service to stream the files to the cloud. All that remains on the file system are links to the remote files; storage space is reclaimed.

    Updates
    Feb 26, 2012: Another company will be competing with DropBox. As above it allows the user to designate specific folders to participate in cloud storage. Since remote files will be slower to access, this company will attempt to “predict” which files would be used more often. See this article.

    Further Reading

    Creative Commons License
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.