Tag Archives: HSM

Private P2P HSM, a network for optimal virtual storage

In “Hierarchical Storage Management, from drive to cloud” I wrote about the idea of applying HSM by using a central storage or Cloud model. Here I discuss the next stage, using Peer-to-peer technology to create private clouds. The central remote cloud services are then useful as the expensive slow end point of data migration.

Background
Hierarchical storage management (HSM) is a system of using layers or tiers of storage resources to migrate files according to various criteria. Data would autmatically be migrated from RAM to Solid-State Drives, to disks, and finally to Tapes. This is an old technology, around since the 1970s, and primarily used at enterprise business. However, various forms of HSM have been used in consumer products, services, or at the OS level. One example is a new service such as Bitcasa.

P2P Synchronization
The use of P2P to synchronize storage to one’s own devices got a big boost with BitTorrent Sync (BTSync) created by BitTorrent. BTSync, though not a replacement for something like DropBox, offers an additional model to offset some limitations of cloud storage.

P2P HSM
BTSync showed that Peer-to-peer (P2P) sync works and is useful. Could this approach be used as a base to provide not only synchronization but migration of files to different tiers? A peer-to-peer-hierarchical-storage-management (p2pHSM).

With this approach, the private storage hierarchy is a true cloud, a private network of storage resources. This private cloud could also be connected to traditional cloud storage vendors, like BitCasa or DropBox, creating a hierarchy of storage Clouds. A further optimization is for the private P2P HSM to arbitrate or bid with multiple external Cloud providers for best rates and other criteria.

Scenario 1
Your extended family and real friends provides many computing devices to the private cloud: smartphones, NAS, USB drives, STBs, vehicles, laptops, PCs, tablets, and so forth. Each of these devices has limited storage capacities and bandwidth limits.

The storage resources on each device allocate a percentage of storage and bandwidth to the private cloud. The storage is secure and private, and its content available via access control permissions (ACL). For example, children cannot access their parents content.

  1. Your walking outside wearing your head mounted device, like Google Glass, and a Hummingbird flies into view, you turn on Record and create a video.
  2. The video gets transmitted to your mobile device.
  3. Your device notes that you are running out of space and a P2P storage request is made to the network of devices available.
  4. A high-end mobile phone is found that has extra space on a memory card.
  5. The original mobile device sends the video to that device.
  6. A week later, no one is viewing that video anymore so the system migrates that video to other storage.
  7. The next level is the laptop since it has plenty of room and is faster.
  8. A month later the video is again migrated to the family PC which has plenty of free space.
  9. Two months later it is again migrated to the family’s external cloud storage service if the monetary rules configured allow it.
p2pHSM
p2pHSM

At every file migration, the system seeks the optimization of price, performance, access, security, and capacity. Only when the private cloud is approaching certain limits or for backup purposes is the top, slowest, and most expensive tier, the conventional Cloud, used.

Issues
Of course, the history of file sharing systems is full of controversy. Some would say that a private P2P adds to the problems, notwithstanding that there are already plenty of public P2P networks and massive file sharing already occurring. Some issues are:

  • Security
  • Privacy
  • Copyrights and fair use
  • DRM
  • Is data accessed at the “Migration Level”, or migrated back?

Resources

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Hierarchical Storage Management, from drive to cloud

Years ago I came upon the HSM concept. Is this now applicable in today’s networked world for personal use? I propose HSM can be extended to encompass the Cloud.

HSM is an enterprise data storage technique, a tiered storage. Data is automatically moved from expensive but fast storage systems like hard disk arrays to cheaper but slower systems like optical or tape drive.

Conceptually, HSM is analogous to the cache found in most computer CPUs, where small amounts of expensive SRAM memory running at very high speeds is used to store frequently used data, but the least recently used data is evicted to the slower but much larger main DRAM memory when new data has to be loaded. — http://en.wikipedia.org/wiki/Hierarchical_storage_management

In the consumer world we have in our PCs fast hard drives and on mobile devices fast flash memory. A simple two tiered system would migrate least recently used files from local media to the cloud. In the home or SOHO environment, a three-tiered system is possible. Solid-State Drives (SSD) of modest size could be the 1st tier, SATA disk the 2nd, and finally, Cloud services can provide the 3rd tier.

For example, you have a PDF on your system that is a great resource, but you haven’t used it in a few weeks. The HSM manager would take that file and move it to the cloud (secure, private, encrypted, …., of course). In its place, to allow access by the user, is a link to the HSM managed storage location. Next time you use the file it will be migrated back to the local storage (but now also backed up in the cloud).

This is really an application of “file virtualization“.

Note that the HSM in enterprise systems is not simply based on “files” but on the underlying storage mumbo jumbo (frames, and all that).

This approach could make the potential future Windows 8 ‘Storage Spaces’ be even more useful. On *nix OS this is possible to implement now. It probably already is.

Demo
Here is a conceptual demo. We’ll use a known cloud storage service provider like Dropbox. As far as I know, Dropbox does not offer HSM.

On your PC you set a property on various folders that makes them eligible for HSM monitoring. This could be accomplished using a GUI and drag&drop. The HSM will immediately copy the folders to the SSD on your system or the main hard drive, if the files are not already on the fastest subsystem. In the original location of the folders, a link to the new locations will be created (soft links?). The end user will not see any difference. Kind of like “web folders” or WebDAV protocol.

After a period of time, the local HSM monitor will record which files have not been used and invoke the Dropbox local service to stream the files to the cloud. All that remains on the file system are links to the remote files; storage space is reclaimed.

Updates
Feb 26, 2012: Another company will be competing with DropBox. As above it allows the user to designate specific folders to participate in cloud storage. Since remote files will be slower to access, this company will attempt to “predict” which files would be used more often. See this article.

Further Reading

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.