Category Archives: General Storage

A Roundup of Storage Startups

The enterprise storage market has been a hotbed of innovation and entrepreneurship in the last several years.  While the storage industry has consolidated with acquisitions (such as HP’s purchase of Nimble) or otherwise simply shutting down, there always seem to be new companies waiting in the wings to take over.

These latest new companies are hoping to prove that they have a new and better way to address the increasing challenges of managing the huge amount of data growth by implementing their own take on enterprise storage technology.  They all come to the market with both the hope and the potential to change how the enterprise business market stores their data.

An up to date list of storage startups is hard to maintain, as the ranks are growing fast and companies can appear seemingly out of nowhere.  This latest crop has some game changing ideas and I look forward to seeing how their technology will shape the future of enterprise storage.  Some have been around for several years and are starting to mature, others started less than a year ago.

Many new companies are hoping (betting?) that the market will see a need for a new data managing layer of software that provides improved management capabilities across  multiple silos of data both on-premises and in the cloud.  Some of the emerging suppliers for data software management are Actifio, Avere (now part of Microsoft), Catalogic, Cohesity, Delphix, Druva, Rubrik, Scality and Strongbox. I’m going to be focusing more on the hardware suppliers in this post, so let’s take a closer look at some of the rising stars in the enterprise storage market.  I’m going to take a look at 23 companies in total in this post, in no particular order:  E8 Storage, Igneous Systems, Komprise, Portworx, Primary Data, Reduxio, Talena, Alluxio, Aparavi, Attala Systems, Datera, Datrium, Elastifile, Morro Data, Excelero, Minio, Nyriad, ScaleFlux, StorageOS, Storj Labs, Vexata, Wasabi and WekaIO.

I have no affiliation with any of these hardware vendors, this is simply a compiled list I generated with some basic online research.  The data presented is based primarily on marketing information that I gathered.  For more detailed information I recommend reviewing their individual company websites, links are provided for all of them.

     E8 Storage | CEO: Zivan Ori

E8 Storage focuses on shared accelerated storage for data-intensive, top tier applications that require a large amount of IO. Their scalable solution is well suited for intense low latency workloads,  real time analytics, financial and trading applications, transactional processing and large scale file systems.  Their patented high performance shared NVMe storage solution delivers much higher performance, improved storage performance density, and lower costs when compared to legacy systems.   They promise NVMe performance without giving up reliability and availability.  The company is privately held and based in Santa Clara, CA with R&D in Tel Aviv, and they have channel partners in the US and in Europe.

Their hardware is built on industry standards, including converged ethernet with RDMA and standard 2.5″ NVMe SSDs.  Up to 96 host servers can connect to each storage controller, and each controller is concurrently linked to shared storage to deliver scalability into the petabytes.

Potential customers can purchase their software separately or as an integrated system if an appliance based solution is a better fit. Bought independently, their software allows the use of hardware from any vendor, as long as the vendor is on their pre qualified list.  It also allows businesses to take advantage of economies of scale within their own supply chains and purchase new units at a pace that suits their needs.

  Igneous Systems | CEO: Kiran Bhageshpur

Igneous Systems is a Seattle-based, venture-backed company that designed a secondary storage system designed to support massive file systems. Their Hybrid Storage Cloud solution provides enterprises a consolidated secondary storage tier with cloud support and scalability.  Igneous remotely manages all on-premises cloud infrastructure, which includes monitoring, troubleshooting, and non-disruptive software upgrades.

Their infrastructure scales from 100TB to 100PB and uses their RatioPerfect architecture, which consists of distributed nano-servers that make the infrastructure resistant to hardware failures. This cloud-like architecture enables Igneous to offer cloud economics in the enterprise data center.

Unlike traditional storage equipment, Igneous Hybrid Storage Cloud uses an integrated serverless environment designed for data centric applications.  It features integrated backup and archive applications that are designed to seamlessly integrate with enterprise NAS as well as tiering data to the cloud.  Integrated search capabilities are built directly into the infrastructure and therefore require no separate backup catalogs to manage. Igneous Hybrid Storage Cloud is specifically designed for massive file systems managing billions of files,  unlike legacy backup systems.

They provide easy to deploy  storage that is a cost effective alternative to cloud data storage.  They provide a managed hardware solution on-premises and look after everything from maintenance and provisioning to performance tuning. Their pricing model is based on consumption. With a background at EMC-Isilon, the Igneous team has a great deal of experience in building infrastructure for unstructured data.

They were recognized by Gartner as a 2017 “Cool Vendor in Storage Technologies”.

     Komprise | CEO: Kumar K. Goswami

Komprise aims to address the issues of storage sprawl and rising costs with storage analytics.  They contend that storage management requires getting as close as possible to realtime insight into what is happening, and their software addresses this by providing metrics together with analytic tools to build a variety of data policies.  They then manage data placement across storage tiers and multiple clouds. Their software allows for interactively modelling multiple scenarios before moving the configuration into production.

Their intelligent data management provides an alternative to more expensive solutions from larger, more established vendors. The company’s IDM platform enables customers to lower NAS costs and ongoing cloud operations by using analytics to intelligently automate archiving and disaster recovery. The  service also allows for transparent access of data across on-premises NAS storage and the cloud.

Their analytics processing identifies data that is most suited for the cloud and then transparently archives and replicates the data. User defined policies are automated to move and manage data across on-prem NAS storage tiers.

      Portworx | CEO: Murli Thirumale

Portworx provides storage for containers and brings persistent storage to all of the common container schedulers.  All of the most popular databases are supported in the container environment.  They are an early player in the persistent container storage field, but have signed up some big names like GE Digital and Lufthansa Systems.  They are betting on the recent trends to replace hypervisors with containers and see persistent storage as the wave of the future.

They provide scheduler integrated data services forproduction enterprise containers and allow users to deploy stateful containers on-prem, in the cloud, or in hybrid clouds.  In contrast to legacy storage that has container connectors built on key-value stores, they are designed and built for cloud-native applications, making container data more portable, persistent and protected.

Primary Data | CEO: Lance Smith

Primary Data’s storage is based on the idea of extensible metadata, using open ended tagging of data objects to control them (i.e. life-cycle management and priority of service), but they also add telemetry to the equation to allow real time automated data placement.

Parallel access to metadata and metrics processing has the effect of speeding up I/O performance, and they keep it cheap by implementing a “pay as you go” pricing model.  Their leadership team happens to include Steve Wozniak (how cool is that?), who is listed as as chief scientist. In 2017 they announced $40 million in new funding and a new version of their storage platform.

Reduxio | CEO: Mark Weiner

Reduxio’s TimeOS software delivers high performance enterprise storage solutions with unique data management capabilities.  They put data at the middle of their architecture and allow complete virtualization of all types of storage.

Their HX550 multi-tier storage solution with built-in BackDating allows customers to modernize and simplify their storage infrastructure and IT operations by deploying flash storage that is cost-effective and that can be used across all their applications.

Reduxio’s unified storage platform is designed to deliver near-zero RPO and RTO, while greatly simplifying the data protection process and providing built-in data replication for disaster recovery. The features in the TimeOS v3 released in June 2017 enabled a single platform for the end-to-end management of the life cycle of an application’s data.

They already have a global install base of more than 150 enterprise customers, many with multiple installed systems across a wide range of industrial sectors, including Managed Service Providers, Manufacturing, BioTech, Education, State and Local Government and Professional Services.  Their product seems to be catching on.

Talena | CEO: Srinivas Vadlamani 

Talena developed the industry’s fastest data backup and recovery solution with built-in machine intelligence to handle huge data sets  with mission-critical applications sitting on top of modern data platforms such as DataStax/Cassandra, Couchbase, Hadoop HBase/Hive, MongoDB and Vertica.  Talena takes advantage of machine learning to ensure data resiliency in the event of disasters. They have the ability to back up and recover petabyte-sized and larger data sets much faster than other solutions on the market, minimizing the impact of data loss and greatly reduce downtime. Their growing customer base includes leading Fortune 500 businesses in the retail, financial services and travel industries, among others. 

Targeting the big-data market, They provides backup, recovery, archiving and test data management for major unstructured databases.  Their key features include deduplication and replication control via user-defined policies. The technology supports data-masking algorithms to prevent data exposure as data is moved around or used in testing.

     Alluxio | CEO: Haoyuan Li

Alluxio (formerly known as Tachyon) provides virtual distributed storage for Big Data.  They aim to become the storage abstraction layer for Big Data in the same manner that Apache Spark became the computation layer. Their memory centric architecture allows developers to interact with a single storage layer API without worrying about the configurations and complexities of the underlying storage and file systems.

Alluxio is a virtual distributed storage layer between big data computation frameworks and underlying storage systems that delivers data at memory speed to any target framework from any storage system regardless of its location.  They aim to address the challenge of data locality.  While in-memory stoage is usually  viewed as cache, their technology allows for separation of the function layer from the persistent storage layer.  Organizations can run any big data framework (like Apache Spark) with any storage system or filesystem underneath (like S3, EMC, NetApp, OpenStack Swift, Red Hat GlusterFS, etc.), and run it on any storage media (DRAM, SSD, HDV, etc.), and with that they support a unified global namespace by virtualizing disparate storage systems.

The company was founded by the creators of the Alluxio open source project from UC Berkeley AMPLab.

Aparavi | Chairman:  Adrian Knapp

Aparavi offers cloud data protection and remote disaster recovery as a service. Their cloud-forward solution offers a RESTful API, a policy engine, an open data format, and a multi-tenant architecture.  Their technology can reduce a customer’s storage footprint compared to more traditional methods while making sure compliance policies are adhered to.  They aim to address the issues of evolving global regulations and the huge amounts of data now being generated with long-term data retention solutions across modern, multi-cloud architectures.

At their core, they aim to better prepare their customers to meet the challenges of long-term data retention across multi cloud architectures. They designed and built a new software-as-a-service platform from scratch to allow companies to protect data on-prem and in in the cloud.  They also aim to break the typical barriers of cost, vendor lock-in, complexity and regulatory compliance requirements that cause businesses problems when utilizing more conventional solutions.  The company is run by management and engineers with a ton of experience in data retention, and the issue they are attempting to resolve is something I’ve seen directly in the companies I’ve worked for.  They may have something here.

         Attala Systems | CEO: Taufik Ma

Attala offers high performance computing and primary cloud storage.  Their product utilizes a scale out fabric running on standard ethernet to interconnect servers and data nodes in a data center.  Because they focus on scale out cloud storage and use an FPGA based fabric, they  are able to effectively eliminate legacy storage management layers. They tout that their product provides over ten million IOPS per scale-out node with latencies as low as 16 microseconds.

The Attala fabric includes the Model HNA host PCIe adapters, providing full hardware emulation of NVMe SSDs, thus allowing their solution to expose pooled resources as virtual SSDs. The host OS, hypervisor or driver see the virtual SSDs as real SSDs using standard NVMe drivers, so they can be used with any OS, hypervisor, ore bare-metal provisioning software.

The software also offers a fully automated orchestration layer, where the fabric dynamically and securely attaches volumes from storage resources from across the network directly to bare-metal servers, virtual machines or containers. No host agents or other software is required, so deployment and maintenance of the system across heterogeneous environments is fairly simple.

Datera | CEO: Marc Fleischmann 

Datera aims to solve what they see as some of the biggest challenges in storage. Their key-value store approach uses NVDIMM to speed up write operations, coalesce writes, and provide a cache for reads. Their access protocol can aggregate massively parallel reads, and their software tools provide many of the same compression, snapshot and replication features offered by the bigger and more established storage vendors. The software also works with orchestration tools from VMware, OpenStack, and Docker.

The product is focused on DevOps and cloud native apps use cases.  It runs on x86 servers with flash, and there is iSCSI-based native integration with OpenStack, CloudStack, VMware vSphere and container orchestration platforms such as Docker, Kubernetes and Mesos.

Some of the key features of DEDF include a RESTful interface, API-first operations to provide web scale automation with full infrastructure programmability, policy based configuration, self service provisioning, a scale out model, a flash first design that delivers high efficiency and low latency, multi tenancy and QoS for cloud-native and traditional workloads, and heterogeneous component support for easily scaling across commodity x86 servers.

       Datrium | CEO: Brian Biles

Datrium offers stackable appliances that act as servers.  Each appliance has a flash cache and they are are linked to a back-end storage unit with larger hard drives that then serves as the primary storage. Enterprise features such as compression, deduplication and end-to-end encryption are included.   They also offer an advanced snapshot tool that includes a catalog of snapshots.  Their product takes a slightly different approach than a hyperconvergence vendor like Nutanix that only pools drives that are built in to its servers.

They see compute, primary storage, secondary storage and cloud storage all coming together in a configuration that is scalable and easy to manage without the need for a silo for each storage class. As most IO requests will utilize the on-board flash cache on their nodes, they can deliver excellent performance without ever having to go to the data nodes.

Compute nodes can be supplied by Datrium, or clients can also use their existing infrastructure.  As persistent data resides on the data nodes, compute nodes are stateless and can go offline without risking data loss or corruption.  They supports a wide variety of environments including vSphere 5.5-6.5, Red Hat 7.3, CentOS 7 1611, and Docker 1.2.

They have a very unique spin on convergence, and their DVX system really enhances storage efficiency, which is critical to getting the most out of the flash in the data nodes.

Elastifile | CEO: Amir Aharoni

Elastifile offers file storage and scale-out file storage.  Their product employs the Bizur consensus algorithm, a distributed metadata model using an adaptive data placement methodology to provide cloud enabled storage services capable of handling transactional workloads with very low latency.

Their software is designed to help large and midsize enterprises scale up through the cloud to thousands of nodes and millions of IOPs for their most mission critical workloads. It will run on any server and can use any type of flash (3D and TLC included).  They claim to bring flash performance to all enterprise applications while reducing the capex and opex of virtualized data centers, and simplify the adoption of hybrid cloud by extending file systems across on-prem and cloud deployments.

       Morro Data | CEO: Paul Tien

Morro Data offers file storage and hybrid cloud solutions.  Their CloudNAS service combines an on-prem cache with S3 or Backblaze cloud storage, designed to give small/midsize businesses an alternative to using local file servers.

What separates them from others is a global distributed file system that synchronizes customer data between one or more on-prem CacheDrive hardware appliances and public cloud storage. The CacheDrives store frequently accessed data on site for better performance.

Their CacheDrive also serves as a cloud storage gateway to improve the performance of file transfers to object storage in the cloud. It is also designed to optimize bandwidth in order to accommodate less than ideal connections.  Morro also supports key enterprise storage features like the more established vendors, like data encryption, compression, retention policies and data recovery.

CloudNAS is designed to be used as primary storage, with the master copy of the data stored in the cloud and synchronized to the CacheDrives at local sites.

Excelero | CEO: Lior Gal

Excelero is a software-defined storage technology startup, and their primary offering is their NVMesh Server SAN software. They designed the software to pool NVMe storage from multiple servers.  The pooled storage then offers very high performance and is intended to be used as primary storage.

Using an SDS approach with an NVMe over Fabrics mesh storage stack, they aim to address issues with hyperconverged infrastructure.  Accessing a drive utilizes the RDMA feature of NVMe over Fabrics, which results in very low latency and shifting the CPU load on the initiating system rather than the one holding the drive.

  Leonovus | Chair: Michael Gaffney

Lenovus’ product is an advanced blockchain storage and compute solution with a marketplace for cloud applications.  Leonovus has invested over twenty million dollars in the development of distributed compute and distributed storage technology. They have been several granted patents, numerous patent claims and patents pending. Their unique software-defined storage solution has strong intellectual property protection.

Their software defined object storage technology is designed for enterprise on-prem, hybrid or public cloud users that have governance, risk management and compliance requirements. The software is designed to run on existing storage hardware.

        Minio | CEO: Anand Babu Periasamy

Minio is an easy to deploy open-source object storage server that uses an Amazon S3 compatible API.  They develop software for cloud-native and containerized applications to help businesses with the management of the exponential growth of unstructured data.  They also support Amazon AWS compatible lambda functions to perform useful actions like thumbnail generation, metadata extraction and virus scanning.

Their open source based object storage server is primarily designed to be used for cloud applications and DevOps departments. Application developers can containerize storage, apps and security simultaneously and use the same resources. The server enables applications to manage large amounts of unstructured data and enables cloud and SaaS application developers to more quickly and easily use and implement emerging cloud hosting providers like Digital Ocean, Packet and Hyper.sh.  It has proven to be popular in the Docker, Mesos and Kubernetes communities because of its cloud native architecture.

    Nyriad | CEO: Matthew Simmons

The core of Nyriad’s platform is their “NSULATE” technology, which uses a GPU to perform the processing. GPUs are specialized for floating point calculations, and Nyriad uses that enhanced capability to generate parity calculations that would otherwise be impossible with a CPU or a RAID controller.

They claim that NSULATE can handle dozens of simultaneous device failures in real-time and maintain consistent I/O performance.  Using Netlist’s NVvault non-volatile DIMMs to create a Linux storage platform, It can scale up to millions of IOPs and allow data to be directly sent to the GPU for storage processing.  As it directly bypasses the Linux kernel, it can offer improved performance.

NSULATE technology also allows for compute and storage to coexist on the same node. The idea is to enable storage nodes to be configured for computation in order to speed up I/O-related code, which can accelerate applications that typically hit a brick wall with storage IO bottlenecks.

ScaleFlux | CEO: Hao Zhong

ScaleFlux’s Converged Cloud Subsystem (CCS) is a tightly unified software and flash based hardware subsystem solution that easily and cost effectively integrates into Big Data, scale out servers. CCS collapses the traditional scale up storage hierarchy that usually bottlenecks data movement and processing performance by enabling high density, commodity flash to be used as an extension to memory.  Deploying CCS throughout the entire data center infrastructure is designed to provide a significant boost to application performance while reducing data center TCO.

StorageOS | CEO: Chris Brandon

StorageOS is a software-based distributed storage platform designed to provider persistent container storage.  It’s available on commodity hardware, virtual machines or in the cloud.  With the addition of a 40MB container, developers can build scalable stateful containerized apps, with fast, highly available persistent storage.

StorageOS offers simple and automated block storage to stateless containers, allowing databases and other applications that need enterprise class storage functionality to run without the normal complexity and high cost.

They aim to provide an enterprise class storage offering that is simpler, faster, easier, and cheaper than legacy IT storage. They also aim to provide automated storage provisioning to containers which can be instantiated and torn down many thousands of times a day.

So, how does the product work?  It’s a very fast and easy process.  It installs as a container under Linux and it locates accessible storage; be it direct-attached, network-attached and cloud-attached, or connected nodes. That storage is then aggregated into a virtual multi-node pool of block storage. Volumes are then carved out for accessing containers, and are then thin provisioned, mounted, loaded up with a database and started.

        Storj Labs | CSO: Shawn Wilkinson

Storj is based on blockchain technology and peer to peer protocols to provide secure, private, and encrypted cloud storage. Basically, it’s an open source decentralized cloud storage platform utilizing blockchain technology, and I looked at them in my previous article Blockchain and Enterprise Storage, where I dove in to what exactly blockchain is, how it works, how it may be applied in the enterprise storage space, and how it’s already starting to be used in various global industries.  Storj uses the spare storage capacity of its community members to store data that has been shredded and encrypted.  From a blockchain perspective, Storj uses their own storage coin token that is used to buy and sell space on the network.

For potential data farmers that are looking to share storage capacity, they will verify the integrity of your storage with a challenge that performs a remote audit. As a distributed storage system, it is a highly-available solution with data sliced up into multiple segments that are stored redundantly across at least five different systems.  The distributed nature helps accelerate data access because data is retrieved from multiple sources simultaneously rather than just one.

    Vexata | CEO: Zahid Hussain

Vexata’s active data infrastructure solution aims to improve performance at scale for I/O intensive applications.   The system presents a block or unstructured data I/O interface to enable applications to access and update large volumes of data at high throughput and low latency, and can be deployed as a fully contained or cloud deployed solution.  Based on their VX-OS software, their SSD systems can be deployed in both enterprise and cloud data center environments.

Their file storage system OS is well suited for business critical enterprise data architectures, media and entertainment workflows, and high performance data analytics.  VX-OS  is a scalable, resilient file storage system that supports industry standard protocols (like NFSv3 and GPFS), while providing over 1M random file IOPS, 50GB/s read and 20GB/s write bandwidth, and up to 180TB of protected capacity. It also supports enterprise class features such as file-based snapshots/clones and replication, as well as data-at-rest encryption without the huge performance penalty.

   Wasabi | CEO: David Friend

Wasabi offers cloud based object storage as a service.  You can read more about Object storage in my Primer on Object Storage article, but in a nutshell it refers to a data storage approach that stores information as individual objects in digital buckets, as opposed to storing files in a hierarchical or block fashion.  They claim that their storage service is significantly faster and cheaper than competing products and offers the same levels of reliability, and that their service can read and write data more than six times as fast as Amazon’s S3, while maintaining 100% compatibility with the Amazon S3 API.  Their prices are also claimed to be around 1/5th the cost of S3, Microsoft Azure, and Google Cloud.

   WekaIO | CEO: Liran Zvibel

WekaIO’s core product is WekaIO Matrix, which is a cloud-native scalable file system the that provides all flash storage performance with the simplicity of NAS storage. WekaIO Matrix offers dynamic scaling of resources based on application requirements. It is a distributed global namespace file system that can scale to thousands of compute nodes and petabytes of storage, and also provides integrated tiering to the cloud.

Their software deploys on industry-standard commodity servers. They have reference architectures for HPE Apollo and Dell EMC servers, with Supermicro and Lenovo in the pipeline.  It runs on bare-metal servers, virtual machines or in containers. The software scales from 6-240 nodes with little to no latency impact.  As a frame of reference, they note that a 30-node cluster can address up to 2PB of storage with up to 1.8M IOPS and 60GBps of bandwidth.  They also position themselves as a file access cloud storage option that sidesteps the limitations of existing Amazon storage services.

Advertisements

Storage Performance Benchmarking with FIO

Flexible IO tester (FIO) is an open-source synthetic benchmark tool first developed by Jens Axboe.  FIO can generate various IO workloads: sequential reads or random writes, synchronous or asynchronous, all based on the options provided by the user.  FIO provides various global options through which different type of workloads can be generated.  FIO is the easiest and versatile tool to quickly perform IO performance tests on storage system, and allows you to simulate different types of IO loads and tweak several parameters, among others, the write/read mix and the amount of processes.  I’ll likely make a few additional posts with some of the other storage benchmarking tools I’ve used, but I’m focusing on FIO for this post.  Why FIO?  It’s a great tool, and it’s pros outweigh it’s cons for me.

Pros

  • It has a batch mode and a very extensive set of parameters.
  • Unlike IOMeter, it is still being actively developed.
  • It has multi-OS support.
  • It’s free.

Cons

  • It is CLI only, there are no GUI or Graphics features.
  • It has a rather complex syntax and it takes some time to get the hang of it.

Download and Installation

FIO can be run from either Linux or Windows, although Windows will first require an installation of Cygwin.  FIO works on Linux, Solaris, AIX, HP-UX, OSX, NetBSD, OpenBSD, Windows, FreeBSD, and DragonFly.  Some features and options may only be available on some of the platforms, typically because those features only apply to that platform (like the solarisaio engine, or the splice engine on Linux).  Note that you can check github for the latest version before you get started.

You can run the following commands from a Linux server to download and install the FIO package:

cd /root

yum install -y make gcc libaio-devel || ( apt-get update && apt-get install -y make gcc libaio-dev  </dev/null )

wget https://github.com/Crowd9/Benchmark/raw/master/fio-2.0.9.tar.gz ; tar xf fio*

cd fio*

make

How to compile FIO on 64-bit Windows:

Install Cygwin (http://www.cygwin.com/). Install **make** and all     packages starting with **mingw64-i686** and **mingw64-x86_64**.

Open the Cygwin Terminal.

Go to the fio directory (source files).

Run ``make clean && make -j``.

To build fio on 32-bit Windows, run ``./configure --build-32bit-win`` before ``make``.

FIO Cheat sheet

With FIO compiled, we can now run some tests.  For reference, I’ll start off with some basic commands for simulating different types of workloads.

Sequential Reads – Async mode – 8K block size – Direct IO – 100% Reads

fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=8k --numjobs=8 --size=1G --runtime=600  --group_reporting

Sequential Writes – Async mode – 32K block size – Direct IO – 100% Writes

fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k --numjobs=4 --size=2G --runtime=600 --group_reporting

Random Reads – Async mode – 8K block size – Direct IO – 100% Reads

fio --name=randread --rw=randread --direct=1 --ioengine=libaio --bs=8k --numjobs=16 --size=1G --runtime=600 --group_reporting

Random Writes – Async mode – 64K block size – Direct IO – 100% Writes

fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio --bs=64k --numjobs=8 --size=512m --runtime=600 --group_reporting

Random Read/Writes – Async mode – 16K block size – Direct IO – 90% Reads/10% Writes

fio --name=randrw --rw=randrw --direct=1 --ioengine=libaio --bs=16k --numjobs=8 --rwmixread=90 --size=1G --runtime=600 --group_reporting

Host Considerations

To avoid IOs reporting out of the host system cache, use the direct option which will directly read/write to the disk.  Use the Linux native asynchronous IO using the ioengine directive with libaio.  When FIO is launched, it will create the file with the name provided in –name to the size as provided in –size with block size as –bs.  If the –numjobs are provided, it will create the files in the format of name.n.0 where n will be between 0 and –numjobs.

–jobs = The more jobs, the higher the performance can be, based on the resource availability.  If your server is limited on the resources (TCP or FC), I’d recommend running FIO across multiple servers to push a higher workload to the storage array.

Block Size Matters

Many storage vendors will advertise performance benchmarks based on 4k block sizes, which can artificially inflate the total IO number that the array is capable of handling.  In my professional experience with the workloads I’ve supported, the most popular read size is between 32KB and 64KB and the most popular write size is between 8KB and 32KB.  VMWare-heavy environments may skew a bit lower in read block size.  Read IO is typically more common than Write IO, at a rate of around 3:1.  It’s important to know the characteristics of your workload before you begin testing, as we need to look at IO Size as a weight attached to the IO. An IO of size 64KB will have a weight 8 times higher than an IO of size 8KB since it will move 8 times as many bytes.  A 256K block has 64 times the payload of a 4K block.  Both examples take substantially more effort for every component of the storage stack to satisfy the IO request. Applications and the operating systems they run on generate a wide, ever-changing mix of block sizes based on the characteristics of the application and the workloads being serviced. Reads and writes are often delivered using different block sizes as well. Block size has a significant impact on the latency your applications see.

Try to understand the IO size distributions of your workload and use those IO size modalities when you develop your FIO test commands. If a single IO size is a requirement for a quick rule-of-thumb comparison, then 32KB has been a pretty reasonable number for me to use, as it is a logical convergence of the weighted IO size distribution of most of the shared workload arrays I’ve supported. Your mileage may vary, of course.

Because block sizes have different effects on different storage systems, visibility into this metric is critical. The storage fabric, the protocol, the processing overhead on the HBAs, the switches, the storage controllers, and the storage media are all affected by it.

General Tips on Testing

Work on large datasets.  Your dataset should be at least double the amount of RAM in the OS.  For example, if the OS RAM is 16GB, test 32GB datasets multiplied by the number of CPU cores.

The Rule of Thumb:  75/25.  Although it really depends on your workloads, typically the rule of thumb is that there are 25% writes and 75% reads on the dataset.

Test from small to large blocks of I/O.  Consider testing small blocks of I/O up to large blocks of I/O in the following order: 512 bytes, 4K, 16K, 64K, 1MB to get proper measurement that can be the visualized as a histogram. This makes it easier to interpret.

Test multiple workload patterns.  Not everything is sequential read/write. Test all scenarios: read / write, write only, read only, random read / random write, random read only, and random write only.

Sample Output

Here’s a sample command string for FIO that includes many of the command switches you’ll want to use.  Each parameter can be tweaked to your specific environment.  It creates 8 files (numjobs=8) each with size 512MB (size) at 64K block size (bs=64k) and will perform random read/write (rw=randrw) with the mixed workload of 70% reads and 30% writes. The job will run for full 5 minutes (runtime=300 & time_based) even if the files were created and read/written.

[root@server1 fio]# fio --name=randrw --ioengine=libaio --iodepth=1 --rw=randrw --bs=64k --direct=1 --size=512m --numjobs=8 --runtime=300 --group_reporting --time_based --rwmixread=70 randrw: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=1

Output:

 Starting 8 processes

 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 Jobs: 8 (f=8): [mmmmmmmm] [2.0% done] [252.0MB/121.3MB/0KB /s] [4032/1940/0 iops] [eta 04m:55s]
randrw: (groupid=0, jobs=8): err= 0: pid=31900: Mon Jun 13 01:01:08 2016
 read : io=78815MB, bw=269020KB/s, iops=4203, runt=300002msec
 slat (usec): min=6, max=173, avg= 9.99, stdev= 3.63
 clat (usec): min=430, max=23909, avg=1023.31, stdev=273.66
 lat (usec): min=447, max=23917, avg=1033.46, stdev=273.78
 clat percentiles (usec):
 | 1.00th=[ 684], 5.00th=[ 796], 10.00th=[ 836], 20.00th=[ 892],
 | 30.00th=[ 932], 40.00th=[ 964], 50.00th=[ 996], 60.00th=[ 1032],
 | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1208], 95.00th=[ 1288],
 | 99.00th=[ 1560], 99.50th=[ 2256], 99.90th=[ 3184], 99.95th=[ 3408],
 | 99.99th=[13888]
 bw (KB /s): min=28288, max=39217, per=12.49%, avg=33596.69, stdev=1709.09
 write: io=33899MB, bw=115709KB/s, iops=1807, runt=300002msec
 slat (usec): min=7, max=140, avg=11.42, stdev= 3.96
 clat (usec): min=1246, max=24744, avg=2004.11, stdev=333.23
 lat (usec): min=1256, max=24753, avg=2015.69, stdev=333.36
 clat percentiles (usec):
 | 1.00th=[ 1576], 5.00th=[ 1688], 10.00th=[ 1752], 20.00th=[ 1816],
 | 30.00th=[ 1880], 40.00th=[ 1928], 50.00th=[ 1976], 60.00th=[ 2040],
 | 70.00th=[ 2096], 80.00th=[ 2160], 90.00th=[ 2256], 95.00th=[ 2352],
 | 99.00th=[ 2576], 99.50th=[ 2736], 99.90th=[ 4256], 99.95th=[ 4832],
 | 99.99th=[16768]
 bw (KB /s): min=11776, max=16896, per=12.53%, avg=14499.30, stdev=907.78
 lat (usec) : 500=0.01%, 750=1.61%, 1000=33.71%
 lat (msec) : 2=50.35%, 4=14.27%, 10=0.04%, 20=0.02%, 50=0.01%
 cpu : usr=0.46%, sys=1.60%, ctx=1804510, majf=0, minf=196
 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, &gt;=64=0.0%
 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
 issued : total=r=1261042/w=542389/d=0, short=r=0/w=0/d=0
 latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
 READ: io=78815MB, aggrb=269020KB/s, minb=269020KB/s, maxb=269020KB/s, mint=300002msec, maxt=300002msec
 WRITE: io=33899MB, aggrb=115708KB/s, minb=115708KB/s, maxb=115708KB/s, mint=300002msec, maxt=300002msec

Additional Samples

I’ll run through an additional set of simple examples of using FIO as well using different workload patterns.

Random read/write performance

If you want to compare disk performance with a simple 3:1 4K read/write test, use the following command:

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

This command string create a 4 GB file and perform 4KB reads and writes using a 75%/25% split within the file, with 64 operations running at a time. The 3:1 ratio represents a typical database.

The output is below, with the IO numbers highlighted in red.

Jobs: 1 (f=1): [m] [100.0% done] [43496K/14671K /s] [10.9K/3667 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=31214: Fri May 9 16:01:53 2014
read : io=3071.1MB, bw=39492KB/s, iops=8993 , runt= 79653msec
write: io=1024.7MB, bw=13165KB/s, iops=2394 , runt= 79653msec
cpu : usr=16.26%, sys=71.94%, ctx=25916, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=786416/w=262160/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=3071.1MB, aggrb=39492KB/s, minb=39492KB/s, maxb=39492KB/s, mint=79653msec, maxt=79653msec
WRITE: io=1024.7MB, aggrb=13165KB/s, minb=13165KB/s, maxb=13165KB/s, mint=79653msec, maxt=79653msec
Disk stats (read/write):
vda: ios=786003/262081, merge=0/22, ticks=3883392/667236, in_queue=4550412, util=99.97%

This tests shows the array performed 8993 read operations per second and 2394 write operations per second.

Random read performance

To measure random reads, we’ll change FIO command a bit:

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread

Output:

Jobs: 1 (f=1): [r] [100.0% done] [62135K/0K /s] [15.6K/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=31181: Fri May 9 15:38:57 2014
read : io=1024.0MB, bw=62748KB/s, iops=19932 , runt= 16711msec
cpu : usr=5.94%, sys=90.13%, ctx=1885, majf=0, minf=89
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=262144/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=1024.0MB, aggrb=62747KB/s, minb=62747KB/s, maxb=62747KB/s, mint=16711msec, maxt=16711msec
Disk stats (read/write):
vda: ios=259063/2, merge=0/1, ticks=951356/20, in_queue=951308, util=96.83%

This test shows the storage array performing 19,932 read operations per second.

Random write performance

Modify the FIO command slightly to use randwrite instead of randread for the random write test.

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite

Output:

Jobs: 1 (f=1): [w] [100.0% done] [0K/26326K /s] [0 /6581 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=31235: Fri May 9 16:16:21 2014
write: io=1024.0MB, bw=29195KB/s, iops=5434, runt= 35916msec
cpu : usr=77.42%, sys=13.74%, ctx=2306, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
WRITE: io=1024.0MB, aggrb=29195KB/s, minb=29195KB/s, maxb=29195KB/s, mint=35916msec, maxt=35916msec
Disk stats (read/write):
vda: ios=0/260938, merge=0/11, ticks=0/2315104, in_queue=2316372, util=98.87%

This tests shows storage scoring 5,434 write operations per second.

Blockchain and Enterprise Storage

The biggest value of blockchain in enterprise storage will be what it enables, not what it is.  While it has yet to be fully embraced by the enterprise, blockchain is well poised to change enterprise IT much like open source software did 20+ years ago.  Interest is steadily rising, and there is evidence that businesses are starting to investigate how blockchain technology will integrate into their future business goals and objectives. In this post I’m going to dive in to what exactly blockchain is, how it works, how it may be applied in the enterprise storage space, and how it’s already starting to be used in various global industries.

What is Blockchain technology?

Blockchain is a distributed ledger that maintains a continuously growing number of data records and transactions. It is a chain of transaction blocks built in adherence to a defined set of rules. It allows organizations who don’t trust each other to agree on database updates. Rather than using a central third party or an offline reconciliation process, Blockchain uses peer-to-peer protocols. As a distributed database, Blockchain provides a near real-time, permanent record that’s replicated among the participants. Bitcoin, probably the most well-known cryptocurrency right now, was possible due to Blockchain. It’s the core of the Bitcoin payment system.

What are the main characteristics of Blockchain?

There are a defined set of characteristics that make blockchain what it is. It is both a network and a database. It has rules and built-in security and it maintains internal integrity and its own history. Let’s take a look at the main characteristics of blockchain.

1. Decentralized.  Blockchain is decentralized, there is no central authority required to approve transactions. It is a system of peer to peer validating nodes. Because there are no intermediaries, transactions are made directly and each node maintains the ledger of updates.

2. External clients manage changes.  Changes to the ledger are triggered by transactions proposed by external parties through clients. When triggered by transactions, blockchain participants execute business logic and follow consensus protocols to verify the results.

3. Shared and distributed publicity.  Participants in the ledger maintain the blocks. When consensus is reached under the network’s rules, transactions and their results are grouped into cryptographically secured, immutable data blocks that are appended to the ledger by each participant. All members of the blockchain network can see the same transaction history in the same order.

4. Trusted Transactions.  The nature of the network distribution requires nodes to come to a consensus that enables transactions to be carried out between unknown parties.

5. Secure Transactions.  Strong Cryptography is added to each block. In addition to all of its transactions and their results, each block includes a cryptographic hash of the previous block, which ensures that any tampering with a particular block is easily detected. Blockchain provides transaction and data security. The ledger is an unchangeable record. Posts to it cannot be revised or tampered with, even by database operators.

How Blockchain Works

Consensus in Blockchain

Consensus is at the heart of the blockchain. To keep the integrity of its database, a consensus protocol is used that considers that the longest chain is always the most trustworthy and nodes can only be allowed to blocks to the chain if they solve an arbitrary mathematical puzzle.   These rules define which changes are allowed to be made to the database, who may make them, and when they can be made. One of the most important aspects of the consensus protocol concerns the rules governing how and when blocks are added to the chain. This is vitally important as in order for blockchains to be useful, they must establish an unchangeable timeline of events which must be agreed upon by all nodes, so that all nodes can agree on the current state of the database.  The timeline cannot be subject to censorship, thus no single node may be entrusted with control over what enters it when.

Proof of Work is the original consensus protocol and is used by Bitcoin and Ethereum. Proof of Work is based on puzzles that are difficult to solve but have an easily verifiable solution.  It can be thought of like a jigsaw puzzle.  While many hours of effort may be required to piece a puzzle together, it takes only a momentary glance to see that it has been correctly assembled. With proof of work consensus, the effort required to solve a puzzle is the “work” and the solution is the “proof of work.”  The fact that the solution to the puzzle is known proves that someone did the work to find that solution.

Blockchains that utilize proof of work consensus require proof for each new block to be added to the chain, thus requiring work to be done to create new blocks. This work is frequently referred to as mining. Proof of Work consensus protocols state that the chain containing the most blocks is the correct chain because it contains the most work. Blockchains which use proof of work are regarded as secure timelines because if one node attempted to rewrite history by changing an old block, its change would invalidate the work on the block it changed and all blocks after it by making the proofs incorrect.   While experimentation with different consensus mechanisms continues,  proof of work is by far the most the widely adopted.  There are alternatives however, so let’s take a brief look at some of them.

Proof-of-Stake.  In proof of stake, participants are required to maintain stocks of the currency (or tokens) to use the system. Creators of a new block are chosen deterministically depending on their stake.

Proof-of-Activity.  In proof of activity,  proof of work and proof of stake are used at the same time to help alleviate the issue of hash rate escalation.  Hash Rate is the measuring unit that measures how much power Bitcoin network is consuming to be continuously functional.

Proof-of-Burn.   With proof of burn, instead of trying arbitrarily large numbers of hashes to answer a puzzle as done with the proof of work method, the system instead runs a lottery and the tokens are burned so a node can try to win a block.

Proof-of-Capacity.  Proof of capacity is similar to proof of stake, but it is measured in hardware capacity that is dedicated to the network.

Federated Byzantine Agreement.  This is designed for private, permissioned Blockchains (like Hyperledger) where good behavior is an expectation, it is designed with less resource intensive methods. This method offers more flexibility with trust because a fork can be agreed upon by its members.

How can Blockchain be used in Enterprise Storage?

Enterprises looking for data access speed, physical security of the files, and businesses that must adhere to strict regulatory requirements about access policies and in-country data location regulations may have trouble applying the technology. Blockchain doesn’t meet those requirements in a traditional sense, most notably because of the distributed nature of blockchain.  For enterprise environments with less stringent regulatory requirements, it could still be an attractive option. The main benefits relate to its redundancy and reduced cost. The cost savings could be the major driver toward this technology in the enterprise.  Let’s take a look at some of the primary benefits of adopting the technology in the enterprise.

The primary benefits of blockchain in the enterprise

1. Decentralization and Redundancy.  Amazon S3 achieves redundancy by spreading files through all of its regional data centers, which makes each data center a point of failure. On a decentralized blockchain where data is stored on many individual nodes across the globe, it is much more difficult to create disruptions.

2. Privacy.  No third party controls user data or has access to user files. Each node only stores encrypted fragments of user data and users control their own keys.

3. Huge cost reductions.  Blockchain storage costs around $2 per terabyte per month. In comparison, S3 hosting from Amazon can cost over $20 per month per terabyte.

4. The Bottom Line.  Companies are always looking to increase revenues, cut costs, and reduce risks. Blockchain technology has the potential to address those core, bottom line issues.

The Elements of Blockchain in the Enterprise

How can blockchain be implemented in an existing enterprise storage envinroment?  Steve Todd from Dell EMC started by defining the basic elements of blockchain and the questions that need to be asked, all of which need to be answered in order to implement blockchain solutions in the enterprise. I’ve copied his questions below. It’s very high level, but it’s a good start in establishing a baseline for an enterprise blockchain implementation.

1. New business logic.  What new business logic is being written, and what is it’s purpose? Will modern application development processes be used to develop the new logic? How will this code be deployed when compared against existing application deployment frameworks? Will your business logic be portable across blockchains?

2. Smart Contracts. How are smart contracts deployed compared to existing application deployment? Are these contracts secure (e.g. encrypted)? Are they well-written? How easy are they to consume? Do they lock-in application developers to a certain platform? Are metrics collected to measure usage? Are access attempts logged securely?

3. Cryptography. Given the liberal use of cryptography within blockchains, which libraries will be used within the underlying ledger? How are these libraries maintained and used across ledgers? what role does cryptography play in different consensus algorithms?

4. Identity / Key Management. The use of private and public keys in a blockchain is foundational. How are these keys managed in comparison to other corporate key management systems? How do corporate identities translate to shared identities with other nodes on a blockchain network?

5. Network Programmability.  How will the network between blockchain nodes be instantiated, tuned, and controlled? How will application SLAs for latency be translated into adequately-performing network operations? Will blockchain transactions be distributed as cleartext or encrypted?

6. Consensus Algorithms.  How will decisions be made to accept/reject transactions? What is the “speed to finality” of these decisions? What are the scalability limits of the consensus algorithm? How much fault tolerance is built into the consensus? How much does performance suffer when fault tolerance limits are reached?

7. Off-chain Storage.  What kind of data assets are recorded within the ledger? Are they consistently referenced? How are access permissions consistently enforced between the ledger and off-chain assets? Do all consensus nodes have the ability to verify all off-chain data assets?

8. Data Protection.  How is data consistency enforced within the ledger? Do corrupted transactions thrown an exception? How are corrupted transactions repaired? Does every consensus node always store every single transaction locally? Can deduplication or compression occur? Can snapshot copies of the ledger be created for analysis purposes?

9. Integration with Legacy.  Does the ledger and consensus engine exist on the same converged platform as other business logic? Will there be integration connectors that copy and/or transform the ledger for other purposes? Is the ledger accessible to corporate analytic workspaces?

10. Multi-chain.  how will the ledger interact with the reality of a multi-chain world (e.g. Quorum, Hyperledger, Ethereum, etc). How will the ledger interact with non-chain ledgers (e.g. Corda)? Will there be a common API to access different blockchains?

11. Cloud automation.  Can routine blockchain tasks be automated? Will cloud providers offer non-repudiation and/or blockchain governance? Can blockchain app developers execute test/dev processes in one cloud provider environment and then push to a (different) cloud production environment?

Blockchain Cloud Storage in the Marketplace

There are multiple blockchain powered distributed cloud storage offerings that I’m aware of, and there are likely more to come. These organizations are using blockchain technology to take advantage of the spare hard drive space it’s users to make decentralized competitors similar to Amazon Web Services and Dropbox.

• Storj
• Filecoin
• Sia
• MaidSafe
• Cryptyk

All of these options provide decentralized cloud-based storage. Customers who use their services allocate a portion of their local storage for cloud-based storage. It’s akin to a decentralized, blockchain-powered version of Amazon Web Services. They all show that a public ledger can be used to facilitate a distributed public cloud, but I think it’s unlikely to be used for mission critical enterprise storage in the near future, at least until some of the basic questions about the elements of blockchain in the enterprise are answered, as I detailed in the previous section.

As cloud based storage becomes more relevant over time, the number of blockchain solutions similar to these projects will surely increase. Blockchain’s decentralization, speed, and reliability give it an inherent advantage over centralized cloud services, as they require the storage of data in data centers with high costs and maintenance requirements. Blockchain technology will likely have an increasingly important role in decreasing costs and increasing the security and efficiency of the methods data storage is implemented.

Blockchain Storage Provider Operations

I thought it would be interesting to take a look at how these existing competitors implement blockchain and how they market their services.  In addition to the security benefits,  overall these decentralized cloud storage providers seem to be marketed as being inexpensive storage for general consumers. A terabyte of storage at Sia costs about $2 per month. Storj charges by gigabyte, starting at $0.015 per gigabyte per month.

Storj, Sia, MaidSafe and Filecoin are built with a proprietary storage marketplace where users can buy and sell storage space.  They all use mining to provide compute power for the network.

Filecoin miners are given token rewards for hosting files, but also must prove that they are continuously replicating the files for more secure storage. Miners are also rewarded for distributing content quickly as the miner that can do this the fastest ends up with the tokens. Filecoin and Sia both support smart contracts on the blockchain that set the consensus rules and requirements for storage, however Storj users pay only for what they consume.  Filecoin also aims to allow the exchange of its tokens with fiat currencies and other tokens via wallets and exchanges.

In Maidsafe’s network,  Safecoin is paid to the user as data is retrieved. It’s done in a lottery system where a miner is rewarded at random. The amount of Safecoin earned is directly linked to the resources they provide and how often their shared storage is available and online.  Maidsafe miners rent their unused compute resources to the SAFE network (capacity, CPU, and bandwidth) and are paid in Safecoin. The SAFE network also supports a marketplace in which Safecoin is used to access, with part of the payment going to the application’s developer.  Miners can also sell the coins that they earn for other digital currencies, and these transactions can happen either on the network or directly between individuals.

All of these service providers store data with erasure coding.  Files are split apart and distributed across many locations and servers, which eliminates the chance of a single point of failure causing catastrophic data loss. Filecoin uses the IPFS distributed web protocol, allowing nodes to continue to communicate even if the rest of the network goes down.

Business Benefits

Blockchain technology implementation can provide a lot of benefits, most notably that it provides for making interactions faster, safer and less expensive, ensuring data security.  Although blockchain technology is primarily associated with the financial industry, blockchain solutions have the potential to be a disruptive force in other businesses sectors as well.

At a high level, what are the main benefits of blockchain in a business environment?

Fewer Intermediaries.  Blockchain avoids centralized intermediaries by using a peer to peer business network.

Faster, More Automated Processes.  Businesses can automate their data exchange and the processes that depend on it and eliminate offline or batch reconciliation. Business can automatically trigger actions, events, and even payments based on preset conditions with the potential for dramatic performance improvements.

Reduced Costs.  Business can lower costs by accelerating transactions and eliminating settlement processes by using a trusted, shared fabric of common information instead of relying on centralized intermediaries or complex reconciliation processes.

Increased Visibility.  Businesses can gain near real-time visibility into their distributed transactions across their networks, and maintain a shared system of records.

Enhanced Security.  Businesses can reduce fraud while at the same time increase regulatory compliance with tamper-proof business-critical records. They can secure their data by using cryptographically linked blocks so that records cannot be altered without detection.

With that in mind, let’s consider the most likely scenarios for Blockchain implementation in business. How exactly is blockchain technology being used in the industry today, and how may it be used in the future?

Blockchain in the Energy industry

The German company Share&Charge and California based eMotorWerks announced they are testing the first phase of a peer-to-peer electric vehicle charging network with blockchain payments. The technology has been called an “AirBnB for EVs,” and will allow EV owners to rent out their charging stations, set their own prices and receive payments via Bitcoin. The technology aims to prove that blockchain technology can make sharing and payment easier and more efficient and at the same time decrease the range anxiety that EV drivers experience.

The companies say that the partnership is the first peer to peer charging network to use blockchain technology in North America. The new P2P network was made available in California starting in August 2017, and a planned expansion to other states is in the works.

Blockchain technology in Banking and Finance

Blockchain solutions are looking to revolutionize how we transfer funds in a business environment. As transactions within Blockchain occur without intermediaries or any kind of central authority, a direct payment flow between customers around the world is easily accomplished. Blockchain application development is booming as more and more startups attempt to innovate the payment chain. Abra, a good example of a recent Blockchain startup, offers a digital wallet mobile app using Bitcoin currency.   There is intense interest in Blockchain in the finanace sector.  A New York-based company that runs a consortium of banks (R3 CEV), has recently released a new version of its blockchain platform (Corda) that it hopes will make it easier for financial firms to use the technology.  Banks and other financial institutions have been investing in the technology for the past few years in the hope that it can be used to automate some of their back office processes such as securities settlement and regulatory reporting.

A report from Accenture claimed blockchain technology could potentially reduce infrastructure costs for eight of the world’s ten largest investment banks by an average of 30%, which would result in $8 to $12 billion in annual cost savings. The savings, according to Accenture, would come in replacing traditionally fragmented database systems that support transaction processing with blockchain’s distributed ledger system. That would allow banks to reduce or eliminate reconciliation costs and data quality.

In addition, Accenture, J.P. Morgan Chase and Microsoft were among 30 companies that announced the formation of the Enterprise Ethereum Alliance, aimed at creating a standard version of the platform for financial transaction processing and tracking.

Blockchain in the Insurance industry

Insurance interest in blockchain appears to be growing. Blockchain has the potential to vastly improve the nature of claims processing and fraud detection in the insurance industry.

Blockchain could reduce many of the typical issues involved with smart contracts. Insured individuals usually find insurance contracts long and confusing, and insurance companies are constantly battling fraud. Using blockchain and smart contracts, both sides could benefit from managing claims in a more responsive and transparent way, and recording and verifying contracts on the blockchain could be a great start. When claims are submitted, blockchain could ensure that only valid claims are paid as the network would know if there were multiple claims submitted for the same accident. When specific criteria are met, a blockchain could trigger payment of the claim without any human intervention, improving the time it takes to resolve claims.

Blockchain also has great potential to detect and prevent fraudulent activity as well. Because validation is at the core of blockchain technology’s decentralized repository, its historical record can independently verify the validity of customers, policies and all transactions.

In the summer of 2017, blockchain Firm Bitfury Teams with Insurance Broker Risk Cooperative. The Bitfury Risk Cooperative partnership seeks to leverage Bitfury’s expertise in blockchain applications across a range of sectors and Risk Cooperative’s insurance placement platform and partnership model with leading insurers to spur adoption of blockchain in the insurance space.

Blockchain perspectives in Supply Chain Management

Blockchain has the potential to transform the supply chain and disrupt the way we produce, market, purchase and consume goods. The added transparency and security to the supply chain will make huge improvments, making our economies safer and more reliable by promoting trust and preventing the implementation of questionable business practices.

Microsoft’s blockchain supply chain group, Project Manifest, is testing the ability to track inventory on cargo ships, trains and trucks using RFID tags that link back to blockchain technologies. Though Microsoft hasn’t shared many details about the project yet, it appears it is working with partners to track things like auto parts to address cross-industry supply chains, which are very complex.
IBM offers a service that allows customers to test blockchains in a secure cloud and track high-value items through complex supply chains. The service is being used by Everledger, a firm that is trying to use the blockchain to push transparency into the diamond supply chain. Finnish startup Kuovola Innovation is working on a blockchain solution that enables smart tendering across the supply chain.

Blockchain smart-contracts are being used to address everything from the shipment, to receipt of inventory between all parties in various supply chains. Doing so could reduce complexity and the number of counterfeit items that enter the supply chain.

Blockchain in the Healthcare Industry

There are plenty of opportunities to leverage blockchain technology in healthcare, from medical records to the pharmaceutical supply chain to smart contracts for payment distribution. While progress has been slow, there are innovations in the healthcare industry taking place.

MediLedger successfully brings pharmaceutical manufacturers and wholesalers who compete with each other to the same negotiating table. They designed and implemented a process for using blockchain technology to improve tracking and tracing capabilities for prescriptions. They also successfully developed a blockchain solution that allows full privacy with no leaking of business intelligence, while still allowing the capability of drug verification and provenance reporting.

Built to support the requirements of the U.S. Drug Supply Chain Security Act (DSCSA), MediLedger also outlines steps to build an electronic, interoperable system to identify and trace certain prescription drugs, meaning it successfully met not just the law, but the operational needs of industry.

Additional projects were kicked off by SimplyVitalHealth and Robomed, where they focused on developing an audit trail and smart contracts between healthcare providers and patients, respectively.

Blockchain solutions for Online Voting

Blockchain could be the missing link in the architecture of an effective and secure online voting system, and could resolve major issues related to the privacy, transparency, and security of online voting.

Using blockchain technology, we can make sure that those who are voting are who they say they are and are legally allowed to vote. We can also make voting online more accessible, as anyone who knows how to use a cell phone can understand the technology required for voting, all while making the election process more secure than it currently is and allowing greater participation for all legally-registered voters.

Sovereign was unveiled in September 2017 by Democracy Earth, a not-for-profit organisation in Palo Alto, California. It combines liquid democracy – which gives individuals more flexibility in how they use their votes – with blockchains, digital ledgers of transactions that keep cryptocurrencies like bitcoin secure. Sovereign’s developers hope it could signal the beginning of a democratic system that transcends national borders.

The basic concept of liquid democracy is that voters can express their wishes on an issue directly or delegate their vote to someone else they think is better-placed to decide on their behalf. In turn, those delegates can also pass those votes upwards through the chain. Crucially, users can see how their delegate voted and reclaim their vote to use themselves.  It sits on existing blockchain software platforms, such as Ethereum, but instead of producing units of cryptocurrency, it creates a finite number of tokens called “votes”. These are assigned to registered users who can vote as part of organisations who set themselves up on the network, whether that is a political party, a municipality, a country or even a co-operatively run company.

No knowledge of blockchains is required – voters simply use an app. Votes are then “dripped” into their accounts over time like a universal basic income of votes. Users can debate with each other before deciding which way to vote.

Blockchain usage in Stock Trading

Some of the most prominent stock exchanges are looking at ways to leverage blockchain to fundamentally overhaul traditional mechanisms. Blockchain could enable savings by reducing duplication of processes, settlement time, collateral requirements and operational overheads. This would minimize the need to set aside financial resources to cater to counterparty risks and achieve higher anti-money laundering standards and reduced risk exposure.

Nasdaq has been at the forefront of blockchain innovation. At the turn of 2015, Nasdaq unveiled the use of its Nasdaq Linq blockchain ledger technology to successfully complete and record private securities transactions for Chain.com—the inaugural Nasdaq Linq client. In May, Nasdaq and Citi announced an integrated payment solution using a distributed ledger to record and transmit payment instructions based on Chain’s blockchain technology. The technology overcomes challenges of liquidity in private securities by streamlining payment transactions between multiple parties.

The path to its adoption will require resolving issues such as scalability, common standards, regulation, and legislation. Blockchain could revolutionize the core infrastructure systems of capital markets around the globe, bringing in greater transparency and efficiency.