5    Overview of Cloud Storage

5.1   Introduction

When discussing cloud storage and standards, it is important to distinguish the various resources that are being offered as services. These resources are exposed to clients as functional interfaces (i.e., data paths) and are managed by management interfaces (i.e., control paths). This international standard explores the various types of interfaces that are part of offerings today and shows how they are related. This international standard proposes a model for the interfaces that may be mapped to the various offerings and a model that forms the basis for cloud storage interfaces into the future.

Another important concept in this international standard is that of metadata. When managing large amounts of data with differing requirements, metadata is a convenient mechanism to express those requirements in such a way that underlying data services may differentiate their treatment of the data to meet those requirements.

The appeal of cloud storage is due to some of the same attributes that define other cloud services: pay as you go, the illusion of infinite capacity (elasticity), and the simplicity of use/management. It is therefore important that any interface for cloud storage support these attributes, while allowing for a multitude of business cases and offerings.

5.2   What is Cloud Storage?

The use of the term “cloud” in describing these new models arose from architecture drawings that typically used a cloud as the icon for a network. The cloud represents any-to-any network connectivity in an abstract way. In this abstraction, the network connectivity in the cloud is represented without concern for how it is made to happen.

The cloud abstraction of complexity produces a simple base upon which other features can be built. The general cloud model extends this base by adding a pool of resources. An important part of the cloud model is the concept of a pool of resources that is drawn from, on demand, in small increments. A relatively recent innovation that has made this possible is virtualization.

Thus, cloud storage is simply the delivery of virtualized storage on demand. The formal term that is used for this is Data storage as a Service (DaaS).

5.3   Data Storage as a Service

By abstracting data storage behind a set of service interfaces and delivering it on demand, a wide range of actual offerings and implementations are possible. The only type of storage that is excluded from this definition is that which is delivered in fixed-capacity increments instead of based on demand.

An important part of any DaaS offering is the support of legacy clients. Support is accommodated with existing standard protocols such as iSCSI (and others) for block and CIFS/NFS or WebDAV for file network storage, as shown in Figure 1.

LegacyDSIsNoShadow.jpg

 

Figure 1 - Existing Data Storage Interface Standards

.

The difference between the purchase of a dedicated appliance and that of cloud storage is not the functional interface, but the fact that the storage is delivered on demand. The customer pays for either what they actually use or what they have allocated for use. In the case of block storage, a Logical Unit Number (LUN), or virtual volume, is the granularity of allocation. For file protocols, a file system is the unit of granularity. In either case, the actual storage space may be thin provisioned and billed for based on actual usage. Data services, such as compression and deduplication, may be used to further reduce the actual space consumed.

Managing this storage is typically done out of band for these standard data storage interfaces, either through an API, or more commonly, through an administrative browser-based user interface. This out-of-band interface may be used to invoke other data services as well (e.g., snapshot and cloning).

In this model, the underlying storage space exposed by the out-of-band interfaces is abstracted and exposed using the notion of a container. A container is not only a useful abstraction for storage space, but also serves as a grouping of the data stored in it and a point of control for applying data services in the aggregate.

Another type of DaaS offering is one of simple table space storage, allowing for horizontal scaling of database operations that certain applications need. Rather than virtualizing relational database instances, table space storage offers a new data storage interface that emphasizes scalability while placing known limits on functionality. Scalability allows the tables to be partitioned across multiple database nodes based on common key values. This model provides horizontal scalability at the expense of functions that may typically only be implemented by a vertically-scaled relational database.

Each data object is created, retrieved, updated, and deleted as a separate resource. In this type of interface, a container, if used, is a simple grouping of data objects for convenience. Nothing prevents the concept of containers from being hierarchical, although any given implementation might support only a single level. The type of container defined in this international standard is called a “soft” container, as shown in Figure 2.

CRUDNoShadow.jpg

 

Figure 2 - Storage Interfaces for Object Storage Client Data

5.4   Data Management for Cloud Storage

Many of the initial offerings of cloud storage focused on a kind of “best effort” quality of storage service and ignored most other types of data services. To address the needs of enterprise applications with cloud storage, however, there is an increasing need to offer better quality of service and the deployment of additional data services.

Cloud storage may lose its abstraction and simplicity benefits if new data services that require complex management are added. Cloud storage customers are likely to resist new demands on their time (e.g., setting up backup schedules through dedicated interfaces, deploying data services individually for data elements).

The SNIA Storage Industry Resource Domain Model (SIRDM) provides a way to address the need for cloud storage to remain simple (see Figure 3 and SIRDM). By using the different types of metadata discussed in the SIRDM model for a cloud storage interface, an interface may be created that allows offerings to meet the requirements of the data without adding unnecessary complexity to the management of that data.

CloudRDMNoShadow.jpg

 

Figure 3 - Cloud Storage Usage of SIRDM Model

By supporting metadata in a cloud storage interface and prescribing how the storage system and data system metadata is interpreted to meet the requirements of the data, the simplicity required by the cloud storage model may be maintained while still addressing the requirements of enterprise applications and their data.

User metadata is retained by the cloud and may be used to find the data objects and containers by performing a query for specific metadata values. The schema for this metadata may be determined by each application, domain, or user. For more information on support for user metadata, see 16.2.

Storage system metadata is produced/interpreted by the cloud offering and basic storage functions (e.g., modification and access statistics, access control). For more information on support for storage system metadata, see 16.3.

Data system metadata is interpreted by the cloud offering as data requirements that control the operation of underlying data services for that data. It may apply to an aggregation of data objects in a container or to individual data objects, if the offering supports this level of granularity. For more information on support for data system metadata, see 16.4.

The SIRDM (see SIRDM) defines information services as services that understand the context of the data. Information services are thus able to determine the requirements of the data and automatically mark the data system metadata for that data.

5.5   Data and Container Management

There is no reason that managing data and managing containers should involve different interfaces. Therefore, the use of metadata is extended from applying to individual data elements to applying to containers of data as well. Thus, any data placed into a container inherits the data system metadata of the container into which it was placed. When creating a new container within an existing container, the new container would similarly inherit the metadata settings of its parent's data system metadata. After a data element is created, the data system metadata may be overridden at the container or individual data element level, as desired.

Even if the provided interface does not support setting metadata on individual data elements, metadata may still be applied to the containers. In such a case, the interface does not provide a mechanism to override metadata that an individual data element inherits from its parent container. For file-based interfaces that support extended attributes (e.g., CIFS, NFSv4), these extended attributes may be used to specify the data system metadata to override that specified for the container.

5.6   Reference Model for Cloud Storage Interfaces

The Cloud Storage Reference Model is shown in Figure 4.

Figure 4 - Cloud Storage Reference Model

This model shows multiple types of cloud data storage interfaces that are able to support both legacy and new applications. All of the interfaces allow storage to be provided on demand, drawn from a pool of resources. The storage capacity is drawn from a pool of storage capacity provided by storage services. The data services are applied to individual data elements, as determined by the data system metadata. Metadata specifies the data requirements on the basis of individual data elements or on groups of data elements (containers).

5.7   Cloud Data Management Interface

The Cloud Data Management Interface (CDMI™) shown in Figure 4 may be used to create, retrieve, update, and delete objects in a cloud. The features of the CDMI include functions that:

   allow clients to discover the capabilities available in the cloud storage offering,

   manage containers and the data that is placed in them, and

   allow metadata to be associated with containers and the objects they contain.

This specification divides operations into two types: those that use a CDMI content type in the HTTP body and those that do not. While much of the same data is available via both types, providing both allows for CDMI-aware clients and non-CDMI-aware clients to interact with a CDMI provider.

CDMI may also be used by administrative and management applications to manage containers, domains, security access, and monitoring/billing information, even for storage that is functionally accessible by legacy or proprietary protocols. The capabilities of the underlying storage and data services are exposed so that clients may understand the offering.

Conformant cloud offerings may support a subset of the CDMI, as long as they expose the limitations in the capabilities reported via the interface.

This international standard uses RESTful principles in the interface design where possible (see REST).

CDMI defines both a means to manage the data as well as a means to store and retrieve the data. The means by which the storage and retrieval of data is achieved is termed a "data path". The means by which the data is managed is termed the "control path". CDMI specifies both a data path and control path interface.

CDMI does not need to be used as the only data path and is able to manage cloud storage properties for any data path interface (e.g., standardized or vendor specific).

Container metadata is used to configure the data requirements of the storage provided through the exported protocol (e.g., block protocol or file protocol) that the container exposes. When an implementation is based on an underlying file system to store data for a block protocol (e.g., iSCSI), the CDMI container provides a useful abstraction for representing the data system metadata for the data and the structures that govern the exported protocols.

A cloud offering may also support domains that allow administrative ownership to be associated with stored objects. Domains allow the standard to (among other things):

   determine how user credentials are mapped to principles used in an Access Control List (ACL),

   allow granting of special cloud-related privileges, and

   allow delegation to external user authorization systems (e.g., LDAP or Active Directory).

Domains may also be hierarchical, allowing for corporate domains with multiple children domains for departments or individuals. The domain concept is also used to aggregate usage data that is used to bill, meter, and monitor cloud use.

Finally, capabilities allow a client to discover the capabilities of a CDMI implementation.

5.8   Object Model for CDMI

The model for CDMI is shown in Figure 5.

CDMIinterfaceModelNoShadow.jpg

 

Figure 5 - CDMI Object Model

For data storage operations, the client of the interface only needs to know about container objects and data objects. All data path implementations are required to support at least one level of containers (see 5.5). Using the CDMI object model (see Figure 5), the client may send a PUT via CDMI (see 5.6) to the new container URI and create a new container with the specified name. Container metadata are optional and are expressed as a series of name-value pairs. After a container is created, a client may send a PUT to create a data object within the newly created container. A subsequent GET will fetch the data object and its value.

Queue objects are also defined (see Figure 5) and have special properties for in-order, first in, first-out creation and fetching of queue values. More information on queues may be found in Clause 11.

5.9   CDMI Metadata

CDMI uses many different types of metadata, including HTTP metadata, data system metadata, user metadata, and storage system metadata.

HTTP metadata is metadata that is related to the use of the HTTP protocol (e.g., content-length, content-type, etc.). HTTP metadata is not specifically related to this international standard but needs to be discussed to explain how CDMI uses the HTTP standard.

CDMI data system metadata, user metadata, and storage system metadata is defined in the form of name- value pairs. Vendor-defined data system metadata and storage system metadata names shall begin with the reverse domain name of the vendor.

Data system metadata is metadata that is specified by a CDMI client and is a component of objects. Data system metadata abstractly specifies the data requirements associated with data services that are deployed in the cloud storage system.

User metadata is arbitrarily-defined JSON strings that are specified by the CDMI client and is a component of objects. The namespace used for user metadata names is self-administered (e.g., using the reverse domain name) and user metadata names shall not begin with the prefix “cdmi_”.

Storage system metadata is metadata that is generated by the storage services in the system (e.g., creation time, size) to provide useful information to a CDMI client.

Table 2 - Creation/Consumption of Storage System Metadata

 

Created by User

Created By System

Consumed by User

User metadata

Storage system metadata

Consumed by System

Data system metadata

N/A

The matrix of the creation and consumption of storage system metadata is shown in Table 2.

5.10   Object ID

Every object stored within a CDMI-compliant system shall have a globally unique object identifier (ID) assigned at creation time. The CDMI object ID is a string with requirements for how it is generated and how it obtains its uniqueness. Each offering that implements CDMI is able to produce these identifiers without conflicting with other offerings.

Every cloud storage system shall allow object ID-based access to stored objects by allowing the object's ID to be appended to the root container URI. If the data object "MyDataObject.txt" has an object ID of "00006FFD001001CCE3B2B4F602032653", the following pair of URIs access the same data object:

http://cloud.example.com/root/MyDataObject.txt

http://cloud.example.com/root/cdmi_objectid/00006FFD001001CCE3B2B4F602032653

If containers are supported, they shall also be accessible by object ID. If the container "MyContainer" has an object ID of "00006FFD0010AA33D8CEF9711E0835CA", the following pairs of URIs access the same data object:

http://cloud.example.com/MyContainer/

http://cloud.example.com/cdmi_objectid/00006FFD0010AA33D8CEF9711E0835CA/

http://cloud.example.com/MyContainer/MyDataObject.txt

http://cloud.example.com/cdmi_objectid/00006FFD0010AA33D8CEF9711E0835CA/MyDataObject.txt

5.11   CDMI Object ID Format

The offering shall create the object ID, which identifies an object. The object ID shall be globally unique and shall conform to the format defined in Figure 6. The native format of an object ID is a variable-length byte sequence and shall be a maximum length of 40 bytes. An application should treat object IDs as opaque byte strings. However, the object ID format is defined such that its integrity may be validated, and independent offerings may assign unique object ID values independently.

0

1

2

3

4

5

6

7

8

9

10

...

38

39

Reserved (zero)

Enterprise Number

Reserved (zero)

Length

CRC

Opaque Data

Figure 6 - Object ID Format

The fields shown in Figure 6 are defined as follows.

   The reserved bytes shall be set to zero.

   The Enterprise Number field shall be the SNMP enterprise number of the offering organization that created the object ID, in network byte order. See RFC 2578 and http://www.iana.org/assignments/enterprise-numbers. 0 is a reserved value.

   The 5th byte shall contain the full length of the object ID, in bytes.

   The CRC field shall contain a 2-byte (16-bit) CRC in network byte order. The CRC field enables the object ID to be validated for integrity. The CRC field shall be generated by running the algorithm (see CRC) across all bytes of the object ID, as defined by the Length field, with the CRC field set to zero. The CRC function shall have the following fields:

   Name    : "CRC-16",

   Width    : 16,

   Poly    : 0x8005,

   Init    : 0x0000,

   RefIn    : True,

   RefOut    : True,

   XorOut    : 0x0000, and

   Check    : 0xBB3D.

This function defines a 16-bit CRC with polynomial 0x8005, reflected input, and reflected output. This CRC-16 is specified in CRC.

   Opaque data in each object ID shall be unique for a given Enterprise Number.

The native format for an object ID is binary. When necessary, such as when included in URIs and JSON strings, the object ID textual representation shall be encoded using Base16 encoding rules described in RFC 4648 and shall be case insensitive.

5.12   Security

Security, in the context of CDMI, refers to the protective measures employed in managing and accessing data and storage. The specific objectives to be addressed by security include:

   provide a mechanism that assures that the communications between a CDMI client and server may not be read or modified by a third party;

   provide a mechanism that allows CDMI clients and servers to provide an assurance of their identity;

   provide a mechanism that allows control of the actions a CDMI client is permitted to perform on a CDMI server;

   provide a mechanism for records to be generated for actions performed by a CDMI client on a CDMI server;

   provide mechanisms to protect data at rest;

   provide a mechanism to eliminate data in a controlled manner; and

   provide mechanisms to discover the security capabilities of a particular implementation.

Security measures within CDMI may be summarized as

   transport security,

   user and entity authentication,

   authorization and access controls,

   data integrity,

   data and media sanitization,

   data retention,

   protections against malware,

   data at-rest encryption, and

   security capabilities.

With the exception of both the transport security and the security capabilities, which are mandatory to implement, the security measures may vary significantly from implementation to implementation.

When security is a concern, the CDMI client should begin with a series of security capability lookups (see 12.1.1) to determine the exact nature of the security features that are available. Based on the values of these capabilities, a risk-based decision should be made as to whether the CDMI server should be used. This is particularly true when the data to be stored in the cloud storage is sensitive or regulated in a way that requires stored data to be protected (e.g., encrypted) or handled in a particular manner (e.g., full accountability and traceability of management and access).

HTTP is the mandatory transport mechanism, and HTTP over TLS (i.e., HTTPS) is the mechanism used to secure the communications between CDMI clients and servers. To ensure both security and interoperability, all CDMI implementations shall implement the Transport Layer Security (TLS) protocol as described in Annex A, but its use by CDMI clients and servers is optional.

5.13   Required HTTP Support

5.13.1   RFC 2616 Support Requirements

A conformant implementation of CDMI shall also be a conformant implementation of RFC2616 (see RFC 2616) (i.e., HTTP 1.1). The subclauses below list the sections of RFC 2616 that shall be supported; however, this list is not comprehensive.

5.13.2   Content-Type Negotiation

A client may optionally supply an HTTP Accept header. If a request body is present, the client shall provide a Content-Type header. If a response body is present, the server shall provide a Content-Type header. If the client does not provide a Content-Type header when required, the server shall return a 406 Not Acceptable status code. See Section 12 of RFC 2616.

5.13.3   Range Support

The server shall support HTTP Range headers and partial content responses (see Section 14.16 of RFC 2616).

5.13.4   URI Escaping

Percent escaping of reserved characters specified in RFC 3986 shall be applied to all text strings used in URIs. This includes user-supplied field names, metadata names, object names, container names and domain names when used in URIs.

Field names and values contained within the request body and response body shall not be escaped.

EXAMPLE    A client retrieving a metadata item named "@user" from a container object with the name of "@MyContainer" would perform the following request:

GET /%40MyContainer/?objectName;metadata:%40user HTTP/1.1

Host: cloud.example.com

Accept: application/cdmi-container

X-CDMI-Specification-Version: 1.0.1

The response shall be:

HTTP/1.1 200 OK

Content-Type: application/cdmi-container

X-CDMI-Specification-Version: 1.0.1

 

{

   "objectName": "@MyContainer",

   "metadata": {

       "@user": "test"

   }

}

5.14   Time Representations

Unless otherwise specified, all date/time values are in the ISO 8601:2004 extended representation (YYYY-MM-DDThh:mm:ss.ssssssZ). The full precision shall be specified, the sub-second separator shall be a ".", the Z UTC zone indicator shall be included, and all timestamps shall be in UTC time zone. The YYYY-MM-DDT24:00:00.000000Z hour shall not be used, and instead, it shall be represented as YYYY-MM-DDT00:00:00.000000Z.

Unless otherwise specified, all date/time intervals are in the ISO 8601:2004 start date/end date representation (YYYY-MM-DDThh:mm:ss.ssssssZ/YYYY-MM-DDThh:mm:ss.ssssssZ). The end-date shall be equal to or later than the start-date. The full precision shall be specified, the sub-second separator shall be a ".", the Z UTC zone indicator shall be included, and all timestamps shall be in UTC time zone. The YYYY-MM-DDT24:00:00.000000Z hour shall not be used, and instead, it shall be represented as YYYY-MM-DDT00:00:00.000000Z.

5.15   Backwards Compatibility: Value Transfer Encoding

CDMI version 1.0.1 introduces the concept of value transfer encoding to enable the storage and retrieval of arbitrary binary data via CDMI content-type operations. Data objects created by CDMI 1.0 clients through CDMI content-type operations shall have a value transfer encoding of "utf-8", and data objects created through non-CDMI content-type operations shall have a value transfer encoding of "base64".

Data objects with a value transfer encoding of base 64 shall not have their value field accessible to CDMI 1.0 clients through CDMI content-type operations. Attempts to read the value of these objects shall return an empty value field ("") to these clients. CDMI 1.0 clients can detect this condition when the cdmi_size metadata is not 0 and the value field is empty.