More about Centera

As explained on the Centera Overview page, Centera utilizes standard SATA hard drives as a storage medium for data objects. Following are more details of Centera, its architecture, and how applications can communicate with the device for reading and writing data.

Centera Hardware Architecture

Centera is comprised of Redundant Arrays of Independent Nodes (RAIN), each of which contains a CPU, network interface, and four SATA hard drives for storage, and is interconnected with all other nodes in the cabinet via a private LAN. Each node executes an instance of CentraStar, the Centera operating software, in one of two operational modes to act as either a storage node or an access node.

Storage nodes provide the physical storage of data objects and Content Descriptor Files (C-Clip) and access nodes provide the means for interaction between the application server and the storage nodes. Throughput and storage requirements of the application usually determine how many access nodes vs. storage nodes will be configured at the time of installation.

As faster performing CPU’s and larger-capacity SATA drives were brought to market, newer generations of Centera now allow nodes to act as storage nodes and access nodes concurrently.

How Centera provides WORM functionality

The C-Clip Content Address of a data object assures the authenticity of that object. If an object is retrieved and altered, the Centera API produces a new CDF with a new content address for the altered object. The original object remains in its original form at its original content address and is still accessible by its original address.

This feature of Centera provides a level of versioning integrity that standard file servers and operating systems cannot provide.  Additionally, Centera features an operational mode where an object cannot be deleted prior to the expiration date of a defined retention period. These non-rewriteable and non-erasable properties of Centera give the Write-Once-Read-Many attributes required for compliance with SEC 17a-4, Sarbanes-Oxley, HIPAA, FDA, and many other data retention regulations.

What is Content Addressed Storage?

Content Addressed Storage is a method of data storage that stores and retrieves a data object by its content address within the storage system, rather than by its actual file name at some physical location on a hard drive.

The benefit of a content addressable approach to storage is that an object is stored in such a way that it is authenticated and unalterable.  In addition, objects cannot be deleted prior to the expiration of their defined retention period.

When an application delivers a data object to the EMC Centera, the API calculates a 128-bit “claim check” that is uniquely derived from the objects binary representation. The metadata for the object, which includes filename, creation date, etc., is inserted into an XML file called a C-Clip Descriptor File (CDF), which in turn has its content address calculated. The Centera repository then stores the object and a mirror copy.

Once two copies of the object and CDF are stored in the repository, the Content Address is returned to the application. Future access to the data object occurs when the application submits the CDFs Content Address to the Centera repository via the API, the data is then returned back to the application.

The Centera file system architecture eliminates directory structures, path names, and URL references to filenames, and only uses the C-Clip Content Address as a reference.

How applications write to Centera

Unlike a standard CIFS or NFS share that can be directly accessed via a mapped drive letter or UNC path, access to a Centera system requires the use of an API (Applications Programming Interface).

There are essentially four ways to implement the Centera API, which are described as follows:

In-house development – The Centera API toolkit is available for download from EMC, and contains all of the necessary command sets and documentation so that a programmer can “custom build” an interface directly within a home-grown application

Applications already including API support – As Centera gained popularity and the demand was generated, many software vendors across many industries acquired the Centera API and added an interface module so that their application is “Centera ready”, eliminating the need for custom programming

Centera Universal Access (CUA) – EMC introduced CUA in 2005 as a means to provide simple access to a Centera in scenarios where the application didn’t have the API already integrated, and where the user didn’t have the resources or desire to custom build their own interface.

CUA was sold as a NAS server that connected to a network infrastructure and provided a direct read/write path to a Centera system. Due to diminishing sales of CUA as 3rd-party interfaces were brought to market, as well as object handling and other limitations within the CUA file system, EMC made the decision to EOL the product.

Independent Software Vendors (ISV’s) – As an alternative to CUA, a number of software vendors developed a Centera interface or “gateway” that can be installed on a Windows, Linux, or UNIX server, and provide the same Drive letter or UNC path access that CUA offered.

A few of these Centera gateway applications include EMC Disk Xtender, QStar Archive Manager, and StorFirst EAS from Seven Ten Storage