Monthly Archives: April 2014

Atomic Commands Part III: Atomic Vectored Commands

The real value of vectored commands and atomic commands becomes apparent when they are combined into vectored atomic writes. A database update that requires multiple writes to update a record (e.g., multiple data fields or a data filed and links) will be significantly speeded if the writes can be performed in an vectored write command can be performed atomically. In this case, all of the data segments defined in the vectored command are either all completed successfully, or if an error occurs, they are all restored to their original values before the atomic write was attempted. In other words, if the vectored write operations succeed, all of the data segments will contain the new data. If not, all of the data segments will contain the data they had before the vectored write operation was attempted. This aggregation of writes along with the atomicity properties can lead to a significant improvement in data base performance.

So, what’s the problem?

First, there are issues with vectored commands around error reporting. When if one of the segment writes fails, how do you tell the initiator where the command failed? If the writes are all atomic, it doesn’t matter since they are all the old data. While it is interesting to note the position of the error for failure analysis, but that can be obtained through a variety of ways, including vendor specific methods.

Another problem is related to support for bi-directional commands. Here’s a flow diagram for a typical, non-vectored READ command:

Vectors Figure 2

And here’s the flow for a vectored read command:

Vectors Figure 3

In this case the initiator needs to send the segment descriptor list to the target (a data out phase) and then turn the bus around to receive the incoming data (a data in phase), which a bi-directional SCSI command. This has been the single biggest objection to vectored commands. Many implementations of SCSI transports were not designed to accommodate this type of bus transaction. And it’s not just a firmware change – much of this low level processing has been embedded in SCSI controller state machines.

Finally, implementing vectored atomic write operations is difficult in both traditional rotating media and in array controller systems. For these types of systems, the expense and maintenance of such functionality simply does not make economic sense. But for flash memory based storage systems typically use some sort of write logging mechanism, implementing this functionality is remarkably easy.

…to be continued…

Atomic Commands Part II: Vectored Commands

Another factor that’s important to consider in the atomic write story is the advent of PCIe based storage. Multiple vendors now produce PCIe cards that can provide terabytes of data on a single PCIe card. The NVMe interface was developed to take advantage of this class of device, but T10 was not far behind when the SCSI over PCIe (SOP) and the PCIe Queuing Interface (PQI) specifications were developed in response and are now reaching maturity. These developments have allowed database serves to become much more efficient by speeding access to significant amounts of data at much higher speeds – usually by several orders of magnitude.

With this increase in speed, the system overhead required to process each SCSI command CDB becomes a much larger part of the total time required to process a write or read operation.  With rotating media, it may take many milliseconds to process a write command due to the latency inherent in the physical media. In this case, the time required to Process the command CDB is very small compared to the overall operation. For flash devices, the access is measures in micro-seconds and processing of the CDB becomes a significant portion of completing the write operation.   

Vectors Figure 1

A solution to this problem is to define vectored read and write operations. These commands permit read and write operations to multiple data segments which do not have to be contiguous, unlike normal read or write commands. This is analogous to the scatter/gather lists employed in typical HBA interfaces and provided in the PQI and SOP specifications. Historically, there has been strong resistance to vectored commands within the T10 committee due to the complexity of error processing. It also makes little sense in rotating media where the latencies are large and a queue of many individual commands works as well and is more easily implemented and managed. But for flash memory based storage with a PCIe interface, atomics may provide a strong motivator.

…to be continued…