A 10-year forecast shows watch-sized hard disks at 20 cents a gigabyte

Make Way for the Petabytes!

As databases keep getting larger, what keeps getting smaller? Disk prices. In fact, they are dropping around 50 percent per year. If this trend continues for the next 10 years, they will have decreased by an additional factor of 1,000. Many of us with gigabyte hard drives on our desktops today would then have terabyte drives--and a gigabyte of PC disk would cost about 20 cents.

This development would have sweeping implications for the world of very large databases. Today, only the largest companies--the information technology world's elite--can afford to have a terabyte of data online. Most database vendors, racing to deal with the challenges of terabyte databases, no doubt thank their lucky stars that only a few of their customers have dared to put a terabyte online.

But if disk prices continue to decline at their present rates, every company will be able to have a terabyte of data online in less than 10 years. The larger enterprise data warehouses will be in the petabytes.

So if you muse about such matters as I do, you must be wondering if this scenario will ever become reality. To get an answer, I first talked with a few experts in disk technology and products, including Joe Molina, chair of the RAID Advisory Board (RAB), Robert Katzive, vice president of Disk/Trend Inc., and Dr. James B. Rothnie, Jr., executive vice president of marketing at EMC Corp. RAB is a consortium of 60 RAID manufacturers that develops standards and conducts educational activities. Much of the information in this column is based either on Molina's talk at Database Programming & Design's April 1996 VLDB Summit or on a subsequent interview. Disk/Trend is a consulting firm that has published leading market research and analysis on the disk industry for 20 years. EMC is a leading manufacturer of disk storage systems.

DISKS IN THE PAST
IBM Corp. introduced the magnetic rigid disk drive in 1956. It had a two-foot diameter disk, stored 2.5 characters of data per cubic centimeter, had a 100-millisecond access time, and cost $35,000 per megabyte. By 1963, the diameter was reduced to 14 inches, which was then widely used for large system disk storage for almost the next 30 years. A rarity in the early 1960s, the disk drive gradually grew in popularity. In 1975, 100,000 were shipped.

DISKS TODAY
Today the leading-edge hard disk is 3.5 inches in diameter, stores about six million characters of data per cubic centimeter, has a 12-millisecond seek time, and costs 20 to 25 cents per megabyte.[1] So cost has declined by a factor of up to 350,000, and physical storage efficiency has increased by a factor of about 240,000. About 87 million disk drives were shipped in 1995. Of these, 85 percent were 3.5-inch drives.

DISKS IN THE 21ST CENTURY
As part of his work for RAB, Molina develops consensus forecasts concerning technology and market factors of interest to the members. As far as it extends, RAB's forecast is consistent with continuing annual price declines of 50 percent for raw disk. When asked about my 10-year scenario for a thousandfold decrease in disk prices, Molina said RAB has no process for looking 10 years ahead. But he thinks that annual price declines of roughly 50 percent are likely for the next three to five years.

Perhaps more interesting than any particular figure is Molina's vision of where disk manufacturing is going in the long term. He says that the advances seen in disk technology over the last several years, such as increased areal density and improved performance, are intimately related to miniaturization.[2] And miniaturization is something we will see a lot more of.

Wristwatches provide a good analogy. At first, clocks could not be made small enough to wear on the wrist. But today, the internals of virtually all watches in the world are made in just a few massive factories in an extraordinarily efficient, high-quality manufacturing process focused on miniaturization. The whole works is turned out as a single unit. The volumes are staggering. Costs per unit are many times lower than even a few years ago.

Within a few years, disk drives will be turned out like watches, with somewhat similar manufacturing processes, focusing on miniaturization, very high volume, quality, and efficiency. RAB predicts that disk drive shipments will increase to 200 million units in 2000 and 350 million units in 2005. These volumes could be in the same range as the annual production of watches.

Remarkably, the physical devices will be about the same size, too. The watch I am wearing is about 1.375 inches in diameter and 10 millimeters thick. According to Molina's forecast, early in the 21st century (remember, the new century is only three years away!) we'll see single-platter head disk assemblies (HDAs) around one inch in diameter and five millimeters high--smaller and considerably thinner than my watch!

By 2000, Molina expects the leading-edge devices to sport rotational speeds of 12,000 RPM, resulting in an average latency of 2.5 milliseconds. These watch-size disks will have a projected capacity of a half to one gigabyte. Molina believes they will be mounted on printed circuit boards. Sixteen will fit on the same size board used by the 5.25-inch disk (about 638 inches), around which many cabinets are engineered today. These printed circuit disk drives will require no cabling or connectors, because all the wiring will be printed on the board.

Early in the 21st century, drives under 2.5 inches will still command a price premium at a projected 7.5 to 10 cents per megabyte. At that point, the 3.5-inch drive will still be the most popular size at a price of 1.5 to three cents per megabyte. However, at the storage subsystem level the sub-2.5-inch printed circuit drive will begin to close in on price because of its reduced cost of integration.

Katzive's view of likely developments seems consistent with the one Molina outlined. Katzive points out that the primary driver here is areal density, which has increased at an average of 60 percent per year since the early 1990s. Katzive believes this trend is likely to continue at roughly the same pace at least through the end of the decade. However, this growth may begin to slow down early in the next decade as various mechanical and design barriers are confronted. Still, Katzive sees areal density increasing from today's maximum of 1.3GB per square inch to about 10GB per square inch by 2000. Such an increase will mean that a typical 3.5-inch hard drive (which holds 1GB today) will hold 6 to 12GB of data in a little over three years.

A side effect of the progressive miniaturization of disk drives is that they will become practical for pocket devices, such as personal communicators, early in the 21st century. So in less than 10 years you might be walking around with a gigabyte disk in your pocket. What would Mae West say?

ENGINEERING ISSUES
The ongoing miniaturization of disk drives raises new engineering issues that will have to be resolved in the next few years. Capacity gains and cost reductions result primarily from increases in areal density. But as areal density increases, the distance between the read/write head and the recording surface must decrease. In the past, the heads have been "flown," riding on a thin cushion of air just above the surface. Soon, they will have to operate in contact with the surface. Contact recording will introduce issues of wear and heat dissipation. Although these problems are not yet resolved, solutions should be found within the next several years. And if they are, we should continue to see the rapid capacity advances and price reductions we've seen lately.

Although Molina's forecast does not extend 10 years into the future, I know of no physical barriers to continued increases in areal density within that time frame. Molina said that disks are "not yet near" the physical limits associated with atomic distances or electromagnetic wavelengths that microprocessors will face in the next decade or two. Although engineering challenges exist, it is "not unreasonable" to think that disk prices will decline by a factor of 1,000 in the next 10 years. According to Molina, then, my scenario is "not inconsistent" with history.

RAID
The extraordinary ongoing saga of density and drive mechanics is only part of the story. Tremendous momentum is also building behind the electronics and software engineering going into Redundant Arrays of Independent Disks (RAID).[3] RAID has been gaining popularity at an extraordinary rate. Its fundamental precepts are:

RAID STORAGE MANAGEMENT
Most disk drives today are subject to commodity pricing, but RAID storage subsystems are not a commodity. Their capabilities vary widely with respect to data protection, data availability, performance, and storage management support. And their prices range from 2.5 cents to almost five dollars per megabyte, depending on capabilities, architecture, warranty, service, and other factors.

Although large price variations might seem unreasonable, they exist because of the high value associated with solving storage management problems, and because they provide a wide range of data protection and data availability alternatives. We are accustomed to thinking that the largest cost associated with online data is its physical storage on intricately engineered spinning disks. However, the long-term trend is toward reduced physical storage costs and increased management costs. Dealing with issues such as configuration, space allocation, backup, recovery, and tuning is immensely expensive in terms of labor, lost data, and other factors.

Industry estimates show that the capital cost of storing a megabyte on disk currently averages one dollar. By comparison, management costs an average of seven dollars per megabyte per year. Downtime costs an average of $500 per megabyte per year. Although these figures vary considerably by application and the data's business value, they still indicate the shape of things. Controlling large, complex storage systems and keeping the data available costs a lot more than simply placing it on disk.

Recognizing this fact, manufacturers are aiming to add value over the next several years through increased automation of functions related to storage management.

STORAGE SYSTEM DIRECTIONS
Today, setting up a large database on a storage system with RAID technology is a major undertaking. Database administration personnel must configure and partition the array, make decisions about the RAID level (for example, mirroring a la RAID-1 or parity groups a la RAID-5), make decisions about striping, and so forth. Performance experiments, measurements, and tuning processes--less a science than an art--typically follow the initial decisions.

RAID manufacturers are attempting to simplify this process by enabling the storage subsystem to perform more of these functions with little or no human interaction. For example, Hewlett Packard Co. has introduced a RAID array capable of dynamically configuring the RAID level by observing the data access pattern. In general, according to RAB, the industry is moving to eliminate the need for users to think about RAID levels. Instead, users (or applications acting on their behalf) should be able to specify their performance and availability requirements and let the storage system configure a solution.

Similarly, the direction is to have the system take more responsibility in delivering availability. For example, by about 2000 the hardware should be able to predict most drive failures in advance. If so, the system can construct a twin for the drive prior to its failure, make the failing drive a spare, and see that it is replaced before it actually disrupts operations or results in data loss. With such an approach, systems will have less frequent need for complex and time-consuming recovery processes.

RAID originally evolved primarily around drive failures. For applications that need yet more data protection and can afford it, another industry trend promises similar protection for all components--controllers, power supplies, fans, cache, and so forth. All components can be replicated and become hot pluggable (meaning that they can be unplugged and replaced while the RAID system continues to operate). In time, it may also be possible to predict failure for all components.

THE FUTURE OF STORAGE
According to Dr. Rothnie of EMC, the storage systems EMC delivers today incorporate processing power that is roughly the same as a Cray Y-MP delivered in 1992. That is, what we think of as a collection of disks actually contains a supercomputer focused on storage management. This capability is employed for such tasks as aiding in configuration, diagnosing problems, predicting failures, and supporting advanced flavors of business continuance, such as remote mirroring and remote copying.

Hundreds of EMC installations already exist in which the storage system maintains backup copies of data hundreds of miles from the main site. As a result, companies can protect critical data from earthquakes, fires, and floods.

Two key themes in EMC's direction are network-connected storage and shared data, both of which are made possible by storage systems' rapidly growing processing power.

The idea behind network-connected storage is that data--and therefore the devices that store it--is no longer peripheral to a central processor. Rather, it is the center of the system and should therefore reside in autonomous storage systems directly connected to the network, where all users can readily access it. This approach naturally leads to the idea of shareability. Data cannot really be owned exclusively by MVS, Unix, or NT hosts but must be shareable by all.

Thus, we are entering an era in which the very architecture and role of the storage system in the network is growing and changing.

VLDB IMPLICATIONS
The rapidly improving economics and performance of disk-storage systems have played a major role in the emergence of VLDBs. Without the huge gains we have seen, surely no one could afford terabyte data warehouses.

To the extent that disk storage advances enable much larger databases, it looks as if we're in for more of the same over the next five to 10 years. Petabyte data warehouses appear to be just around the next bend. And it won't be long before every company can afford to have a terabyte of data online.

Storage system manufacturers are preparing to make these enormous databases economical, physically more compact, more available, higher performing, and more easily manageable. Still, a few problems will be left for us in the database field to grapple with.

In the past several years, data transfer rates and disk I/O rates have not risen as rapidly as capacities. It is clear that improvements are ahead in these areas, but they are not expected to keep pace with the relentless, exponential gains in capacity and density.

From a database perspective, there's good news and bad news. The good news is that we'll be able to store more data, keep it available with less difficulty, and get help in managing those ever deeper oceans of data. The bad news is that we'll have to keep getting smarter in terms of how we search, access, and update that data. But to those of us who really like VLDBs, that's the fun part.

 

REFERENCES
1. Seek time is the time needed to move the head to the selected track. Latency is the time needed to rotate to the selected address on the track. Access time is the sum of seek time and latency.
2. Areal density is the product of recording density (bits per linear inch of track) and track density (tracks per inch).
3. When first introduced in the 1980s, RAID was more commonly referred to as Redundant Arrays of Inexpensive Disks.

Richard Winter is a specialist in large database technology and implementation. He is president of Cambridge, Massachusetts­based Winter Corp., an international consulting practice that advises executives on large database strategies, parallel architectures, risk management, and critical implementation projects. He can be reached via e-mail at [email protected] or by fax at (617) 547-1431.



This is a copy of an article published @ http://www.dbpd.com/