A data warehouse, by its very nature, creates a security conflict. On the one hand, the goal of every data warehouse is to make valuable data accessible. Most of the hard issues in data warehousing have to do with how to make the data more understandable, more available, and easier to access. Most organizations have invested in significant networks to link together hundreds, if not thousands, of computers that all have access to thedata warehouse. And almost all organizations have multiple links into and out of the Internet. On the other hand, the world seem seems to be filled with hackers, crackers, and industrial spooks of various kinds who are constantly mounting attacks on our networks, our email, our hardware systems, and ultimately our data warehouses. These hostile individuals range from the legendary 14-year-olds who view cracking a computer system as a game, to adults trying to manipulate business information fraudulently, to merely curious employees who can't resist the temptation to explore.
In many ways the average data warehouse team still lives in a world of naive innocence. The team is so busy sourcing data and deciding on hardware and software that a comprehensive security plan simply hasn't been done. Also, the design of data warehouse security often falls through the cracks. There is a tendency to leave network security up to someone else, and simultaneously most data warehouse teams assume that their DBMS security features will be "enough" to handle their needs.
The state of affairs is really pretty ugly. Many corporate networks are so large and complicated that no one really understands how many access points they have or who all the entities on the networks are. In many cases, sensitive information is lying right on the table and hasn't been abused only because no one has tried to grab it yet. The situation is similar to leaving a car unlocked in a shopping center parking lot. You might go for years without having the car broken into just because the thieves have not turned their attention to it yet.
Even though it is easy to paint a bleak picture of the possibility of ever having adequate data warehouse security, a lot can be done. The first step is a data warehouse security plan. The emphasis here is on the data warehouse, not just the network. The data warehouse team must have a security architect whose job is as complex and far-reaching as the master data architect's job. The security architect must plan for:
The second step for getting security under control is to start using security technology in a serious way. A surprisingly large and vigorous security industry offers a variety of powerful solutions. With the right technology and the right design consultants, you can assemble a security system for a data warehouse that will balance your users' needs to access the data against your organization's needs to keep the data confidential. A state-of-the-art system that can be implemented today might look like Figure 1.
In Figure 1, a remote client (upper left) identifies himself or herself to the client machine through the user identification subsystem by using a simple password, smart card, or hardware token. Once properly identified, the user can open an encrypted connection over a physically insecure external network, such as a public telephone line or the Internet.
The link-encrypted session is highly resistant to eavesdropping and spoofing. The receiving end of the connection is handled by a firewall router that filters out packets from unknown external clients. The firewall router then passes approved packets to an authentication agent server. The authentication agent grants access credentials to the requesting client. These access credentials can then be presented to the access-control server, the application server, and ultimately the data servers to allow specific data to be accessed.
An internal client must run the same gauntlet with the authentication server and the access-control server. The only difference is that the internal client is within the secure Intranet defined by the firewall. Data can actually be accessed only if the access-control server also approves the operation being requested by the authenticated user. For instance, even though the user may be properly authenticated, the user may not have read privileges to certain data elements. A sophisticated access-control server may also be able to implement "single sign on" that allows the requesting client to identify himself or herself to the data warehouse environment exactly once in a session with a single password.
The application server is a typical middle layer in a data warehouse environment, such as Hewlett-Packard Co.'s Intelligent Warehouse or Information Advantage Inc.'s DecisionSuite Server. The application layer must acquire the proper authenticated access rights from the upstream security servers, and then it can access some or all of the physical data servers to satisfy the user's requests.
Perhaps the biggest architectural insight from Figure 1 is that much data warehouse security is handled outside of the relational DBMS. The relational DBMS certainly enforces access privileges at the read/write level, but many of the important issues are handled upstream by the authentication server and the access-control server.
It is beyond the scope of this article to do justice to all of the vendors offering security products that are interesting to the data warehouse community. However, the Internet has a number of very good sites with listings of vendors, and all of the vendors in turn have their own sites. One good site for certified firewall products is the National Computer Security Association listing at www.ncsa.com/fpfs.
The best overall site I have found for listing vendors and commercial products for security is Rodney Campbell's site at the Telstra Corp. in Australia. The Web address is www.telstra.com.au/info/security/vendor.html.
The best textbook on security issues, in my opinion, is Charles Pfleeger's Security in Computing from Prentice Hall (www.prenhall.com). Be sure to pick up his second edition, which has a 1997 publication date. I like this book very much. It can be read at two distinctly different levels. The high level, suitable for data warehouse architects, explains the current science and technology of computing security in a very readable way. Many interesting asides in Pfleeger's book illuminate the history, politics, and motivations behind the technology. The book is also quite an eye-opener regarding the various security risks confronting the owner of a data warehouse.
The second level of Pfleeger's book is a series of detailed explanations of key algorithms that implement security mechanisms, especially encryption. This level of the book would be a suitable basis for an advanced undergraduate computer science course. If you are not an undergraduate anymore, you can, like I did, smoothly glide past these detailed explanations with Pfleeger's encouragement.
If you keep your copies of DBMS, go back to the January 1997 issue. The Internet Systems supplement contains a very elegant glossy foldout, courtesy of Open Horizon Inc., that has a security network diagram consistent with Figure 1. (If you can't find the foldout, you can get a new one by calling Open Horizon at 415-869-2200.) The foldout shows most of the current security buzzwords. If you use Pfleeger's book to learn what all the buzzwords mean, you will be in a good position to hire a consultant to help you design your data warehouse system to be resistant to hackers, crackers, and industrial spooks.
