The data warehouse manager�s job is potentially huge, offering many opportunities and just as many risks. The data warehouse manager has been given control of one of the most valuable assets of any organization: the data. Furthermore, the data warehouse manager is expected to interpret and deliver that asset to the rest of the organization in a way that makes it most useful. All eyes are on the data warehouse manager.
In spite of all this visibility, many newly appointed data warehouse managers are simply given their titles without a clear job definition or a clear sense of what is and is not their responsibility. As an industry, we have been groping for a definition of the data warehouse manager position. Perhaps this was appropriate in past years because we needed to define the job. We needed time to get some accumulated experience. We needed to test the boundaries of what being a data warehouse manager means. I think we are somewhat overdue in defining the data warehouse manager�s job. It is not sufficient or helpful to just say that a data warehouse manager�s job is to "bring all the data into a central place and make it available for management to make decisions." Although such a definition may be correct, it isn�t precise enough for anyone to tell if the data manager has done the job well. In this column I�ll begin to tackle the data warehouse manager�s job definition. I will suggest a metaphor for the job that, if accurate, may provide a rich set of criteria to judge a "job well done." Furthermore, a clear definition will help senior IS executives understand what the data warehouse manager needs to do and, just as importantly, what things a data warehouse manager should not have to do.
A good metaphor for the job of the data warehouse manager is the job of an editor-in-chief. Think about what an editor-in-chief of books, magazines, or newspapers does. At a high level, the editor-in-chief:
These statements seem a little obvious because we all know, based on experience, what the job title "editor-in-chief" implies. And most editors-in-chief understand very clearly that they don�t create the content about which they write, report, or publish. They are, rather, the purveyors of content created by others.
I hope that you have been struck by the many parallels between the job of an editor-in-chief and the job of a data warehouse manager. Perhaps a good way to sum this up is to say that the job of the data warehouse manager is to publish the enterprise�s data.
Let�s examine the parallels between these two jobs. In most cases, the data warehouse manager is aggressively pursuing the same goals as the editor-in-chief. In some cases, the data warehouse manager could learn some useful things by emulating the editor-in-chief. At a high level, the data warehouse manager:
In addition, the data warehouse manager has a number of responsibilities that most editors-in-chief do not have to think about. These special data warehouse responsibilities include backing up all the data sources and all the final, published versions of the data. These backups must be available � sometimes on an emergency basis � to recover from disasters or provide detail that wasn�t published the first time around. The data warehouse manager must deal with overwhelming volumes of data and must diligently avoid being stranded by obsolete backups. The data warehouse manager must replicate published data in a highly synchronized way to each of the downstream "publications" (data marts) and provide a detailed audit trail of where the data came from and what its lineage and provenance is. The data warehouse manager must be able to explain the significance and true meaning of the data and justify each editing step that the data staging area may have performed on the data before it was published. The data warehouse manager must protect published data from all unauthorized readers. Of all the responsibilities the data warehouse manager has in addition to the classic editorial responsibilities, this security requirement is the most nightmarish; it is also the biggest departure from the editing metaphor. The data warehouse manager must somehow balance the goal of publishing the data to everyone with the goal of protecting the data from everyone. No wonder data warehouse managers have an identity problem.
In the discussion of the traditional editor-in-chief�s set of responsibilities, we remarked that nearly all editors-in-chief understand that they are merely the purveyors of content created by others. Most editors don�t have a boundary problem in this area. Many data warehouse managers, on the other hand, do. Frequently, the data warehouse manager agrees to be responsible for allocations, forecasts, behavior scoring, modeling, or data mining. This is a major mistake! All these activities are content creation activities. It is understandable that the data warehouse manager is drawn into these activities, because, in many cases, there is no prior model for the division of the new responsibilities between IS and an end-user group such as finance. If an organization has never had good allocated costs, for example, and the data warehouse manager is asked to present these costs, then the data warehouse manager is also going to be expected to create these costs.
The data warehouse manager should treat allocations, forecasts, behavior scoring, modeling, and data mining as clients of the data warehouse. These activities should be the responsibilities of various analytic groups in the finance and marketing departments, and these groups should have an arm�s length relationship to the data warehouse. They should consume warehouse data as inputs to their analytic activities and, possibly, engage the data warehouse to republish their results in a data mart when they are done. But these activities should not be mixed into all the mainline publishing activities of the data warehouse.
Creating allocation rules that let you assign infrastructure costs to various product lines or various marketing initiatives is a political hot potato. It is easy for a data warehouse manager to get pulled into creating allocations because it is a necessary step in bringing up a profit-oriented data mart. The data warehouse manager should be aware of the possibility of this task being thrust on the data warehouse and should tell management that, for example, the "data warehouse will be glad to publish the allocation numbers once the finance department has created them." In this case, the editorial metaphor is a useful guide.
Beyond the boundaries defined by the editorial metaphor, we must add the responsibilities of backing up data, auditing the extract and transformation processes, replicating to the data marts, and managing security. These tasks make the data warehouse manager�s job more technical, more intricate, and at the same time, broader than the job of an editor-in-chief.
Perhaps this column can stimulate the development of criteria for data warehouse manager training and help IS executive management appreciate what data warehouse managers face in doing their jobs. In many ways, many of the responsibilities discussed in this article have been implicitly assumed, but the data warehouse managers have neither had them spelled out, nor been compensated for them.
Finally, by focusing on boundaries, we can see more clearly some content creation activities the data warehouse manager should leave on the table. Allocating, forecasting, behavior scoring, modeling, and data mining are valuable and interesting, but they are done by the readers (end users) of the data warehouse. Or, to put it another way, if the data warehouse manager takes on these activities, then the data warehouse manager (of course) should get two business cards and two salaries. Then the boundaries can be twice as large.