Mastering the Yin and Yang of Information Governance Part 2

The first part of this post described how Information Governance consisted of two domains: the first, Performance, concerns itself with the contribution of Information Technology to the organization’s competitive position, the second, Conformance, concerns itself with the compliance of the organization’s Information Technology to laws, regulations as well as the voluntary compliance standards the firm may adopt.

These two domains, while largely different, share commonalities, but more important, it is the CIO who is responsible for melding both aspects of governance into a single strategy and executing against it. It is this balancing of what are often opposite forces that requires a CIO to master the Yin and Yang of Information Governance.

The second part of this post deals with how to successfully balance the Conformance and Performance aspects of Information Governance.

How the Conformance and Performance functions differ:

Information Governance Performance exists on a continuum of return on value. It is judged based on its enablement of the firm’s strategy. Does the governance of IT result in a price advantage to the firm or provide a differentiating strategic capability to the firm? There is a clear opportunity cost to IT Performance. Most of the IT budget will be allocated to maintaining shared functions, so only a small portion of the budget is available to be used for enabling differentiating capabilities. Assuming a cost leadership strategy based on IT is also difficult since most IT capabilities are readily available for adoption by your competitors. In the end, IT leadership is as difficult and as rare as business leadership. In devising your IT governance strategy you have complete freedom bounded only by your imagination, budget and capacity to create change in your organization and in your market.

Information Governance Conformance is closer to a binary test, does the IT function comply with the required regulations and laws for IT in your industry and in the jurisdictions in which you do business? In most firms, there are far more compliance needs than there is available budget to deal with compliance. Compliance needs must be filtered through a risk assessment to deal with the most serious risks first. The risks themselves should be filtered through the firm’s ‘risk appetite’. If your business has been found in violation of a requirement to retain records or protect privacy, you will likely be judged more harshly for subsequent violations. In devising strategies to achieve compliance, you are more limited in your choices. In the end, your choices are those that will be accepted by regulators, judges and juries.

How the Performance and Conformance functions are alike

Notwithstanding the differences between Performance and Conformance, they share three common foundational approaches as well as the need for some common tools. Here is where the Ying and Yang balance comes into play.

IT Inventory

As the old adage says, “You can’t manage what you cant measure, you can’t measure what you don’t understand, you can’t understand what you haven’t defined.” You would not try to manage a business’s funds without a complete dynamic knowledge of the firm’s cash flows in and out. You would not try to manage the firm’s human resources without a dynamic knowledge of its employees. You would not try to manage a supply chain without a complete list of your vendors. So how in the world do you think you can manage a firm’s IT without an inventory of what information you have and where it is? I am amazed how often my clients have no clear picture of their Information and IT resources.

An IT Inventory is required to support both the Performance and Conformance domains of Information Governance. You cannot begin to measure the benefit of information or its compliance without a current inventory of Information, Hardware and Software assets.

A series of overlapping inventory measures are needed to assure the conformance of IT, among these are:

    • Data Map: A Data Map is a requirement for eDiscovery. Counsel need to be prepared to identify where potentially relevant ESI resides and who the custodians of that ESI are. For the purposes of both discovery and to comply with cross-border data transfer prohibitions, you need to know the geography/jurisdiction in which the data resides.
    • Records Retention Schedule: Every firm needs a Records Retention Schedule, but that is just the beginning. You can’t manage the retention of electronic records without knowing the applications that contain the records as well as the custodians responsible for managing it.
    • Private Information Inventory: Given the requirements to protect private information, you need to know where the private information is.
    • Essential Information Inventory: The foundation of a functioning disaster recovery plan requires knowing what information and applications are essential and insuring that provisions for backup and recovery of these are sufficient for your purposes.

Tools available for creating and managing information and IT inventories are not as mature or integrated as we would like, but are getting better all the time. Mature application portfolio management suites are available from IBM, CA, HP and others. Most of the functions in these tools are geared more to the Performance than the Conformance domains. To these tools you can add and integrate Conformance Management tools from vendors such as PSS and Exterro. The latter tools provide functions for Information Policy Management, Data Mapping, Litigation Hold and Privacy Mapping (though to implement privacy mapping you may well need a Data Leak Protection application). One of the most difficult tasks in bringing your applications under control is to identify the actual contents of your applications, tools such as IBM’s Automated Content Assessment suite and IBM’s Data Discovery suite can be used for these purposes.

Balanced Scorecard:

As Frederick the Great said, “He who defends everything, defends nothing”. A Balanced Scorecard is the best means I know of to focus on those things that are most important to a business — including the value management of its Information portfolio as well as its information compliance efforts. The Balanced Scorecard was developed by in the early 1990s by Robert Kaplan and David Norton (KPMG Peat Marwick/Nolan Norton & Co., where I worked during the time it was developed.) The Balanced Scorecard is imperfect, but the best tool available to translate strategy into action.

There is so much information about Balanced Scorecard available that I won’t go into detail here. Suffice it to say that the two elements that make a Balanced Scorecard so valuable for both the Performance and Conformance functions is that the Balanced Scorecard requires you to prioritize your objective into a small, actionable list and to devise a set of metrics to achieve these objectives. A Balanced Scorecard forces one to filter their strategic priorities and measures the progress towards success through a combination of quantitative financial,operational and satisfaction measures.  As Kaplan recently noted in an interview, the recent financial crisis underscores the requirement to supplement these measures with a set of ‘Risk” metrics with account for a range of business risks including those risks which though rare can have a material impact on the business — what the author Nick Taleb calls “Black Swans”.

It is no surprise that while Balanced Scorecard is the most widely used strategic framework in the Fortune 1000 with more than 53% of firms using it, according to a recent Foster Research paper on the transformation of the CIO function, the most widely used IT strategy tool is ITIL:

What could say more about the disconnect between business leadership and IT leadership  than their divergence on the tools used to run their respective domains?  Moreover, while a balanced scorecard is highly customized to a business’s particular competitive environment, ITIL is a cookbook for IT.

IT Chargeback:

According to a recent Forrester survey, half of IT shops have no mechanism at all to charge back IT costs to business units and only 25% charge back all of their IT costs to business units. It’s no wonder that IT budgets are out of control and business units prioritize all IT needs as “high” thereby frustrating both Performance and Conformance efforts. I have seen this in practice in my consulting work. You talk to the business unit representative and they tell you that they need to keep every bit of information forever, yet when you talk to their IT support people they tell you that the business never asks for any information more than a year old and relies on the data warehouse for long term views. These same organizations now have warehouses full of tapes, not to mention Records Management and eDiscovery nightmares.

IT Chargeback underscores how Performance and Conformance can be brought into a complementary balance. Reducing information to only that which provides real value to the business reduces the cost to the business which frees up funds to spend on IT capable of providing differentiation. At the same time, reduction of information simplifies most compliance tasks. IBM and AIIM both refer to something they call “Information in the Wild” or uncontrolled information. Information which is worth being retained is worth bringing under control. If the information is not worth bringing under control, then the only question is if you have a compliance need for it.

IT chargeback is where you will find the money to pay for the Performance and Conformance improvements you want to make as well as the lever to accelerate the change to governable information. The movement to Cloud Computing will make it easier to implement chargebacks. Most internal IT shops have neither the time or inclination to develop robust IT provisioning and chargeback infrastructures, most Cloud vendors have spent considerable resources developing the means to support multiple flavors of chargebacks.


The Yin and Yang of Information Governance is all about carefully selecting those handful of initiatives which yield improved information governance performance and then quantitatively measuring your success against your goals. Jazz musicians often say, “It’s not about the notes you play, it’s about the notes you don’t play”. The same is true of information governance strategy, you can only afford to play a few notes so choose them very carefully.

Posted in Information Governance | 1 Comment

Mastering the Yin and Yang of Information Governance Part 1

I just launched my new firm, Strategic Governance Solutions, so this is a propitious time for me to lay out my views on Information Governance. Information Governance is a very ‘hot topic’. All of the leading enterprise IT vendors and IT consulting firms have issued Information Governance frameworks and a broad range of tools to support Information Governance are available in the market with new announcements every quarter. Yet, Information Governance is a gigantic, amorphous topic that, at its core, encompasses the value, or lack of value, of every dollar spent by on IT by corporate America. It also encompasses all of those things Compliance Guys worry about. Indeed, it is the duality of purpose of Information Governance that makes it so difficult to do well and calls for creating a balance between Information Governance Yin and Yang.

The Two Worlds of Information Governance

Most people have heard the the Indian tale of the Blind Men and the Elephant, where each of them touch various parts of the elephant. The man touching the tail concludes that it is a rope, the one touching the ear a fan, the one touching the leg a pillar. Of course all are right and none are completely right.

So too with Information Governance. Ask Data People what Information Governance is and they will talk about the need to ensure the efficient and efficacious use of the enterprise’s data or the need to improve application development through the use of Master Data Management. Ask the same question of Compliance People and they will talk about Privacy, Security, Retention Management and eDiscovery with their own shopping list of tools to aid in this endeavor. Ask the IT Strategy People about this and they will talk about the need to create competitive advantage from IT focusing on the methods and tools to ensure that competitive advantage and business value drive IT investment. They are all correct, yet none is completely correct.

The Blind Men and The Elephant metaphor also extends to most IT vendors working in the Information Governance space, Compliance Tool vendors see it as a compliance play. Even within this sphere, vendors who work in any particular segment of the market, security and privacy or retention management or data management appropriate the entire information compliance label despite the fact they are not covering all areas of information compliance.

The truth of the matter is that Information Governance is so crucial and so difficult precisely because it requires the management of the entire elephant, not just the individual components. The ISO standard for Corporate Governance of Information Technology sets forth the scope of IT Governance as consisting of two realms – Conformance and Performance.

Conformance concerns itself with abiding by the rules, regulations and laws governing the use of IT. Within the purview of this domain are issues such as:

  • Security
  • Privacy
  • Spam
  • Trade Practices
  • Intellectual Property
  • Record Keeping
  • Safety & Health
  • Accessibility
  • Libel & Slander
  • Environmental Standards
  • Social Standards
  • Political/National Security Standards

In essence, this is a Policing Model of IT Governance

Performance is all about extracting from the IT Investment the most business value for the organization. While there is a long list of examples I could offer it all boils down to supporting, enabling or, if you are really good or lucky, creating the basis of the company’s competitive strategy. Unfortunately, for most companies, this ends up being interpreted as creating and managing IT as a shared service – in other words, making sure the train runs on time at the lowest cost. I call this the Stewardship Model of IT Governance. In great companies, Information Technology Performance means more than just Stewardship, it means that IT is an indispensable catalyst for the company’s strategy, whether that strategy is ‘Low Cost’, ‘Differentiation’ or ‘Unique Niche Provider’. I call this the Innovation Model of IT Governance.

Now, I don’t need to tell you that the kinds of personalities, skills and educations typically associated with the Police who run Conformance, the Stewards who manage shared services and the Innovators who create new and improved basis of competition tend to be very different. It’s not an iron clad rule, but it is a tendency. There are highly innovative lawyers and those who just fill out forms, there are clock punching Stewards and those whose dedication to customer services make their shared services a hard to duplicate competitive advantage. There are those who are ‘Innovators’ in name only, taking credit for innovation when all they have done is mindlessly follow someone else’s strategy in a way that is all too easy for a competitor to duplicate.

The Information Governance model is not unique. Governance Models for corporations as a whole and their subordinate functions including those for Finance, Supply Chain, HR, etc all have the same basic features and the same dichotomy between Performance and Conformance.

Smart leaders know how to meld all of these types into a cohesive, effective team. Effective Information Governance requires mastery of both of its realms and a balancing act between Performance and Conformance. Balance between Yin and Yang is the essence of Tai Chi and the essence of Information Governance.

My career has been spent almost equally divided between helping clients obtain the most strategic business value for their IT investment and helping them address the broad range of compliance requirements for IT. In Part 2 of this post, I’ll lay out the differences and similarities in dealing with Performance vs. Conformance as well as the best methods to enhance and blend the combined functions.

Posted in Compliance, Information Governance | Leave a comment

Jeopardy, eDiscovery & Records Management Revisited

Last year I wrote a blog post on IBM’s Risk Technology site about the application of natural language/semantic computing tools to endeavors as diverse as answering questions on the TV show Jeopardy- which IBM is currently working on – as well as the use of the same type of tools for Records Management and eDiscovery. In that post, I asserted that we have reached a point where computers are at least as good at people at tasks such as categorizing documents as records or non records or for classifying documents for purposes of eDiscovery. A recent study provides further proof for this assertion.

In Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, published in the January 2010 issues of the Journal of the American Society for Information Science and Technology, authors Roitblat, Kershaw and Oot (all members of the Electronic Discovery Institute) set forth the results of a study they conducted comparing human review of documents collected for eDiscovery vs. computer review of the same collection. They concluded “on every measure, the performance of the…computer systems was at least as accurate…as that of a human review”

The Roitblat, et. al., study has implications even beyond those set forth in their paper, so lets look in a bit more detail at their methodology.

Roitblat started with a collection of 1.6 million deduplicated documents that had been collected for the Department of Justice/FTC antitrust review of the Verizon acquisition of MCI. 225 attorneys, representing Verizon at a cost of $13.6 million, conducted the original review of the documents. After the Verizon review, 176,000 items were produced to DOJ.

From the original collection Roitblat selected a random sample of 5,000 documents that they had reviewed by 2 human review teams from two volunteer litigation support firms. The researchers also used two volunteer litigation support firms to do a computer software categorization and selection of responsive documents. As the researchers describe in statistical detail in their paper, the differences between the original human review vs., the two human re-reviews vs. the 2 machine categorizations were not statistically significant. Based on these results, the authors conclude that machine categorization is at least as accurate as human review and thus is “reasonable” and compliant under the Federal Rules of Civil Procedure.

The Roitblat, et al study is a significant empirical advance for advocates of the use of machine classification for eDiscovery, but as I said above, it has implications beyond those set forth in the paper:

Vendors and Software were not critical factors

Interestingly, both litigation support vendors and both machine classification engines achieved similar results. While every vendor will try to convince you of their ‘unique’ capabilities, it appears that any of the leading litigation support vendors and any of the widely used machine classification tools will do an acceptable job. This should not be too surprising; most vendors use variations of the same families of algorithms in their classification engines and most of these algorithms have the same academic roots. (Indeed, a recent article points out that most eDiscovery engines use the same OEM and open source components) Where the engines will vary most (when using the same family of algorithm such as two different implementation of Bayesian logic) is in their efficiency and speed. Where vendors will vary most is in the overall quality and skills they bring to the management of the project.

Two engines might be better than one

In my earlier Jeopardy blog post, I noted that one of the advantages of open standards for text mining tools is they enable one to combine different text mining and classification tools. The Roitblat study points to the potential value of doing this. In the good old days of data entry shops, before Optical Character Recognition, vendors widely used a technique called ‘double keying’. Two operators would key the same information, if both versions agreed, the chance of both making the same keying error were so remote that not further review was necessary. The double keying principle can be applied to text classification as well. Run the same document through two different engines, if they both agree, no further review is necessary. Only on those documents on which they do not agree, do they need higher levels of human review. Vendor engines that conform to standards such as UIMA or GATE simplify the task of creating a voting algorithm among multiple classification engines or between two different types of algorithms available from the same engine.

Even with machine classification, there is no such thing as fully automated eDiscovery project

A key to accuracy in both the Roitblat study and in the recent NIST TREK studies is the use of an expert ‘adjudication’ review team to assess and normalize differences in classification among the various methods. Given the often subjective judgments that determine categorization, adjudication often works to fine tune the classification engines, yielding better results. Perhaps an even more important reason to have a subset of the documents classified by a skilled human review team is to demonstrate the equivalency of the automated results to the human results, thereby supporting the argument that the software is a reasonable substitute for a completely human review for the particular documents and issues in the immediate case. Finally, in keeping with best practices for achieving quality in eDiscovery (see the Sedona standards on this), there is probably no way around conducting a human review on a random, statistically significant sample.

Not only are computers better than humans at document classification, they are less expensive and more efficient

Roitblat noted that the original attorney review for the 1.6 million documents was $13.6 million or $8.50 per document. The cost of automated document classification continues to drop, I estimate that processing and classification from a reputable vendor is now in the range of $700-$1,000 per gigabyte, so in the Verizon case the total cost for the 1.3 Terabyte collection would have been in the range of $4-$6 per document. In addition to a lower cost per document, automated review takes far less time. The original human review of the Verizon documents took the 225 attorneys 4 months to conduct. An automated review, even with time for training the engine, appropriate levels of human adjudication and quality control would likely have taken no more than 2 months.

Posted in eDiscovery, Text Mining & Analytics | 3 Comments

Potential New SEC Disclosure Requirements and Enterprise Content Mangagement

As reported yesterday, the SEC is getting ready to propose new rules for asset backed securities that will require standardized loan level detail on all assets packaged in the security or derivative. While the sell side of the financial services industry will no doubt scream bloody murder about more regulation, the fact of the matter is that all of the key information from descriptions of the assets to credit scores of the borrowers are typically captured as part of the loan origination process and can easily be converted to standard XML and carried along for the life of the loan along with new loan servicing information as it accumulates. Moreover, anyone who has an ECM system which can consume XBRL, can easily set up the disclosure system to update the SEC systems that are being converted to XBRL

Given the success of the banking lobbyists in killing or watering down every significant regulatory reform that has been introduced in the past two years, I am not particularly confident that this reform will ever become law, but if it does, the loan origination systems of every major bank will likely require no more than a few tweaks to send the data to the warehouse

Posted in Compliance | Tagged | Leave a comment

Recommended new book

Dr. Anthony Tarantino, a Senior Advisor with IBM’s Governance, Risk, and Compliance Center of Excellence has written a new book:

Governance, Risk and Compliance Handbook : Technology, Finance, Environmental, and International Guidance and Best Practices:

Posted in Recommended Resources | Leave a comment

Why Don’t We Just Keep Everything?

There is a records retention crisis in the world of business.

We know of large organizations that have more data in their IT systems then they have back up cycles available to preserve it. At the same time, these organizations have warehouses full of paper, some of it going back 50 to 75 years. We have also seen organizations that have many hundreds of thousands of tapes in storage, some of them 25 years old (and for which they no longer have the hardware or software to make use of the tape). This torrent of information is growing ever larger (See e.g., George L. Paul and Jason R. Baron, Information Inflation: Can the Legal System Adapt? 13 RICH. J.L. & TECH. 10 (2007),

Notwithstanding this ever-growing torrent of information, organizations have an obligation to identify, classify and, in many cases, produce this information for purposes of regulatory compliance and eDiscovery. The common wisdom and advice to organizations facing this situation is to start by putting a Records Retention Schedule in place. The underlying philosophy behind a Records Retention Schedule is to keep what you need to keep to run your business and to comply with the law while disposing of everything else. The standard retention period usually ranges between 30 days (especially for email and other messaging media) to 10 years – all subject to suspensions of the clock for Legal Holds. At the end of this period, records are subject to “disposition” actions, Records Management speak for either determining to keep something in perpetuity (typically for corporate vital records) or destroying the records.

In the face of this difficult challenge to bring order to what IBM likes to call the “digital landfill’ some clients ask, “why bother?” Given that courts or regulatory enforcement agencies only penalize organizations if they have destroyed or failed to produce what is required, they ask, “Why don’t we just keep everything?” Admittedly, IT rather than Legal folks typically pose this question because for IT folks keeping everything simply translates into more IT capacity and they are rarely averse to adding more IT capacity. So then, here are six good reasons not to just keep everything:

  • It increases litigation costs.

In a well-known case study DuPont examined the corpus of documents it was producing in nine litigations. They found that 50% of the records they were going to produce were past the date of ‘disposition’ according to DuPont’s Records Retention Schedule and that the cost of this over-retention for just the litigation support fees in these nine cases was approximately $12 million. Extrapolate this number, year after year, to the number of cases in a typical litigation docket of a large corporation and you are talking about very large amounts of money.

The increase in litigation costs is not just for litigation support processing costs; documents are fodder for discovery. Every additional document kept and produced that need not be kept translates into more depositions, more questions in each deposition, more interrogatories, requests for admission, etc. It is probably no coincidence that one well known music company president I knew kept no personal documents except for his calendar; he was a former litigator and know how much additional time he would spend in discovery if he kept an office full of documents. He sent everything he wanted to keep to corporate files and trashed the rest.

  • It potentially increases liability

Some people are of the opinion that keeping everything provides the ability to find the “white knights’, the exculpatory documents that will win the case, but a 2006 survey by Lawyers Weekly found that 4 of the top 10 jury verdicts in 2005 were directly related to ‘smoking guns’ discovered in email systems. It may well be that these emails were properly retained under a corporate retention program but given the 50% over retention figure noted above, in all likelihood some were unnecessarily retained.

So, is it impermissible to set up a retention program partially motivated by keeping records out of the hands of some future litigant, absent a particular preservation obligation?, not according to the Supreme Court. (See, Arthur Andersen LLP v. U.S., 544 U.S. 696, 704 (2005) “‘Document retention policies,’ which are created in part to keep certain information from getting into the hands of others, including the Government, are common in business ….”).

The liability issues are not solely litigation related, the more records you maintain without control, the more risk you entertain with respect to privacy, secrecy, copyright, disaster recovery and other information management issues.

  • It increases infrastructure costs

One of the dirty little secrets of over-retention addicts is that they are not simply keeping everything; they are keeping 20 copies of everything. With the growth of the markets for email and application data archiving solutions, both Gartner and Forrester have reviewed the state of over-retention and have concluded that it is a good planning parameter to assume that there is a 20:1 duplication ratio in any large organization. Think about that as you prepare your IT budgets; for every 20 Terabytes of data you are managing, you could reduce it to 1 Terabyte simply by deduplicating. This does not even get into the issue of reducing retention by the application of a retention schedule; simply keeping only one of everything will get you a 20:1 reduction.

Keep in mind that it is not simply storage you are reducing through deduplication, production email and applications usually run on highly redundant, highly available systems, the most expensive kind of computing environments. Archiving systems usually can be run on less expensive platforms.

Under most properly planned records management programs, where a record is to be retained under the program, a single ‘copy of record’ is retained and any other copies (sometimes called ‘convenience copies’) are disposed of. While it is typical that all copies of a record are preserved for purposes of a Legal Hold, here we are discussing records preservation absent a legal hold on a particular record.

  • It increases insurance costs

Insurance companies have recognized that businesses that do not have well designed and managed programs for Records Management and eDiscovery have higher litigation costs and probably, higher risk of liability. Therefore, both AON and AIG for example are now considering this in their underwriting of general business and directors insurance and are charging premium rates for businesses that maintain “digital landfills” while providing discounts for businesses that are attending to their Records Management and eDiscovery preparedness needs.

  • It sends a signal that you are not in control of your records

As noted above, there is no practical business reason to keep everything nor a legal requirement to keep everything while there is considerable financial incentive not to keep everything. This being the case, it is fairly natural to assume that the reason a business chooses to keep everything is that it is very afraid it will be unable to retain what it needs to retain or find what it needs to find absent the draconian decision to keep everything. This in turn sends out a signal to Plaintiffs that, for this Defendant, discovery will be a very expensive proposition and it is much more likely that this Defendant – with all that ESI- will more easily run afoul of its discovery obligations. In other words, this business can be more easily induced to settle a case. This is not a good image to project.

  • It increases the likelihood of overly broad legal holds

In the course of the eDiscovery preparedness consulting engagements I have participated in I have noticed that the very same succor that ‘keeping everything’ provides to the IT staff can also trickle down to the lawyers. Where the company is going to keep everything anyway, it seems there are more legal holds that preserve entire IT applications or that there are 25 overlapping legal holds on the entire email system for many years running.

If the company policy is to keep everything forever, it is going to be very difficult to argue that any given hold represents a burden or undue cost. Moreover, we usually find that in these same companies, there is no program for IT chargebacks for legal holds, so the cost of the hold becomes an amorphous part of the overall IT infrastructure cost. How much does this cost in total?; it is difficult to say but there are cases in which a hold on a single large application was found to cost $27,000 per month (See, Douglas L. Rogers, A Search for Balance in the Discovery of ESI Since December 1, 2006, 14 RICH. J.L. & TECH. 8 at p22, So look at the number of applications you have on hold, the number of active holds you have and do the math.

I hope you see that keeping everything is not a panacea for addressing the compliance and eDiscovery issues attendant to IT systems. If my experience is any guide, if a business has attempted to keep everything, eventually, it will determine that this is no longer a sustainable policy and that now this decade’s old ‘digital landfill’ must be cleaned up. In later posts, I will discuss how to go about doing this.

Posted in eDiscovery, Records management | Leave a comment