29 November 2018

This article was originally published in Commercial Litigation Journal on 27 November 2018.

A full understanding of any case may require a review of potentially hundreds of thousands or even millions of documents. Such a review is likely to be highly impracticable and/or cost prohibitive if done manually by human reviewers. Technology promises to make such a large-scale document review possible in a time- and cost- effective manner - to prioritise, categorise and visualise what you need to know faster, cheaper, and more accurately. Consequently, there is much debate about whether technology will replace lawyers or simply free up lawyer-time for the more complex, valuable tasks in an investigation. This article considers the approach to document review in investigations, the need for technology harnessing machine learning, and the changing but crucial role of lawyers. It explains that technology in investigations changes the role of lawyers but does not replace them.

What does an investigation require?

Investigations usually require significant document review from various sources, to determine what has happened as a matter of fact. The initial challenge is therefore to accurately and efficiently identify the location of that evidence.

Investigations usually begin with identification of the parameters. There are various document sources to consider for review, including:

  • communications
  • document management systems
  • reporting and accounting systems.

Each document source is likely to be in a different location, on a different platform, and require a different process for collection. The relevant technology and approach to reviewing each document source is likely to be different.

Communications is a large umbrella term. Email is the most obvious source of information, but modern communication encompasses phone records, text messages, instant message services and social media, all of which may require review. It is also important to check use of personal accounts.For example, investigations into foreign exchange and LIBOR manipulation have focused on reviews of instant messaging systems that traders used to send coded messages.

Lawyers will therefore still be a necessary part of the investigation and review process - to identify these sources and probe whether further sources are required. Clients’ IT teams are often involved at this stage to preserve and extract sources of documents.

Be ready for the unknown

It is difficult to know from the outset where an investigation may lead. The investigative process depends largely on the nature of the allegation, but also on the issues the document review uncovers.

What may initially appear as a potential breach of employment could turn into a matter with regulatory or criminal consequences. Lawyers are in a position to spot potential legal risk as part of the investigation. This article does not comment on criminal investigations but care should be given throughout an investigation to preserve data should it be required for other purposes.

Traditional approach to document review

Once sources are identified, the traditional approach to document review is to use date ranges and keywords to identify potentially relevant documents. Both are familiar to practitioners because of their frequent use, such as searching for emails in their own mailbox. As a result, some practitioners are inclined to continue with the processes and technology that they know.

However, there are two key reasons why a new approach is needed.

First, it is difficult to identify the right search parameters. The date range may be unknown and it is unclear which keywords could be used. In an investigation into alleged bribery a search for 'bribe' is unlikely to produce useful results. Whilst lawyers have traditionally used simple keyword searches, it would not be possible to know whether the keywords were useful until document review was under way.

Second, even if correct keywords and date ranges are identified, they are unlikely to return a manageable number of relevant documents that can be reviewed in time. There is an increasing amount of data from an increasing number of sources. Keywords may return 'false positives' – documents that are responsive to the keyword but not relevant to the investigation. This adds to the reviewer's task.

The result would often be that responsive documents would then be reviewed by a potentially large team of fee-earners. The reviewers would code each document according to a series of issues such as relevance. This approach often results in significant costs for the document review part of the investigation, given the volume of documents and time required to review. It also runs the risk of human error and inconsistency.

Machine learning

Investigations now increasingly rely on technology that uses machine learning to improve results. Machine learning can be broken in to two parts – unsupervised and supervised.

Unsupervised machine learning

Unsupervised machine learning is where an algorithm recognises relationships between data without human input.

Technology is available to help identify concepts in documents. Such technology may recognise that in a mailbox there is a conceptual group of documents related to a relevant issue whereas another conceptual group is related to an individual’s personal life and is irrelevant to the investigation.

A tool known as keyword expansion helps identify whether different words are conceptually similar to a proposed keyword. For example, the investigator may use the word ‘meeting’ to find examples of discussions. Keyword expansion may identify employees who instead referred to discussions as ‘updates’. The benefit of this is that it is based on the data and the words actually used, not on a thesaurus or guesswork. This process identifies documents potentially relevant to an issue that would otherwise be missed by use of a specific keyword.

Identifying concept groups and categories of documents helps investigators to identify potentially relevant and irrelevant sources of documents. However, unsupervised machine learning may not help to reduce the number of documents for review to a manageable number. Lawyers are still required for further review of the potentially relevant documents. This is where supervised machine learning is used.

Supervised machine learning

This is sometimes known as technology assisted review (TAR), computer assisted review or predictive coding.

There are a variety of definitions for this process, so some explanation may be needed.

TAR can be viewed broadly. The Practice Direction for the Disclosure Pilot Scheme defines TAR to include:

'… all forms of document review that may be undertaken or assisted by the use of technology, including but not limited to predictive coding and computer assisted review.'

However, others view TAR as the process of having computer software electronically classify documents based on input from expert reviewers in an effort to expedite the organisation and prioritisation of documents (as defined in the Technology and Construction Bar Association Guide to Disclosure in England and Wales, and by the Electronic Discovery Reference Model widely used in the US).

In a straightforward example, TAR is used in the following way. A sample set of documents (‘seed-set’) is created from the overall set of documents to be reviewed. A senior reviewer then codes each document in the seed-set, and an algorithm uses this coding to detect patterns and applies it against the remaining dataset, ranking documents according to how likely they are to respond to specific issues. Further seed-sets may be required to improve the precision of the algorithm’s results to an acceptable level.

The aim is to limit the number of potentially relevant documents for further review. In my experience, the use of TAR reduced a dataset of 1.5 million documents to 100,000 documents that were statistically likely to be relevant, dramatically reducing the number of documents subject to substantive review. 

The continuing role of lawyers

Case law has demonstrated the need for lawyers to ensure the transparent and reliable use of TAR in the context of disclosure as part of civil litigation. While an investigation may never lead to proceedings, there is sufficient similarity between disclosure and investigations to make comparisons useful.

Equally, case law emphasises the need for the TAR process to be transparent and reliable so that other parties and the court can understand what has been done. This allows the court to know what evidential weight should be given to the disclosed documents. Similarly, whether the results of an investigation are only required internally or are later required in other proceedings, whoever receives the results needs to know whether they can be used as the basis for any further decisions.

Turning to the relevant case law, the first reported approval of TAR in the English courts was in Pyrrho Investments Ltd v MWB Property Ltd [2016]. Both parties applied for the court to approve the use of predictive coding to help review over three million electronic documents held by one of the parties. The court ordered the use of predictive coding provided a transparent methodology was used. This was to ensure the other party and the court could understand the review process.

Master Matthews said (para 68):

'As technology assisted review combines man and machine, the process must contain appropriate checks and balances which render each stage capable of independent verification. A balance must be struck between the right of the party making discovery to determine the manner in which discovery is provided and participation by the requesting party in ensuring that the methodology chosen is transparent and reliable.'

Lawyers need to confirm those checks and balances are in place to ensure the use of TAR is reliable. Reliability can be demonstrated through verification. Some technology allows users to export the logical steps the algorithm has taken so that others, such as another party, can verify the results.

Verification also depends on an explanation of the TAR process. Such an explanation is usually documented in a methodology produced at the start of an investigation and updated as the investigation evolves. 

At the start of an investigation, it may not be clear who will eventually need to be informed of the process. It could be that only the company’s board need be informed. This does not mean a methodology can be dispensed with. The TAR process may need to be explained to criminal or regulatory investigators depending on the documents found. Lawyers involved with the document review are well placed to produce the methodology.

The need for lawyers to ensure transparency in the process is, in part, the result of their position as officers of the court.

In Brown v BCA Trading Ltd [2016], the Registrar approved the use of predictive coding by BCA despite Brown’s opposition. The judgment emphasises the role of lawyers at paras 13 and 14:

'13… it has to be borne in mind that the parties must do their best to achieve reasonable and proportionate results in any event. That is in their own interests and meets the overriding objective. I am also encouraged in reaching my decision by the fact that directions will cause the parties to sit down before the predictive coding begins in order to discuss the criteria to adopt and the general process of disclosure…'

'14 Furthermore, of course, I bear in mind in this case that those discussions will be between experienced solicitors who can be relied upon to hold the reins within the context of them owing their duties as officers of the court, as well as their duties to their clients.'

The lawyers’ role as officers of the court is, in part, to ensure that the court is not misled. In investigations, lawyers will continue to have a role in ensuring that the document review process and results allow others to rely on those results for further decisions.

The changing role of lawyers

The role of lawyers in investigations will change as a result of technology in two key respects.

First, senior lawyers are likely to be engaged with document review at an earlier stage. TAR relies on a limited number of senior lawyers who are familiar with the matter to review seed-sets for the algorithm to use. This is in contrast to traditional review structures that use potentially dozens of junior fee-earners from the outset.

This does lead to an initial upfront cost. However, an aim of machine learning is to identify potentially relevant documents faster to reduce the total number of documents to be reviewed. As a result, any junior document review team should be smaller in number and required for less time.

Second, lawyers will develop technical expertise in the technology, particularly TAR. The need for a transparent process means senior lawyers will need to explain the document review methodology. This goes beyond explaining which data sources were selected and which keywords used. Lawyers will need to understand how the technology works with the data they are reviewing, including any limitations to their review. 

In particular, lawyers need technical expertise to understand the limitations of the technology. TAR does not guarantee perfection. TAR may miss relevant documents and return irrelevant documents. This has always been the case in civil litigation. In Digicel (St Lucia) Ltd v Cable and Wireless plc [2008] Morgan J said: 

'… the rules do not require that no stone should be left unturned. This may mean that a relevant document, even ‘a smoking gun’ is not found. This attitude is justified by considerations of proportionality.'

Lawyers, with technical specialists, will need to explain how error rates are within an acceptable margin.


Technology will play an increasingly important role in investigations, just as it is in litigation. It will improve the quality and efficiency of some of the tasks that have traditionally been performed by lawyers.

However, technology will still require management and input by lawyers. The role of lawyers will therefore change to reflect what is required of them and how they use the technology.

TAR has been used as part of investigations and litigation in the US extensively. As a result, there has been commentary and case law in the US that provides useful guidance. In the US, Judge Peck said in Da Silva Moore v Publicis Groupe [2012]:

'The Court recognises that computer-assisted review is not a magic… solution appropriate for all cases. The technology exists and should be used where appropriate, but it is not a case of machine replacing humans: it is the process used and the interaction of man and machine that the court needs to examine.'

Those using technology in investigations need to be ready to explain the interaction between the investigation, the lawyers and the technology. The key points are:

  • engage with any stakeholders early to determine what is needed and by when
  • identify your objectives, sources and witnesses, including those which are essential and those which are not
  • maintain a thorough record of your methodology so that you can explain to stakeholders what you have done and any caveats to the investigation
  • ensure constant oversight of the process by relevant technical experts and those who are familiar with the facts of the investigation.

Key contact

Tom Whittaker

Tom Whittaker Director

  • Dispute Resolution
  • Commercial Contract Disputes
  • Procurement and State Aid

Subscribe to news and insight

Burges Salmon careers

We work hard to make sure Burges Salmon is a great place to work.
Find out more