Savannah Seymour explores how TrialView can support you with managing duplicates as quickly and seamlessly as possible.
Date : 22/05/23
Managing duplicates can be a time-consuming challenge to overcome in the run up to a hearing. Duplicates can exist in a dataset for many reasons, such as collecting the same document from multiple sources, collecting multiple versions of the same document (near-duplicates) or non-inclusive emails.
The Commercial Court Guide clearly states “no more than one copy of any one document should be included, unless there is good reason for doing otherwise”, meaning that legal teams have a responsibility to ensure that unnecessary duplicative documents are removed from the bundle.
Deduplication is a process by which exact duplicates are eliminated from the dataset, and this can usually be carried out by your chosen software provider, such as your eDiscovery platform, ebundling or trial presentation software. Most deduplication tools use the assigned hash value to compare against other documents. Each document will have its own assigned hash value, which is a unique identifier used to distinguish documents from one another. If two documents share the same hash value, they would be considered exact duplicates of one another, and thus all but one would be eliminated.
On the other hand, near-duplicates are often handled differently within the context of an investigation or trial. It may be relevant to the matter that, for example, multiple versions of a contract existed, or the fact that someone relevant to the proceedings forwarded a confidential email onwards. These near-duplicates may need to be included in the bundle along with their counterparts for points of comparison. In the case of near-duplicates, using hash value deduplication may not be the best tool. Instead, your provider may be able to help you with near-duplicate analysis which can show similarities between documents from 95%+ similarity. This means that documents with slight changes to metadata or text would be highlighted for human review to decide on whether to include them in the trial bundle.
TrialView can support you with managing duplicates as quickly and seamlessly as possible. Our workspace assigns a unique ID to each individual document to ensure it’s easy to identify where a document exists across different locations. This means that if a document exists in two separate bundles, both link to the same underlying document, which is imperative for bundle and version-management.