My last post showed how you can use pre-reviewed seed sets of relevant and irrelevant documents to help prioritize unreviewed documents, using WordSmith 6 from Lexical Analysis Software Ltd. (USD $88.73 or EUR €67.57 from – Please note that I’m not affiliated with Lexical.)

Here’s how you can help your document reviewers hit the ground running by investing a few hours of attorney or paralegal time with WordSmith before you’ve reviewed any documents. You can quickly and easily create a dictionary of case-specific words – particularly initials, abbreviations, acronyms, and names of key players – to help you and the document reviewers understand the documents faster. This initial investment will pay you back many times in speed and accuracy.

In short, you create a list of words in your documents. Then, you review the list, view words of interest in their original context, create a definition for each, and give the list of definitions to your reviewers.

More specifically,

1. Create a plain text file containing the text extracted from your documents.

2. Set WordSmith’s WordList options to create case-sensitive word lists:


3. Use WordSmith’s WordList module to create a word list from the extracted text.

4. Optionally, use WordSmith’s KeyWords module to compare that word list to a “reference corpus wordlist” of text unrelated to your case. You will get a list of key words in order of descending “keyness” (which is based on the relative frequency of each word in the two text lists). This step is not necessary but it tends to move case-specific words toward the top of the list, letting you scan fewer words to find the important ones.

(I’ve compiled a sample reference corpus word list in WordSmith format that is available here. I make no representations about the utility of this list.)

5. Scan the list for words that you believe your reviewers will not know. When you find one,  click to select it, then right-click it and select “Concordance.”


6. A Concordance window will open up showing you a list of lines of text, in which the selected word appears in context in the middle of each line. This window can be stretched so that you can see more context and more lines.


7. The Concordance list can be sorted in many ways. For example, to look for the definition of an abbreviation, one strategy is to sort by L1, then L2, then L3. This means that WordSmith sorts the list first by the word immediately to the left of the selected abbreviation, then by the next word to the left, and finally by the word to the left of that.



8. Then scan down the Concordance list to where the first word to the left of the selected abbreviation starts with the last letter of the abbreviation, where the abbreviation appears in parentheses:


9. This shows that “GAO” stands for the US Government Accountability Office.

10. As another example, where the abbreviation might be defined in a contract in your document set, you can sort by R1-R2-R3 and scan down to where the word “means” appears just to the right of the abbreviation.

11. You can also use WordList with sorted Concordance lists to find names associated with initials (e.g., to find that “FP” stands for “Frank Poole”) or to find the full name of a frequently-appearing name in the list. To find the full name from a last name, use an L1-L2-L3 sort, and to find a full name from a first name use a R1-R2-R3 sort.

12. Once you have the full name of the person of interest, you can use WordSmith’s concordance function described above to find out more about that person, such as the person’s title, responsibilities, and contacts.

This just scratches the surface of how you can exploit the searchability of electronic documents using Wordsmith 6 to perform computational linguistic analysis in technology-assisted review.