Divining the Legal Mysteries of the All-Powerful Search Box: Part 1

Posted on 01-28-2013 by
Tags: Real Law

Brought to you by the Real Law Editorial Team

“… the English or American lawyer resembles the hierophants of Egypt, for, like them, he is the sole interpreter of an occult science.”

—Alexis de Tocqueville, Democracy in America

Instead of mystical tomes, modern lawyers have the search box—their one true source of arcane knowledge and wisdom. Searching legal content for use in cases is one of the most important, yet time-consuming tasks a lawyer, paralegal or law librarian can have. It’s important because a case may live and die on the strength of the lawyer’s argument. It’s time consuming because even simple cases may require large numbers of citations and very thorough research to ensure that the arguments are strong. While the legal information industry has innovated by improving search accuracy and efficiency, the effectiveness of search largely relies on the research expertise of lawyers, paralegals and law librarians. The practice of law hangs on words—and their interpretation. When it comes to search, what you get out still depends on what you put in.

Today, legal case research is overwhelmingly done online. In fact, recent studies have shown that law firms consider online search the most important thing a new attorney needs to learn. Law schools will need to respond and adapt their curricula. While new attorneys are very Web-savvy, they often don’t know how to evaluate sources and tools. One respondent said, “They rely too much on Google™ or other Internet search engines these days. They do not have a plan in mind when researching.” Part of that plan is knowing the advantages and disadvantages of different types of tools.

Looking at the tools we have available today, it’s impressive how quickly electronic text search has come. Vannevar Bush, one of the first scientists to even propose a computer-accessible, networked source of knowledge (he called it a “memex”), thought the task would be so difficult that a special language would need to be created so computers could understand what we wanted.

Luckily, that wasn’t necessary. By the 1960s, as computers became common, one of the very first legal text searching methods started with what’s called Boolean search—exact matching of given terms. This type of search was a step ahead of Bush’s computer-specific language, but it can also look a bit like programming. If you want a document from a specific source or source type, from a specific date range, or that includes your search terms in a particular way, this is the way to do it.

While this was all that search technology was initially capable of (the first Boolean-based legal text search projects started in the ‘60s), it became quickly clear that this was not enough. Research in 1985 showed that Boolean text searches might return only about 20 percent of matching material, even though they gave the impression of being thorough. Having too much faith in your search results can be dangerous, but smart filters and search modifiers go a long way to making Boolean search a strong way to find something—if you know what you are looking for.

Making Search More Natural

A decade later, scientists began to develop natural language search methods. This technology went beyond exact text matches to produce better results, taking into account both a more flexible range of inputs and a better understanding of the source content. This included tallying the number of documents a search returned, and ranking documents by their relevance based on how often exact and related terms appear.

Natural language allows us to research general issues rather than very specific topics. This is extremely helpful when we don’t know much about an issue or are researching something broad and complex. It’s also a way to be more thorough by complementing a specific, technical search with one that is more conceptual.

Our Semantic Web of Legal Content: Past

Bringing in Citations

When it came to searching legal text, one challenge with the Boolean and basic natural language models was that it treated all the text equally. This was a problem when it came to citations, which were treated just like another string of words. Citation-based retrieval is a crucial part of legal research, and lawyers rely heavily on cites to find specific cases. Using early search tools, it was up to the user to come up with variations in the search box that would give them what they wanted.

Fortunately, legal information providers were able to adapt their tools to meet the unique requirements of legal research. Twenty years ago, tools like Lexcite and Lexsee incorporated retrieval by citation, as a search or as a link from the case doing the citing. By recognizing and normalizing embedded citations combined with creating an authority of citable cases, lawyers had a new approach for retrieving important documents during legal research that nicely complemented traditional Boolean and Natural Language searches.

The Costs of “Free”

In more recent years, Google has consistently been the leader in advancing general search. Their initial innovation was in analyzing the linkage-based relationships among documents and exploiting those relationships to improve search results. Today they are leaders in applying linkage and user analytics to improve search results ranking, recommendations, ad click-thru rates and other areas. But Google has limitations by its nature. One limitation is content. Very recent research at Stanford University revealed that free legal research tools were simply not reliable, since they had no access to unpublished cases, and could not connect cases with higher court reversals or overrules. Another limitation is the business model itself: if you’re not paying for it, you become the product.

In our next article, we look at where legal search technology will take us next.

If you found the above article to be helpful, you may be interested in the following information: