Reprinted with permission from the October 2017 issue of ALI CLE’s The Practical Lawyer.

On Lex Machina’s platform, counsel can use the “motion kickstarter” to survey recent motions before the assigned trial judge. The “motion chain” links together the briefing and the eventual order for each motion, so counsel can identify the arguments which have succeeded in recent cases, and review both the parties’ briefs and the judge’s order.

Ravel Law offers extensive resources to help counsel in crafting their arguments. As counsel does her research, Ravel Law shows visualizations demonstrating how different passages of a case have been cited, and by which judges, enabling counsel to quickly zero in on the passages which judges have found most persuasive. Or the research can be approached from the other direction, by identifying the cases and passages most often cited by your judge for particular principles. How does the judge typically explain the standards for granting a motion to dismiss, or for summary judgment? Does the judge tend to frequently cite Latin legal maxims, or even sports analogies? How does your federal judge handle the state law of his or her home jurisdiction? How has your judge ruled in rapidly evolving areas of the law, such as class certification, arbitration and personal jurisdiction? Now it’s easy to find out.

And when the case finally goes to trial, there’s still a role for judicial analytics. How often do the judge’s cases go to trial? What kinds of cases have tended to go to trial before your trial judge? What were the results? The data you pulled at the outset on the length of the judge’s previous trials might suggest just how liberal or strict the judge tends to be with the parties in trial. Did either party waive a jury, and if so, what happened? How has your trial judge handled jury instructions in recent trials where the parties didn’t waive the jury? What were the awards of damages, plus any awards of attorneys’ fees or punitive damages?

Post-trial is an often overlooked opportunity to cut litigation short by limiting or entirely wiping out an adverse verdict through new trial motions and motions notwithstanding the verdict. Counsel can determine on Lex Machina’s motion comparator, Ravel Law’s motions database or Bloomberg’s Litigation Analytics how likely judges are to either overturn or modify a jury verdict. A close look at the data and recent orders and motions will help inform a decision as to whether to file a motion for judgment notwithstanding the verdict or a motion for new trial. If your client has been hit with a punitive damages award, you’ll need to review not only the judge’s record on post-trial review of punitives, but drill down from there to the order and the briefing on the motion to evaluate what approaches worked (or didn’t).

Analytics have tremendous potential in appellate work too. All of the major vendors have enormous collections of data on state and federal appellate courts and judges. But for my firm’s appellate practice, I was interested in tracking a number of different variables which would be difficult to extract through computer searches, so rather than relying on any of the vendors, I built two databases in-house. Our California and Illinois Supreme Court databases are modeled after Professors Spaeth and Segal’s Supreme Court database, tracking many of the same variables. My California Supreme Court database encompasses every case the court has decided since January 1, 1994 – 1,004 civil and 1,293 criminal, quasi-criminal and attorney disciplinary. My Illinois Supreme Court database is even bigger, including every case that court has decided since January 1, 1990 – 1,352 civil and 1,529 criminal. For each of these 5,000+ cases, I’ve extracted roughly one hundred different data points. Was the plaintiff or the defendant the appellant in the Supreme Court? Is there a government entity on either side? Where did the case originate, and who was the trial judge? Before the intermediate appellate court, we track dissents, publication, the disposition and the ideological direction of the result. We track three dates for each case: the date review was granted, the date of the argument and the date of the decision. Before the Supreme Court, we note both the specific issue and the area of the law involved, the prevailing party and the vote, the writers and length of all opinions, the number of amicus curiae briefs and who each amicus supported, and of course each Justice’s vote. In addition, our database includes data from every oral argument at the Illinois Supreme Court since 2008, and arguments at the California Supreme Court since May 2016, when the Court first started posting video and audio tapes of its sessions.

Conventional wisdom in most jurisdictions holds that unless the intermediate appellate court’s decision was published with a dissent, it’s not worth seeking Supreme Court review. We’ve demonstrated that in fact, a significant fraction of both the California and Illinois Supreme Court’s civil dockets arises from unpublished unanimous decisions. We track not just aggregate reversal rates for intermediate appellate courts, but break the data down into reversal rates by area of law.

Lag times are particularly interesting in California, since the Supreme Court is generally required to decide cases within ninety days of oral argument. As a result, the vast majority of the lag between grant of review and final decision in California falls between grant and argument, rather than argument and decision. Not only have we tracked the average time to resolution for civil and criminal cases— we’ve demonstrated that there’s a correlation between the Supreme Court’s decision and the lag time from grant to argument. We’ve tracked the individual Justices’ voting records, not just overall, but one area of law at a time.

Only in the past few years have data analysts began to take a serious look at appellate oral arguments. The earliest study appears to be Sarah Levien Shullman’s 2004 article for the Journal of Appellate Practice and Process.  Shullman analyzed oral arguments in ten cases at the United States Supreme Court, noting each question asked by the Justices and assigning a score from one to five to each depending on how helpful or hostile she considered the question to be. Based upon her data, she made predictions as to the ultimate result in the three remaining cases. Comparing her predictions to the ultimate results, Shullman concluded that it was possible to predict the result in most cases by a simple measure – the party being asked the most questions generally lost.

John Roberts addressed the issue of oral argument the year after Shullman’s study appeared. Then-Judge Roberts noted the number of questions asked in the first and last cases of each of the seven argument sessions in the Supreme Court’s 1980 Term and the first and last cases in each of the seven argument sessions in the 2003 Term. Like Shullman, Roberts found that the losing side was almost always asked more questions.

Timothy Johnson and three other professors published their analysis in 2009. Johnson and his colleagues examined transcripts from every Supreme Court case decided between 1979 and 1995—more than 2,000 hours of argument in all, and nearly 340,000 questions from the Justices. The study concluded, after controlling for a number of other factors that might explain case outcomes, all other factors being equal, the party asked more questions generally wound up losing the case.

Professors Lee Epstein and William M. Landes and Judge Richard A. Posner published their study in 2010. Epstein, Landes and Posner used Professor Johnson’s database, tracking the number of questions and average words used by each Justice. Like Professor Johnson and his colleagues, they concluded that the more questions a Justice asks, all else being equal, the more likely the Justice will vote against the party, and the greater the difference between total questions asked to each side, the more likely a lopsided result is. Our study of every oral argument at the Illinois Supreme Court from 2008 through 2016 came to the same conclusion: the larger the margin between your total questions from the Court and your opponent, the less your chance of winning.

Litigation analytics can uncover useful insights outside of courtrooms as well. Corporate legal departments are increasingly using analytics to track and manage their outside counsel. Does the company have more or less litigation than its competitors? Do the lawsuits last a comparable length of time, and is the company’s win rate comparable to its peers? What are the trends over time? When the company is selecting counsel for a particular lawsuit, depending on where the case is venued, it should be possible by consulting Premonition, Lex Machina or Bloomberg to compare each candidate counsel’s winning percentage in the jurisdiction and before the particular judge, as well as to develop far more background information than was ever possible before. From the viewpoint of the law firms competing for business, analytics offers an invaluable insight into the nature of your target client’s business. All the same questions which the legal department will likely be interested in are valuable to the outside attorneys as well. Is your target’s current counsel not winning cases as often as other companies are? What’s the nature of the company’s litigation? And if candidate counsel can discover the names of the other firms competing for the business, analytics databases can provide detailed information about those lawyers’ experience and relevant background. Premonition’s Vigil court alerts system can get lawyers word of a new filing or case development involving a client or potential client only an hour or two after it happened, not a few days later.

So how does the future look? We’re still in the early days of the revolution in litigation analytics. As the federal PACER system is upgraded and more and more states put some or all dockets in electronic form, more litigation data will become available to analytics vendors. Analytics scholars will develop new methods to turn additional aspects of litigation into usable data. Upgrades in artificial intelligence systems will result in analytics learning to gather more subtle data from court records— the kind of variables that require understanding and interpretation, rather than simply looking for text strings. More analytics vendors will inevitably enter the market.

Lawyers will have to become comfortable working with analytics data in situations where decisions were once made based upon intuition and experience, both in courtrooms and in clients’ counsel searches. More law firms will likely develop in-house analytics databases similar to mine in other large states.

We’ve barely scratched the surface in terms of statistical and theoretical techniques which can uncover new insights about litigation and judicial decision making. Several academics have proposed algorithms for predicting case outcomes based on information such as the composition of an appellate panel and the ideology, gender and background of the judges, and these algorithms have generally performed better than law professors’ predictions based on the legal issues involved. Regression modeling is a natural next step not just to predict case results, but to estimate the real impact of various variables, such as how much (if at all) amicus support increases one’s odds of winning. Several vendors have touted their data on winning percentages for lawyers, but regression modeling could isolate how much impact a particular counsel really has upon a party’s chances, or whether the jurisdiction or the nature of a lawyer’s clients explains his or her record. As Judge Posner and Professors Epstein and Landes suggested in The Behavior of Federal Judges, computerized sentiment analysis of the content of judicial opinions could produce more nuanced insights about particular judges’ attitudes and ideology. Game theory is another well-developed academic discipline with a largely untapped potential for understanding how appellate courts work.

We end with the question every analytics scholar (and vendor) is asked sooner or later: will litigation analytics replace lawyers?

The answer is no, for two reasons.

The first is what I think of as the orange used car problem.

A few years ago, a company which conducts data mining competitions for corporate clients ran a contest in hopes of building an algorithm to determine which among used cars available at auction was likely to have mechanical problems. They collected the data, ran the correlations, and it turned out the strongest correlation to “few or no mechanical problems” was, you guessed it, that the vehicle was orange.

A few people facetiously proposed theories as to why orange used cars might be more trouble-free (maybe car fanciers with better maintenance habits are drawn to them?), but this is an example of one of the most fundamental rules in data analytics: correlation does not necessarily indicate causation. Saying two variables are highly correlated doesn’t necessarily mean one is causing the other; both could be caused by a third, unidentified variable, or it could be a random correlation, or your dataset could be biased or simply too small. Much of litigation analytics—at least short of the more sophisticated logistic regression modeling – currently consists of identifying correlations. It takes an experienced lawyer intermediary to review the data and understand what are valuable, actionable insights and what are just orange used cars.

The second reason is even more fundamental: all litigation analytics require interpretation, and one must keep constantly in mind—and remind clients early and often – that nothing in analytics is a guarantee of any particular result. The more heavily questioned party does win at times in the appellate courts. Just because Justices A and B have voted together in 75 percent of the tort cases in the past five years is no guarantee they won’t disagree about the next one. The academic algorithms which have been developed for predicting results at the Supreme Court are wrong anywhere from twenty percent to a third of the time. Some often-quoted statistics can mislead through over-aggregation. For example, perhaps an intermediate court’s overall reversal rate on all cases is two-thirds, but on further analysis, it turns out that the reversals are all in tort cases, while the court is generally affirmed in other areas of the law.

Does this mean that litigation analytics are irrelevant? No, no more so than the bank would find the experiential data on the hypothetical mortgage bundle we discussed at the outset irrelevant. Attorneys have been predicting what courts are likely to do for generations based on intuition, experience and anecdote. The business world began moving away from that a generation ago, and now that revolution has struck the law full force. Today, there’s data for most aspects of litigation, and that trend builds every year. The advent of litigation analytics and data-driven decision making is a game-changer in terms of intelligent management of litigation risk.

Image courtesy of Flickr by James Jordan (no changes).