Recently in New York, Adam Smith Esq had the opportunity to invite a couple of dozen law firms to ‘An Introduction to IBM Watson’ at the brand new $1bn IBM Watson facility down on Astor Place. This is not going to be a report on that event, except insofar as it helped advance our thinking on the general concept of ‘machine learning,’ which was also the topic of a lead article in the current McKinsey Quarterly.
That piece, An Executive’s Guide to Machine Learning: it’s no longer the preserve of artificial intelligence researchers and born-digital companies like Amazon, Google, and Netflix, is a primer on the need for machine learning in a world of big data, and how it is emerging as a mainstream management tool.
The proper starting point is simply to define ‘machine learning:’ Here’s what McKinsey has to say:
Machine learning is based on algorithms that can learn from data without relying on rules-based programming. It came into its own as a scientific discipline in the late 1990s as steady advances in digitization and cheap computing power enabled data scientists to stop building finished models and instead train computers to do so.
Even this definition would benefit from a bit of unpacking, so let me take a stab at it with an example. For years AI researchers focused on trying to write exhaustive lists of rules to guide computers in such areas as recognizing whether a given image was a cat, yet no matter how many rules were piled on top of rules, exceptions and unforeseen situations invariably arose, at which point the system was essentially helpless – no available rule applied, so the rule set was exhausted.
Machine learning took the opposite approach: Without relying on any rules whatsoever, it simply ingested a large data set of (in this case) cat images, and began to generate its own scoring system to arrive at probabilities on whether image X was in fact a cat. Note one key difference from rules-based systems: Machine learning is designed to achieve probabilities, so whereas a rules-based system that came to a ‘dead end’ with rules couldn’t offer any opinion whatsoever, at least machine learning could venture an opinion joined to a probabilistic confidence level.
Another point: The larger the data set the machine learning algorithm can review, the more accurate it will become (and it never forgets what it’s already seen).
I feel compelled to intercept a debate which may be going on in some readers’ minds by about this point: If you’re asking yourself the question of whether the humans or the machines will be the ultimate winners in this, might I respectfully suggest that is massively the wrong question. As a practical matter, we simply don’t know, although that has never been the case with the introduction of major new technology before. But as an optimist and an armchair business historian, my money is on the machines providing a platform from which humans can vault to the next level of productivity, insight, creativity, and intellectual achievement.
Back to the thread.
Now, recognizing cats is obviously a trivial example, but Watson and other implementations are addressing far more sophisticated and consequential decisions. It’s probably nowhere more advanced than in assisting medical diagnoses, and those of us who could attend the event at IBM last week saw a powerful example of that type of domain expertise/functionality, which was literally life-saving.
How is machine learning different from classical statistics, including such things as regression analysis?
Probably the most important distinction is that to arrive at hypotheses about correlations, statistics has to rely on the conjectures of statisticians (or other experts in the field). As we have all learned over the past decade or so, unconscious and unrecognized biases creep in when humans are applying judgment. Here’s an example from the McKinsey article (emphasis supplied):
Closer to home, our colleagues have been applying hard analytics to the soft stuff of talent management. Last fall, they tested the ability of three algorithms developed by external vendors and one built internally to forecast, solely by examining scanned résumés, which of more than 10,000 potential recruits the firm would have accepted. The predictions strongly correlated with the real-world results. Interestingly, the machines accepted a slightly higher percentage of female candidates, which holds promise for using analytics to unlock a more diverse range of profiles and counter hidden human bias.
In other words, with classical statistics, to test hypotheses statisticians first have to have hypotheses, and that’s where human bias can come in.
We saw an example in a case study with IBM Watson where a 9-year-old boy presented at the emergency room with a very high fever and a lump on his neck. Doctors unaided by Watson focused on the fever and attempted to determine underlying causes while treating it—to zero effect—with antibiotics and standard fever reducers such as ibuprofen. After six days straight in the hospital and innumerable inconclusive tests, an attending nurse happened to have an idea: She was right.
By contrast, given the same case with the physicians aided by Watson, they arrived at the correct diagnosis—the obscure but potentially life-threatening Kawasaki’s Disease, which can include complications involving the heart—within 24 hours with no invasive testing. Watson was ‘free’ to hone in on Kawasaki’s Disease because it comes unfettered with natural human instincts such as focusing on a high fever and assuming it’s probably driven by infection (hence the irrelevant prescription of antibiotics).
Another signal difference between classical statistics (again, typically regression analysis) and machine learning is the very fine level of detail machine learning can generate from the data.
One reason it’s called ‘Big Data’ is that given the massive computing firepower we can increasingly bring to bear, we can let machines loose on parsing the data without needing to have any ingoing hypotheses. The Big Data will reveal correlations we hadn’t thought of—and maybe never would have thought of. My favorite example of this (OK, I admit it’s almost too memorable) is when the irreverent CEO of the online dating service ‘OK, Cupid’ published the finding that his data scientists had discovered there was a very high correlation between beer drinkers and people who wanted to have sex on the first date. Not an association, I imagine, most of us would have hypothesized.
But back to Law Land.
Where could you possibly start? McKinsey invokes a thought-provoking and instructive analogy to M&A strategic planning. Because, not to be oblique about it, if you don’t have a strategy for how machine learning could benefit your lawyers and your clients, don’t even pass Go.
We find the parallels with M&A instructive. That, after all, is a means to a well-defined end. No sensible business rushes into a flurry of acquisitions or mergers and then just sits back to see what happens. Companies embarking on machine learning should make the same three commitments companies make before embracing M&A. Those commitments are, first, to investigate all feasible alternatives; second, to pursue the strategy wholeheartedly at the C-suite level; and, third, to use (or if necessary acquire) existing expertise and knowledge in the C-suite to guide the application of that strategy.
And of course employ the familiar and standard techniques of change management:
Start small—look for low-hanging fruit and trumpet any early success. This will help recruit grassroots support and reinforce the changes in individual behavior and the employee buy-in that ultimately determine whether an organization can apply machine learning effectively. Finally, evaluate the results in the light of clearly identified criteria for success.
I hear you saying, ‘But we’re not there yet.’
Fair enough, but that’s a truism about any powerful new technology. When it first arrives on scene, how to apply it – never mind its climax stage ‘highest and best’ application – is always far from obvious. It requires trial and error. To use hopefully-helpful examples from the past, the Wright Brothers had no idea what heavier-than-air flight capability would actually be used for in any commercial (much less military) context.
Similarly, when the World Wide Web first began to dawn on the collective consciousness, no one had a clue what it would become and ultimately empower. (Remember ‘brochureware?’) It takes time to figure out that global real-time connectivity is actually about activities like collaboration, online communities, and the immense power that the disappearance of constraints of time and distance impose. Not to mention creating the fertile soil for entirely unprecedented and hitherto-unthinkable businesses, like Airbnb, Uber, eBay, Facebook, Google, and the granddaddy of them all, Amazon.
So we direct your attention to this wisdom embedded in the McKinsey article. You may debate their precise timeframe for adoption and the transition from ‘invention’ to ‘invisibility,’ but it’s a useful construct for thinking about tools like Watson:
New technologies introduced into modern economies—the steam engine, electricity, the electric motor, and computers, for example—seem to take about 80 years to transition from the laboratory to what you might call cultural invisibility. The computer hasn’t faded from sight just yet, but it’s likely to by 2040. And it probably won’t take much longer for machine learning to recede into the background.
The ‘invisibility’ observation is particularly subtle.
AI is one of those fields that seems to have had so much promise for so long, with so little to show for it, that it has reduced even its true believers to the point of exhaustion. McKinsey frankly acknowledges this, hearkening back to the bona fide pioneer Alan Turing, who did his most inspired work in the World War II era, three-quarters of a century ago, saying, ‘But [neural networks and other] techniques stayed in the laboratory longer than many technologies did and, for the most part, had to await the development and infrastructure of powerful computers [in just the past few decades].’
Or, in the far pithier and more memorable phrase which now has a thousand fathers, ‘It’s only AI when you don’t know how it works; once it works, it’s just software.’
Does, then, IBM Watson work? Without a doubt.
Here are my top takeaways from the Watson meeting:
- The firms attending agreed unanimously and without reservation that Watson is already having the greatest impact on knowledge work of any previous technology, by an order of magnitude. Watson is not only a big deal, it’s the Real Deal.
- Given the immense resources IBM has and promises to continue to put behind it, the position of commercial leader in this class of powerful technologies is Watson’s to lose; the conversation will be shaped around Watson and not something else.
- In other words the conversation has shifted from ‘Is Watson for real?’ to pragmatic and operational issues centred on questions such as how much it would cost for a given law firm to develop its own proprietary ‘instance’ of Watson and whether lawyers would actually use it.
- Because no law firm has yet adopted it, not surprisingly, there are no successful ‘use cases’ in law so far, and lawyers are trigger-happy at jumping to the self-satisfied conclusion, ‘I told you so.’ I firmly believe that’s a highly perishable, wasting argument: My own intuitive prediction is that in 6-12 months, that will no longer be true.
Readers of Adam Smith, Esq. and technologists who have had encounters with Law Land, know a few other things as well, which explain why as of mid-2015 the immense promise and potential of Watson have yet to yield a concrete case study of its deployment in legal.
The first and most pragmatic is that IBM has prioritized several knowledge domains, notably medicine, consumer-facing applications, finance, and banking risk/compliance, well ahead of law. Law isn’t a top five and probably isn’t a top ten priority for IBM Watson at the moment.
Second is everything we know about law firms’ approach to technology over the past four or five decades of experience: We prefer to be brutally late to the party. Refreshingly, more than a few of the firms in attendance last week vowed this time would be different for them.
Third is the fruit of the #1 lesson law firms learned in the wake of the GFC: Clients, not firms, are calling the shots. (Or, to use a corporate governance analogy, at least they have majority control of the seats on the Board.) No one for a moment doubts that clients will deploy Watson before law firms, including the single most critical species of client for most of BigLaw, banks. For once it would behoove us to respond with alacrity and nimbleness in place of denial, resentment, and behind-the-lines guerrilla resistance.
I mentioned the Wright Brothers earlier and alluded to their being oblivious as to potential military uses of their invention. To state the blisteringly obvious, that didn’t stop more than a few folks from coming up with such uses, in an escalating spiral of reliability, effectiveness, power, and just plain fearsomeness. Given IBM’s focus on other knowledge industries ahead of Law Land, it’s only being fair (and not unkind) to say that IBM’s degree of sophistication and depth of thought about potential legal-vertical uses for Watson is not much more advanced than the Wright Brothers on aerial warfare.
I have news for you: That’s not actually IBM’s job, and not IBM’s problem. It’s our job and our challenge. At this point the most realistic way to think of IBM is as an arms dealer, ready and willing to sell instances of Watson to anyone interested.
Why wouldn’t your firm want to explore what’s possible? Because I guarantee you others will be doing just that. Some of them, I’ll wager, were at last week’s event.
Bruce MacEwen is the president of Adam Smith, Esq. You can read his blog here.