Friday, October 9, 2015

Review: Nuance Dragon NaturallySpeaking 13

About 15 years ago at a trade exhibition I was blown away by a demonstration of Dragon NaturallySpeaking. With the style and panache of a stage conjourer the American presenter took suggestions - jokes, sayings, lines from songs - from the rapt audience and transferred them to the big screen behind him using only his voice. Then by issuing aural commands, he juggled the words into new and sometimes hilarious sentences without ever once looking around. Truly this was the future.

Except of course, it wasn't. When I finally got my hands on a copy sadly the experience was anything but magical. Slow, inaccurate and clunky in the extreme it eventually invoked the blue screen of death on my PC, at which point the future was consigned to the bin. I could only conclude that I had fallen victim to a modern version of the infamous Turk, the chess-playing "automaton" that astonished audiences in the 1820s, but which in fact concealed a highly accomplished (if uncomfortable) chess-playing human in its base. Either that or the software only answered to a Silicon Valley accent.

Now, of course, that future really has arrived. Speech recognition is commonplace on our smartphones in the shape of Siri, Google Now and Cortana, and on the PC and Mac Dragon NaturallySpeaking has now notched up 13 versions.

I tested the Premium version of NaturallySpeaking 13 on a reasonably powerful PC (Core-i7, 4GB RAM, Windows 7). However, the installation still took quite a few minutes and involved a couple of optional reboots, which was all a bit mysterious. Apparently the installer analyses the hardware and optimises the functionality accordingly, so features such as natural language commands that demand more grunt will not be installed by default on lower-powered systems. This might be the reason for the lengthy installation process.

NaturallySpeaking eventually showed up on screen in the shape of a small toolbar, the DragonBar, at the top of the screen. A start-up screen prompts the user to select a language and to read a few sentences to acclimatise NaturallySpeaking to the speaker's voice. There was an initial hiccup, however, in that the software didn't recognise the headset and mic I had plugged into a mini-jack port, instead defaulting to a USB webcam, through which it recognised about one in three words. Not an auspicious start. However, things improved markedly once I had realised what was going on and replaced the headset with a USB one. The commercial version of Premium comes with its own USB headset and mic so this shouldn't be a problem for most users.

That hiccup aside, getting started proved to be ridiculously easy. Selecting "Standard UK English" then reading out a few sentences in my neutral southern tones was all it took for Dragon to recognise correctly the vast majority of words. Whether it would struggle with a broad regional accent is another matter. A quick scan of commentary on the internet suggests that it might.

There is something of a learning curve to using NaturallySpeaking, simply because of the sheer number of commands. There is a core set of 50 or so global commands, then you have commands specific to a certain application, for example Microsoft Paint, and optionally Dragon supports natural language commands in Microsoft Office, Firefox and other applications so you can say "bold that" or "make that bold" to achieve the same result. According to Nuance, the list of applications that can use natural language commands is growing all the time, and now includes Gmail and Hotmail on all the popular web browsers. And if that's not enough you can add your own custom commands.

But the learning process is a two-way thing. While you are learning about NaturallySpeaking, NaturallySpeaking is also learning more about you. When you close the program it spends a minute or two refining your "profile" so as to increase its accuracy next time. And you can always take matters into your own hands and read Dragon a spot of Isaac Asimov ("Captain Dimitri Chandler [M2973.04.21/93.106//Mars//Space-Acad3005//*//] - or 'Dim' to his very best friends") if you want to really put it through its paces.

NaturallySpeaking is a pretty comprehensive program and I only had time to scratch the surface. It should be perfectly possible, given enough time and effort, to navigate all aspects of the PC and the internet - and even do some light programming if you fancy it - without ever having to lay hands on keyboard or mouse.

My immediate needs - and the reason I was keen to reacquaint myself with Dragon - were much simpler, specifically that I have the typing skills of a drunken ox in boxing gloves, which makes transcribing interviews - a necessary evil in this line of work - a particularly tedious task.

The Holy Grail for my particular use case would be the ability to distinguish accurately between multiple voices, for example to transcribe a meeting automatically. Sadly though, this is not yet possible. NaturallySpeaking (and, I believe, similar programs) can only cope with one voice at a time and it struggles with background noise. Professional transcribers can breathe easy for now.

The workaround is to listen to the recording and "parrot" what you hear. Much slower than a direct transcription, but quite a bit quicker and certainly much more accurate than the keyboard for a hamfisted typist like me.

Nuance claims 99 per cent accuracy out of the box, which seems a rather bold claim. While dictating from a recording, an error rate of one word in 20, or 95 per cent, would seem to be closer to the truth.
Unlike Siri and similar, which process input in the cloud, NaturallySpeaking does all its crunching locally, which means that whatever you say appears on screen pretty much immediately. This is very helpful when dictating because when mistakes are made you can be ready for them and go back and correct them quickly. I made a lot of mistakes early on, while getting used to the commands. Things improved reasonably quickly but there will always be ambiguities and misunderstandings.

NaturallySpeaking generally does a good job of choosing the correct homonym according to context, but it does sometimes make mistakes, for example selecting "right" instead of "write". You also have to train it to use specialised words. I found it learned easy ones like Hadoop first time, but after many attempts I still failed to get it to output "Azure" instead of "as your".

When transcribing recordings I found it quicker and easier to correct mistakes using a keyboard, but in time and with practice there is no reason why you couldn't do everything via voice commands. However, I struggled to navigate the web using NaturallySpeaking. On pages where links exist as hypertext there is no problem: just say "click XYZ" and off you go. But with the checkboxes in Yahoo! Mail for example, or in websites where links hide behind graphics you can spend a long time shouting at your computer to no avail whereas a simple mouse click does the job in a couple of seconds.

NaturallySpeaking comes in a number of different versions (link, PDF), including specialist editions for the legal and healthcare markets. The Premium edition tested includes functionality such as full text control for Excel and PowerPoint, the ability to playback your speech in documents, transcription from approved digital recorders and other features that are not available on the Home Edition. However, for my purposes - parroting interviews from recordings - the functionality of the latter - which is about £55 cheaper at £71.99, would have been quite adequate.

Overall, NaturallySpeaking is an impressive package. I was expecting a much lengthier training process, but you really can get going pretty much immediately. The training and help files are genuinely useful and easy to follow, and adding new vocabulary and other customisations is straightforward.

Now, if they could rise to the challenge of transcribing multiple voices that really would be something to speak home about



