Military Voice Recognition Software
This study evaluated the implementation of voice recognition (VR) for documenting outpatient encounters in the electronic health record (EHR) system at a military hospital and its 12 outlying clinics. Seventy-five clinicians volunteered to use VR, and 64 (85 percent) responded to an online questionnaire post implementation to identify variables related to VR continuance or discontinuance. The variables investigated were user characteristics, training experience, logistics, and VR utility. Forty-four respondents (69 percent) continued to use VR and overall felt that the software was accurate, was faster than typing, improved note quality, and permitted closing a patient encounter the same day. The discontinuation rate of 31 percent was related to location at an outlying clinic and perceptions of inadequacy of training, decreased productivity due to VR inaccuracies, and no improvement in note quality.
Lessons learned can impact future deployment of VR in other military and civilian healthcare facilities. Introduction Voice recognition (VR) and electronic health records (EHRs) have both entered mainstream medicine in the past decade. Currently the increased time burden of data entry into EHRs is one of the reasons that the EHR adoption rate is low.
With voice recognition software continuing to improve in speed and accuracy, it could potentially improve the process of inputting data into electronic health records and thereby decrease one of the key barriers to EHR adoption. Similar to the introduction of many new technologies, VR may succeed or fail based on personal experience, training, or technical or logistical reasons.
We sought to explore the factors that influence the continuation or discontinuation of voice recognition as an inputting method for an electronic health record by surveying all clinicians who volunteered to receive the software. Background Voice recognition is a relatively new means to enter patient data. Clinicians who tried VR in the early 1990s used “discrete” voice recognition that required the user to pause after each word. Continuous voice recognition became available around 1998 and rapidly became the industry standard.
In the same time frame, specific medical vocabularies were created that greatly improved accuracy., The earliest adopters of VR were often radiologists and pathologists because they depended on dictation services that were associated with high costs and delays in report completion. With traditional dictation, the clinician dictates a note or report, which a transcriptionist transcribes and returns for proofreading and approval. Reports are incorporated into EHRs by either the clinician or the transcriptionist. This process usually takes several days to complete, so it is not ideal when rapid access to a record is needed. Timely completion and closure of an encounter improves the coding, billing, and payment process. Clinicians and healthcare organizations are looking for solutions to rapidly and cost-effectively generate a legible record.
Early adopters have embraced VR as a potential solution and have developed templates that format a report into standardized sections and macros that insert a body of standard text into a report, such as “insert normal chest x-ray” or “insert normal gross appendix.”, With dictation costs of approximately 6 to 20 cents per line resulting in annual costs of $5,000 to $15,000 per clinician, organizations are looking for less expensive alternatives. Many early studies on voice recognition have limited current applicability because of significant improvements in VR software and computer hardware. Earlier versions of VR were associated with slower speed and accuracy.
Frequently, studies did not report if medical vocabularies, which improve the accuracy of VR, were used., Until recently, computers had inadequate processor speed and random-access memory (RAM) for optimal VR performance. The manufacturer of the VR software Dragon NaturallySpeaking (version 9) recommends that it be installed on computers with at least 1 GHz of processor speed and 1 GB of RAM. Previous VR studies have included only small numbers of clinician users, thus limiting meaningful statistical analyses.
In many studies, training methods were not described and varied considerably between different implementations. Cost analyses of voice recognition conflicted, with return on investment ranging from six months to six years. The cost of clinician time to edit mistakes associated with VR is considerable and frequently not reported.
Dragon Voice Recognition Software Reviews
Studies of VR use with EHRs focused solely on inputting specialty and ancillary reports and not on typical outpatient encounters., The Department of Defense (DOD) uses a system known as AHLTA as the EHR for the 9.1 million beneficiaries receiving care at its military healthcare facilities. Clinicians can enter data with 1) MEDCIN, a point-of-care medical terminology database, 2) dictations “cut-and-pasted” into the EHR, 3) free-text typing directly into the EHR, or 4) point-and-click condition-specific automated input methodology (AIM) templates. While MEDCIN provides clinical elements that are codified, clinicians have found the clinical notes slow to create and cumbersome to read. Studies have shown that clinicians prefer to create a natural narrative that can only be achieved with handwriting, dictation, or voice recognition., Clinicians are also reluctant to use data entry methods that reduce productivity. With its continuous improvement in speed and accuracy, voice recognition has the potential to streamline data entry. We are unaware of any published studies that evaluated data entry methods, including voice recognition, into AHLTA.
Given the unanswered questions in the medical literature, we studied the implementation and use of VR at a medium-sized military treatment facility and its outlying clinics. We examined the factors associated with continuation or discontinuation of voice recognition software used to input patient data into an electronic health record. Methods and Materials Naval Hospital Pensacola (NHP) delivers inpatient and outpatient care to active-duty personnel, military retirees, and their families at the hospital and outpatient care to active-duty personnel at its 12 branch clinics. The NHP medical staff consists of 149 military and civilian clinicians (physicians, physician assistants, and nurse practitioners). Prior to 2008, the majority of clinical notes were handwritten with the exception of the orthopedic and internal medicine clinics, where the majority were dictated.
In 2008 all outpatient clinical notes were required to be entered into AHLTA. In early 2008 NHP offered speech recognition software, Dragon NaturallySpeaking Medical 9, to the entire medical staff on a voluntary basis in order to decrease transcription costs and to potentially improve entry of clinical notes into AHLTA. The medicine-specific package included 14 preconfigured medical specialty vocabularies and a headset microphone. Seventy-five clinicians volunteered to use the software with no penalty if they decided to discontinue use.
We did not study those clinicians who did not volunteer to have the software installed. Software was installed on desktop computers with 3.4 GHz processor speed and 1 GB RAM in the clinicians' offices and/or exam rooms. While the participants received headset microphones, they had the option to purchase handheld noise-reduction microphones. The deployment of the software and training were staggered over approximately a 12-month period. Individual VR “user profiles” were stored on a server so the voice profiles could be used on multiple computers.
Training was offered by a vendor trainer, a NHP information technology (IT) trainer, a physician champion, a clinical peer, software tutorial (self-training), or a combination of these methods. The training method was not randomized and was selected largely by availability of the trainers and/or the comfort level of the clinician.
The vendor provided “train the trainer” sessions for clinicians and nonclinicians with above-average technology aptitude (“superusers”). A physician champion spearheaded the VR effort for six months before his transfer to another facility. Clinicians were told training must be completed before the software would be installed on their computers. All clinicians who received the VR software were asked to complete a voluntary, Web-based questionnaire at least three months post implementation. The assessment consisted of 24 questions about VR user characteristics, training, logistics, and utility.
The Web-based anonymous survey was developed and responses collected with the online survey tool SurveyMonkey. The assessment questions were developed and pilot tested by a team of “superuser” clinicians on the medical staff.
The research protocol was approved by the Naval Medical Center Portsmouth Institutional Review Board. Statistical Analysis: The statistical analysis was performed with GraphPad InStat 3.10 software (San Diego, CA). We analyzed nominal data in the questionnaire in a contingency table and if cells contained an expected (not observed) value of 5 or less, then an assumption for chi-square testing was violated and categories were collapsed. As a result of collapsing categories, results were analyzed in 2-by-2 contingency tables using the Fisher exact test. If the data were ordinal, values were assigned dummy codes (0–4) and group differences were analyzed with the nonparametric Mann-Whitney test., Two-tailed p-values were used, and if the p-value was less than.05, it was considered statistically significant. Filters and cross-tabulation tools in the survey software were used to analyze variables related to continuation and discontinuation of VR.
The results of the post-implementation assessment was reported in percentages by category and rounded to whole numbers, so totals could be less than or greater than 100 percent. Results The survey was completed by 64 clinicians for a return rate of 85 percent. The following are their responses, divided into sections based on the questionnaire.
User Characteristics: Most participants were military clinicians (78 percent) located at the hospital (75 percent). Fifty-nine percent of respondents were primary care clinicians, and 41 percent were non-primary-care clinicians. Only 14 percent of participants had prior experience with voice recognition. Sixty seven percent rated their comfort level with technology in the novice to moderately comfortable range, whereas 33 percent considered themselves very comfortable to expert (Table ). User Characteristics for Continuing and Discontinuing Voice Recognition Training: A majority of participants (92 percent) received at least one type of training, with the software tutorial being the most common method (53 percent). Five participants reported receiving no training, four of whom accounted for 20 percent of the discontinuers.
Clinicians in branch clinics received less face-to-face training than clinicians at the hospital (44 percent vs. Logistics: Participants used a headset (56 percent), handheld microphone (36 percent), or both (8 percent). A majority of participants used VR in the office (74 percent) and less often in one (5 percent) or multiple exam rooms (8 percent) and in multiple clinics (7 percent).
Seventeen percent, all discontinuers, did not indicate where VR was used. Ninety-eight percent of users did not use it while the patient was still in the exam room. A majority of continuers (73 percent) used voice recognition immediately after seeing a patient, while only a few of the discontinuers did (12 percent).
For continuers, voice recognition was the most common method to input outpatient encounters into AHLTA; 80 percent used it more than 75 percent of the time. For discontinuers, typing was the most common method, with 82 percent using it more than 50 percent of the Most respondents never used dictation (67 percent), MEDCIN templates (72 percent), or AIM templates (41 percent) for outpatient encounters. Continuers did infrequently use VR for medical boards, operative notes, discharge summaries, e-mail, and Microsoft Word documents, while all but one discontinuer never used it for these purposes. VR Utility: Mann-Whitney testing revealed statistically significant group differences ( p-values from.0001 to.0007) between continuers' and discontinuers' perceived utility of VR software. A majority of the continuers felt VR was very to extremely helpful (74 percent), saved 11 to more than 60 minutes per day (93 percent), improved EHR notes (93 percent), and resulted in same-day closing of the encounter more than 75 percent of the time (63 percent). In contrast, the majority of the discontinuers felt VR was only slightly to moderately helpful (70 percent), saved less than 10 minutes per day or increased documentation time (100 percent), did not improve the EHR notes (100 percent), and did not result in same-day closure of encounters (59 percent).
Compared with typing, VR was rated as faster by continuers (88 percent) and slower by discontinuers (65 percent). The majority of the continuers (72 percent) rated VR accuracy at 85 to 95 percent, while a majority of discontinuers (76 percent) rated it at 80 percent or less. Macros consisting of voice commands that insert text were used by 91 percent of the continuers with 72 percent rating macros very to extremely helpful. In contrast, 41 percent of discontinuers had used macros with 17 percent rating macros very to extremely helpful. VR Discontinuation: Twenty of the 64 clinicians (31 percent) stopped using VR software. For the clinicians in the internal medicine and orthopedic clinics who routinely dictated their notes prior to VR use, the discontinuation rate was 21 percent (4 of 19 clinicians). User characteristics significantly related to discontinuation were location at an outlying clinic and inadequate or no training.
Factors not related to quitting were clinician status, military/civilian status, technology comfort level, and prior VR experience. Compared to clinicians that continued VR, discontinuers generally rated it much lower in helpfulness, accuracy, minutes saved per day, improvement in the quality of EHR notes, and the ability to close the encounter in one day.
The main reasons cited for quitting were slowness of the method due, in part, to the time required to correct VR errors (70 percent), failure to recognize the user's voice (35 percent), inadequate training (30 percent), and failure to live up to expectations (30 percent). Discussion President Bush established the goal of universal adoption of interoperable EHRs by 2014. Title XIII of the American Recovery and Reinvestment Act of 2009 established Medicare and Medicaid reimbursement for practitioners and hospitals who demonstrate “meaningful use” of certified electronic health records, beginning in 2011 with the goal of increasing the adoption rate. Healthcare organizations and clinicians will need to evaluate the various methods of inputting data into EHRs for maximal productivity and satisfaction.
After a successful pilot program, the Army Medical Command announced in 2009 that it would purchase 10,000 copies of voice recognition software and distribute them to 42 healthcare facilities worldwide for clinical documentation into AHLTA. Lessons learned from our study will have relevancy in their large-scale deployment and training strategies. Our retrospective study assessed the user characteristics, training, and perceived impact of voice recognition on productivity. A majority of those who responded to the questionnaire were young, active-duty military physicians working at the hospital. Most were new to voice recognition but very comfortable with technology. In spite of the high comfort level with technology, 31 percent discontinued use of the software. Discontinuation was associated with location at an outlying clinic; low perceptions of training, performance, and time saved; and perception of a lack of improvement in patient note quality.
Inadequate training was perceived to be the reason for quitting by 30 percent of participants. Although it was originally stipulated that staff would not receive the software without formal training, some had only software-tutorial training available because of delays in training. Other studies of voice recognition noted training times as short as 30 minutes and as long as four hours., In the latter study, in spite of extensive training, only 25 percent of physicians persisted, so there is debate regarding the importance of extensive initial training. In our study the 80 percent discontinuation rate of personnel who received no training would support the need for required training. The authors are unaware of studies published in the medical literature regarding the importance of additional follow-up training for individuals who struggle with and/or quit using VR software. Respondents used voice recognition primarily in their offices, even though they were given the option of having the software installed in the office or the exam room.
With the majority of continuers and a minority of discontinuers using VR immediately after seeing a patient, it is unclear if user preference or exam room logistics (i.e., changing exam room assignments) accounted for this difference. Mobile laptops with wireless headsets and wireless connectivity were not available but potentially could have improved efficiency and acceptance of VR use. VR accuracy was rated at less than 90 percent by 56 percent of all participants and at less than 80 percent by 49 percent of discontinuers, far lower than the “up to 99 percent” accuracy quoted by the vendor.
Time wasted correcting VR notes or potentially missing an important error will continue to be a major barrier to acceptance of VR over traditional dictation. According to an American Health Information Management Association practice brief, the time required to edit is about twice that needed to dictate.
The literature confirms that clinicians consider the self-editing of voice recognition to be a burden., Clinicians who continued using voice recognition felt there was clear-cut benefit in terms of productivity and improved documentation into the electronic health record (93 percent). The high use of macros (voice commands that insert text) by the continuers (91 percent) could also contribute to their perceptions of higher productivity and accuracy. The strengths of our study are a high questionnaire return rate, a large medical group studied with representation by primary and specialty care clinicians, inclusion of a hospital and associated clinics and the use of recent VR software with preconfigured medical vocabularies. The limitations of our study that may reduce its accuracy and generalizability are the small numbers analyzed in the contingency tables mandating the collapsing of categories, possible selection bias by using volunteers for VR, the fact that the selection of VR users and training methods were not randomized or standardized and the completion of questionnaires at variable times post implementation. Additionally, amplifying information about the type of VR errors encountered and the time required to correct them would have been helpful. Furthermore, the 50 percent of clinicians who did not volunteer to install VR were not studied to better understand the reasons for non-adoption.
This adoption rate of new technology is probably not unexpected. According to Rogers's innovation adoption curve, approximately 50 percent of individuals fall into the “late majority” and “laggards” categories, for which technology adoption is a challenge. Lessons Learned: Based on our experience, we recommend the following to improve VR utility and acceptance:. Conclusion Our study reported the experiences of clinicians who continued and discontinued the use of voice recognition software at a medium-sized military facility and its outlying clinics. Continued use of VR was associated with location at the hospital, a positive training experience and a positive perception of how VR improved note quality and clinician productivity. Almost one-third of voice recognition users discontinued using the software, primarily because they felt the training was inadequate and their perception of the utility of VR was much lower, compared to those who continued using the software.
While voice recognition holds great promise for timely documentation into an electronic health record, training and implementation are challenging. The variables affecting success must be planned for prior to purchase, training, and implementation and must be followed by frequent reassessment. Future studies are needed to further delineate the key user characteristics, training methods, and logistical considerations that improve VR adoption and continuation rates. In addition, future research is needed to determine how to encourage adoption of new technologies such as voice recognition for the late majority and laggards who tend to resist innovation.
Christian Montenegro ALEX CASTRO has been patient. Ever since his teenage years, when he volunteered to work on speech-recognition projects during an internship at AT&T Bell Labs, Mr Castro has been waiting for the technology to work well enough to become widely adopted. “I always felt that voice recognition was a technology that would someday be applied to mainstream uses,” he says. While waiting for “someday” to arrive, the 32-year-old had time to finish his college degree, earn a Masters at Cornell, do a stint at Microsoft's MSN Entertainment business and oversee the launch of Amazon's online marketplace. Now Mr Castro has finally started his own firm, called, a podcast directory with a nifty audio search-engine that can search audio clips (and the soundtracks of video clips) for keywords, using speech-recognition technology. “This is a huge market opportunity,” he says.
It is not just Mr Castro who has been waiting. Speech recognition has taken a long time to move from the laboratory to the marketplace. Researchers at Bell Labs first developed a system that recognised numbers spoken over a telephone in 1952, but in the ensuing decades the technology has generally offered more promise than product, more science fiction than function. For years, call-centre and dictation applications have been better known for the frustration they cause than for their ability to recognise words. Thanks to technical improvements and carefully chosen applications, however, speech recognition finally seems to be catching on in mobile phones, cars and search applications, and its effectiveness in call centres and dictation has improved. “There is good infrastructure, there are industry standards and accuracy is good enough that it's no longer a painful experience interacting with a voice application,” says Tom Furlong of Granite Ventures, a venture-capital firm based in San Francisco.
“A number of companies are thinking about voice for the first time.”. Optimistic forecasts from market-research firms also suggest that the technology is on the rise. The market for speech recognition is dominated by server-based systems used in call centres, directory-assistance services and voice portals (speech-driven data services that supply news, weather forecasts, share prices, travel information and so on). Companies spent $1.2 billion on such systems in 2005, and this is forecast to grow by 22% a year to reach $3.2 billion by 2010, according to Datamonitor, a consultancy. The market for embedded speech-recognition technology, which goes into mobile phones, car-navigation systems and so on, will grow from $46m in 2006 to $239m in 2011, says Dan Miller of Opus Research, a consultancy based in San Francisco. Find me pizza, now An area of great interest at the moment is in that of voice-driven “mobile search” technology, in which search terms are spoken into a mobile device rather than typed in using a tiny keyboard. With technology giants Google and Microsoft getting into the picture, “we have the makings of very robust mobile-search capabilities,” says Mr Miller.
Microsoft acquired Tellme Networks, a voice-recognition company based in Mountain View, California, in March. The software giant plans to use Tellme's software to enable users of mobile phones and hand-held computers to search the internet using voice commands. “Voice can serve as a mouse for the mobile internet and bypass the arduous keypad interface,” says Seamus McAteer of M:Metrics, a market-research firm. “The appeal of speech is to flatten menus and to handle names that don't lend themselves to a ten-digit keypad.” In February Nuance Communications, one of the leading firms in the field, bought BeVocal, a smaller rival, to gain access to its mobile-services technology. And in April Google launched 1-800- GOOG-411, an experimental voice-driven search service that can be used to find local businesses by telephone within America. “There is going to be a lot more investment in speech for mobile search,” says Daniel Hong of Datamonitor.
This upsurge in interest is due in large part to technological improvements. Companies have worked out ways around many of the problems that befuddled previous speech-recognition technologies.
Modern systems often work by identifying vocal sounds, called phonemes, rather than entire words, which can make them more reliable. Deliberately limiting the scope of the words being recognised also improves reliability. Speech-recognition systems do not have to be able to take dictation to be useful; simply recognising a handful of commands or address-book entries is often enough. The “Star Trek”-style communications badges made by Vocera, based in Cupertino, California, are used in hundreds of hospitals to link up doctors, nurses and other staff by speaking a few simple voice commands such as “call” and “find” followed by a name. The falling costs of processing power and storage capacity have also helped make speech recognition more accurate.
“We're better at speech recognition today because of Moore's law,” says Brian Garr, director of enterprise speech solutions at IBM, referring to the industry's rule of thumb that the cost of a given amount of computing power falls by half roughly every 18 months. Another trick is to hand off the work of speech recognition to a powerful remote computer, rather than relying on the processing power of a small portable device. That is how the Vocera badges work. “You can have all the heavy lifting done by one central server,” says Brent Lang of Vocera. And technical standards such as VXML, which provides a standard way to program voice dialogues, have made things easier too. The resulting lower cost and greater reliability mean that speech-based systems can even save companies money.
Last August, for example, Lloyds TSB, a British bank, switched all of its 70m annual incoming calls over to a speech-recognition system based on technology from Nuance and Nortel, a Canadian telecoms-equipment firm. “'Press one for this and two for that' is not that customer-friendly,” says Sally Jones-Evans, managing director of telephone banking at Lloyds TSB, who notes that most British banks use touch-tone systems in their call centres. Using speech recognition instead, she says, provides a competitive advantage because it is easier to use and more efficient. Lloyds TSB has also been able to close one of its 11 manned call centres, reaping “very attractive” cost savings, says Ms Jones-Evans.
But speech-recognition systems do not necessarily spell the end for manned call centres. West Corporation, a company that manages call centres for other firms, is using phoneme-based speech technology from Nexidia, based in Atlanta, Georgia, to analyse recordings of calls made to customer-service lines. Rather than requiring human staff to trawl through hours of recordings, West uses Nexidia's technology to search for keywords and phases, such as positive or negative adjectives, or the names of competitors. “We try to create a customer-mood meter,” says Bruce Pollock of West.
This can both help clients understand customer preferences and improve the performance of call-centre operators. (Mr Castro's firm, Pluggd, does a similar thing with podcasts, spotting keywords so listeners can jump directly to segments of interest.).
Christian Montenegro Another promising area is in-car use. As drivers juggle mobile phones, BlackBerrys, navigation systems, iPods and satellite radios, “the challenge the auto industry is facing now is to offer services to customers in a safe manner,” says Thilo Koslowski, an analyst at Gartner, a consultancy. Voice, he says, is the obvious answer, “because you don't have to take your hands off the wheel or your eyes off the road.” Nearly 70% of premium vehicles (such as those made by BMW and Mercedes) and 20% of mass-market models (from makers such as Toyota and Volkswagen) around the world now have the option of speech-driven functions, says Mr Koslowski.
There are military uses, too. Since last year American soldiers have been testing two-way, speech-enabled translation software from SRI International and IBM to help in training sessions with Iraqi soldiers and policemen. “There is a shortage of human linguists,” says Wayne Richards of the United States Joint Forces Command. But using the software, American soldiers can speak English into their laptops, which then speak the Arabic translation to the trainees.
The idea is to get locals trained “so that our forces can come home as soon as possible,” says Mr Richards. Meanwhile the Phraselator, a hand-held device made by VoxTec of Annapolis, Maryland, lets soldiers maintain eye-contact when conversing with non-English speakers.
At a checkpoint, for example, a soldier can say one of a thousand or so predefined phrases such as “Please stand over here”, and the device will say the phrase out loud in Arabic. Talking the talk Still, plenty of pitfalls remain. For one thing, companies frequently fall into the trap of excessive voice-enabling. John Hall, the president of VoxTec, drives a Honda minivan that can respond to several hundred verbal commands. He likes being able to ask his navigation system to “Show me the nearest hospitals”, whereupon it calls up a list of nearby facilities.
But 90% of the car's voice commands are useless, he says. It is much quicker to turn up the radio's volume control by hand than it is to press a special button and say “Radio—raise volume,” he notes. Another difficulty will be encouraging sceptical consumers to give the technology another try. “People have a lot of negative perceptions of speech technology, because the speech systems deployed first were pretty bad,” says Mr Hong. Mr Castro agrees. “There's a history of disappointment and failed expectations,” he says. When setting up his firm, he presented his idea to some venture capitalists.
They were impressed by the technology but were put off by the term “voice recognition” which, like “artificial intelligence”, is associated with systems that have all too often failed to live up to their promises. Pluggd attracted a group of angel investors instead, including Intel Capital, the investment arm of the world's biggest chipmaker, and it is now raising its first round of venture capital. Perhaps, after decades of waiting in the wings, the technology is finally about to hear its name called out from the stage.