The Button Fallacy

When I was studying at Carnegie Mellon University I was on a research project, READ, whose main purpose was to explore the use of voice recognition in children's reading experiences on a connected TV platform. As you can imagine this came with all of the usual problems of developing with voice recognition for kids, but I'm not going to get into that very much here. I'm more interested in a lesson I learned from this project: Take existing mechanics and advice from others with a grain of salt, think about your project and its goals before you make assumptions and implement them.

For my project this was a button, hence the title of this post. Researching experiences that have voice recognition for kids we found several great examples of it being done and one of my good friends was working at a studio that was making a voice recognition experience for kids too. In looking at these experiences most had a button in common. One of the prime examples of this is Thomas and Friends Talk To You, which is a great experience and the button totally works for them. 

 Thomas and Friends talk to you screenshot

Thomas and Friends talk to you screenshot

 

There it is, in the bottom right, a button that needs to be pressed, and in some cases held down when talking. 

This button method has benefits, mainly it helps clean up your audio input. You know that the sound you are trying to recognize is the user trying to interact with your system instead of accidental sounds. This is actually important for voice recognition and works on some experiences. When I went to my friend for advice I was told this button was a must for voice recognition experiences. He gave the improved input explanation above. Hell a trusted advisor and a handful of successful voice recognition experiences all advocated it, it must be great. I accepted it as a necessity and didn't look back.

A week or two later we had a prototype far enough along that we could test. We could not get kids to press our button, forget about press and hold. Could not be done. We iterated, added feedback and indirect control to try to get kids to interact with the button. Each test passed and we saw our testers trying to speak to the device, but failing because of this button. And why did we need it? Just to get better audio input? But we weren't getting any because of it. It had to go. We knew when we expected users to interact with their voice, we didn't need a button to be pushed. When the time comes we just turn the microphone on and let programming magic take care of the possibly worse quality sound input. It finally worked. I learned my lesson the hard way, had I not accepted what others had done before me with out questioning it I would have saved my team at least one month (for part of the experience we were fortunate to be pushing other areas forward and testing more than voice recognition interaction with each test). I wish I had more pictures to illustrate this with my experience, but many of them for this project are lost because it is on a platform that is very difficult to setup again without the required serves and hardware.