At their I/O in May, Google demonstrated the new capabilities of Google Assistant by playing two recordings of the technology making calls on its controller's behalf.
In the first, Assistant successfully made a hair salon appointment. In the second, it made a restaurant booking and managed to navigate the conversation even when some answers were not as you may expect.
Check out the video below...
A simplistic test
The calls felt like a simple series of trigger-based responses and the conversations were basic (intentionally so, I'd assume).
This is hardly surprising given the businesses they were made to.
When you call a hair salon, it's a safe bet that the first question you'll be asked is a form of 'when do you want your appointment'. Questions about the haircut and service you’re looking for are likely to be asked next, with a request for the caller’s name following not long after.
These question types are very easily programmed and simple for a computer to follow. They leave little opportunity for any deviation that could confuse the AI, so what nuances did it actually understand as part of this call? I didn't spot any.
The second call (to a small restaurant) was more impressive, but it still came across as a series of simple scripted responses to trigger keywords: when, how many people, what time. Its realisation that it couldn't book and ability to ask about the wait time was impressive, but the response at the end ("Ooh, I gotcha") didn't sound natural.
Even the end results are not particularly impressive. In the hair salon call, the assistant initially asked for an appointment at 12pm, but is told one is not available and offered 1:15pm instead.
The assistant responds by asking if anything is available between 10am and noon, before eventually landing on a 10am appointment.
However, this doesn't seem like a truly intelligent response. The assistant has only used the core data that had been provided by the instructions. It hasn't really learned or adapted in any meaningful way.
It'd be interesting to see this technology put to the test in a scenario where 12pm is the only option available on that day. Would it be able to really learn and ask, for example, if another day is available?
A natural voice
People seem more impressed that the assistant said 'mmm' and 'errmm' than the actual quality of its understanding. While the use of this kind of language does offer a more naturalistic tone, it's easy to programme; it doesn’t make the assistant more intelligent.
Even with these flourishes, there are still numerous points where the inflexion sounds unnatural. For example, listen when the assistant says "between 10am and 12pm". It sounds robotic and far from natural.
What's more, when the receptionist asks for the service required, the most natural response isn't 'a woman's haircut'. This tells the receptionist nothing that they don’t already know: that a haircut for a woman is being requested.
'Just a trim', 'a cut and blow' or a reference to a celebrity's style is a more useful response, but the technology doesn’t seem smart enough to offer that level of detail or nuance yet.
Finally, why wasn't this a live demo? Of course, there are some practical reasons: the recipient might not answer, background noise from the room might disturb the results, quality of the line might make it hard to hear.
But demonstrating live is part of the Google handbook for presenting. If you've ever seen Google present anything, even if it was just one of their agency teams, you’ll know they love a live demo.
It struck me as telling that in this instance they didn't, and the cynic in me was questioning how many trial calls it took to get one right.
In a nutshell
With all their money, technology and knowhow, I have no doubt Google are going to crack this code and will be placing appointments for us before we know it. But right now it just felt half-baked to me and far less impressive than I was expecting.
What do you think? Am I being to harsh? Let us know on Twitter: @fastwebmedia