The Story of Siri, by its founder Adam Cheyer

18 Dec 2014 conference, community

Last month we hosted the LISTEN conference in San Francisco (full recap here). Adam Cheyer, cofounder of Siri, shared the fascinating story of the long journey that led to Siri. Today we are pleased to release the full video of Adam’s keynote.

Walking backward in time, Adam discussed the technical history of Siri as well as how the vision of virtual personal assistants evolved over time. He wowed the audience with a video from 1987 on a concept from Apple where predicted a Siri like device 24 years in the future and was only off by 2 weeks

Thanks again to Adam and all LISTEN participants, it was a wonderful day of learning and discovery.

Team Wit


Guest Post: Hacking a Voice Interface into a Microwave

13 Nov 2014 guest, post,, hackathon,, community

Wit.ai participated in a number of hackathons this fall and we saw tons of cool hacks using the Wit.ai API. One of our favorites was from Hack the North at the University of Waterloo in Ontario, Canada. Team Home Ease won our API prize for their voice controlled microwave and toaster oven project. Hackers Nick Mostowich, Mayank Gulati, Ram Sharma, and Austin Feight shared their experience of the hackathon and working with Wit.ai. Great work Home Ease!


Team Home Ease

As a group, we were really tired of people telling us that the Internet of Things is “coming soon.” We’ve seen demos of ten thousand dollar fridges at CES, seen Robert Downey Jr. use a fully automated garage in the Iron Man series, and seen glimpses of greatness with devices like Nest. Still, we don’t have the ability to buy smart devices off the shelf for a reasonable price.

So we decided to just build them ourselves.

Hackathons aren’t always the greatest place to build cutting edge hardware, since development iterations are much longer than most software projects. Most teams at hackathons hunker down three or four to a small table and code up a web app. Instead we thought of doing something different, something with hardware, and we knew it will be related to the Internet of Things.

So, we put together five tables together in the biggest room we could find in the beautiful E5 building at Waterloo. We covered the table with all the hardware we could find. Austin brought an entire tub full of parts, tools, and wires. Nick brought a full bag of goodies. Mayank helped one of the organizers for an hour and in return had access to every single microcontroller, actuator, and piece of hardware available for use at the hackathon. Ram drove Mayank home to pick up his own appliances, and we were off to the races. You could say this gave us a bit of a hometown advantage, but hey, you work with what you’ve got!

Toaster Oven

In total we had a waffle iron, a toaster, a toaster oven, and a microwave to work with. The goal was just to connect them to the internet so we could control them remotely and automatically. We felt that this would be a good starting point and a solid accomplishment for the weekend.

The waffle iron and toaster were simply too mechanical to bother with. Levers and springs don’t lend themselves easily to retrofitting, so we moved on to the toaster oven. We used spark.io as our microcontroller, and connected a servo directly to the potentiometer that controls the power level to the heating coils, giving us control over on/off and temperature. We ripped the timer right out; all our timing was done server side.

The microwave was a lot harder. Rather than control it mechanically, we decided to interface directly with it’s input controller. After ripping off the control panel, we realized that it was a relatively simple 2D input matrix. Whenever two of the thirteen input pins were set to high, a specific code was activated on the internal controller of the machine. For example pins 2 and 13 corresponded to “stop”. The only problem was that we had a maximum of 7 outputs from our relay. The breakthrough came with a relatively simple observation: we only needed enough buttons to turn the machine on and off and set the power level; timing was unnecessary as we could just start and stop the machine from the server. In the end we only hooked up start, stop, power level, 5, and 9. This was enough to control everything about the microwave a user would expect.

Thus we ended up with two kitchen appliances connected to a simple Node.js web backend. Things could be turned on and off at various power levels all via the internet. We had accomplished our goal with some time to spare. We thought, what would make this hack even better?

Toaster Oven

Enter Wit.ai.

We went to see Wit’s demo and were blown away. Jen was able to get simple voice control up and running in about fifteen minutes. Austin sat down with his phone and built a simple application to send audio to Wit and pipe the results back to our Node server. The intent/entity system made actually controlling the devices insanely easy. For our applications, the intent was the device to use and the entity was a recipe. For example, you could say “Microwave me popcorn” and the server would look up what the microwave recipe for popcorn was. It would then put the microwave on high for 4 minutes.

What really blew us away was how Wit.ai adapted to different types of input. There’s no need to be verbose and specific. You can say “I’m feeling like some pizza,” or “Cook me some mother fucking bacon!” and Wit just magically knows what you mean. It worked in noisy environments, and adding tons of recipes didn’t seem to slow it down noticeably. We were also able to validate the input messages through Wit.ai’s excellent web interface.

From our point of view it made our hack look really impressive with minimal amounts of work. Hands down Wit.ai is one of the best APIs we’ve ever worked with.

Team Home Ease and Team Wit


Listen 2014 Conference Recap

07 Nov 2014 conference, community

Podium sign

Find out below 3 great keynotes from Listen 2014

Yesterday, we were happy to host the first ever conference on voice interfaces for the internet of things, Listen 2014. It was an exciting day, bringing together industry leaders in speech technology and IoT to discuss the future of our connected lives. Hosted at the beautiful Bluxome Street Winery in San Francisco, the sold out event was intimate and filled with engaging conversation and thought provoking debate, both on stage and off.

Bluxome Street Winery

Kicking off the conference was our own Hacker in Residence, Jen Dewalt with a talk titled “The Voice Revolution: Why now is the time for voice interfaces.” In her presentation, Jen discussed how voice interfaces are the natural solution for internet of things devices and that getting the voice interface design right was key to the growth of the IoT era.

Jen Dewalt of Wit.ai

Next up was Siri and Viv founder Adam Cheyer with a fascinating talk on the history of Siri. Walking backward in time, Adam discussed the technical history of Siri as well as how the vision of virtual personal assistants evolved over time. He wowed the audience with a video from 1987 on a concept from Apple where predicted a Siri like device 24 years in the future and was only off by 2 weeks. Adam closed with his vision of the future with Viv.

Adam Cheyer, founder of Siri and Viv

In the next session, Adam joined a blockbuster panel including Rob Chambers of Cortana, Vishal Sharma, former VP of Google Now, Ron Croen, founder of Nuance, and Sunil Vemuri, PM of Google Now with moderator Rachel Chalmers of Ignition Partners. The group discussed the future of voice activated personal assistants. The conversation included a vigorous debate over how search fits into personal assistants and whether the final solution would be one all inclusive app or several directed apps with specialized contexts.

The Future of Personal Assistants panelists

Dan Jurafsky followed with a captivating talk on the Language of Food. Dan opened the talk with the history of ketchup and how the language of food can expose historical revisionism. He then moved on to his research on food marketing and potato chips, sharing that every negative word on a potato chip bag add 4 cents to the cost. Dan wrapped it up with a look at the language of menus and speed dating.

Dan Jurafsky speaking about the language of food

After a delicious lunch, Roberto Pieraccini of Jibo took us through the history of speech technology, starting with the earliest speaking machines. He covered the evolution of research in speech recognition and included a clip of the first electronic speaking machine. Roberto’s recreation of HAL 9000 using actual 2001 technology was met with much laughter from the audience.

Roberto Pieraccini of Jibo

In the Developer Case Studies panel, we met three developers who had used voice interfaces in their applications. Daniel Sposito led by showing us DayRev, a personal narrator to help you stay up-to-date on your favorite subjects. Next up was Paul Beck from Hulu to discuss the process of using Cortana to add a voice interface to the Hulu app. Finally, Joel Wetzel, CTO of Affirma showed us his running assistant app MARA. The session provided an interesting perspective on putting voice interfaces into production.

Developer case studies panel

After a quick break, Oren Jacob, CEO of ToyTalk, took the stage with a hilarious presentation on developing voice interfaces for children. He shared clips of children talking to the ToyTalk apps and anecdotes of his research with children to demonstrate the challenges of designing voice applications for kids. He also touched on the regulatory issues around recording children and the challenges they present. His bubbly and passionate style evoked plenty of laughs.

Oren Jacob, CEO of ToyTalk

Oren joined the last panel of the day with Eric Migicovsky, founder of Pebble, Konstantine Othmer, CEO of CloudCar, Chaz Flexman, VP of Strategic Partners at Wink and moderator Derrick Harris of Gigaom to discuss user interfaces and the future of connected devices. The panel was a lively discussion on how we currently engage with voice interfaces like Siri and how they will be integrated into internet of things devices going forward. When Derrick asked the panelists to share what baseball inning the industry of voice interfaces and IoT was in, the panelists shared their cheeky individual spirit, responding:

  • Oren: 3rd inning,
  • Eric: 1st period (hockey),
  • Chaz: 1st quarter (basketball),
  • Konstantine: 1st wave (kiteboarding)

IoT panelists

The closing keynote speaker was Wit.ai founder and CEO Alex Lebrun. He opened with a joke about trying to book Spike Jonze for the closing talk but when finalizing the deal he realized Jonze was a little out of our price range. Alex spoke about his passion for voice interfaces beginning with Cybelle and his vision for the future of voice. Showing pictures from the day, Alex expressed his gratitude to the speakers and attendees for coming together to share their experiences and make Listen 2014 such a great event.

Alex Lebrun, CEO of Wit.ai

We closed the day with a wine and cheese reception where we were able to taste 6 of Bluxome Street wineries delicious wines and enjoy conversation on the day’s events.

Wine reception

We’re working on gathering and editing the videos of the talks and hope to make them available online soon. Stay tuned!

Wit.ai as whole would again like to thank everyone who participated in Listen 2014 and we hope you had a wonderful day. It’s because of you that the event was a success and we love to hear any comments or feedback you may have to make Listen 2015 even better.

Conversation over breakfast

Networking

Roberto Pieraccini chatting with attendees

Attendees enjoying breakfast

The Future of Personal Assistants panelists

Ron Croen speaking on the Personal Assistants panel

Audience members laughing

Attendees chatting

Attendees chatting

Attendees chatting

Attendees chatting

Wine bottles

Team Wit


New API Version Released

22 Oct 2014 update

We’ve released a new version of the API which will make working with entiites like datetimes easier. The new API version (version 20141022) only affects the entities field in the outcome object when you receive a response from Wit.ai. To update to the new version, change your version number in the call to the API. The new way to call the API will be:

bash curl -H ‘Authorization: Bearer $TOKEN’ ‘https://api.wit.ai/message?v=20141022&q=hello'

or

bash curl -H 'Authorization: Bearer $TOKEN' -H 'Accept: application/vnd.wit.20141022+json' 'https://api.wit.ai/message?q=hello'

The new format makes it easier for you to parse entities but also paves the way for more information for many entities such as datetime. To compare, if we pass the phrase “Set alarm tomorrow at 7am” to Wit.ai, the response from the old API version (v. 20140620) might look like this:

 curl   -H 'Authorization: Bearer $TOKEN   'https://api.wit.ai/message?v=20140620&q=set%20alarm%20tomorrow%20at%207am'

{
  "msg_id" : "dcdac1af-adad-41dd-aeb6-c40f29a20e08",
  "_text" : "set alarm tomorrow at 7am",
  "outcomes" : [ {
    "_text" : "set alarm tomorrow at 7am",
    "intent" : "alarm_set",
    "entities" : {
      "on_off" : [ {
        "value" : "on"
      } ],
      "datetime" : [ {
        "value" : {
          "from" : "2014-09-27T07:00:00.000-07:00",
          "to" : "2014-09-27T08:00:00.000-07:00"
        }
      } ]
    },
    "confidence" : 0.995
  } ]
}

In the new version, the response looks like the following:

curl   -H 'Authorization: Bearer $TOKEN   'https://api.wit.ai/message?v=20141022&q=set%20alarm%20tomorrow%20at%207am'

{
  "msg_id" : "07a8edd6-3503-4fc7-857d-bc506b85c720",
  "_text" : "set alarm tomorrow at 7am",
  "outcomes" : [ {
    "_text" : "set alarm tomorrow from 7am",
    "intent" : "alarm_set",
    "entities" : {
      "datetime" : [ {
        "type" : "value",
        "value" : "2014-09-27T07:00:00.000-07:00",
        "grain" : "hour",
      } ],
      "on_off" : [ {
        "value" : "on"
      } ]
    },
    "confidence" : 0.999
  } ]
}

The table below illustrates all of the changes in the new API response:

Example   Current format   New format
on July 15th at 5pm (Datetime)   {“value” :
{“from” : “2015-07-15T17:00:00.000-07:00”, “to” : “2015-07-15T18:00:00.000-07:00”}}
  {“type” : “value”,
“value” : “2013-07-15T17:00:00.000-02:00”,
“grain” : “hour”}
from 7am to 8am (Datetime)   {“value” : {“from” : “2014-09-26T06:00:00.000-07:00”, “to” : “2014-09-26T07:00:00.000-07:00”}}   {“type” : “interval”,
“from” : {“value” : “2014-09-26T07:00:00.000-02:00”, “grain” : “hour”},
“to” : {“value” : “2014-09-26T08:00:00.000-02:00”, “grain” : “hour”}}
this afternoon(Datetime)   {“value” : {“from” : “2014-09-26T12:00:00.000-07:00”,”to” : “2014-09-26T19:00:00.000-07:00”}}   {“type” : “interval”,”from” : {“value” : “2014-09-26T12:00:00.000-02:00”, “grain” : “hour”},”to” : {“value” : “2014-09-26T19:00:00.000-02:00”, “grain” : “hour”}}
2 hours (Duration)   {“value” : 7200}   {“type” : “value”, “unit” : “hour”, “value” : 2, “normalized” : {“value” : 7200, “unit” : “second”}}
70 degrees (Temperature)   {“value” : {“temperature” : 70}}   {“type” : “value”, “unit” : “degree”, “value” : 70}
70°C (Temperature)   {“value” : {“temperature” : 70,”unit” : “C”}}   {“type” : “value”, “unit” : “celsius”, “value” : 70}
70 (as an implicit Temperature)   {“value” : {“temperature” : 70}}   {“type” : “value”, “value” : 70}
30 miles (Distance)   {“value” : {“unit” : “miles”,”distance” : 30}}   {“type” : “value”, “unit” : “mile”, “value” : 30}
30 (Quantity or Number)   {“value” : 30}   {“type” : “value”,”value” : 30}
about $20 (Amount_of_money)   {“value” : {“currency” : “$”,”amount” : 20}}   {“type” : “value”, “unit” : “$”, “value” : 20}
one cup of sugar (Quantity)   {“value” : {“product” : “sugar”,”unit” : “cups”,”value” : 1}}   {“type” : “value”, “unit” : “cup”, “value” : 1, “product” : “sugar” }
350ml (Volume)   {,”value” : {,”unit” : “litre”,,”volume” : 23,},}   {“type” : “value”, “value” : 330, “unit” : “millilitre”}


With the release of the new API, any old versions ( any version < 20120924) will be deprecated and a field ‘WARNING’ with the value ‘DEPRECATED’ will be added to any response from a request with an old version. Old API versions will persist until the next subsequent API update (the version after v. 20141022) at which point they will be removed from our system.

Please contact us if you have any questions or concerns.

Team Wit


Inbox Voice: When your users aren't radio hosts

21 Oct 2014 feature, speech

If you see a demonstration of almost any speech engine today, it’s going to be done by a person who has a completely fluent, radio-ready command of English. He’ll speak deliberately and slowly, and voila, you’ll see that they’re understood.

But what happens when you enter the real world, and you realize your users aren’t all radio hosts?

Our team boasts an array of accents and dialects and we dreamed of a day when we could say “Turn on the lights,” and the major speech engines didn’t think we were talking about our wives. Wouldn’t it be great if we could train a speech engine from what users actually say? This way, you could create an acoustic model tailored exactly to them.

Introducing Inbox Voice

Inbox Voice is a new training tool to help make your instances even more accurate. Every time one of your users speaks to your app, Wit will try to transcribe what they said and display the result on the Inbox Voice page. If the transcription is correct you can validate it. If we made a mistake, you can correct the text and our friendly robot will start learning.

create custom acoustic models

The more you correct your audio transcriptions, the better Wit will become at understanding your users. This means you can start to customize your voice interface from voice to text to intent.

correct acoustic models

Inbox Voice gives you specialized voice recognition tailored to your users. So, no matter what accents your users have, Wit will understand what they are saying. Beyond that, the natural language processing is tailored to your app as well, so your users’ intents get mapped much more accurately.

You can sign up and try it out on your instance. Let us know what you think and we look forward to seeing what you create.

Team Wit