July 09, 2008

Given that I’ve written here before about Peer-to-Peer (P2P) SIP (and why it’s of interest to us), I thought I’d invite you to join in tomorrow (July 9, 2008) at 11am US Eastern time when I’ll be interviewing David Bryan, co-chair of the IETF’s P2PSIP Working Group and CEO of SIPpeerior Technologies, about P2P SIP… what it is, who might use it, and why it matters.  It should be an enjoyable and interesting call for those of us who are interested in the underlying networks that make VoIP possible.  

If you would like to join into the call and are a Facebook user, you can easily join the call through the Calliflower application for Facebook.  If you don’t use Facebook,  you can just go to the Calliflower.com website (you will need to create a free account there).  If you can’t join us at 11am US Eastern time tomorrow (Wednesday, July 9th), you will also be able to listen to the show later in the day when I post it to Alec Saunders blog at Saunderslog.com.

Here’s a brief video preview:
{seesmic_video:{”url_thumbnail”:{”value”:”http://t.seesmic.com/thumbnail/kmVzYqKbI1_th1.jpg”}”title”:{”value”:”Voxeo Blog Video Entry: Invitation to join Squawk Box conf call tomorrow about P2P SIP ”}”videoUri”:{”value”:”http://www.seesmic.com/video/hykT2kiGjn”}}}

P.S. “Squawk Box” is a daily podcast on technical topics of the day hosted and produced by Alec Saunders and appearing on his Saunderslog blog. I often participate in the calls and as Alec is on vacation this month, he asked me to fill in as a host for him this week. Other than my involvement as a participant, there is no formal connection at all between Voxeo and the Squawk Box podcast.

Bookmark this post:
Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


July 08, 2008

52983DEB-348C-4E43-960B-65166FFCFCE4.jpgSo if we get to the point where we can truly “trust” the identity of the person calling us on the other end of a SIP connection, what will that look like to the end user? How will I know - easily - that I can trust that the “Caller ID” displayed on my IP phone is in fact who it says it is? Is there a “visual identifier” of some type that I could have on my IP phone (or softphone) that would clue me in? Kind of like the “lock” icon in web browsers that indicates a call is encrypted?

Those were the questions I was looking to address in a new Internet-Draft I submitted yesterday:

draft-york-sip-visual-identifier-trusted-identity-00

One of the things we focus on here at Voxeo is to ensure that the user experience is as simple and easy as possible. That’s why we rolled out our Designer tool a few years back. That’s why we spent a good amount of time looking to make Prophecy Log Search as simple to use as possible - and why we continue to improve it.

So in the discussions that have been going on within the IETF circles around the incredible need to nail down the ability to have “trusted identity” within communication based on SIP (which I wrote previously, one of the questions I’ve kept asking myself is “how will this appear to a regular end-user?” Based on some comments in the SIP mailing list the other day, I decided to write up this draft.

Feedback is welcome. I’ve already received the comments that I didn’t address the whole issue of PSTN interconnectivity, i.e. if it’s coming from a PSTN gateway, how do you deal with the fact that the Caller ID could have been spoofed on the PSTN side. I’m sure other comments will come in as well.

As I say in the draft, it’s not entirely clear to me that the IETF is the right place to have this discussion since ultimately it is about the user interface in vendor products… but at least it’s a place to start.

Technorati Tags:
, , , , ,

“”

Bookmark this post:
Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


52983DEB-348C-4E43-960B-65166FFCFCE4.jpg

Do you trust the “Caller ID” you see on your phone when someone calls? Do you realize that it can be easily changed? Do you realize that spoofing that Caller ID gets even easier when we start communicating more over SIP?

Right now, one of the greatest challenges being addressed by the Real-time Applications and Infrastructure (RAI) area of the IETF is the whole concept of being able to trust the identity of the user who is calling you. You see, being able to trust the identity of the person on the other end of a SIP connection is incredibly hard. John Elwell recently summarized the issues well in his draft: “End-to-End Identity Important in the Session Initiation Protocol“. Getting “identity” right is one of the largest issues on the agenda of the various groups in the RAI area of the IETF.

Why? Given that we don’t have any way to trust the identity of a caller on the PSTN, why does it matter for SIP? I mean, Caller ID on the PSTN can be easily spoofed… either through any number of web sites or simply through hooking up your own IP-PBX to the PSTN (it’s even possible to do through applications built on our platform). Yet the vast majority of people I’ve asked still trust “Caller ID” on their PSTN phone.

I’d argue that this probably is mostly due to history… for the longest time, you couldn’t easily change your Caller ID. It was set within the carrier networks that make up the PSTN. People have grown to trust it. I expect that will change as unethical telemarketers will no doubt start to make more changes to get around all the call blocking users are doing. If it looks like the call is from your friend, you’re probably going to take the call.

The thing is that SIP makes this incredibly easy. Like SMTP for email, SIP is entirely text based and so just as you can change your email client to say you are sending mail from “elvis@heaven.gov”, you can change many SIP clients to say you are calling from whatever name or phone number you want. If you can’t change the client, you can set up and run your own SIP server.

The danger that many of us see is that if this capability gets widely abused, there is the strong potential that we could wind up in a situation where your identity over SIP is dismissed and not trusted… just like email addresses are today. Given the huge volume of email spam, how many of us actually trust that the “From” address on an email message is really who it is? We have to go into the email message to really see if it is someone we know… which is something you can’t really easily do with real-time communication like voice. You have to actually accept the call and start talking.

I don’t think we as an industry want to see SIP identity go that way… so we need to make sure that we get SIP identity right. We need to get to a state where users can trust that the “Caller ID” they see displayed on their IP phone, softphone, or other device is actually who it says it is.

From a Voxeo perspective, we’re interested because we’d like to see more and more communication occur over SIP. Our Prophecy product is a SIP application and media server. Our hosted platform allows inbound and outbound SIP connections to and from applications. On the back end, we’re a huge consumer of SIP trunks. We want to be able to trust the identity information we see.

Because we also host 10s of thousands of voice applications (55,000+ right now), we also are very interested to ensure that any identity mechanisms allow your SIP identity to be extended to a service provider. If you have pushed your voice applications out into our hosted cloud, right now your apps can set our PSTN Caller ID to be a number that is identified as you. We want to see the same capability within SIP - and want the recipients to be able to trust that the identity of the caller is in fact you - even if we may be actually hosting the infrastructure.

Obviously while most communication today occurs still over the PSTN, some of these issues aren’t immediate. But as we all go about building the great big SIP interconnect that lets us bypass the traditional PSTN, these issues become increasingly important.

We have to get SIP identity right - or risk being dismissed.

If you’re interested in getting more involved, I’d encourage you to subscribe to the IETF’s SIP and SIPPING mailing lists (but obviously be aware that “identity” is not the only topic being discussed there - and beware, they can be high volume lists). Here are some pointers to pieces to read for background:

  • RFC 4474 - SIP Identity
  • RFC 4916 - SIP Connected Identity
  • RFC 3325 - P-Asserted-Identity
  • SIP Identity Using Media Path
  • Why SIP URIs Change Across Domains
  • Adding Asserter Identification to P-A-I of RFC 3325
  • There will be a great amount of discussion in the weeks and months ahead… feel free to join in!

    Technorati Tags:
    , , , , , , , , ,

    “”

    Bookmark this post:
    Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


    July 04, 2008

    Great intro to Speech Server!

     
     

    Sent to you by Brandon Tyler via Google Reader:

     
     

    via Search results for '"Speech Server"' by dszabo on 7/3/08

    Office Communications Server 2007 is Microsoft's IP communication solution and it allows companies to leverage their network infrastructure for voice, video communication, instant messaging, audio/video calls and for much more. You may ask what's the benefit of using the computer network as opposed to the internal telephony network - you don't pay after the minute at neither of them. However, if it's the computer network, it's only the matter of software to integrate the telephony, video and IM with desktop applications. By routing the IP packages through a server in the DMZ, users can call each other at not cost wherever they are. The presence is also very important to mention - users can see each other's status (whether they are online, busy, away, on a meeting with their laptop, out of the office, etc). The presence icon is integrated into every Office applications and it tells the caller in advance whether the other party will be answering the call or not, or if it's not the right time for the call. This is integrated to every piece of the Office System (SharePoint and the Office client products). Online users will stop using mobiles, there's a whole change in the communication culture. This article outlines OCS 2007 Speech Server, which is an additional server role for Office Communications Server.

    OCS 2007 Speech Server

    Every software client/device is a UC endpoint in OCS - whether it's an IP phone, Office Communicator (the client of OCS, like Messenger), a video camera in an A/V meeting room, etc. Imagine that you have not only these endpoints, but that you also have non-human endpoints connected to your OCS/telephony infrastructure. These endpoints are software-driven and can communicate with callers on the phone. An example of such an endpoint is Exchange Server Voice Access, where you can get Exchange to read up your emails and you can do other clever things (say "Clear my calendar for today" - which sends a cancellation to every attendees of your meetings for today). You can write these applications using managed code and these application can be deployed and enabled in your OCS infrastructure. These numbers can be even enabled for callers outside of your organization (this is how Exchange Server Voice Access works at Microsoft).

    How to write programs for Speech Server?

    There are 3 important areas in a voice enabled system:

    1. Speech: the quality of the speech engine
    2. Voice recognition: the quality of the voice recognition engine and
    3. Programmability - how easy to develop voice-enabled applications on this platform.

    I'll start with the programmability one and I let you to judge on the other two. There are two programming models that you can use: the web-programming model where the voice application is hosted in IIS as a web page and the dialog is represented by a set of post backs. The other programming model is using Windows Workflow Foundation to design the conversation's flow. I'll focus on the latter today and will skip the web-based one. For the workflow programming model, you need Visual Studio 2005 SP1, IIS, MSMQ and Speech Server installed on your PC (see the pre-requisites section).

    Fire up your Visual Studio, there's a project template called "Voice Response Workflow Application" after you have install the development components. You can already start dragging and dropping workflow activities into your workflow designer to describe the conversation's flow. There are many workflow activities that you can use: Statement activity, QuestionAnswer activity, GetAndConfirm activity - this one won't step to the next activity unless the caller is confirmed his/her answer, Menu, etc. When your workflow asks something, you define the question for the activity, like "Can I have your employee ID please?", then you need to define what format you expect the answer in - this definition is called "Grammar". The grammar is a pre-defined pattern that defines the different ways the answer can be said. For example, "yes, it's 1234", or "my employee id is 1234", or "1234", or "it is 1234" and so on. We define a placeholder in this pattern for the number because that's the only thing that we are interested in, and we define the different options how the answer can be said. There's a designer that helps you creating the grammar.

    The grammar will have an output variable which you can get when the caller is answered your question. Here, you need to write code - when the caller answered the question, you will get the employee ID into a variable that you can convert to a numeric value and you can do your actions based on this number - for example, you can look up this employee ID in Active Directory, etc.

    I've prepared with a small application just to show how this thing works. What it does, it calls you up and it asks you about the number of computers and persons in your household and it submits the answers to a database. I could have written a more intelligent application as well, but this will be enough to understand how it works.

    Prerequisites

    To be able to play with the product, you need to install all components of it on your development environment.

    In order to install the Speech Recognition Server component, you need to enable a few features if you don't have them already enabled. You need to re-start the installer every time you have enabled a feature - it won't refresh automatically. To save some time, copy my features list (Vista):

    Enable for OCS VR

    You also need Visual Studio 2005 with SP1 (VS 2005 RTM is no go), and the Visual Studio 2005 extensions for .NET Framework 3.0 (Windows Workflow Foundation) package in order to be able to install the Development Tools component of the product. Installing Visual Studio 2005 Service Pack 1 Update for Windows Vista is also recommended if you run Visual Studio 2005 on Vista.

    After the product is set up, at least one language pack needs to be installed (you can find them on the installation DVD or your can download them from the Internet) for the Windows services to start. I've installed the English/UK pack. There's also a US and Australian English available on the DVD and 11 additional languages. The full list is:

    • Chinese (People's Republic of China)
    • Chinese (Taiwan)
    • English (Australia)
    • English (United Kingdom)
    • English (United States)
    • French (Canada)
    • French (France)
    • German (Germany)
    • Italian (Italy)
    • Japanese (Japan)
    • Korean (Korea)
    • Portuguese (Brazil)
    • Spanish (Spain)
    • Spanish (United States)

    What is a Grammar? What is a Rule?

    The "grammar" is a collection of "rules". A rule is a mini workflow where you can describe the expected sentence's structure. In my case, I expect the answer "I have X computer at home" or something similar from the end user. I designed my rule to accept a more sophisticated answer as well, like "I have only 2 computers at my household" or "I have got no computers". It's up to you how you make your rules finer and more resilient. The result of the rule is a value which is the number of computers in my case. The following is a screen shot of my rule from the Visual Studio Rule Editor:

    image

    The green shapes are called Lists, the white ones are Phrases. Only one of the Phrases apply inside a List shape. The pink shapes are Rule references (RuleRefs), they are used to reference to other rules. The two RuleRefs in my case are references to numeric rules, they are used to recognize the number 0 and the numbers 1 to 999 and convert the recognized words to a numeric value. Looking into those rules, they have several lines where they combine the recognized words and calculate the numeric result. The result is written to the $$._value member variable which is then copied to the $._value member variable by a Script Tag (blue). The value in $._value then can be referenced in the Voice workflow and can be used for further tasks (in my case, confirming the number of computers to the end user). After compiling the grammar, the outcome is an XML file, with a .grxml extension.

    What is the Voice Workflow?

    After designed the rule, let's work on the workflow part, which is the one that controls the main flow. The Rule that I've described above is evaluated in the HowManyComputers QuestionAnswer activity.

    image

    How can I start?

    I recommend to install the developer samples. After installed, I suggest opening the HelloWorld project from the C:\Program Files\Microsoft Office Communications Server 2007 Speech Server\Samples\Workflow\HelloWorld folder and playing with it.

    You can test your application by pressing F5 key - you'll get the Voice Response Debugging window where you need to click on the Call button:

    image

    When the workflow starts and the server asks you a question, you are redirected to the second tab and need to click on the Start Recording button. Say your answer and Speech Server will recognize it.

    image

    When your answer is recognized, click on the Submit button to post back your input to your workflow.

    Questions?

    Don't hesitate to ask!


     
     

    Things you can do from here:

     
     

    Bookmark this post:
    Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


    July 02, 2008

    sipit.jpgAs I’ve written about before, we’re fans of the SIPit interoperability events that are sponsored by the SIP Forum as they provide a great way to test how well different vendors SIP implementations interoperate. We recently attended SIPit 22 at the University of New Hampshire and the feedback was extremely helpful in our continual effort to improve our products.

    Anyway, SIPit 23 was recently announced for October 13-17 in Lannion, France. The event is hosted by the ETSI Interopolis Service and France Telecom-Orange Labs. ETSI has a website for the SIPit 23 event that is full of information about the event.

    I don’t honestly know yet whether we’ll be attending, but I do encourage vendors to seriously take a look at attending. It’s a great place to learn how well your SIP implementation plays nice with others.

    Technorati Tags:
    , , , , ,

    Bookmark this post:
    Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


    July 01, 2008

    I found this in my email inbox this morning.

    Dear Marshall Harrison,

    Congratulations! We are pleased to present you with the 2008 Microsoft® MVP Award! The MVP Award is our way to say thank you for promoting the spirit of community and improving people’s lives and the industry’s success every day. We appreciate your extraordinary efforts in Communications Server technical communities during the past year....

     

    I'm sure that GotSpeech and the impact it has on the Speech Server community had a lot to do with me being rewarded again (3rd year now). GS has become a very active and thriving community and I am always thrilled to see new members come on board. I am really excited when I see members that started out with lots of questions progress to the point that they are now answering other people's questions. That is what it is all about.

    So, thanks to each of you for the time you spend on GotSpeech and the contributions that you make to the site and the community.

     

    You can find more info the Microsoft MVP program by visiting https://mvp.support.microsoft.com/.

    Bookmark this post:
    Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


    June 27, 2008

    I originally interviewed Mike Wehrs, Nuance’s vice president of evangelism and industry affairs for an FYI about vSearch in the July/Aug issue of Speech Technology Magazine. Unfortunately, the time crunch was such that we weren’t able to slot the quotes into the story. (Editor: You really need to meet your deadlines. Ryan: I’ll work on that later.) Mike gave [...]

    Bookmark this post:
    Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


    June 26, 2008

    I've been noticing that the amount of activity on the forums has been increasing with lots of new posters. That is encouraging as it means the new people are trying out Speech Server. Interest in Speech Server is building and that is good for all of us.

    One of the things I do quite often is look at he stats towards the bottom of the Forums page. Yesterday when I looked this is what I saw -

    GotSpeech Stats

    5 new threads, 50 new posts and 13 new users in a 24 hour period. That is the sign of an active and thriving community. My heartfelt thanks to everyone who contributes to this community.

    It just goes to prove what Ive been telling everyone - GotSpeech is the place to go for information on OCS 2007 Speech Server.

    Bookmark this post:
    Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


    Last updated: July 09, 2008 01:01 AM All times are UTC.
    Powered by: Planet

    Speech Connection and Logos are Trademark of Nu Echo Inc. Copyrigth 2006.