Tomorrow’s video services – seeing will be believing
Date: Tue, 02/23/2010 - 13:51
According to Gartner: “By 2015 over 200 million workers globally will run corporate-supplied video- conferencing from their desktops.” But why stop there? Let every one, everywhere join the party on their PC, Mac, smart phone, or head-up display. And how will broadcast TV morph into a visually connected IP world?
What would it take to realise this vision? Simple- get the experience in-line with users' cost and per- formance expectations – something both video conferencing and interactive TV services have fallen short of achieving. Caught in the race for the Mega-bits-per-second we forgot what user’s want, and traditional video architectures simply can’t offer the performance and QoE at a realistic price. But the tide is turning: newer systems, custom built for the Internet, promise to leap this hurdle.
So, where’s this all taking us? According to Ofer, this is just the beginning. Extend a visual species’ vision to see across continents and you make a profound shift in what becomes, literally, their worldview. Just as the Internet has shifted our understanding of culture and knowledge, and TV has transformed politics, so will video conferencing impact business, recreation and society. Are we ready for it?
Manek Dubash, Editorial Director, NetEventsTV
Ofer Shapiro is a man who at RADVISION developed the first IP video conferencing bridge and gatekeeper technology. He was a contributor and Editor of the H232 standard and he now is CEO of Vidyo, maker of high definition video conferencing. The only company that can do it at 30 frames per second and which powers Google video. Now if that’s not clue enough, get the man himself. Coming down, CEO of Vidyo, Ofer Shapiro
Ofer Shapiro, Founder and CEO Vidyo, Inc.
Thank you Manek. We’re going to talk today about the future of video communication and future video services. But because this is a very visual thing, I just wanted to start with a demo. So actually I want to take three of our friends, but it seems like four of our friends over there, to join us into a conference call just so that everyone will understand what we see here. We see two endpoints actually that are personal telepresence endpoints. One is running on this computer and the second one is running on my laptop and they are both connected to a server that actually happens to be in California. These three people that we have on the call are also connected to the same server in California. So essentially, we have a five-way call hosted in California and all the participants are in Europe. It’s sort of almost a worse case you could do networking wise.
I’ll just ask my colleagues to present themselves and where they are calling from. Anne? Anne, can you hear us? Fraser. Oh or course not because this microphone. Hold on. Fraser, can you hear us?
Yes I can.
Okay, good. So where are you calling from?
I’m calling from my office which is just outside of London. This is our part of a managed conference screen. My connectivity to the Internet is through just a generic standard ADSL connection. I’m actually using my laptop PC with a £40 webcam and speaker box.
Thank you and Anne, can you hear us?
I can hardly hear you.
If you could just tell us where are you calling from please?
So I’m from [Marish], so in Romania. This is connected from home right now using an ADSL connection which is 1MB when I receive as well. I’m using my laptop as well with camera and headphones.
Thank you Anne and Holger?
I’m Holger. I’m calling you from Munich in Germany. I’m here at my home office. I’ve got a standard connection with a laptop which I’m using with a [Logitech] camera.
Okay, thank you very much.
So essentially, this is the type of set up we call personal telepresence and we wanted to give people the feel of this because when we talk about what’s expected by users in terms of what they want to do with video communication, we believe this is more or less what it is.
Thank you very much. You can disconnect now. We’re going to the presentation. Thank you.
So what’s in store for the future? I want us to take a little bit into my crystal ball and see if we can suggest future ideas and what lies in front when we talk about future video.
One of the things that are very important when you talk about video, and it always was a problem historically, is it’s all about quality. Quality of video has to be such that people will get what they expect. What do they expect? When you say video, the only reference point we have as users is essentially TV. So we expect an image that looks more or less like a TV image and the reality is that in most traditional video conferencing set ups that’s not we got. Either it was in a set up of one camera in not very high resolution covering a whole room so the people there are kind of tiny as you can see in this image or the picture was broken up due to transmission errors.
Did anyone ever see video images broken up in a transmission? If you ever saw it raise your hand please. Okay.
So the other thing which is a problem is latency. Latency is an issue that is subtle. Latency is something that is very hard to capture in an image, but happens to be something that is a problem for many types of discussions especially when we try to interact with people.
The latency that people expect is the same latency as a phone call and with video conferencing, especially on a multi-party video conferencing, traditionally, that’s not what was happening.
Anyone ever experienced a problem with video conferencing being kind of too slow, people talking unnaturally, stepping over each other? If you experienced that, raise your hand please? Okay.
So this is the status of the video conferencing industry. At least 80% of the users have participated in bad video calls. Just to add insult on injury, essentially, people were paying a lot of money for it. So you get a very interesting value proposition. It doesn’t work very well. It’s very, very expensive and doesn’t provide the performance that people expect.
That’s an interesting product to sell and personally I’m making money selling these type of things for many, many years, as well as many of my colleagues. The issue is why does the video conferencing industry even exist? Because there is a huge need to have visual communication on top of voice communication. But really, in order for this to go outside of the initial application we really need to have some quality that we get better.
So some people like Cisco and Hewlett Packard actually came up with this concept of telepresence. Telepresence is actually not a new term. Anyone here knows where it was invented? It was invented by Isaac Asimov probably 50, 60 years ago in a story where people wanted to project themselves across far distances. So if you think about tele-presence, it’s like being present there from remote. The point is to actually feel like you’re talking to the people that are at the far side and it’s natural just like I feel I’m talking to you. See them, interact with them in a low latency fashion.
Telepresence is doing it using the traditional video conferencing technology by just bumping up the cost.
Of course, the other part of it is people have also developed much, much cheaper solutions than traditional video conferencing that don’t work very well, but they’re at least available for free. If you’re not happy, you can always get a full refund.
So telepresence gets to this high reliability, high quality, pictures never break, low latency, providing this natural experience, but the issue with it, it’s extremely expensive. Telepresence, you to go into a telepresence room after you amortise the network cost and the equipment cost and you accounted for everything, it’s $5 per minute. So it’s a great, great replacement for the corporate jet. It’s not a great replacement for everyday communication. So it provides the experience that people want but not the price point that people want.
So what lies with us in the future? Just to kind of map the problems basically a little bit better, let’s look at the different price performance curves. We started from a typical Gartner view of the different quality points and the different form factor points I would say – the desktop, the executive, the room system, the telepresence systems and there is the quality scale and this is a typical graph that I’m sure was created by someone in the US because it implies continuum. It implies that everything is fine, is okay, and it implies that we have quality which is okay, good, better and superb. The reality is, the experience shows that it’s not all the price points are okay.
If you look at traditional room systems, their average utilisation is 10 hours per month. Think about two hours per week. So most of them are installed for places like for the weekly partner meeting, for a very, very specific application. For the Board calls. That’s why they utilise it very little, because if you try to use them outside of that, it’s hard to manage, it’s getting too expensive, the quality is not there. There’s just not the concept of repetitive use.
I just talked to someone who is a bit investment firm some time ago. They have calls that they hold between their Hong Kong, London and the US branch over video conferencing and they have that. They kind of set it up, invested for these three sites probably hundreds of thousands of dollars. I asked him what do you use it for. He said for the monthly called Board meetings and I said do you ever talk to your buddies in the office in Hong Kong, in London over the phone. Of course, he does it all the time. Ever occurred to you to actually go to the conference room and use the video conferencing system that you have and you paid so much for. No, no, no.
It’s not interactive, the picture sometimes break, it is not worth it to deal with it unless you have a business case.
Telepresence different. Telepresence has much, much higher utilisation. We believe that it is because of this crossing of the threshold quality. What’s the threshold quality in quantitative terms? Essentially, it’s getting to at least some of the finishing per face. So instead of having one big screen that covers lots of small people, you have HD cameras that are split between two or three people at a time so you get at least of the finishing. It never breaks. It looks like a TV image.
Then what happens also is you get very low latency again because by bumping up the network you eliminate all sorts of bottlenecks and then people like it and then people say wow, it was natural communication.
The problem of course is that telepresence, which is the experience of remotely communicating, that’s synonymous with these types of rooms that are connected to expensive networks that are cost prohibitive. We think the future, and this is where the bonanza of video services will be, by creating this what’s called personal telepresence, which is something that provides those fundamental properties like we showed in the demo, at least some of the finishing per face and this consistent quality and the low latency, but bringing it to the price points of the personal devices. We think this is key.
Immersive telepresence will always be there. It’s an amazing thing. It’s an experience that is like sitting there in a studio environment, but it really needs to be complemented by the much larger number of endpoints maybe by several orders of magnitude of personal small meeting rooms, executive appliances and of course mobile devices in the future.
So another thing that is happening is that enterprise deployment is going from just room system deployments to desktop deployments. Just to get a feel of what does it mean number wise, the video conferencing appliance world today is really an aberration from a communication market point of view. 200,000 systems every year. What Gartner are saying to us is they believe we will get in a few years to 200 million paid endpoints. This is not counting Skype. 200 million endpoints. So this huge, huge growth in the number of endpoints will yield new challenges because it is not going to be about I have a good endpoint, no, no, no. Buy from me because I have a better endpoint. It’s about how do we manage it. How do we manage the bandwidth, how do we manage the infrastructure?
Let’s just go through some numbers and we see that this is kind of mind boggling. We start from looking at typical deployment in a medium size enterprise today. 100 people that want to communicate and there is four of them in a room, so you get like 25 rooms, and you use it for 20 hours a month and you need so much bandwidth.
When you go to personal devices, just by going personal, now you need four times the bandwidth because you have instead of four people in one room, you have four people in front of personal devices. Then you start to grow your deployment according to the type of numbers that I have shown in the previous Gartner slide and you say you have more participants and they’re using it more because now it is more useful. It’s for the daily calls, not just for the partner meeting calls.
Then if you kind of extrapolate it and you get, just because you increased the number of participants from 100 participants to 1,000 but you made it more useable and you moved from room systems to personal devices, you get 200X bandwidth increase. 200X.
Now we already mentioned that telepresence, for example, is expensive already because of its bandwidth requirement. Multiplied by 200X, there are two options. Either you have to multiple Cisco sales by 200. I’m sure they will be happy, that’s why they believe in video communication as a driver for the growth. But clearly, there is no way this is sustainable from an economic point of view.
So there has got to be a way to manage the bandwidth better in two parameters. One, get rid of the need to have high quality of service and the second one is getting rid of some of the bandwidth that has to go between the site by having the right architecture to deal with it.
Talking about architecture, let’s look at traditional architecture for conferencing because desktop, it’s worth noting, almost every call becomes a conference call when you go to personal devices. Traditionally, this was done by some central entity that’s called a Multi-phone Conference Bridge, or an MCO for short and that type of bridge was basically decoding and re-encoding and kind of shaping the information for every participant. That means that for every cycle of processing that you have for video compression at the endpoints you need the same thing at the MCO. Since the endpoint number is growing, it’s just not sustainable to put that much of a hardware and what you get is a lot of sensitivity to packet loss. You get a lot of cost because of the amount of hardware that is required and a lot of latency because this extra processing in the conference bridge adds to the latency that is native to Voice over IP calls or Video over IP calls.
So the MCO architecture, the traditional way to do conferencing, simply does not scale in that world. What’s scale? What scale is, imagine that you could have solved the problem of the conference bridge. Eliminate it. Replace it with something that is more like a router. That scales like a router, does not have any latency and is based purely on software and can have all sorts of good mechanisms that people would need, for example, in some hour or part of the day the ports are being used in Hong Kong and in another part of the day they are being used in New York as they follow the sun. Having the ability to have no latency conferencing allows you to actually localise conferences, localise local participants so to speak in the continent, in an office and save the bandwidth between the different offices. If you are able to run on a non-QoS network, you further decrease the problem because now it’s less expensive. You can run on the converged network.
So the future world, if we can deal with that fundamental infrastructure problem, looks like this. You have the telepresence room just like you have them today, but they connect to those personal devices and there would be 10 times or 100 times more personal devices in telepresence rooms.
And that’s not all because now you’re starting to add all those things in the mobile side like netbooks on a 3G or a Smartphone that runs on a 3G or a 4G connection. In that environment, you’re getting to a more stable value proposition. If video communication is segregated to the room, then it’s, by definition, less useful because snowed in right, everyone stays at home, you’re sick, you just don’t have access to it because you’re on the road, all of those type of parameters make it less useful and less dependable and once you get to the point that you can access it from anywhere, or almost anywhere, any device in any network that you connect to, then you’ve got ubiquity, then people get to rely on it coupled with the right quality and coupled with the right price points.
So when we started the company, our goal was to put video conferencing price and performance in line with user expectation. Just as a reminder, expectation is TV-like and latency like a phone. The price point, like voice. You want it to be a no-brainer. If you start to say I will pay double or triple or quadruple, not to mention 100 times more like in a telepresence scenario, you have to start thinking is it worth it to have a video call.
We don’t want people to think about it is worth it. We want people to just use it just like they use a phone today. I remember my grandmother’s generation, they were thinking about placing a phone call. It was like an event to call our uncles in the US when we lived in Israel because it was something you did for the holidays and the rest of the time it was letters. At some point it became a no-brainer. We call just to say hey, what’s the weather over there. The same thing will happen with video communication. That’s growth potential that Vidyo should have.
So the next generation infrastructure that is needed is something that will address those challenges. How do we get it off the QoS network so we can use low cost bandwidth? How do we work on any network to make the experience ubiquitous? 3G, 4G WiFi on top of the general Internet and of course the converged corporate networks and of course, end users’ broadband connections.
We believe a big part of it is in the architecture based on H264S3C and we believe that a key thing is to design an infrastructure that could really scale from the 200,000 endpoints of today to the 200 million endpoints of 2015. This is the key thing and people are starting to realise it. We talk to a lot of people in the enterprise, in the carrier space, in the service provider space and everyone starts to gradually realise that you have to take care of these two parameters – how do you scale the infrastructure while maintaining the cost, and how do you provide the quality that people expect so they’ll actually us it.
Just leave you with one more interesting thought. Service revenue in general is always bigger by a factor of 10 than the equipment revenue in a given market. So in the video conferencing market, we have an interesting anomaly historically. The service revenue is 10 times smaller than the equipment market and why is that? Because people don’t have a lot of repeated use.
So the $1.5b market for equipment that video conferencing has today suggests that there is a potential for a $15b service industry which is about to happen. Carriers are very hot about providing it and this is something that we expect to see in the coming years.
We believe that if all these parameters will be fulfilled we see that long promised explosion in video communication. We personally, as an organisation, are talking to people that are early adopters and we see those deployments that are happening. We see people that are specific organisations that are buying infrastructure to communicate to 100,000 people worldwide. That’s a single organisation. So these type of things actually happen and they will happen more and more.