Into the Cloud on auto-pilot?
Date: Tue, 01/03/2012 - 18:56
An essential characteristic of cloud computing is self service and auto-provisioning. This gives rise to a requirement for end to end automation and management across all the elements of cloud infrastructure including virtual machines, storage, firewalls and network. This is referred to as the orchestration layer of cloud computing infrastructure. Early providers of cloud services built their own orchestration layer using tools from different vendors and some they built themselves
This was a major barrier to time to market for new cloud service providers entering the market. However, with the introduction of converged cloud architectures and end to end automated provisioning and management solutions from leading vendors it is now possible to considerably speed up the time to market for new providers of cloud services. But what of enterprises that want to implement their own private cloud solutions? To realise the full potential of virtualisation and cloud initiatives, enterprises must adopt more automated tools, ranging from IP address automation to configuration and management tools for both physical and virtual devices. At NetEvents EMEA Press Summit Rome 2011, Peter Hall, Principal Analyst, Ovum, explored the challenges facing enterprise IT managers as they desire to accelerate their virtualization deployment and meet growing demand for cloud, self-service, on demand applications and provide insight into where and how to start addressing them.
Panellists: Steve Garrison, VP of Marketing, Infoblox; Andreas Stern, Director Business Development Europe, Middle East & Africa, Spirent Communications; Joe Baguley, Chief Cloud Technologist in EMEA, VMWare; Phil Tilley, VP Marketing EMEA, Alcatel-Lucent; Global Marketing Co-chair, MEF
Thanks, everyone, and welcome. The title of my presentation Into the Cloud on Autopilot isn't my title. I think it's a NetEvents title. But I think the reason for that will become apparent as we go through the presentation, so I won't try and cover that now.
What I want to do is spend - this is supposed to be a 30-minute session although I notice we're running about 12 minutes late. Hopefully we'll still have our 30 minutes.
I want to spend about 10 minutes just giving some background and then I'll hand over to the three panellists to give their view of a number of key questions that I see here. I'm going to start by just saying a little bit about Ovum's view of the cloud computing market. We're one of the last of the analyst companies to produce a global forecast for cloud computing and I was part of the team of five analysts who cover IT, services, software and telecoms, who've recently produced our first forecast for this market. In fact, this has only been published in the last couple of weeks so it's really hot off the press. And I guess we've had the benefit, because we're the last to produce the forecast, we've had the benefit of having the longest time to really observe what's happening in the marketplace. And in fact when I looked at our analyst peers in this market and I looked at the various forecasts for them, I found such a colossal variation that it's no wonder that people are confused about this market, because clearly there's an issue about what are cloud services, what are people forecasting, how are they defining this. And so we decided that we'd be pretty rigid or rigorous, I would say, not rigid about that and being very clear about our definitions.
The way we see this is that the cloud computing market today is very much dominated by software as a service. Software as a service is pretty mature as a delivery model.
Salesforce.com I think have been offering service for over 10 years and there are some other mature players, WebEx, for instance. And it's a market today where I think no individual player has more than about - Salesforce.com has perhaps 15% of the total software as a service market. So there's no individual player which has a huge amount of that marketplace.
Infrastructure as a service, which I guess many of us think about when you talk about cloud computing, is very different. It's very much dominated by Amazon, Amazon Web Services. It's a much, much smaller market. We predict in 2010 revenues from infrastructure as a service, we're well below $1b. I think $800m is the actual figure.
Amazon doesn't go public on its breakdown of revenues, but we think Amazon have close to half of that. The next largest player, Rackspace, barely 10% of the total. So it's a very different sort of structure.
When we look at platform as a service, that's also a less mature market again, certainly much less mature than software as a service. But many more players that are active, I think the largest player there having 20% or 30% of the market. I think that's Salesforce.com again with their Force.com application. But again, quite a lot of players there.
So this is a very interesting market at the moment. And I think infrastructure as a service, which is the one which I really want to focus on in terms of this discussion on auto-pilot, is one which is still very embryonic. And I think one of the great things that we've seen in the last 12 to 18 months is a host of new players coming into this market, which potentially can be very large. And the global telcos are really amongst these. So players like Verizon, AT&T, BT, Orange, T-Systems, there are many others, Telstra, TATA. And really I think changing the face, if you like, of infrastructure as a service.
Infrastructure as a service, as I say, been very dominated by Amazon. I would call them a commodity player. You're certainly not going to use them if you want to support mission-critical applications. But they've been extremely successful in attracting a very large number of customers to do test and dev, proof of concept and things like that.
But I think we're now going into a second generation, if you like, of infrastructure as a service, where we're starting to see players offer the sort of SLAs that James was talking about, SLAs that are capable of supporting mission-critical applications.
That's the phase that we're going into. And I think what's also interesting is that the telcos are being successfully in picking up some big public cloud deals. They're not only offering public cloud, they're offering private as well. But when we look at the likes of Amazon and even Rackspace, we find that the average customer is spending well below $10,000 a year. I think the average is closer to $1,000 than $10,000.
When you look at the telcos, they're achieving ARPUs from their public cloud services of in excess of $100,000 a year. So we're talking about a very different positioning, attracting different customers. And I think this is going to be a big part of the growth of infrastructure as a service.
So I just want to, before we go into this auto-pilot sort of theme, I just want to remind you what cloud computing is. And we had this discussion about cloud services being a pretty meaningless term and people cloud washing, people using it to mean all sorts of things. There is a definition produced by NIST. Most of the industry use it. I think it probably needs a bit of updating. I think it was on its 17th iteration in the current version, but this market is evolving.
But in principle, the term cloud computing doesn't have to mean something different to everyone. There are definitions like this and there are extensions you can make to this and be very clear about them. But I think the important thing to say is really that cloud computing is defined by the essential characteristics. It has to be on demand. It has to be self service. It has to be flexible, elastic services. It has to be a measured service, which means that you can have all sorts of billing models you can build by megabytes, by number of VMs and type of CPUs and bandwidth of network access and all of these things. And it's that which makes it difficult. That's why you can't just become a cloud computing service provider by saying we've got a data centre, we've done a bit of virtualisation, we can turn on a cloud service. Well you can't because you can't do all of those things. You need end-to-end management. You need automation and you need all of those things to do that.
So becoming a proper cloud service provider isn't easy. And some of the guys – I mentioned the telcos, which is my greatest area of interest within the cloud area.
Most of those spent between 12 and 18 months launching their first cloud service and that's because of all of this automation and the portals and providing end-to-end management, being able to take bits of kit from different vendors and be able to turn on virtual machines, turn on storage, turn on bandwidth and do all of that in near real time is a pretty difficult task.
And in fact most of those organisations built their own management systems, the socalled orchestration layer in order to do that. Well now there are solutions from a variety of vendors which actually make that task easier. So for a telco coming into the market now, they can perhaps do something in six to 12 months instead of 12 to 18 months.
So what I want to do is really focus on the task of end-to-end management, because - and this is where the auto-pilot theme comes in, but I think, as I say, auto-pilot wasn't my choice of words and I think it's quite a dangerous choice of words because the implication is that if you're taking a public cloud services, it's all just fully automated and all just no human intervention. But I think as we go from this first generation of infrastructure as a service today, which is really commodity services, to the new generation, which is about delivering mission-critical SLAs, being able to support mission-critical applications then it is more than auto-pilot. Okay, you still need all that automation, but you also need consulting, you need monitoring, you need to – the cloud service provider needs much more responsibility than just saying hey, you define what resources you want. And whether your application works or not is up to you. It's not down to us. That's pretty much where infrastructure as a service has been, but I think the new generation of infrastructure as a service means that the service provider takes on some responsibility to make sure that your applications are actually running and performing satisfactorily.
So turning to my last chart, which is really where we get into the debate, I'd like to really get the panel's views about - the first here is what advice would the panel give to service providers and enterprises, service providers who are launching public infrastructure as a service, enterprises who are also launching their own internal private infrastructure as a service. Now they might be doing that because they're part of a big group and providing cloud computing services to different companies within the group. They might want individual departments within the enterprise to be able to just go onto the portal and turn on their own cloud services. So we see enterprises doing this as well as service providers.
So what would you advice be in really this task of delivering end-to-end automation and management? Starting right to left, Steve, would you like to make a first comment on that?
Okay. Steven Garrison, Infoblox. I was just on the panel a bit ago so I'll try not to repeat myself. That's my key here. So for the enterprise, our advice is to not ignore the mundane. And what I mean by that is again, what Infoblox offers is missioncritical support of two key protocols in a turnkey appliance, called DNS and DACP.
That's mapping the IP address to the domain name to the application. And we've found large customers get this sooner than smaller customers because they've understood the pain of failure. And that's why I mentioned that we prevent natural disasters every day.
One of our largest banks says we can't do our thousands or millions of transactions without Infoblox. So people really use our technology to bring an end-to-end automation and closed-loop framework.
The main advice, the reason I'm bringing it up is don't ignore the mundane and boring because if you do, you're probably going to have the problem that we saw with VMware where they weren't able to issue the IP addresses in real time. The network engineers I mentioned through the survey said 40% of the time it takes hours to days.
We've fully automated that. So look at all the bits. Build a process chart and look at where the bottlenecks are and look at where the humans are and make sure the humans, if they can operate at machine speed, put machines in to run at machine speed and you solve that problem. I will not say virtual mainframe during this panel.
The second thing is, and Peter still owes me a drink for his sarcastic quip, the second one, services providers. We think the challenge with services providers is twofold, but I'll focus mainly on cloud, which is how do I offer more value. And we've got three large service providers offering our software as a multi-tenant platform for hosting DNS/DHCP/IPAM services to basically carve up our software environment so that they can resell that to multiple smaller customers.
The only one I can disclose is New Star. The reason they're doing this is to not just sell bandwidth, not just sell pipes but to be able to go into a strategic discussion with the CIO and say hey, network services, network services such as DNS and DHCP are very complex. You need a specialist. You may lose that specialist, all your expertise walks out the door. Let me manage that for you and it's a totally different discussion then saying here's a pipe with a certain amount of SLAs based on latency and packet loss and bandwidth. So we see the telcos leveraging bits of automation to build cloud as a means to actually have a higher level sales pitch with their clients and therefore create a stickier relationship, a tighter relationship with their customers.
And I think both of them play into what we've been hearing all day about commoditisation, consumerisation, adding value. You have to figure out a way to bring more value to the market.
Okay. Thanks for that, Steve. Phil, can you give your view on this?
Yes. I think for sure, I think the big challenge in obviously getting to the cloud is there are so many multiple components. As Steve said, talking about the DNS part, that is one small part. You've got the servers and obviously the virtual VMware stuff, the orchestration layer. There are so many components have to be brought together here. And I think our advice and guidance is actually work together with an integrator or somebody who's got the capacity to pull all the components together, validate it, test it, and actually almost build initially a blueprint solution. Clearly the problem with blueprint solution is everybody's slightly unique, so it's just the building block but to start there.
I'm obviously with Alcatel-Lucent where we've actually partnered with HP and we have an alliance with HP, where we say, okay, Alcatel-Lucent's strength is in the network. HP's is in the data centre. Together we've got two big companies building a cloud proposition which is obviously a pod environment from HP with a network from Alcatel-Lucent. And all we have to do together is actually build the connection between the orchestration layer and the network management layer to then put that glue together to make sure that what and how does the data centre environment, the cloud environment actually interface with the network management layer so we can actually manage the network and assure bandwidth is there.
So my advice is really is hey, look and work with people that have the scale and the ability to actually really pre-validate these things.
Thanks, Phil. Andreas?
Yes. My name is Andreas Stern. I work for Spirent Communications. So our passion is test and measurement and we focus on simulation. So I would like to take the example of Matthias who started the example of an airline. And I think here as well you have to pick the airline, so you would never go into a never-come-back airline.
So you have to pick the right partner to start with. And the most important is to makesure that you are aware of your own criteria. Now in Spirent we say that the cloud pass your criteria. And in Spirent past tense for performance, availability, security and scalability.
So starting with the first one, performance. We all know performance, throughput, latency out of our day-to-day work. Now we know availability. Now we heard before in a presentation the [5-9s], the [6-9s]. If the information you give to the cloud is really absolute mission-critical for you, [5-9s] still means more than eight hours outage. [6-9s] means still 52 minutes outage. So if you can, if your business can work with this outage per year, it's okay. If not, you need to find a partner who can really overcome this, drive this to [7-9s] or even further. Security. Now you have to make sure that there are different layers of security, encryption, firewalls, IPS/IDS systems involved in this as well. And then at the end, scalability. How many users you would like to bring to the cloud and what are the expectations they should do during their working time. So it's really you need to be aware of your criteria. Does the cloud pass your criteria? And then, as Phil mentioned before, you need to do this with a partner, a partner who understands the business.
Thanks, Andreas. One of the particular things, this is an event where we also talk a lot about networking, of course. So just focusing on the network component, which is clearly a bit part of the auto provisioning. And potentially here we're not just talking about configuring public internet and allocating IP addresses. Customers increasingly are using other network technologies to access public cloud services, so they might be using MPLS, for instance, they might be using Ethernet. So what are the challenges there that service providers will have in auto configuring the network component?
Phil, can you start on that?
Yes. I guess the question is is it really better or worse to have a substandard service or no service at all. One of the challenges in here, is it more frustrating to have a wireless network that almost works or nothing at all? Well actually the trouble is, actually I'm not sure whether almost works is usable. It's like if you're trying to watch a movie and it's pixelated and actually it's possibly better to be told sorry, there's no bandwidth available, you can't have that film right now. So I think one of the challenges is actually do we say actually we can't deliver you the service that we think should be usable? We actually will tell you there is no service availability for whatever reason, try and find an alternative.
So I think one of the biggest challenges there for provider or face us actually do we have capacity to enable auto provisioning in the first place. So yes, we'll always try and find the capacity, but there are times when congestion and overload situations occur where there is just not the capacity. So one of the first things is I think the provider has to have good inventory management systems and good inventory control to actually understand what capacity is there. Can I support the level of service that's required? And so I think that's one of the starting points and then we grow from there.
So it's a question of what is the capacity in the network? What is the capacity in the service systems? Do we have the capacity to enable a service at all, yes or no? And so it starts with understanding before auto provisioning is let's understand what's available.
Okay. Steve, I know you've already referred to network, but do you want to add anything?
Yes. I'm just first thinking about getting a screen that says I'm sorry I can't serve you right now and wonder if that's actually a good change for the world. In an honest way it is, but in a frustrating way it isn't. I actually am getting my wireless to work good enough that I'm getting my email in, but anyway, I think that's bold of you to bring that up, because would a new world, how do we have competition? You'd be searching around to find who has the inventory at that time. You'd have a new provider that gives you the inventory of all. Who does wine searching?
Winesearcher.com, great site if you like wine, it tells you where to buy any wine you want in the world. That's what we'd need in that model. Okay, I need broadband. I don't know who provides it. Broadbandsearcher.com. That's cool.
It's a competitive world.
It is a competitive world. Maybe that's a new opportunity. Let's talk offline.
Well I want to go back to, I think his point is so profound, I think our whole mantra at Infoblox is to automate the mission-critical services so you don't have to worry about people making mistakes, delays initiating the service, getting back to that time to value comment. So Peter, for now, I'll pass the microphone.
Okay. Over the last year we spent quite a lot of time testing together with different vendors. And one of the most critical and annoying parameters we came across was failover time. Now all the building blocks of a cloud are made by human beings for human beings, so nothing is perfect. But there should be a system that in case of failure the system is able to help itself to switch from one component to another component. And we found out that this failover scenario can take from millisecond into seconds, into minutes sometimes. It sometimes happened that if a single component failed, then a complete cloud is dragged down. Now you have this route flapping happening and then one single building block is killing the cloud. And this is something that needs to be taken into account when doing the automation of the cloud services.
I think that's a great point. And I think it also builds on the point as said is actually one of the reasons failover is an issue is because actually you don't know whether you've got, when you failover, you've still got sufficient capacity and availability of capacity. There are some protocol issues in some of the network things as to how quickly can you rebuild, but fundamentally it's have you got spare capacity in a failover scenario? If you have, it'll failover okay. If you haven't, it's a problem.
So you get back to, I think, [Broadhead's] point that you could failover to the wrong network and all of a sudden lose service or have service degrade.
So I think I'm getting the message don't trust the auto-pilot.
I think if you sit in an aeroplane, now you have to believe in the pilot, in the technology and the service, in all the environment behind. Yes, there is no difference between first business and economy class in seatbelts. So the seatbelts must be designed in the right way to make sure to rescue most of us after a crash.
Yes. I think the auto-pilot thing is a misnomer too here. I don't think anybody, any of us, you can ask us all, when we say automation, we're not saying some glorious point and click and 1,000 things align behind the scenes. I think that's probably a great vision for 50 years from now, but that's not going to happen. No customer I've talked to even wants to do that. What they want to do is help their senior staff not fix mundane tasks or implement mundane processes. And if you are trying to build a very complex cloud environment, you still want decision points to be made.
For example, the failover. If I'm doing a mission-critical service and something wants to failover, I'd love to see a dashboard come up that says we need to failover. Pick A, B or C as your alternative. Here's the stats on A, B or C. I wouldn’t want that to be automated day one. I want to gain trust in the automation sequence before I fully let that go. And there's a reason there's live pilots on planes. When the auto-pilot goes out, we still don't die, right. So most of the time anyway.
And I think that trust is an important factor because, again, I think the first generation of internet, of cloud services, I should say, of cloud computing services, particularly infrastructure as a service, we're very much a supermarket-type thing where you go in, you specify what you want and you get something which may or may not meet your expectations. And I think the new generation of services, which are largely coming, I think, from SIs and from telcos really change that.
And maybe I disagree with James' vision of the exchange or whatever you want to call it, where you can just buy services there, because - or rather trade services, because I think as companies want to do more and more mission-critical applications in the cloud, which inevitably they will do. This is really what it's about in the medium to long term. It's about taking over some of your mission-critical applications, not just doing test and dev. Then things like SLAs, things like understanding that your service provider has got an interest in your applications actually performing to the specification you want is a big part of what the service provider needs to provide.
The very last point there which I think relates to that as well is how can enterprises monitor and measure the service delivered by their cloud computing service provider?
Should they - clearly, again, the more sophisticated services provide dashboards and all sorts of things to tell you how your service is performing. Is that sufficient or should you be doing something yourself to measure your cloud service provider's performance? Who's got a view on that?
Yes, I'll kick off. I think we've got used to enterprises and everybody else has got used to network-based SLAs. And I guess the issue with an SLA, it's great, it's a guarantee from your provider that he's going to deliver a service. But actually it's only initially, I guess, it's a marketing gimmick almost the SLA rate. It actually can mean nothing unless you can measure it.
We've got to a situation where we can measure network SLAs. We're now in a phase where we're going towards application SLAs and certainly, some, I think, cloud providers are providing application SLAs. But in a lot of cases those are still fairly gimmicky. It's a marketing tool to say I can meet that application. We've got over the next period of time to actually get the tools in place to actually prove application performance using deep packet inspection as a technique, as one of the techniques to actually prove application performance rather than just network performance.
Yes. I'd like to add to that that most of our customers are looking at private cloud first, as I mentioned on the last panel, because of verticals and regulatory issues and their lawyers not guaranteeing, not being able to guarantee the data is secure. But I think the other reason is when we ask them why not public cloud first, except for dev and test, they don't understand how to monitor it and they don't understand the success criteria so they want to build their own private cloud, which again means internal within their customer parameter, for definition's sake, learn from that.
And I think these, again, I'm talking about global 2,000 have been burnt by turnkey systems where they can't see inside. They pay an enormous service fee to renew that.
They pay professional services to maintain it and they have no understanding of what's inside the black box. So this time they want to learn what's inside the black box by doing it themselves a little bit just to get a taste, and enough taste so they can challenge the service providers and say that portal, that's a marketing gimmick, give me a real portal where I see real-time data that means something to me. So it gets back to trust and learning, I think.
Thanks. I'm being urged to wind down, but I've been short-changed. I've only had 23 minutes of my 30 minutes. So can we open it to the audience for a few minutes? Any questions or are you so desperate to get your coffee? A question here.
Thierry Outrebon from Windows News. You didn't speak about reversibility, the fact you could change like you could change an aeroplane, and also about the way you could measure the quality of security, just as the size of a seatbelt in front of a crash or something like that. So could you speak about the way you could change your partner?
Well I wasn't kidding about Winesearcher.com. There are portals in the States that let you find, by rate, a better place to offload tier three data for infrastructures and service less storage. So part of my quip there was based in reality. I think once we learn what the basic SLAs are and the trust of what's inside the black box, you will find people who are going to become mediators and helping you offload or bid on different cloud services that, at the end of the day, are the same. You're just looking for a better time, dollars or euros-per-minute usage rate. I think that's about 10 years away though for sophisticated services. But today you can find it in the States on storage today.
Are you specifically referring to security? That's a difficult one isn't it, because how do you measure? How does an enterprise -?
Then it goes back to Spirent and other tools, right, so let's get the Spirent guy to jump in here on how they do that.
It is done. It's done in different level. It starts on the one side. It starts on the application side to the high performance side. So all the different sides can be tested against attacks, against viruses. And it's what we do quite a lot with our tools. It's really testing, on one side, what effect does an attack have? What effect does a virus have? And also what effect is this on good traffic? So we have simulations done with thousands of different attacks and then have traffic, real good traffic running at the same time to measure the impact on each other. And it is doable. It's done.
I think every cloud provider is aware that here's a target for attacks. The bigger the data centre, the bigger the chance that he gets attacked. And they invest quite a lot in testing the security part of it.
Yes. I think, like I was saying, a lot of it is down to a risk assessment. I guess one has to balance the cost of doing infinite testing and the time and delay versus actually how big is the risk. And I think that's the real challenge that people face is actually I can spend thousands of hours doing all of the testing and validation. I can invest a fortune in all of the security measures, but actually then look to say actually, going down the cloud route is actually not costing me any less. Perhaps I'm doing overkill on the risk. So I think this is one of the challenges, how do I assess the risk properly and then work out what my investment profile needs to be to match that risk. And that's a real challenge.
One danger on the security side is the speed of new invented attacks and viruses.
Now there are hundreds, thousands of clever guys out there doing nothing else and finding another way to attack networks and fighting this with tools is really a challenge. It's really hard work. And we do frequent updates and there are thousands of new viruses and attacks every month. It will be always a challenge for clouds that there will be one guy, one morning, somebody waking up with a fantastic idea how to get the cloud down. And that's how does a cloud react on this? How does a cloud block itself, protect itself against these attacks and make sure that only small parts of the data centre is affected by this attack?
Thanks. One quick last question anyone? Okay. Well I'd like to thank the panel. I think it's been an interesting discussion. And it's quite an important one as well in terms of really how do you make this whole cloud infrastructure very slick and automated but, at the same time, high performance, secure and all those things. So thanks everyone.