Ever feel like someone is telling you everything you want to hear while their true raison d’être is for their own gain?
Maybe I’m off base here, so I‘ll ask you to help me decide
I believe we all agree that a few of the core goals with implementing VMware are simplifying operations, and to drive down infrastructure costs. So am I missing something with this post from Chad?
The Challenge of VDI
VDI is a great means to drive down the operational cost associated with providing and servicing an end user environment. The challenge to VDI is that the environments are very large. I mean moderate installations are commonly 1,500 seats and at the high end we have customers looking at 50,000 – 100,000 seats. At this scale it is easy to understand why the storage infrastructure cost can be a large hurdle facing an organization considering the adoption of VDI.
This sounds like a winner
Most enterprise storage arrays have a means of cloning LUNs in some manner where each clone only consumes a fraction of the actual amount of storage that is being provisioned. These types of clones only charge you for the uniqueness of each copy. At NetApp we offer this feature with FlexClone.
Zero storage costs for every virtual desktop? Sounds like a match made in heaven for VDI doesn’t it? At this point you might be saying, “Where do I sign up for such a solution?” I know my competitors are hoping you are, but before you buy into the smoke and mirrors, can I ask you to watch their demo of this solution?
There’s an older version of this demo starring Chad here: (click on the image to view)
Knowledge is power!
As demonstrated in my competitor’s demo, each Virtual Desktop requires its on LUN (or LUN clone) in order to be provisioned. See, the EMC crew is banking on you not being aware of one major point:
An architecture requiring one LUN per Virtual Desktop is simply unmanageable for any sizable installation.
In the NetApp VDI demo which we will highlight at VMworld 2008, customers will see NetApp deploy 5,440 virtual desktops in less than 25 minutes while only consuming the storage required for a single desktop. On the surface this solution sounds exactly like what I am banging against. The blurring of this distinction by EMC is good for business (note: good for their business not necessarily yours).
In Chad’s blog he graciously provided us with a back handed compliment regarding our VDI demo. He mocked the scale of 5,440 desktops, trumpeting that EMC has already provided VDI tests scaling to 10,000 desktops.
See what I think that Chad and the EMC guys want to do is impress you with is scale… remember VDI environments can be massive. However did you catch the magic trick in the video? Did you realize that with EMC in order to deploy 10,000 desktops one has to provision and manage 10,000 LUNs.
I guess if you love LUNs then this is cool.
OK, seriously, who in their right mind thinks that this design is easy, simple, & scalable?
In our VDI demo for VMworld, NetApp creates 5,440 desktop with 32 volumes. If we would have done 10,000 desktops it would have taken 64 volumes (or a few less).
Let me see here, which is more impressive… 64 volumes or 10,000 LUNs. That’s right, VDI with NetApp requires less than 1% of the storage objects required by EMC to create and mange this solution.
As the goals of Virtualization include driving down cost while adding simplicity, I believe that NetApp’s VDI solution is like a golf score; where lower is better.
We stand alone
While almost everyone in the storage industry can clone LUNs only NetApp can clone files, and virtual disk files make up the storage for virtual desktops.
In an upcoming post I will share with you more information on optimized storage for VDI. I’ll give Chad and crew a head start here… It has to do with eliminating boot storms and reducing the post deployment storage growth in the VDI GOS and user storage space.
The used car salesman approach revisited
Maybe Chad has made some assumptions about our technology, or maybe he’s working too hard, I mean he keeps misspelling my name in his blogs (thanks buddy).
For those of you who recognize me at VMworld (see my picture at the top of this page) stop by and say hi. I’d love to show you the demo and hear about your virtualization plans; however, while you’re there watch out for this guy. He might try to get you into a new car…
This is gonna be fun…
Chad Sakac says
Vaughn (and my apologies for the typos previously, I meant no ill intent):
1) I’m sorry for anything that I’ve said that has made you angry or upset, or that you think is unfair.
2) I really struggle with the personal attack, I’ve tried consistently to keep this about technology dialog, but what’s one to do.
I do think that respect amongst colleagues is possible, and will continue to try to see whether that can work. My last dialog with Nick will remain my commit – I will try to stay above the fray as much as I can.
I’ve emailed you seperately, and if you want to tell me where you think I went over a line, we can talk next week, and I’ll apologize personally.
In any case, to the technical matter at hand. You sent me an email and asked the question – is the relationship of the VM to LUN a 1:1 relationship, and you are replicating the LUN, right?
As I said to you, and I will say in this public forum: so long as the unit of replication is the LUN, that’s what you’re replicating.
There are ways around this for block devices, but only using mechanisms like block lists which currently ESX doesn’t support in mainstream builds.
When the solution is designed, the count of VMs per container (i.e. VMFS volume) is a function of fan-in, and the ESX cluster limits, which are a core scaling limit. The other scaling consideration is the VM/core density.
In our block design, to date, we’ve been deploying around 500-1000 VMs per ESX cluster. So, given the ESX cluster LUN count limits, this means that the minimum is 5 VMs per LUN. We also need to consider the underlying arrays limits in terms of number of snapshots available per source LUN target – this is 1000 for the Celerra, 128 for the DMX, and 8 for the CX. We can do 96 filesystem snapshots for CIFS/NFS, 16 of which can be writeable at a time. The other consideration in the design is the LUN count of the array itself.
Same comment that I made earlier with Nick – Limits aren’t bad – the important thing is to state them and communicate them. I think it’s fair to say I will update my post to make this part more clear.
For example, in the 10K scaling test, where a DMX was used for the boot images (the user data being redirected to CIFS), the math works like this – we start with a gold image. Our target is 1000 VMs per cluster. We know that the limit of the number of user LUNs is going to be 224 (the cluster limit – number of nodes in the cluster), but in this configuration, we can only do 128 snapshots. So, what we do is we deploy two LUNs, and create 10 copies of the gold image – 5 per source., then create 100 replicas.
I’ll be clear here, it’s not 10000 for the price of 1 (a 99.9999% cost savings), it’s accurate to say it’s 10000 for the price of 100 (a 99.99% cost savings).
Youtube only supports 10 minute clips man, there is no way to explain any topic in depth in 10 minutes. We have published our whitepapers that state these limits and design principles in detail.
LUN count itself isn’t a scaling problem – though as I said in our email dialog, there is a perception of the object count causing management scaling issues. Where you are balancing VMs manually, and where you are managing LUNs, that is true, but in this case, the LUN deployment is being handled completely automatically – and the managment is the VDM connection broker pools themselves.
There’s a ton of stuff we’re not mentioning here that all affect scaling (for example, I’m curious to see how you guys are getting 177 VMs per ESX server if I read Nicks post correctly, that’s much higher than we’ve seen to date), but should discuss openly at VMworld – for example, number of VC servers, and rates of VM registration, etc that are also part of the design problem at these scales.
Now, I don’t love LUNS/block devices, any more than I love files and filesystems. I reserve my love for my wife and family, and affection for friends and colleagues.
Ok, so same thing I proposed for Nick – I’ll publicly state operational limits, pitfalls and considerations for the EMC solution. Will you guys do the same?
Mike Shea says
Well, I have admit – the video at the end *was* seriously funny – especially if you’ve met Chad – it does look like him a ton! (OK, maybe not the powder blue leisure suit, and certainly not the approach). Yer famous! ;^D (At first I thought Vaughn had gone nuts and began to see your face in potato chips, toasted bread, wall stains and now YouTube!)
If you show up to VMworld wearing a powder blue suit – I am bringing my camera… ROFL
Chad – FWIW – NetApp is and has always been very open on limits – in public. Please see: http://www.netapp.com/us/library/.
If it ain’t there, it ain’t written yet. Some things, as everyone is aware, take time.
Would EMC do the same? Not bloody likely. Why not publish PowerLink material publicly? Doing would help people to make a truly informed decision in the marketplace.
Why does EMC publish one CLARiiON best practice guide for external and one for Internal consumption only? Ditto for many other things. Performance group data is even limited to a tighter internal community under the guise of ‘you won’t understand it if we don’t explain it to you, so you cannot have it without being a ‘trained’ performance guru’??
The EMC watchword was and is: “What our field, our partners and prospects don’t know, they can talk about and hurt us”
EMC treats prospects and customers the same way. (Just try to download serious technical material from the EMC website if you are a prospect…)
In the dark, you cannot see what might get you.
I worked in the field there for five years. The name of he game is obfuscation by the EMC field. I hated it Chad, and that is why I and thousands of other EMC people left. Hundreds, if not thousands are now highly motivated and candid NetAppians.
If that is not your manner, then bravo to you, but you are an island in a sea of darkness. Been there. Bluntly, you are guilty by association. He who sleeps with dogs, gets fleas.
The point Vaughn makes with authority, that is still being sidestepped, is that a few storage objects are easier to manage than many. The point is operational – **LIVING WITH YOUR SOLUTION IS GOING TO HAVE AN IMPACT** What is it? It makes a difference.
Virtualization’s promise is, in part, is to create a ‘many to few’ consolidation. VMware, Hyper-V Citrix and others achieve this.
Here is the key point:
NetApp does *precisely* for storage what VMware does for servers and applications. We also eliminate the tradeoffs associated with any centralized consolidatation. We also do not inject new tradeoffs.
Storage without compromise. It is a good feeling. What a great place to work.
Chad Sakac says
Mike – I do have a sense of humor, and it does indeed look like me 🙂
I’m furiously trying to search through my dad’s old suits, since he has the same build as me. You might just get you laugh seeing me there like that 🙂
I’ve said it before to you Mike, I’m sorry you left EMC, I glad you’re enjoying yourself there at NetApp, they are a great company. We all work to better the world in our own way, but we all have to be happy in what we do day to day.
I argue that EMC is a great company also.
I don’t think I’ve ever said that the NetApp approach on the topic of VDI or any other solution or NetApp’s philosophy in general is bad/broken/disasterous. I’m sure others have, just as I’m sure NetApp folks have said it about EMC approaches.
What I have said is that there more than one way to solve problem, and each has tradeoffs. What I’ve constantly heard in return is – the NetApp way is the best way, regardless of circumstance.
Personally, I don’t see the conspiracy you see to keep information away from the world at large.
In the end, individual customers decide, and the market decides, and each company needs to execute as well as we can.
On the question of more vs. less storage objects – I’ll try to be as clear and as non-confrontational as I can.
I agree with Vaughn’s belief that “virtualization changes everything”, but I dont think it’s about consolidation per se (though that’s a part of it), I think it’s about getting more out of less (at every level of the IT stack), and being more flexible as a business. The storage layer is but one part of that overall picture.
Netapp’s approach has less datastores presented to the ESX cluster than EMC’s approach (for example, in Vaughn’s case he said 64 vs. 10,000, but as I pointed out in my correction, it’s actually 64 vs. 200 for a given cluster.
The other thing that we’ve found is that in practical deployments (as opposed to the scale like mad from a single image), maintaining the storage-driven approach gets really hard day to day, even with all the scripting and integration we can provide. Note that this is not because of LUN management but rather the end-to-end lifecycle of deploying, updating, moving VMs between persistent and non-persistent use cases –
those are really, really hard. My view is that we (i.e. both NetApp and EMC) can’t solve those problems without VMware updates to VDM and ESX, which they are doing.
Re: your last sentence, I don’t view compromise as a bad thing. I think not seeing compromises and tradeoffs in every choice, in every decision is short-sighted.
Chad Sakac says
One last thought – watching the Mad TV video again… who’s being religious about one approach vs. another 🙂
Paul Galjan says
Hi Vaughn. And hi Mike! We met a couple of times at NTAP SE conferences and had some interesting discussions – not likely you remember me though.
I worked at NTAP Federal until about a year ago, when I moved to EMC Commercial. My job function remained almost identical – Microsoft technology specialist (CSE @ NTAP, TBC @ EMC).
I don’t know the exact numbers (I’m dubious of the “thousands”), but what I do know is that there is a healthy movement of people back and forth between EMC and NTAP. When I say healthy, I mean that in the true zen-like sense. Both companies have aspects of their technology and culture that are worthwhile to study, if not wholly adopt.
To my point: in my 4 years @ NTAP, we were constantly looking to EMC as an example. For instance, we tried for a year to get filers into the Microsoft Technology Centers. It was only until we mentioned EMC’s presence at the MTCs did we manage to get them procured and installed. To everyone’s joy, it drove a lot of business. Imagine that we could learn something from EMC! I could name a bunch more examples (including product feature innovations that were eventually adopted at NTAP), but that would be boring.
I’m one of the guys that went NTAP to EMC – and I know a lot of other ex-NTAPers here. The reasons for my choice were as individual as I am. I was secure in my position at NTAP, widely respected in Federal, and so forth. Surely you can appreciate that the process of gaining security and establishing a reputation at any new company is an exhausting proposition. So you can imagine that it was not a decision I took lightly.
I was nervous about moving. I’d heard all the bad things about the “Evil Machine Corporation,” but figured it couldn’t be much worse than the particular situation in which I’d been placed. I spoke to probably 3 dozen people from NTAP and partners in the days after my departure, and a few of ex-EMCer’s reassured me that it was a great move to a great company. I was shocked at that, because the overall tone about EMC was so vitriolic (I remember more than one instance where someone on the dl’s got a talking to regarding what is and isn’t appropriate to say about a competitor).
My personal experience has not been anything like you or some of the other ex-EMCers describe. I cover a $100M territory with dozens of people, both technical and sales folks. In the past year, I’ve had contact with over 100 customers in one way or another. Never, not once, not a single time, have I EVER been asked to avoid mention of a limitation of one of our products. In fact, the opposite has been true – the folks I work with do not have the time or patience to deal with the customer satisfaction issues associated with the implementation of a mis-represented product. Everyone from the area manager to the most junior reps are very clear that our first priority is not to make the sale, but to make sure that anything we do sell meets the customer’s requirements.
Nobody at NTAP asked me to misrepresent products, either. But the reality of NTAP’s catalog of products tempts people to position something that might not be in the customer’s best interests. An example might be a customer that’s happy with their storage, but is having difficulty managing 10GB Exchange mailboxes. If all you sell is storage, then the temptation is to address the storage need, and sell snapshots as a way around some of the difficulty. At EMC, we’ll help the customer’s complete environment by introducing methods to manage the information in a more holistic manner. That’s why people keep coming back.
Now, EMC is a big company, with lots of areas and lots of products. So some of the 35,000 people that work here might just be out to sell people stuff they don’t need and doesn’t work – like the Chad look alike in the MadTV segment. But the activity on the internal EMC distribution lists indicates to me that the vast majority of customer-facing EMC folks conduct business like I do. And any that do operate like that won’t be here that long. The managers will very quickly tire of giving away gear and software to address customer sat problems.
In reality, the notion that we conduct business fast and loose like that doesn’t pass the sniff test – if everything we sell is broken, costs more than our competitors to implement, or doesn’t deliver on expectations, we wouldn’t be growing. I think we can all agree that the used car salesman approach doesn’t scale to the $13B level. To get there, you need repeat (and happy) customers.
BTW Chad – you absolutely need to get one of those powder-blue suits for the TC conference. I’ll pay for it if you can’t dig one up.
Jeff Browning says
First, congrats on the blog. You actually beat me to the punch. I do like the title. Imitation is the best form of flattery.
Second, responding to Mike’s comments about thousands of EMC folks becoming happy NetAppians, I will say that that process goes both ways. I have personally been involved in recruiting many NetApp folks into EMC (now that my non-recruiting clause has expired), and those folks are very happy EMCers indeed. Let’s be honest. There is a lot of cross-pollination between NetApp and EMC. As Chad and Dave Hitz agreed in their mutual blog posts a while back, the competition between EMC and NetApp is very healthy. Having worked at both companies I can tell you that both companies have virtues and vices. Neither is perfect, nor is the other evil. Both are full of honorable people who are trying to do the right thing for their customers.
And finally, responding to Vaughn’s original post, I find your presentation somewhat technically inaccurate. I watched Chad’s chalk talk which you kindly linked to, and Chad did say that the VDI data could be stored on EMC using either iSCSI, NFS or FCP. In the case of either iSCSI or FCP, it is true that there would be a LUN for each client OS image. In the case of NFS, there would be a file for each such image.
Either you have thousands of LUNs or thousands of files. On NetApp a LUN is a file. If NetApp did the same thing using iSCSI or FCP, then that would involve both thousands of files and thousands of LUNs. Does this make that significantly more complex?
Honestly, I would not claim that. Either way, thousands of items are going to need to be managed. Fortunately, computer software has a marvelous way of handling large numbers of similar items fairly conveniently. I could design such a piece of software. Fortunately, VMware has already done that.
In the case of the Oracle virtualization solution that I created recently, and will be shipped by EMC in our up-coming launch, I used a Pure NFS solution for all VMware OS images, for both the database servers and the clients. VMware cloning of these images was heavily used with all of the savings in terms of space and time that implies.
Yes, EMC does NFS too. NFS has a place in the VMware space. And EMC’s NAS solution, the Celerra NS series multi-protocol array, is coming on very strong.
Dan Baskette says
Hey guys, I have spent a TON of time doing this for 10 clients, 100 Clients, and now 1000’s of clients. LUNs,File Systems, Used Cars, whatever… That’s really not the hard part of scaling these solutions. In fact, there really is no LUN Management to be done, Software sutomates the creation of the LUNS, Software sutomates the Masking of the Luns, Software automates the surfacing of the LUNS….it renames everything and then hands it off to ESX. So whether its 1 Storage object or 1000, I don’t really see why it matters?
This is where the fun begins. Registration of 10,000 Clients is no small feat. Booting 10,000 Clients is even more fun. Software automates all that as well, but limitations in areas outside of the storage make this a bit more challenging.
Okay, now you have your VM’s and they are registered with VDM and pools are created. What’s the strategy for patching? Do you patch the gold image and then re-snap EVERY TIME? How do you bump the users off to accomplish that task? How do you handle your Anti-Virus software which needs to be locked to a particular MAC Address typically? Whats the AD Strategy for grouping of the new objects and retiring of the old objects when new snaps are created?
My point here is that how many images you put in a FS or on a LUN is such a tiny part of this solution that its a pointless argument. The Management of the overall solution is a FAR more challenging task and where the real work for our customers begins.
Dan Baskette says
Apparently, I need software to automate Spell-checking as well. AUTOMATES. that’s not that tough to spell, huh? Sutomates?
The VMware folks told us that we can use the next release of VDM and a new cloning technology called SVI. This looks like the technology that we will use when deploying VDI as it will be using the native vmfs volume.
Dan, with the next release you can use the VMsafe API for av protection and what everyone is concerned called rootkits.
Is that the next gen vdi solution that will change the way ntap and evil machine corp will approach vdi or will you take a hybrid approach.
Either way I think the citrix approach is the best way to do vdi.
Dont get me wrong VMware is the best there is for servers but citrix owns the desktop space and I don’t see any solutions from EMC.
Chad Sakac says
Terry, our plan is to do the hybrid approach.
Now, it’s still too early to see how VDM 3.0 and SVI scale up, but we’re on the second beta drop now, and so far it’s looking pretty good.
A lot of stuff we found in the process of the scale-up test with VDM 2.1 and the array mass replica approach were just a lot of non-storage related challenges per se (for example, the rate at which VMs could populate VC, the VDM pools, and AD itself as they initially booted). VDM 3.0 is much faster than VDM 2.1, and there are a ton of important features beyond storage savings that it adds.
The goal of exercises we’re all undertaking (I can speak for ours) wasn’t actually to show off, but to find out how it would work for real.
I just got back from VMworld, and while we were there, I made the offer to the NetApp team (our booths were right beside one another) that perhaps we should consider doing the testing together in the next round.
We have a few more tests to run on the current 2.1 rig and then will do a mass update.
If VDM 3.0 and SVI (or VMware View Composer as it’s now called) can scale well, it can do something (at least in my opinion) much better than the pure array based methods can do, which is the whole desktop lifecycle.
I don’t entirely agree with the VMFS user redirection that is in the VDM 3.0 preview (a.k.a. the beta build) and VMware is proposing. I think there is a strong argument for CIFS redirection for that potion of the desktop (better scale, archiving/dedupe techniques can be applied, simpler snapshot-based backup/restore).
Perhaps the VMFS user data folder mechanism will work well for people without NAS solutions, and it certainly will be very “easy”, but unless they think a lot about backup/recovery (and man, VCB – aka vStorage Backup Framework now – is NOT the way).
We’ll see, and we’ll report the findings as fast as we can.
So, netting out the EMC view:
1) VDM 3.0/composer seems like a good solution for the desktop boot disk. We will see if there is a sustained customer value for a blended approach (perhaps leveraging the vStorage block list tools)
2) I think CIFS is the right way to do the user folders for customers as soon as they get reasonably large – but are open minded to see how well the VMFS mechanisms work.
3) We want to see how well thinapp can work for real-time app streaming.
That 1-2-3 approach is going to be the next focus round here at EMC in conjunction with VMware, and hopefully our respected colleagues at NetApp.
Fitness Freak says
It was great fun watching this video.
Every body acknowledges that our life is expensive, but we require cash for different issues and not every person gets big sums money. Hence to get quick personal loans or just college loan will be a correct solution.