One of the original projects of OpenStack at the time that it launched was Swift, an Object Storage platform born out of Rackspace. A couple of months ago, we interviewed Joe Arnold, SwiftStack Founder and CPO. This week, we talked to John Dickinson, who serves as Director of Technology at SwiftStack and Project Technical Lead for the OpenStack Swift project. This was a notable day for John and the global Swiftstack contributors because just minutes before the podcast started, two years of effort had finally paid off with the addition of a new erasure coding feature. Watch the video, download the podcast, or read the transcripts below to learn more about:
You can follow John on Twitter at @notmyname and on IRC at the same name. His blog and other links can be found on his calling card page.
Jeff and I are taking the show to the OpenStack Summit in Vancouver! If you'd like to be considered for a guest spot, tweet us at @openstackpod
See past episodes, subscribe, or view the upcoming schedule on the OSPod website.
For a full transcript of the interview, click read more below.
Jeff Dickey: Good morning, everyone. I'm Jeff Dickey from Redapt.
Niki Acosta: I'm Niki Acosta from Cisco and we have an awesome guest with us today. John Dickinson aka @notmyname on Twitter. Please introduce yourself.
John Dickinson: Thanks for having me. My name is John Dickinson. I work on OpenStack Swift where I am the Technical Project Lead and I've been doing that for a while. That's what I do.
Niki Acosta: We're excited to have you. We had Joe Arnold on the show awhile back, definitely somebody technical as well. You're probably a little bit more involved in the day to day within your role at SwiftStack and your PTL role at OpenStack. I definitely want to know more about the life of a PTL but before we do that tell us about you; how'd you get into tech. Were you the geeky kid with the computer?
John Dickinson: I was. Yes. That's how it seems to start a lot of times. Let's see, how did it all start? It all started when my grandfather gave my family a old Commodore 64 and I started working on that, playing games. The really cool thing about that era of computer was that it came with a big, thick manual. The manual was, here's how to write a program for this and the computers of that era, you had to choose not to program them.
I naturally got into that and started figuring out how to make this little box do what I wanted it to do, which was cool. The really cool thing is that I'm doing today what I wanted to be when I grew up. I wanted to do programming and stuff like that for decades now. It's cool because it's pretty fun and so self taught and school taught and been doing it professionally for about ten years or so now.
Niki Acosta: How did you get into OpenStack?
John Dickinson: Almost I guess by accident. At the time I was ... Slightly before OpenStack, I was employed but looking for another job and ended up landing at Rackspace. The day that I started at Rackspace was pretty much the day that Cody and Ernest started off to rewrite the storage engine behind the Rackspace cloud files product. We started working on that and worked on it for about nine months and launched into production but a few weeks before we had done that maybe a month or so, the Rackspace executives came to our team and said, "ey guys what would you think about open sourcing this?" I'm like sure, that sounds great. It turns out that that was a part of OpenStack so that became or that was the Swift Project and so I've been apart of OpenStack since the very, very beginning.
Niki Acosta: Will you give us some insight into the history of the Swift name? How did Swift get its name?
John Dickinson: I don't know. It was chosen way before I joined Rackspace. It was something that was sketched out in various incarnations. I will say something that this will probably has absolutely nothing to do with how the name actually came up but it's really a cool fact. Swift is a kind of bird, right? What's really interesting is Swift birds are the birds that spend the most time in the air at one time. Seriously, the birds spend something like six months flying before they ever land. You can say Swift is the kind of bird with the most up time just like OpenStack Swift has an extraordinarily high amount of up time.
Niki Acosta: That's so cool. That's really geeky and really cool at the same time. On your screen it has your name down there and it has the little Swift Bird and your little icon. I had no idea that Swifts flew that long without landing.
John Dickinson: It was pretty cool when I figured that out too. I think it's a complete accident that we're called the same thing. It's still a really cool thing and I think it's something fun that we share.
Niki Acosta: Go ahead Jeff.
Jeff Dickey: I want to know more about Swift and the evolution of it to where you started with it to where it is today.
John Dickinson: It's really been cool. I've been apart of working on Swift for about five years now. The original goal was to rewrite the storage engine behind Rackspace Cloud files to replace something that was already there, to solve some of the problems that it was having with scaling. The marching orders that were given was basically two things, make it better and customers can't know. In other words you have to support the same API and be able to have some sort of migration plan. We completely solved those things. It was better in every measurable way.
We were able to have a migration plan for all new customers immediately going on there and over time, over the next several months, migrating the existing data from the existing system into Swift. Since then one things that's been tremendously exciting is that although Swift started as a storage engine in a public cloud hosting provider, something that was very much analogous to S3, it's continued to grow and meet more use cases for that for different people out there. In addition to being something that is used by many public service providers today all over the world, it is something that is used also internally by a lot of different companies as well.
Those sort of different use cases in those sort of different experiences are things that are really shaped the evolution of Swift itself over the years. We've been able to do things that are simple sort of things like, oh great now we can have the ability to say create sign URLs so people can share URLs with content realm. That's fine. It's kind of a small feature. It's really powerful and really interesting supporting arbitrarily a large size objects as the different use cases came up. It also has resulted in some incredibly huge and substantial features that directly is the result of these different people doing stuff.
A couple of years ago we released support for global clusters so that one logical Swift cluster could span continents and oceans. Again, not something that was a part of the original use case but has since been used many times for people who need to have either locality of access for their content or they just need to have their data do-ably spread across a wide area so that they can survive even entire data centers going down. Since then we've released something last year called Storage Policies which allows you to directly expose nuances of your particular deployment to whether that's a locality or region or particular hardware.
You could, for example, say this data is going to be US east coast and US west coast spread and this other data is only going to stay in Asia and the data won't ever leave Asia. You could also say this kind of data is going to have a different SOA, perhaps a different pricing model because it's going to be backed by flash storage instead of spinning drives. That's allowed us to have a lot of flexibility in how deployments look so that a deployment software versus a deployment Rackspace on the public cloud side look very different than say a deployment at Gain Company or CDN provider or some other web or mobile content host-er that's more on the private storage side.
Recently, the thing is we've been to build on top of that again in response to what these people are asking for is support for erasure code content and that's something we've been working on quite a bit. Actually just recently finished up.
Jeff Dickey: That's awesome. I know people been asking for that for a long time. Can you describe, too, what that is?
John Dickinson: Yes. Technology has been around for a long time. I think it was first invented or proposed and described in the 1960s. It's been used forever. If you've ever watched a DVD, you've used erasure codes or used a red card you used erasure codes. It's a way that you can get really high durability, in the case of failures, but not use as much raw storage is required as in a pure replication store to model. In the pure replication model let's say you're going to have three copies and you're going to store those on different hard drives on different servers and things like that, which is really great.
If a whole server goes down, then you still have access to your data. The problem is in some use cases especially when you've got large content like backups and stuff like that, if you're going to store one gigabyte of data you've got to store it three times so that's three gigabytes that you have to store. With erasure codes you can break it up into some smaller pieces and compute some other what's called parody pieces and store those. Maybe you're effectively only storing, say, 1.6 or 1.8 times the amount of storage. You only need 1.8 gigabytes to store your 1 gigabyte piece of data.
You can still be protected against many failures when you're doing that as many or even more failures than in replicated storage. The cost, though, is that it takes a lot of CPU horsepower to do this. It's not particularly good for every use case but it is really good for those use cases where you have this large set of data that needs to be stored fairly cheaply and you're not going to be accessing it frequently. We've been working on that inside of Swift for the past year or two depending on how you're counting and literally in the last 48 minutes we have been living in a post EC world in Swift. Just late last night and early this morning to get all of that stuff merged has been a truly global effort from a lot of really talented and great to work with contributors in Swift community.
Niki Acosta: You're a PTL and you have been for quite some time I think, right?
John Dickinson: Yes.
Niki Acosta: What is that like? I think a lot of people just think this is the guy that's going to say no to my code or ... You actually do have the ability to make decisions but you're also held accountable by the community, right?
John Dickinson: Sure. Yes. It's an interesting balancing act because inside of an open source community there's not ... I would say it's very similar and OpenStack itself, to be honest, is very similar to the organization of a large company. You've got different levels of bureaucracy and you've got different levels of people talking to each other in politics and all that kind of stuff. The difference is that you don't have nearly as much accountability within that as you would, not to put too fine a point on it.
You can't actually hire and fire people inside of an open source community. Which means that getting people to work together is much more along the lines of making sure that people have the tools they need both to get their work done but also to know what is being done by other people and how to take that to their employer and tell that story to their employer and to show this is why the community is good and this is why we're working on these sort of things because it helps us over here. It means that there's a lot of time spent on making sure that people aren't stepping on each other's toes.
They're working together and not against each other and that the things that are being focused on are actually things that are moving the entire project forward rather than people going in a hundred different directions at once. At least in my experience the life of a PTL involves a lot of that sort of thing. It does still involve a little bit of writing code, not as much as it used to. It does still involve reviewing code and making sure that I can be available for anything that's necessary there.
It also involves a lot of talking to people and making sure people know what's going on and even outside of the developer community even just the immediate Swift community but the broader OpenStack community and even the ecosystem as a whole figuring out that okay we know that, for example, HP is seeing this issue and has this concern and needs to work on this. Then we see that Intel is over here doing this and Red Hat's doing this and SwiftStack is doing this and so on and so forth.
You've got all of these different people together and it's been really fun building the tools and the processes and staying involved to help people out. Everything from tech support to evangelism to actual typing code sometimes and making sure the other people who are primarily doing the code writing, reviewing are able to do their job with as few distractions as possible.
Niki Acosta: We didn't define PTL for people, that's my mistake, first. For someone who's not familiar with that term, can you explain what that is?
John Dickinson: It stands for Project Technical Lead and the basic idea is that it is inside of all of the OpenStack projects in some ways it's the point person, it's the face of the project to the rest of the community. It's someone who's elected every six months by the people who are contributing to that project. It's chosen from a group of your peers and by the community in order to ... In some cases set directions and priorities but in a lot of cases to coordinate and to make sure things still happen and to be that face and voice in the community. The person who gets the short straw, every time somebody needs to stand upon stage and say something or type up things or be kind of the person who if there needs to be a tie breaking decision, the person who could do that and the other people will trust and respect to do so.
Niki Acosta: That's not a long tenure for people. Six months is not a very long time. How do you grapple with staying in your position of PTL? Do you have to be the ultimate nice guy and make a lot of friends or is it based on technical merit?
John Dickinson: I think it's a lot of both. In some cases I'm fortunate to ... Nobody else has wanted to do it so I'm the one who's, okay, I don't mind the tedious parts enough as much as I enjoy some of the other parts so it's ... To be honest I haven't there's been nobody else who's stepped up to say yes, I'd really like to do this. There are many people in the community who are very capable and able to do it if they chose to do so but for the time being they are happier not focusing on a lot of the external facing but focusing on the inward what's happening to the Swift code itself. I got to run for my job every six months but so far it's been a joy to do and not a burden and something that I'm proud to do and really honored to work with all of the people in the community. If there's anything good that's happening in Swift it's not because I'm PTL, it's because of the awesome contributors.
Niki Acosta: That's how you keep your job, ladies and gentlemen. You're such a good representative for your project. How much of your time ... I certainly understand that Rackspace was super open source friendly. Obviously as a founder of OpenStack they were, yes, go be a PTL and at SwiftStack you have that same luxury. How much of your time do you spend working on stuff for SwiftStack directly versus PTL duties?
John Dickinson: I spend 24 hours on both! The really great thing about being at SwiftStack is that my duties as PTL do not conflict with my duties as a SwiftStack employee. We are very focused on providing an awesome storage product for our customers and to make sure that people's storage problems are solved. One of the key pieces of doing that is to stay involved in the open source community so that we know that our customers have a voice inside of the project itself and that we are also able to bring to our customers a high quality software developed by world class engineers all over the world. The two work together very, very well in that and I've yet to find any conflict of interest or any sort of tension between the two. Occasionally, it gets busy but for the most part it's a joy to do both and by doing -working for one -I'm also working for the other at the same time and there's been no struggle there. It's been great.
Niki Acosta: A lot of what you guys do, if I remember correctly based on our conversation with Joe Arnold, pretty much all of the stuff that you guys do is contributed back, right?
John Dickinson: Yes.
Niki Acosta: Where do you draw the line between your IP and the stuff ...
John Dickinson: The truth is we sell software so if we're giving it all away how are we making money sort of thing. Anything that affects the actual storage engine and pretty much what you're talking about is the read and write data path for the object storage system, we contribute that. That's everything from performance improvements to security fixes to new features -things like global clusters and ratio codes and things like that that we've been able to take a leading role on.
Beyond that though I think it's important to consider that Swift, like all of the OpenStack projects, it's not a product; it is a project. It is an open source code project and it's almost unfair to compare OpenStack projects to other products that are out there in the world, in the wild, because a product has a lot more around it. A product has a lot more polish around it. It has a lot more or the operational things worked out. It has a lot more of the deployment things worked out and it's needed ... you've got a lot of those integration points that you got to deal with in a product.
Those are the kind of things that we do for Swift itself to bundle up a product called SwiftStack. To do that, we have spent a lot of time on working on the integration with the rest of the IT infrastructure so making sure that it works with your existing identify management system so you don't have to have something brand new whether that's an old app or what not. Being able to integrate with charge backs and billing systems so that a company can easily see who's using what and monitor that and appropriately measure that.
Just figuring out what's going on in a cluster with alerting and monitoring and metrics and displaying those so that operators know what's the state of your cluster and what should I do when something happens. Know matter who you are and know matter what's going on, if you're deploying any storage system you have to have those sort of tools. If you're doing Swift, you're going to have to have that information. You need to know what's going on. While Swift has a lot of data that it tells about itself that you can hook into to figure out what's going on, putting it all together and integrating it into the rest of the systems that you may have is exactly what SwiftStack is doing so that you can actually focus on building apps instead of worrying about your storage system. We spend a lot of time on that sort of thing.
The question I guess about where do we draw the line, if it's about the object storage system, completely hundred percent open source. There's no lock in on that and we are using upstream Swift code. We don't have our own special version of Swift that we're doing. If it's talking about the operation tool sets and the monitoring and say like a file system gateway on top of that, those are the type of things that we have as part of our product that our customers have access to.
Niki Acosta: In terms of Swift, a lot of the criticism that I've heard is that it's really freaking hard to ... it's not that hard to install but it's really hard to scale. Do you think that's true? I mean you're a Swift expert but do you think that's slow deduction?
John Dickinson: I think that's a good question. That's an interesting thought. There's two ways you can take that. One is the question on does Swift itself scale and the answer is unequivocally, yes. That's not just me saying that as an advocate for Swift. That is me saying that with the backing of many, many, many multi petabyte deployments on Swift some of whom are very large including people like Rackspace and HP and even other private ones. The truth is, yes, Swift is a very massively scalable object storage system.
Any kind of struggles you have with scaling Swift and managing that, in fact, at times I've heard somewhat of the opposite is what the great thing is is once you got it installed, it's very easy to add new capacity and you don't have to have any sort of down time to do that. Yes, you are dealing with software that's running across a cluster of machines so yes there's operational and management needs that you have there. That's again unplug is part of what we do at SwiftStack.
That's absolutely something that were very concerned with in the open source project as well to make sure that the operator's needs are accounted for and that we can always maintain things like great we've got a stable API and you're always going to be able to upgrade with no downtime. You're going to have a stable release that is going to not frequently update API versions or config variables and you're always going to have a migration plan from one to the next.
Niki Acosta: With the customers that you're working with, are you seeing that people are migrating data to Swift from something else?
John Dickinson: Yes.
Niki Acosta: If so, what are they migrating from? What is Swift meant to replace or augment?
John Dickinson: That's a really good question. The answer is yes. There is ... A lot of people are migrating to Swift away from something else. I'll talk about what those are in just a second. In general, let's talk about what is Swift good for? Why would you ... What are the use cases that you would actually use for it? There's lots of other projects that people ask about and say well how does it compare to this other storage system X, Y or Z or whatever it may be. Swift is designed for storing unstructured data. What's that?
Think about ... The common thing that we think about a lot is files. Think about documents. Think about your pictures and videos. Think about stuff you see on the internet. That is unstructured data. Basically it means that if you and I, Niki, are using the same, for example, phone app and we each take a picture. It doesn't really matter. Those are completely unrelated pictures. It doesn't matter where we were when we took those or what order those specifically happened in, they are unstructured with relative to one another. In aggregate all of our pictures together. It's just a big bunch of unstructured data. Another really common use case and one of the things people start using Swift for initially is backups. I've got a server, I've got a lot of servers and many times it's using something like Nova is I've got all of these servers and need to back them up so where do I put all the backups for that server? You can take a backup of a server, you can put that inside of Swift and now you know it's going to be available when you need it. Hopefully you won't. If you do, it's right there and then it's also going to be durably stored so that you're not even going to lose it if you have hard drives go down or servers go down or need to replace that over time.
When you're asking the question of where are people migrating from, many times it's people who have been trying to solve these problems already and have run into scaling problems either technically or financially with traditional storage systems. There is definitely some degree of migration away from some public cloud providers like Amazon S3 or something like that. Many times people will start balking at a bill that is measured in tens of hundreds of thousands of dollars a month. It's dominated by a network costs and maybe they don't even want to give another company their data so they need to keep that in house because that's their ... that's actually key to their business.
In that case there's people who are using Swift for cost savings over public cloud storage. On the other hand, there's a lot of people that I see, especially people who are doing more of the private cloud thing, who are migrating away from traditional storage providers, NAS and SANS, things like that that really have inelegant scaling and upgrading models and get very expensive overtime -especially when you're locked into some sort of support contract or something like that. I see all of that and that's much more common than the general, I think I'm going to have this idea and it's going to store a lot of data. I guess I better deploy Swift and do this green field deployment.
To give you a really practical example or a few really practical examples. One use case of Swift I like as far as public web content that we talk about a lot is Wikipedia. If you go to Wikipedia and you look at a picture it's coming from their open source cluster that they are running. You've got this web content that is incredibly popular one of the most popular websites on the entire internet and the data has to be there. By the same token looking at those pictures and videos, you've got businesses whose livelihood is -requires that that'll be there like online auction sites.
If there's not a picture of whatever you're trying to buy you're not going to buy it. They're using Swift to reliably store that data. There's a lot of people who are using Swift for media and entertainment, videos and pictures and games recently talked about CDN provider who's using Swift as an origin for a lot of the content that they're doing especially that's popular in the gaming industry, financial industry is really sensitive about where their data lives. A lot of them are saying I've got all of these documents and I can't give them to a different company so I have to make sure that they are safely and secured in my own data center and I need a way to do that scalably. They're using Swift for that.
Niki Acosta: Does SwiftStack have any opinions on the actual underlined hardware that customers can use?
John Dickinson: Yes. Swift itself is hardware agnostic but we ... That's one of the things that we do here at SwiftStack is definitely work with other companies like Redapt to -Hi Jeff -to make sure that our customers have a set of hardware that is tailored to their use case and actually is something that we can support well and is just to make sure that they're successful.
Niki Acosta: What are your baseline hardware recommendations? Are you putting a ridiculous amount of drives in commodity hardware or does the [inaudible 00:30:44] matter?
John Dickinson: In a lot of cases, yes, that's what it looks like but I think the important thing there is it's generic off the shelf hardware. It's not any sort of customer hardware. If you're preferred vendor is HP, good you can use HP. If it's Dell, you can use Dell. If it's Cisco, you can use Cisco. You can do all of that kind of stuff and you can tailor it for your exact needs. In a lot of cases, yes. It does look like I've got a box with a little bit of CPU and a whole lot of drives. That's actually one of the fun things that I get to do is I get to talk to these guys who are these storage media vendors.
I've got a box of 8 terabyte drives sitting on my desk that I'm about to rack up for a testing cluster and I'm going to get another box of those in the mail pretty soon from a different vendor. The really great thing is that as this new storage media is coming into play, we get to make sure that Swift is keeping up with that and actually we're being approached in the community by these people who are making the underlined storage media to say hey, we want Swift to work natively and well with this out of the box, the moment we launch stuff.
Not only new technologies with hard drives but flash vendors and even recently there's been some noise about a couple of companies that have written tape library connectors for Swift so that if you need the ultra cold ultra large scale storage archive data sort of thing, there's people out there who are actively working on that right now in the ecosystem. It's been really exciting. That's actually what's really exciting me a lot right now in the open ecosystem is being able to work with the storage media so that Swift can natively talk to those very, very well and ultimately whole thing is about applications and application developers really don't care what kind of hard drive it is, they just want to make sure the data is stored.
Being that abstra