Ready to learn about Trove? Oh, sure-you know it's OpenStack's database project. But do you really know what it does?
Amrith Kumar is the founder and CEO of Tesora, and on OpenStack Podcast#26, he sat down with us to talk about Trove, Tesora, and the database applications they work with. Specifically, he covered:
To see who we're interviewing next, or to sign-up for the OpenStack Podcast, check out the show schedule! Interested in participating? Tweet us at @nextcast and @nikiacosta.
For a full transcript of the interview, click read more below.
Jeff Dickey: Good morning everyone. I am Jeff Dickey from Redapt.
Niki Acosta: I'm Niki Acosta from Cisco and we have an awesome guest with us today. Amrith Kumar please introduce yourself.
Amrith Kumar: Hi. Good morning or afternoon to you wherever you are. My name is Amrith, I'm the founder and CTO of Tesora. We're a small company and we work on Trove, which is the OpenStack databases service project.
Niki Acosta: Awesome. We are really excited we focused a lot ... We had a lot of guests that cover storage, a lot of people obviously running compute, but not too many doing Trove. We're really excited to dig in what Tesora is all about. Typically, we like to start by asking about you. How did you get into tech and how did you end up as the CTO and founder of Tesora now working on OpenStack?
Amrith Kumar: How did I get involved in tech? I guess I've always been interested in how things work. More interested in taking things apart and putting them back together. There's a large number of slightly dismantled coasters at home for no apparent reason. With computers, I've been working on a whole bunch of different things. A couple of database companies, a couple of companies building very large enterprise hardware. More recently, I got interested in this whole notion of how people consume software in the cloud. Everybody wants everything as a service. Computer as a service with things like Nova and database as a service with Trove seemed really attractive propositions. I've been working on this now for the last 18 months.
Niki Acosta: Was Tesora ... Did you come up with roots already in OpenStack or did OpenStack seem like a natural extension for what you were already trying to do?
Amrith Kumar: Yeah, OpenStack was a natural extension. About four years ago was when Ken Rogan and myself started the company. We were always of the opinion that people are going to want to consume databases as a service. The old way of consuming a database of course was you bought yourself an expensive server and you bought yourself some software and there was a lot of cursing and swearing involved and eventually maybe you had a database.
The way in which people want a database today is: go to a webpage, click a button, you get a database. Logically, there's a set of problems you need to solve in order to make this possible and OpenStack seems to be an excellent framework in which to solve a lot of those. Trove was logical project for us to gravitate towards.
Niki Acosta: As a sort of platform for databases, what databases do you offer as a service?
Amrith Kumar: Trove currently supports relational and non-relational databases. It supports mySQL and post gress from the relational side. It supports Mongo, Cassandra, Couch, that is non-relational side. This is as of the Kilo [inaudible 00:03:13]. Trove only came out as an integrative project of Icehouse. It's been not that long and in Juno we had a bunch of features and Kilo we're also going to introduce support for Vertica, support for DB2 and support for CouchDB.
Niki Acosta: Tell us about the customers that are using your service. I think there's still probably a large number of people that think, "I have to do this in-house. I have to do it on my own hardware. I'm not ready to put stuff in the cloud." Who's using Tesora today and what are they using it for?
Amrith Kumar: Let's start with who's using Trove today. Trove has been around for a short period of time and people like say eBay. EBay runs Trove in their data centers. HP's cloud brought us Trove as part of their cloud offering. Rackspace operates Trove as part of their cloud offering. There's a whole long list of people who use Trove. There's a lot of people who are using our software and who are bringing our software into production in a variety of different verticals. Financial is one area where people are doing this. Telco, there's a lot telcos who are using this.
Think about people who are already using OpenStack who've got large amounts of data, it's just a matter of time before they realize that it's no fun manually configuring and operating databases. Over time everybody will start using Trove in some form or another and we offer the best version of Trove and the most complete version of Trove. That's where people will gravitate to. We have a community and enterprise edition and the enterprise edition has a whole lot of features over and above Trove.
Jeff Dickey: Who's you typical customer? Are these DBA folks? Are they infrastructure folks or devs? Who's typically using and integrating Trove?
Amrith Kumar: Typically, you see people in an IP organization who are looking to offer database as a service within the [inaudible 00:05:19]. Think about it, there was a time when if you were a customer within the organization you wanted a database. You made a request to IP, some months later you got a database. That's not going to fly very far. IP organizations are being required to offer database as a service within their own companies.
They are therefore looking to deploy database as a service therefore, they are using Trove. That's one class of user. The other is people who are offering a public or manage private cloud and looking to offer database services as part of that. We have some customers who are doing that as well. If you go to get database as a service from them you're effectively using Tesora's DBaaS under recovery.
Jeff Dickey: Can Tesora be used separate from OpenStack like Swift or is it very coupled?
Amrith Kumar: Tesora is an extension of the Trove project and Trove is basically built on other OpenStack services. Trove is a database as a service project in OpenStack. When Trove wishes to spin up a VM it goes to Nova. For identity management it goes to Keystone, for storage it goes to Cinder or to Swift. By its very nature, Trove cannot be operated independent from OpenStack. It's an interesting question you asked and definitely, something we've thought about a lot. There's utility in being able to get the same Trove look and feel into other clouds. We're doing some things in that area.
Niki Acosta: Obviously, you know a lot of people take OpenStack projects they add their own secret source or their own IP. Where does your IP reside within Tesora?
Amrith Kumar: First of all, let me say we are a major contributor to the open source version of Trove. We're the largest contributor to Trove at this point. A lot of what we do is in the public. There are some databases, which we support, which Trove doesn't support. An example is Oracle. We recently introduced support for Oracle. By virtue of the fact that Oracle is a closed source database and you can't just redistribute Oracle under Apache, there are components that we have which are part of our enterprise product. Remember our enterprise product is also an open source product. Our IP is therefore in the things we do in that product and the licensing which go along with it.
Jeff Dickey: How does that work with Oracle? Are they fans of you guys? It seems on one hand could spin up a bunch of Oracle licenses, on the other hand it is a little bit different than their model.
Amrith Kumar: At the end of the day it's what licensing Oracle has for the Oracle server. We're not in the licensing game at all. If you were to spin up an Oracle instance, it behooves you to get an Oracle license for it. We're not in any way going to change that part. What we're doing is we're making it easy for you to be able to say, "I have a standard form deployment that Oracle would like to do. I'm using OpenStack. I want my Oracle instance for my development desk to in my OpenStack environment, how do I do it? That's all we do.
Niki Acosta: In terms of the database technologies that you guys support. Certainly, over the last couple of years, it seems like there's always a database that's hot for a minute and then we move on to something else. What are the hot databases right now and why?
Amrith Kumar: What are the hot databases and why? Right now if you look at the majority of people who are looking to get a database, Mongo is one, which they've tried. Cassandra is one, which they've tried. If you're using relational databases, it's probably going to be one of MySQL server or Oracle. Each of these databases has specific benefits and limitations. The value of OpenStack Trove is that you don't have to go decide upfront which database you want. You could try a database. Figure out whether it works for you or not. If it doesn't work, very inexpensively, try the next one. That's where Trove comes in.
There used to be a time when you need to make a costly decision. What database I want to pick up front. That would drive a lot of your design choices and in a lot of ways restrict the things your application could do and the way which you could innovate. You can always say I'm starting with MySQL but down the road, I find there's a particular use case where MySQL is not the best one and I need to do both. You could say, "I tried Cassandra and it's not the right one for some reason and I now want to go to Mongo." You could do that too. That's the benefit of Trove.
Niki Acosta: Do you help customers if they are wanting to explore different database technologies. Are you guys helping them with migration of the data behind the scenes or is that something that the customers have to handle?
Amrith Kumar: That is not something we're directly helping people with. In most cases, people already know how to do this. Trove makes it easy for you to get the database up and running. At that point moving the data in is not that hard. At the end of the day, Trove is not a database. Trove makes it easier for you to use existing databases. Whatever tools and techniques you used with Trove before, with your database before-they worked even after you used them.
Niki Acosta: Essentially the API connections to Trove itself don't change. What would change is whatever it is that's plugging in underneath of Trove, right?
Amrith Kumar: Kind of. Let me explain it to you something like this. When you have a database, there's two classes of operations you could do with it. One is a set of things like provisioning and deprovisioning. Management, resizing the instance on which it's running, and starting and stopping it. That's one class of use. The other thing that you could be doing is running queries against it. Inserting data, selecting data, retrieving data, so on and so forth. The first class of use cases-effectively things we call the management. The second is the data plane.
Trove almost entirely operates on the management. The data plane is still whatever your application wants to do with the database. Typical workflow is you go to Trove and say provision me a database. Once Trove is done and you have that database, your application connects directly to that database in whatever way it's normally used. Trove doesn't get into that path at all. Therefore, we're helping you start, stop, manage the lifecycle of the database and allowing your application to interact with the native database just as it used to. We don't get into the path. We migrate the data most of the time.
Niki Acosta: In effect, you probably reduce some latency because it's not all coming to this bottleneck of Trove?
Amrith Kumar: We're not. We're never in the data path. The application talks directly to the database and you can get very, very good response times and Trove is not even involved in that.
Niki Acosta: Here is an interesting thing that we've certainly being hearing about as people still continue to talk about federation. Are you starting to see instances where maybe people are combining maybe VMware and then connecting that through Trove, to a database and maybe it's in different data centers? Are we at a point where that's possible yet or are we still sort of limited by the speed at which information can travel?
Amrith Kumar: We are still limited by the speed at which information can travel but there are a lot of people who are beginning to see with databases like, today for example like Cassandra and Mongo that you can in fact have a distributed database and you can in fact access it very effectively even in a distributed manner. Again, Trove helps orchestrate those provisions. But at the end of the day, it's the power of underlying database technology, which people are leveraging, and Trove makes it easier to do that.
Niki Acosta: I hope you have some talks at the OpenStack Summit about this.
Amrith Kumar: Yes, we do.
Niki Acosta: Yes.
Amrith Kumar: We're talking about ... Actually at the OpenStack Summit, we have a bunch of talks about how you can use Trove, simple things. How you can use some advanced features in Trove like replication and clustering which is now new in Trove. We definitely do and I understand you have some interesting talks there as well.
Niki Acosta: I do. I have a couple of talks. One is Are you ready for OpenStack? Which I will be jointly presenting with Scott Sanchez. The other is a product management strategies panel and I think you guys had one as well. Thank you for the kind gesture.
Amrith Kumar: Yeah, [inaudible 00:14:33].
Niki Acosta: Trove is fascinating. I haven't heard of many people using it. I think for the, in the most cases, there's a lot of people that are still trying to wrap their heads around the earlier core components. As more people move to OpenStack I'm definitely seeing people trying to figure out and solve for the database question. It sounds like you guys are probably further ahead in trying to help customers solve that problem. One distinction that I don't think we made and I guess I may have a question about this as well, are you guys also handling backups of databases?
Amrith Kumar: Absolutely. Maybe a good idea is to talk about what are the things Trove does well. It gives you a simple horizon-based dashboard where you can provision a database. You can provision a replica for the database, which is replication for high availability and so on. You can easily say take a back up. The backup will automatically be generated for you and will be sent off to some Swift storage. You can have a collection of backups and you can say, "I have this back up from this point in time. Launch me a new database with that back up as the data loaded onto it." Trove will completely manage that for you and you get a connection end point with that data loaded onto it.
You're running a database and you find that the database is underpowered. You can say resize that, and choose a new larger flavor and could do a compete migration of data and get you a larger flavor. You can say I don't have enough storage because I've loaded so much data expand the storage for me and Trove will manage that for you as well. That's just a quick list of some of the things Trove can do for you to make it easier for you to use the database. In fact, it will do this for all the databases I mentioned.
Niki Acosta: Is this typically deployed in the customer's data center or do you also offer as a service somewhere hosted?
Amrith Kumar: We do not offer it as a service, but we have partners who do offer it as a service. We have customers who are using this in their own data centre. They would be deploying this in their private OpenStack clouds. We have customers who are operating managed private clouds and offering this as a service. Of course, Trove is being used by HP's Hellion cloud. It operates Trove, Rackspace operates Trove, and so on.
Niki Acosta: Kind of an interesting question I'd like to ask just because I think there's a lot of divided opinions on this, but does the hardware really matter?
Amrith Kumar: With databases it turns out that it does. It depends on what you're trying to do with it. This brings up a very interesting thing. You'd ask me about how we defer from Trove, so I'll tell you a little bit about that. Databases are notorious in that they've traditionally consumed a significant portion of the IT budget because they are very resource intensive. Database queries can be very costly not only in terms of CPU time but also network and storage. There are classes of databases, which for example, will never be properly run in a virtualized environment.
Ironic may be a kind of solution but there's just dedicated hardware and very complex infrastructure for databases which is never going to be available as part of a virtualized environment. One of our extensions to Trove makes it possible for you to take a database which is running on dedicated, bare metal, super expensive hardware and make it available to somebody through Trove. That's an extension we have over and above Trove in our Tesora platform.
So, if you have a large Oracle 12C rack deployment. In our DBaaS platform you can provision a container database into that, and that database is running on some hardware entirely not managed by OpenStack. In those cases, performance really does depend on the hardware you have and we make it possible to get the benefits of that.
Niki Acosta: That's interesting. You said it's deployed in a container on the metal.
Amrith Kumar: Yeah.
Niki Acosta: Are you in liberty to tell us what container technology you're using?
Amrith Kumar: No. Not that kind of container. This is container database as in the way which Oracle describes these containers. It's pluggable database, so it's Oracle's view of multi-tenancy where you have a large database infrastructure and you can come along and say give me a database effectively rather than asking for the old way of saying give me an Oracle instance. You can get an Oracle virtual instance if you will running in this pluggable database infrastructure. That's called a container. Don't confuse it with Docker and things like that. Totally different story.
Niki Acosta: What's the hook into dedicated bare metal? Is it an agent that runs? Is it through Ironic?
Amrith Kumar: In our case no. We're doing it through a thing called a proxy guest agent. A typical Trove deployment, the guest image you provision through nova is a VM which includes a database of your choice and a Trove component called a guest agent. We decoupled those two. We have a guest agent on a small nova instance somewhere and the database itself running somewhere else. That's our remote proxy mechanism. If you wanted to operate with bare metal, as long as you have nova configured to talk to bare metal properly, Trove will work this for you.
Niki Acosta: Very cool. Very, very cool. Jeff you have a question? You just [inaudible 00:20:13]
Jeff Dickey: I do. I'm curious about how Trove works in zones and regions. Is that just the databases themselves talking to each other or just Trove working across regions?
Amrith Kumar: At this point ... Let's talk about that in a couple of levels. First of all Trove is a consumer of the existing OpenStack services. Trove consumes nova services, it's just another client of nova. It's just another client of storage and so on. When Trove comes along and requests a new instance it requests it and specifies which region it would like it on, but beyond that it has not control over it. Therefore, the entire set of conversations about regions, is something which you provide as instance nova and then you choose optimized storage for that.
Now you could have a highly available deployment based on this, by specifying some instances in one region and some in another, and so on. When you are building a database, let's assume you're building a clustered database with [inaudible 00:21:21] it would be very, very bad if you had [inaudible 00:21:25] in different regions. You want to have [inaudible 00:21:27] nearby. You can provide that ability in Trove at this point and choose where you want [inaudible 00:21:32] but that's about as far as Trove goes. Remember Trove has been around for a relatively short time and we're rapidly adding some of these features on as people are coming up with these use cases. Some of those are still [inaudible 00:21:48]
Niki Acosta: Talk to us about some of the innovations that are happening in Trove or what's on the wish list for Trove users.
Amrith Kumar: One of the things, which we're finding more and more, is that a database operated as a service requires a lot more than just taking the existing database and slapping it in the VM and saying it's now managed by Trove. Every workload, which a person would want to have at that database, needs to completely be offered as service. We're working in Trove on projects which will make it much easier for you to do all of this. For example with no access to shell on the machine [inaudible 00:22:29]. That's a class of thing that we're doing.
Also, Trove is a project, which very quickly came up with a large number of databases supported in the Icehouse and Juno releases, and we have a lot of technical debt that we have to repay. We're doing that with a significant portion of Kilo and we're definitely going to do some more of that in Liberty. We're a diverse community and we have participation from whole bunch of companies. We're getting new databases added on a regular basis. As I mentioned, in Kilo we're going to have support for vertica, Db2 and for CouchDB.
There's plans for new databases, which are going to be supported in the coming releases. That's one kind of thing which we do. Another thing we're working on is for a given database. More capabilities which make it easy for you to use the database in the cloud. When Trove came out in Icehouse you could only support single instance data [inaudible 00:23:29]. In Juno we had a support for replication with MySQL and plus with Mongo. In Kilo, we're extending on the replication capability. We're going to have to continue down that path and address more and richer use cases in the future.
I think of Trove as being extensible in three directions or it's a framework, which allows you to extend in three directions. One you can add into databases. The other is for each database, there are capabilities, which may not currently be supported, which you need to have in order to properly use that database. The third is, some databases are really versatile and you can do things in three different ways. You can take backups in MySQL in several different places. Each of these is an extensibility in different direction.
In all of these cases, we're actually seeing projects, which extend and improve Trove. We started with MySQL [inaudible 00:24:27]. We went to inner back up and we have incremental backup. We're going to extend it in the same way for other databases, add more databases, and add more capabilities for each of those databases as well.
Niki Acosta: Sounds complicated. Obviously, there's a lot of people who live and breathe and eat and sleep databases for a living. I can imagine it's probably hard to keep up with all the database technologies. For a short time I was working with the Rackspace team over at Object Rocket and similar types of questions that had to be answered and decisions were made for some of the technologies to not go with Trove yet just because they didn't feel like it was ready.
It sounds like part of the magic that you guys provide is stabilizing those components to make sure that they are going to operate within a reasonable amount of predictability for a lot of your customers. How are you handling upgrades? It seems to be one thing that certainly the folks at Metacloud have spent a ton of time on and certainly something that's still a sore spot for OpenStack overall.
Amrith Kumar: Sure. Let me go back a little bit to one of the things you mentioned earlier about the complexity involved. Databases are notoriously complex. When the balance of power in an organization was IT controlled, what databases you would have and they would have, they had all the knowledge about the technologies involved. It was easy for them to say, you can choose any database as long as it's on Oracle or you can choose any database you want as long as it's SQL server.
That doesn't work anymore and people want the flexibility of choosing more flexibilities. It's given that there are so many rich. There is so much diversity in the database choice you have. One organization cannot have all the expertise about each of these database technologies. Therefore, in order for this to be successful, Trove has to have built into it the best practices for each database. When the people from eBay contributed the code for Mongo, Mongo supported Trove. They operate Mongo at incredible scale. They know how to operate Mongo at that scale. They contribute therefore code with the best practices for Mongo. Tomorrow if you want to deploy Trove in your organization, you get the benefit of all their expertise.
That's how Trove is making it easier for people to do things in their data centers. You asked about upgrades. Trove again operates in the management plan and therefore upgrades are something, which we have to deal with. Right now one of the things we do with upgrades is; when you want to deploy a new instance of [inaudible 00:27:17], you launched a new server and have it migrate the data off and deal with the upgrade by itself. Eventually all the data is stored on a mount point and leave the database to deal with the upgrade. The best we're able to do at this point with that is, on a per instance basis there's some interruption.
Again, if you have something like replication going you can do it with less interruption to the end user. This is something, which is definitely a complex problem, and we're definitely working on it for each of the databases which we support.
Niki Acosta: Are you finding that the customers are still having a hard time trusting a service provider like yourselves to come in and handle that component? or do you think that since it's more built for OpenStack that people know and probably can attest to the difficulties in Trove themselves; so they see you guys as sort of the trusted provider to do that for them?
Amrith Kumar: Let me be very clear. We do not offer a service. We offer software, which people can use in their data centers. The databases, which they use, are databases they already know. We offer them guest images for those databases, which work well with Trove. The people who are using these databases already operate these and understand the complexities. They have realized therefore that what Trove is doing is helping simplify a lot of those.
While there is still a lot of people who are trying to get their head around the idea of first adopting OpenStack then adopting database as a service within OpenStack. There's a thing we're not seeing which we would say I don't trust OpenStack or I don't trust Trove to manage my database. At the end of the day, they realize that Trove is still helping ease provisioning management and the lifecycle of the database. It's not the database itself.
Niki Acosta: Got you.
Amrith Kumar: &nbs