Monday, October 29, 2001

Interview with Sleepycat President and CEO, Michael Olson How to make money with the GPL. How to promote and spread free software. How open source's experience advantage with developers gives companies a competitive edge. Sleepycat President and CEO Michael Olson shows us what happens when free software meets intelligent business strategy.

Could you tell us a little about Sleepycat?

Sleepycat Software was founded in 1996 to develop, maintain and support the open source Berkeley DB product. Our approach to business has been very different from that of many other software companies that started during the past several years. We've always been funded by our revenues, and have never taken any capital from outside investors. We've been profitable since inception.

We have a dual licensing strategy that permits us to distribute an open source product but still make a living off of software licensing for that same product. Open source licensing has given us an enormous installed base and a large pool of developers who know and like our product. Our for-pay licensing strategy has allowed us to hire developers, salespeople and marketing staff, and to promote and support Berkeley DB.

We've doubled revenues annually since we started Sleepycat. Despite the tech downturn in 2001, we expect to record substantial growth in revenues this year as well. The company was started by two people, Margo Seltzer and Keith Bostic. Today we have thirteen employees, mostly in Boston and the SF Bay Area, with a few elsewhere.

Nearly all of our customers are original equipment manufacturers, or OEMs. They embed Berkeley DB in the products that they build, and then ship those through to end users. We have a very small direct sales force to reach our OEM customers.

We have a couple of hundred paying customers, and an unbelievable number of non-paying users under the open source license. I did some work last year to quantify our installed base. Counting all the projects and products that bundle and redistribute Berkeley DB, we estimate that there are more than 200 million copies deployed worldwide. We get about 1200 copies downloaded daily from our Web site. That doesn't count copies from mirrors, copies bundled with other open source distributions, or copies shipped by proprietary vendors.

If you surf the Web, send email, or shop on-line, the chances are that you use our software. Berkeley DB is embedded in network infrastructure products like routers and switches, DNS and Web content caches, email servers and clients, and is used by an enormous number of ISPs and ASPs for Web content delivery or back office services. Companies like Cisco, Sun, HP, IONA, Amazon and Sendmail use Berkeley DB. Open source projects like Cyrus, Squid, RPM, Postfix, and MySQL include it.

We're proud of the success we've had. It's due to the quality of the people that we've managed to attract and retain. We're small, but everyone here is very smart. We have a very senior team of technical people working on an established, mature software product.

We don't look like a typical technology company. In a male-dominated field, we're about evenly split on gender lines. In a work-all-the-time industry, we emphasize the importance of families and interests outside the company. We've always earned more money than we've spent, so we have never had to do layoffs. Lots of companies talk about empowering employees -- at Sleepycat, everybody makes important decisions every day. We have a remarkably liberal benefits policy for a company our size.

Sleepycat is the best place I've ever worked.

How did the Berkeley DB code base come into existence originally?

In 1991, Keith Bostic, Margo Seltzer and I were all at UC Berkeley. Keith was working for the Computer Systems Research Group, which produced the Berkeley Software Distributions (BSD, popularly known as Berkeley UNIX). At that time, the CSRG was trying hard to produce a version of the distribution that included no AT&T copyright code, so that people could get the source code for Berkeley UNIX without having to buy a source license from AT&T.

Margo and I were doing graduate research in database systems. Keith approached us and asked us to produce a version of the dbm library, which is still a widely-used single-user data store on UNIX systems, that was unencumbered by an AT&T copyright. We thought it was an interesting project, and agreed to work on it.

The result was eventually shipped with the 4.4 BSD release as "Berkeley DB". That version had a dbm-compatible interface and supported two storage structures, hash tables and btrees. It was distributed under the same BSD licensing terms as the rest of the BSD software. Not long after it shipped, I went off to do other things. Keith and Margo continued to maintain it, and eventually got to release number 1.85 under that license.

The 1.x code was picked up by a lot of different open source and proprietary developers. Notable among those, for later business reasons, were Sendmail, the SLAPD group at University of Michigan, and the Cyrus project.

In 1996, Netscape Communications decided to build a suite of server tools. That suite was to include an LDAP server, and Netscape recruited a number of the core team from University of Michigan that had done the LDAP work there. With guidance from those developers, Netscape approached Keith and Margo to ask them to add some new features, including support for multiple users and for transactions and disaster recovery, to the 1.85 version of Berkeley DB.

Margo and Keith agreed, and on the strength of that deal founded Sleepycat. The agreement with Netscape left ownership of the new intellectual property with Sleepycat. Margo and Keith wrote a lot of code, hired an attorney, crafted a new license for the 2.x release of Berkeley DB, and filed incorporation papers in Massachusetts.

Since the release of 2.0 in 1997, we've done about three releases per year. We're currently at release 3.3. We'll ship version 4.0 late this year. Our staff of 13 includes nine software developers. That's the team that is doing the engineering work on new releases, the testing, and the software support. The 3.3 release is about 150K lines of C code, with fairly thin API layers for C++ and Java. There are Perl, Python, Tcl, and PHP bindings as well.

How does Sleepycat's dual licensing model work?

The original version of Berkeley DB was, as I said above, released under a BSD license. When Margo and Keith formed Sleepycat in 1996, they wanted a license that would encourage open source projects to use the library, but would allow them to make money from proprietary vendors. They crafted a new license, called the "Sleepycat license," and used that for version 2 (and, later, versions 3 and 4) of Berkeley DB. Version 1.85, the last of the pre-Sleepycat releases, is still available (you can even get it off of our Web site), and it's still under the BSD license. However, the multi-user transactional engine is only available under the Sleepycat license.

The Sleepycat license says that you may download and use Berkeley DB at no charge, provided that

- you do not redistribute your application code off of a single physical site; or

- you make the complete source code for your application freely available at no charge.

These are, effectively, the same terms as the GPL. We didn't use the GPL for historical reasons -- carrying the BSD license and copyrights from 1.85 would not have been possible under a straight GPL. However, the license was designed to work exactly the way the GPL does.

Proprietary software vendors generally can't agree to these terms. They can't afford to give away the source code to products they sell. If a company wants to redistribute Berkeley DB as a part of a proprietary product, they can come to Sleepycat and pay us a fee to purchase different license terms from us. In that case, we sign a pretty conventional license agreement permitting use and redistribution in binary form, without forcing them to ship source. We make the usual representations and warranties, indemnify the customer against certain damages, and so on.

In effect, Sleepycat's dual licensing strategy says that

- if you're open source, so are we; but

- if you're a proprietary software vendor, we look exactly like all of your other proprietary suppliers.

This works for two very important reasons.

First, Berkeley DB is a library. In order to use it, developers must link it with their applications. That gives us leverage over the terms under which the embedding application is distributed. We can force them to use an open source license or to pay us money. This strategy doesn't work for standalone applications like Web servers, relational database servers, or mail servers, because the end user doesn't change those or link directly with them. Note also that this wouldn't work if we applied something like the LGPL to Berkeley DB -- it's only the full-blown GPL-style license we have that gives us the leverage to charge money.

Second, Sleepycat owns the intellectual property in Berkeley DB. Unlike many other projects, there's no developer community outside the company that's contributing code to Berkeley DB. We do the development. In some rare cases, we do get code contributed from a customer. When that happens, we require that ownership of that code be transferred to Sleepycat before we'll incorporate it into our source tree. If we allowed third party contributions that we didn't own, we would not have the standing we need to cut proprietary licenses for our paying customers.

So for example, if MegaISPCorp downloads Berkeley DB and uses it to build the authentication and user database for their Web site, but it runs only inside their data center, then they don't have to release their source code or pay us any money. They're not shipping our code. None of the users who visit MegaISPCorp's Web site need to release anything, because they're not redistributing our software either.

The restrictions apply only to people who actually ship Berkeley DB. That's the action that requires either payment or release of source code. Building a Web service on top of Berkeley DB and making it available via HTTP doesn't require payment or release of code.

It's not quite right to say that "under the Sleepycat license" they can ship closed source. They can't do that at all under the Sleepycat license -- it's effectively the GPL. If they want to ship closed source, they need to pay us for a different license. That license looks just like all the other agreements that vendors sign with each other. The "dual licenses" are the Sleepycat license and a separate license agreement for proprietary use and redistribution.

How does bug fixing work at Sleepycat? That's a big draw of many open source models.

We don't have a large number of third party developers posting bug fixes. Occasionally we'll get a proposed patch from the field. Most often, we get very good bug reports: "At line X in file foo.c, you release a mutex that you already released on line Y in file bar.c because you're not checking condition baz." Customers use the source to investigate problems thoroughly. We generally produce the patch, integrate it into the source tree, and run it through our regression and coverage suites prior to the next release.

We still have many eyes making all bugs shallow, but we don't have many hands making the patches.

One important reason for this is that, in Berkeley DB at least and likely in other database engines, you can't make changes to (say) the locking subsystem unless you understand the assumptions behind recovery processing. People who have been building database servers for a long time understand how all the pieces fit together, but it's hard for a casual programmer to join a project like Berkeley DB and make contributions quickly. There's just too much state to absorb. By contrast, a casual contributor can get up to speed quickly on projects like Apache or Linux, where you can work in an area that's entirely independent of the bulk of the system.

Interestingly, the places where third party contributions *do* happen in Berkeley DB are completely outside the core library. For example, Robin Dunn does a fantastic job on the Python language bindings for Berkeley DB. Likewise, Paul Marquess keeps the CPAN archive up to date with the Perl bindings for the latest release of Berkeley DB. The API bindings don't depend on library internals in any way, and that's a place where we do get some leverage from developers in the open source community. We don't own these, and we can't charge money for them, but we don't need to. It's good for us that people writing code in those languages can use our software.

Do customers come to Sleepycat asking for custom services often? Does this dual license allow Sleepycat to continue development Berkeley DB successfully?

Customers do tell us what new features to put into the product. When we do our release planning, we look at the customer requests we've gotten, decide which ones are interesting to our customer base generally, and include those.

We very seldom get requests for custom development, however. We really don't like those. They take expensive engineering talent and put it on a project that only matters to a single customer, and we can't charge very much for the work. I can count maybe four instances in the last three years where we've done any amount of custom development at all, and all of those were very small projects. Even in those cases, we owned the changes and they got rolled into our main code line, even though the majority of our customers won't take advantage of them.

We vastly prefer to make a living off of software licensing, not services. In fact, three quarters of the money we make comes from licensing, and only a quarter from support and related services. Given that, it's much more in Sleepycat's interest to have our high-powered developers working on features that we can go sell to lots of customers, rather than projects that we can sell to just one or two.

Could Sleepycat exist if Berkeley DB was under the GPL? Do you think the work Sleepycat has done would be (commercially) possible if the original code was GPL'd?

Sleepycat could absolutely exist if Berkeley DB were under the GPL. Our business model depends on our ownership of the intellectual property in Berkeley DB, and on our ability to use dual licensing for companies that don't want to comply with the open source terms of the Sleepycat license. The GPL would permit this in the same way that the Sleepycat license does.

Both Sleepycat and the Free Software Foundation have looked hard at the two licenses, and we agree that the Sleepycat license is compatible with the GPL. This means that GPL'ed projects can use Berkeley DB under the Sleepycat license, because the GPL meets the "open source" requirement of the Sleepycat license and the Sleepycat license imposes no additional restrictions beyond those in the GPL.

A big reason for Sleepycat's success has been the widespread adoption of the 1.85 Berkeley DB code under the BSD license, dating back to 1991. Kirk McKusick has an apt characterization of the BSD license: There's copyleft, which in some sense requires broad distribution of copies, and there's copyright, which is intended to limit the distribution of copies. Then, according to Kirk, there's copy center, as in, "Take it down to the copy center and make all the copies you want." BSD is a copy center license. You can make copies and use them for whatever you want without paying anyone any money.

It's hard to say how Berkeley DB would have fared under the GPL in the early 1990s. Certainly it was well-written and useful, and it would have had some success. However, I can't say whether it would have been picked up by the projects, like SLAPD, that directly led to the formation of the company.

I will say this, though: Sleepycat couldn't exist if the current release of Berkeley DB were under the BSD license. I'm not taking a political stance, here -- I think that open source licenses like the GPL and the BSD license are valuable, and that both have created enormous value. As a business matter, though, the BSD license wouldn't allow Sleepycat to pursue the dual licensing strategy that we have with Berkeley DB.

The business lesson here is that you need to consider your product strategy, your business model, and your licensing terms as a coherent whole. Our answers are embedded storage management, revenues from product licensing, and dual GPL/proprietary terms. If you change any one of those three, the business doesn't work anymore.

Why aren't dual licenses more common among free software businesses?

Most free software projects are standalone utilities. Unless you can impose restrictions on the end user's application code, you don't have the leverage you need for dual licensing. This is simplest for libraries that are released under GPL-style terms, like ours. There are a few cases besides us that I know about. For example, MySQL AB in Sweden has GPL'ed their client-side library, but they'll sell customers proprietary licenses to build MySQL clients using exactly the same dual licensing strategy that we have.

And, as noted earlier, ownership of the IP is crucial. If you've got ownership shared among developers all over the globe, there's no single entity that customers can approach for a closed-source redistribution license.

Is relaxing the GPL's redistribution requirements was valuable to some customers?

There are GPL'ed packages -- like Linux and the GCC toolchain -- that have enormous installed bases, but there aren't so many *libraries* that are widely redistributed under the GPL. The FSF created the LGPL to address exactly this problem: proprietary vendors can't use libraries under the GPL in their closed source products, but the LGPL allows that.

We weren't really being calculating when we released Berkeley DB 1.0 under the BSD license. All Berkeley software was under the BSD license. We just did what the rest of the people in our building were doing. If we'd chosen the LGPL instead, it likely wouldn't have made any difference to how broadly our software got picked up and used by other projects and by proprietary vendors.

I can't say what the difference would have been if Berkeley DB 1.0 had been GPL'ed instead. I can't point to any single early user who would have declined to use Berkeley DB under the GPL.

That said, starting with a BSD license and switching to the Sleepycat license certainly worked for us. It's ironic, really. You often hear that the BSD license is business-friendly, and that the GPL is the great destroyer of intellectual property. Well, in Sleepycat's case, switching to a BSD license would kill our company. Our ability to charge money for our intellectual property depends entirely on a license that's just like the GPL.

Do people ever break licensing terms? How do you manage that?

It happens. Generally it's an accident -- no real company wants to be in violation of another company's intellectual property rights, so it very seldom happens intentionally. When we find out about a case like this, we contact the person or company involved, explain the terms of the Sleepycat license, and point out the violation. Almost every single time, the other party has gotten under paid license quickly. In one or two cases, when they understood the problem, they stopped using our software.

The most common way that we find out about these cases is that someone contacts us for technical support on the product, but we have no record of them in our sales database.

How do businesses feel about using open source software? Does it give you a competitive advantage or disadvantage?

We compete with proprietary database vendors on a lot of fronts -- I'd argue that we generally win on performance, reliability, and scalability. Other factors, including open source, play a role in helping us win deals, but the major impact of open source for us is that it gets us into the deal in the first place.

Certainly companies care about control and visibility into the development process. Because they get the complete source code for Berkeley DB, our customers know they don't need to talk to us about new ports or custom features. Whether they ever do ports or custom features, both matter for planning reasons. During development, the fact that they've got our source code means that writing to the APIs, figuring out how they work, and debugging problems is much simpler. That speeds up development, and that's valuable to customers.

Most importantly, though, developers can come to our Web site and download the complete source for our product quickly and easily. There's no charge for developer licenses and no feature-crippled evaluation version. They get the actual product they'll ship, and they can try it out and integrate it into their products. This is much easier and faster for our customers, and it's good for us, as well: By the time they've decided we have a good solution, we're pretty well entrenched in their product. That makes it harder for our competitors to dislodge us.

This last issue -- ease of access for developers -- is a big competitive advantage for Berkeley DB over proprietary products, which have various problems with open-ended no-cost full-version evals. It helps us win business.

One last comment on this point: The market is generally much smarter about open source licensing than it used to be. Most of our customers at least know the term, and have heard of the GPL. That's both good and bad -- many have heard some of the more polarizing claims about open source, and need to be educated about our license and the business value it conveys on them. There's more fear, uncertainty and doubt among customers than there was a year or two ago, when the words "open source" never entered our conversations with many proprietary vendors.

Why the embedded market? Plans to go elsewhere?

Sleepycat's core strength has always been high-end Internet infrastructure applications -- we dominate the messaging and directory server markets, and we're deployed at the big ISPs and portal sites. We continue to increase our sales across the board in this horizontal market. We think we've got an outstanding product for these applications.

In the last year or so, we've begun to do substantial new business among vendors building "embedded systems." This term gets abused, but generally, it means some special-purpose device, generally without a desktop-style UI, providing a single service. Examples range from the fuel mixture sensor in your car's engine, to a palmtop computer, to a set-top box, to an eight-way multiprocessor providing storage virtualization services. It's a *very* broad market.

The companies using Berkeley DB in this market are generally building appliances that need to scale to moderate numbers of users (say, in the thousands), and that need very fast predictable response times. Examples include network file servers, wireless network gateways, and optical switches. The particulars of each of these are very different from the others, but all of them need fast, reliable data management. Most importantly, you're not allowed to ship a relational database administrator with every box you sell.

Berkeley DB's an ideal storage engine for products like that. There are two reasons that we're so excited about this emerging market.

First, it's growing explosively. Storage virtualization is $8B today, headed for $37.5B in three years. Telco and datacomm, despite the poor performance of the public players today, has a CAGR of 23% through 2005. New companies are forming, getting funding, and buying tools like Berkeley DB for the products they're building.

Second, there is no established leader selling databases in this market today. There's simply no Oracle here yet, dominating the market and booking most of the business. We believe that because of the unique technical characteristics of our product, our strong track record in the business, and our clear focus on the opportunity, we can be that leader. There's a lot of money to be made.
Link to this article


Post a Comment

Subscribe to Post Comments [Atom]

<< Home