Chapter 1. Introduction

Apache is the most widely used Web server on the Internet according to Netcraft's Web server survey. The survey shows Apache has been the most widely used Web server since early 1996, and remains so with a wide margin. Of the 35,114,328 Web sites under survey in October 2002, 60.54% are running Apache. The second most used Web server is software giant Microsoft's Internet Information Server at 28.89%. Judging by its success, Apache must be a real feat of engineering. Oddly enough, the Web server is developed by a group of volunteer programmers distributed across the United States and Europe. There has been no central project management, a mailing list the only means of communicating and synchronizing the development effort.

We are taught that software systems development is an engineering discipline, akin to building bridges or architecting buildings. Certain elements must be in place for a software development project to succeed. As Booch, Jacobson and Rumbaugh writes:

There is a belief held by some that professional enterprises should be organized around the skills of highly trained individuals. They know the work to be done and just do it! … This belief is mistaken in most cases, and badly mistaken in the case of software development. … [D]evelopers need organizational guidance, which … we refer to as the "software development process". [BOOCH1999](Booch et al. 1999, p. XVII)

The problem domain must be charted, and the system requirements formulated. The system requirements detail the software system's functional requirements. By identifying the software system's functional requirements, we are ensured the right system is developed. Once this initial survey is done, it is about developing the system right. A detailed plan is laid down to show how to build the system. Charts and diagrams are drawn to show the system's architecture. Upon completion of these, it is time for the programmers to code the system and make it into runnable software. Surveying, planning, building; the steps required for any engineering activity to succeed.

Looking at the Apache Web server development, two of these elements are missing: surveying and planning. Apart from a standard describing the network protocol used by the Web server to communicate with its clients, there are no other written requirements. This does not mean the Web server is no more than a piece of software processing HTTP requests. It is a file server, returning requested files to its clients. Even such a fundamental piece of functionality as where the server is to look for the requested files, is non-existent in any standard or requirements document. Apart from returning a syntactically correct HTTP error response to the client, how are errors handled internally in the Apache Web server? Not to mention all the added functionality such as access control, database interfaces, virtual hosting, just mention a few. There was never an explicit plan for building these; no initial surveying done, no charts produced, the feature is just implemented.

"Distributed engineering and development depends upon careful planning, coordination, and supervision" [MOON2000](Moon and Sproull 2000). The work process used in developing the Apache Web server is described in a single, short Web page [HARTILL1995](Hartill and Fielding 1995). This is an in-depth description of the project's decision-making process. It is about how to determine whether a new feature or a bug fix goes into the next release or not. A ballot is held to determine what new features and bug fixes to include with the upcoming release. The vote of every participant on the mailing list counts. The timing of the ballot depends on someone feeling like releasing a new version of the Web server. Anybody is free to declare a ballot, and prepare the new release. There is no supervision, although it is possible for any developer to veto a change or new feature. Careful planning, none existent. The only means of coordination is communication pr. e-mail on an electronic mailing list.

Yet, the Apache developers succeed with their endeavors. After a year or so after starting working on it, theirs is the most widely used Web server on the Internet. In 1996 it is adopted by the Internet Engineering Task Force as a reference implementation of the new release of the Hypertext Transfer Protocol, version 1.1. Even though doing almost everything wrong from an engineering perspective, the Apache developers are obviously succeeding with their efforts. How come?

Can it be that software systems development is more than just an engineering activity? Can it be that other factors play an equally important role in developing software? What exactly is it that the Apache developers are doing, and how can it affect the way we understand software systems development?

During the mid- and later parts of the 1990s, both Microsoft and Netscape funneled enormous sums into developing the emerging Web technology. With no commercial backing until IBM starts showing interest in 1998, Apache is one of the prime actors in setting the agenda for innovating Web technology. Features that are taken for granted today, were not even thought of when Tim Berners-Lee developed the first Web server. Even though money is spent on corporate research, innovations spring from the volunteer efforts. The PHP scripting language, for instance, has its roots as an embedded scripting language for server-side processing of Web pages on Apache. Microsoft launches its competing technology, Active Server Pages, a year after PHP's launch in 1995. PHP was originally written by a young programmer from Canada for tracking access to his resume. From this it grew into one of today's most widely used embedded scripting languages for server-side processing of Web pages.

The story of PHP is not unique in the history of the Apache Web server. Instead it seems to be more of the norm. Several of the technologies that today make up the World Wide Web, originated as part of the Apache Web server. What is particular to the team developing Apache? What makes the Apache developers innovate? Maybe innovation is more than just pure analytic, cranial work? It may seem that other factors influence the evolution of technology. An interesting fact with the Apache developers is that all developers are also users of the Web server. They are webmasters or even professional consultants selling services on the World Wide Web. Can the particular relationship between user and developer play a role in innovating?

Towards the end of the 1990s software created by the hacker underground emerged as viable alternatives to its commercial counter-parts. Linux and Apache perhaps are the most renowned and widely used software systems to emerge from hacker communities. There is mythical ring to the name hacker. Popular scientific literature [LEVY1984](Levy 1984) has had a tendency of romantically portraying the hacker as a techno-anarchist with an uncanny ability to write ground-breaking software. Some attribute the hacker communities with significant innovative abilities [DIBONA1999](DiBona et al. 1999). Others accuse the hackers of simply chasing taillights, imitating software already developed by commercial actors [VALLOPPOLLI1998](Valloppolli 1998). During the worst part of the 1990's IT craze, hacking was even claimed the foundation of a new software economy [YOUNG1999](Young 1999). But what is hacking, and how does it relate to software systems development? What is it with hacking that makes it different from software engineering, and how does that relate to innovation?

Attempts have been made to prescribe hacking as an approach to software systems development [BROWNE1998](Browne 1998) [HANNEMYR1999](Hannemyr 1999) [RAYMOND1998](Raymond 1998). Hallmarks of hacking have been identified. Eric Raymond (1998) presents a list of enabling conditions for to succeed with developing large-scale projects through hacking. His work is based on personal experience. Raymond is one of the hacker community's most outspoken representatives. His work describes only the trappings of hacking, so the question still remains: what is hacking? Maybe the a way to understand what hacking is, is by looking at what a hacker does? Would it be possible for such an approach to uncover new insight into hacking as a form of software systems development? If it is so that hacking has produced great innovations, would it maybe be possible to learn something about innovation as well by studying hacking?

With the Internet's rise from academic obscurity in the early 1990s to mass use by the middle of the decade, tele-commuting and computer supported distributed work became the order of the day. A wide range of software for computer supported cooperative work was developed to assist the new distributed form of working. Large corporations developed and installed elaborate groupware systems perceiving massive gains in productivity from the introduction of technology [MONTEIRO2000](Monteiro and Hepsøe 2000). Without little or no work apart from introducing the technology, it was believed that groupware and the Internet would change the way organizations cooperated and exchanged expertise [ORLIKOWSKI1992](Orlikowski 1992).

The Apache project uses only an electronic mailing list and a file server in its collaborative work. Yet, it seems to suffice. The mailing list is archived for future reference. That's it when it comes to saving organizational knowledge electronically. No elaborate groupware seems to be required for a relatively large, distributed collaboration effort to succeed. What lessons can be learnt from the Apache project about distributed collaborative work? Aren't the Apache developers exchanging information and knowledge? Can it be that there are other elements other than technology that need to be in place in order for distributed, collaborative work to succeed?

This thesis is an effort to understand the software systems development practices of the Apache hacker community based on a case study of the Apache project. Is there something in their seemingly ad hoc approach that can shed some new light on how better to develop software? Is there something in their approach that makes this way of developing software more ideally suited for innovation? And in what way does it enable us to say something about distributed computer supported collaborative work?