Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

Introduction

Introduction

Should You Read This Book?

You probably picked this book off the shelf because you have some level of interest in the PHP language. If you are new to programming in general and are looking to get into the industry with a robust but easy-to-use language, this is not the title for you. Have a look at PHP and MySQL Web Development or Teach Yourself PHP in 24 Hours. Both titles will get you accustomed to using PHP and have you writing applications in no time.

After you become familiar with the syntax and structure of the PHP scripts, you’ll be ready to delve into this title. Encyclopedic knowledge of the userspace functions available within PHP won’t be necessary, but it will help to know what wheels don’t need reinventing, and what proven design concepts can be followed.

Because the PHP interpreter was written in C, its extension and embedding API was written from a C language perspective. Although it is certainly possible to extend from or embed into another language, doing so is outside of the scope of this book. Knowing basic C syntax, datatypes, and pointer management is vital.

It will be helpful if you are familiar with autoconf syntax. Don’t worry about it if you aren’t; you’ll only need to know a few basic rules of thumb to get by and you’ll be introduced to these rules in Chapters 17, “Configuration and Linking” and 18, “Extension Generators.”

Why Should You Read This Book?

This book aims to teach you how to do two things. First, it will show you how to extend the PHP language by adding functions, classes, resources, and stream implementations. Second, it will teach you how to embed the PHP language itself into other applications, making them more versatile and useful to your users and customers.

Why Would You Want to Extend PHP?

There are four common reasons for wanting to extend PHP. By far, the most common reason is to link against an external library and expose its API to userspace scripts. This motivation is seen in extensions like mysql, which links against the libmysqlclient library to provide the mysql_*() family of functions to PHP scripts.

These types of extensions are what developers are referring to when they describe PHP as “glue.” The code that makes up the extension performs no significant degree of work on its own; rather, it creates an interpretation bridge between PHP’s extension API and the API exposed by the library. Without this, PHP and libraries like libmysqlclient would not be able to communicate on a common level. Figure I.1 shows how this type of extension bridges the gap between third-party libraries and the PHP core.

Figure I.1. Glue Extensions


Another common reason to extend PHP is performing special internal operations like declaring superglobals, which cannot be done from userspace because of security restrictions or design limitations. Extensions such as apd (Advanced PHP Debugger) and runkit perform this kind of “internal only” work by exposing bits of the virtual machine’s execution stack that are ordinarily hidden from view.

Coming in third is the sheer need for speed. PHP code has to be tokenized, compiled, and stepped through in a virtual machine environment, which can never be as fast as native code. Certain utilities (known as Opcode Caches) can allow scripts to skip the tokenization and compilation step on repeated execution, but they can never speed up the execution step. By translating it to C code, the maintainer sacrifices some of the ease of design that makes PHP so powerful, but gains a speed increase on the order of several multiples.

Lastly, a script author may have put years of work into a particularly clever subroutine and now wants to sell it to another party, but doesn’t want to reveal the source code. One approach would be to use an opcode encryption program; however, this approach is more easily decoded than a machine code extension. After all, in order to be useful to the licensed party, their PHP build must, at some point, have access to the compiled bytecode. After the decrypted bytecode is in memory, it’s a short road to extracting it to disk and displaying the code. Bytecode, in turn, is much easier to parse into source script than a native binary. What’s worse, rather than having a speed advantage, it’s actually slightly slower because of the decryption phase.

What Does Embedding Actually Accomplish?

Let’s say you’ve written an entire application in a nice, fast, lean, compiled language like C. To make the application more useful to your users or clients, you’d like to provide a means for them to script certain behaviors using a simple high-level language where they don’t have to worry about memory management, or pointers, or linking, or any of that complicated stuff.

If the usefulness of such a feature isn’t immediately obvious, consider what your office productivity applications would be without macros, or your command shell without batch files. What sorts of behavior would be impossible in a web browser without JavaScript? Would you be able to capture the magic Hula-Hoop and rescue the prince without being able to program your F1 key to fire a triple shot from your rocket launcher at just the right time to defeat the angry monkey? Well, maybe, but your thumbs would hurt.

So let’s say you want to build customizable scripting into your application; you could write your own compiler, build an execution framework, and spend thousands of hours debugging it, or you could take a ready-made enterprise class language like PHP and embed its interpreter right into your application. Tough choice, isn’t it?

What’s Inside?

This book is split into three primary topics. First you’ll be reintroduced to PHP from the inside out in Part I, “Getting to Know PHP All Over Again.”

You’ll see how the building blocks of the PHP interpreter fit together, and learn how familiar concepts from userspace map to their internal representations.

In Part II, “Extensions”, you’ll start to construct a functional PHP extension and learn how to use additional features of the PHPAPI. By the end of this section, you should be able to translate nearly any PHP script to faster, leaner C code. You’ll also be ready to link against external libraries and perform actions not possible from userspace.

In Part III, “Embedding”, you’ll approach PHP from the opposite angle. Here, you’ll start with an ordinary application and add PHP scripting support into it. You’ll learn how to leverage safe_mode and other security features to execute user-supplied code safely, and coordinate multiple requests simultaneously.

Finally, you’ll find a set of appendices containing a reference guide to API calls, solutions to common problems, and where to find existing extensions to crib from.

PHP Versus Zend

The first thing you need to know about PHP is that it’s actually made up of five separate pieces shown in Figure I.2.

Figure I.2. Anatomy of PHP.


At the bottom of the heap is the SAPI (Server API) layer, which coordinates the lifecycle process you’ll see in Chapter 1, “The PHP Lifecycle.” This layer is what interfaces to web servers like Apache (through mod_php5.so) or the command line (through bin/php). In Part III, you’ll be linking against the embed SAPI which operates at this layer.

Above the SAPI layer is the PHP Core. The core provides a binding layer for key events and handles certain low-level operations like file streams, error handling, and startup/shutdown triggering.

Right next to the core you’ll find the Zend Engine, which parses and compiles human readable scripts into machine readable bytecode. Zend also executes that bytecode inside a virtual machine where it reads and writes userspace variables, manages program flow, and periodically passes control to one of the other layers such as during a function call. Zend also provides per-request memory management and a robust API for environment manipulation.

Lying above PHP and Zend is the extension layer where you’ll find all the functions available from userspace. Several of these extensions (such as standard, pcre, and session) are compiled in by default and are often not even thought of as extensions. Others are optionally built into PHP using ./configure options like —with-mysql or —enable-sockets, or built as shared modules and then loaded in the php.ini with extension= or in userspace scripts using the dl() function. You’ll be developing in this layer in Part II and Part III when you start to perform simultaneous embedding and extending.

Wrapped up around and threaded through all of this is the TSRM (Thread Safe Resource Management) layer. This portion of the PHP interpreter is what allows a single instance of PHP to execute multiple independent requests at the same time without stepping all over each other. Fortunately most of this layer is hidden from view through a range of macro functions that you’ll gradually come to be familiar with through the course of this book.

What Is an Extension?

An extension is a discrete bundle of code that can be plugged into the PHP interpreter in order to provide additional functionality to userspace scripts. Extensions typically export at least one function, class, resource type, or stream implementation, often a dozen or more of these in some combination.

The most widely used extension is the standard extension, which defines more than 500 functions, 10 resource types, 2 classes, and 5 stream wrappers. This extension, along with the zend_builtin_functions extension, is always compiled into the PHP interpreter regardless of any other configuration options. Additional extensions, such as session, spl, pcre, mysql, and sockets, are enabled or disabled with configuration options, or built separately using the phpize tool.

One structure that each extension (or module) shares in common is the zend_module_entry struct defined in the PHP source tarball under Zend/zend_modules.h. This structure is the “start point” where PHP introduces itself to your extension and defines the startup and shutdown methods used by the lifecycle process described in Chapter 1 (see Figure I.3). This structure also references an array of zend_function_entry structures, defined in Zend/zend_API.h. This array, as the data type suggests, lists the built-in functions exported by the extension.

Figure I.3. PHP extension entry point.


You’ll examine this structure in more depth starting with Chapter 6, “Returning Values,” when you begin to build a functioning extension.

How Is Embedding Accomplished with PHP?

Ordinarily, the PHP interpreter is linked into a process that shuttles script requests into the interpreter and passes the results back out.

The CLI SAPI does this in the form of a thin wrapper between the interpreter and the command line shell while the Apache SAPI exports the right hooks as an apxs module.

It might be tempting to embed PHP into your application using a custom written SAPI module. Fortunately, it’s completely unnecessary! Since version 4.3, the standard PHP distribution has included a SAPI called embed, which allows the PHP interpreter to act like an ordinary dynamic link library that you can include in any application.

In Part III, you’ll see how any application can leverage the power and flexibility of PHP code through the use of this simple and concise library.

Terms Used Throughout This Book

PHPRefers to the PHP interpreter as a whole including Zend, TSRM, the SAPI layer, and any extensions.
PHP CoreA smaller subset of the PHP interpreter as defined in the “PHP Versus Zend” section earlier in this chapter.
ZendThe Zend Engine, which handles parsing, compiling, and executing script opcodes.
PEARThe PHP Extension and Application Repository. The PEAR project (http://pear.php.net) is the official home for community-generated open source free projects. PEAR houses several hundred object-oriented classes written in PHP script, providing drop-in solutions to common programming tasks. Despite its name, PEAR does not include C-language PHP extensions.
PECLThe PHP Extension Code Library, pronounced “pickle.” PECL (http://pecl.php.net) is the C-code offshoot of the PEAR project that uses many of the same packaging, deployment, and installation systems. PECL packages are usually PHP extensions, but may include Zend extensions or SAPI implementations.
PHP extensionAlso known as a module. A discrete bundle of compiled code defining userspace-accessible functions, classes, stream implementations, constants, ini options, and specialized resource types. Anywhere you see the term extension used elsewhere in the text, you may assume it is referring to a PHP extension.
Zend extensionA variant of the PHP extension used by specialized systems such as OpCode caches and encoders. Zend extensions are beyond the scope of this book.
UserspaceThe environment and API library visible to scripts actually written in the PHP language. Userspace has no access to PHP internals or data structures not explicitly granted to it by the workings of the Zend Engine and the various PHP extensions.
Internals (C-space)Engine and extension code. This term is used to refer to all those things that are not directly accessible to userspace code.