Drift Into View 4: A PhD On PHP

Monday, 20 December 2004

This is the fourth of five articles detailing the tragic series of events that culminated in the web project crime against humanity known as Electron Drift. The first article was on motivation and initial design. The second article covered the use of MySQL as a database to hold site articles. In the third article, I moaned about the hard time I experienced setting up IIS as a development environment web server on my PC. Now we're over that particular hurdle, we can now discuss PHP, the programming language that Electron Drift is built on.

Unlike most of my articles, I am not going to expend paragraphs on how to "set up" PHP on a PC. I did have to tell IIS to call the PHP parser when it handled a file with a PHP extension, but that was about it. I did not experience any major problems, which is a crying shame, as most developers love to complain. Developers love to complain about their coworkers, but if that fails, then the systems admin guys can always take a fall. There is also no end of coders complaining that this language or platform sucks more than others. Guess where all those gaming teenagers who used to write terabytes of newsgroup posts in threads like "your Nintendo machine sucks!" and "Sega rules!"? They grew up, became developers and are now employed in your organization.

Even Ma And Pa Could Use It

Using and abusing PHP is straightforward so let's just jump straight into code. In any file with a recognised PHP extension, PHP code fragments are delimited by <?php and ?>. Programming syntax is reminiscent of Perl. Here's an excerpt from the Electron Drift code, which outputs the full article index that is available from the navigation bar on the left of this page.

<?php
$articlelist = db_retrieve_all_titles($live);
while($result = mysql_fetch_array($articlelist))
{
   print '<p><a href="index.php?article='.$result['articleid'].'">'.
      $result['title'].'</a> ('.
      produce_site_date($result['pubdate']).')';
   if ('PREP' == $result['status'])
      print ' <span class="note">Not currently published.</span>';
   print '</p>';
}
?>

Let's go through the above code fragment and assume you've never seen PHP code in your entire life. If you're used to C or one of its derivatives you should find the code readable. Unlike C, you will observe that the $ symbol is used to represent a variable. Even more unlike C, PHP is quite forgiving when playing with variables. There is no need to explicitly define what datatype is in use, or declare variables prior to using them. Variables can also slip from datatype to datatype on the fly. This is nice because it encourages speedy development with minimal planning but this is also bad because it encourages speedy development with minimal planning.

In this example, $articlelist receives something akin to a recordset, the results from a database query. The while statement traverses the results, retrieving an array for each row returned. If I had wanted to shield the main code from the choice of database then I should have used the mysql_fetch_array command inside the db_retrieve_all_titles and relocated the results into a two-dimensional array there. As it was, I was not particularly bothered, especially considering PHP and MySQL are such a popular pairing, although they have had a public spat in recent times. No divorce on the cards though.

Going into the while loop, the print statement is used to throw data back out into the real world, into the HTML body. What is printed will replace the PHP code block. To make a bit more sense of the print statement, note that PHP uses the period character to attach strings together.

Array data can be accessed in a number of different ways, but I am particularly fond of using names in PHP rather than explicit numerical indices. As explained a moment ago, the mysql_fetch_array command will reformat one database result into an array, but each element of the array will be assigned the corresponding column name from the query. Thus, $result['articleid'] gets me the articleid column of this particular $result row.

Astute readers will notice there is no error handling present. As this was my first, experimental PHP outing, I concentrated on raw programming and decided that proper error handling would be something for a second PHP project. I am not usually known for skipping out Good Programming Practices like error handling, but this was a personal project and I wanted to make visible progress quickly - recall from the previous episode that Electron Drift had been held up for months. Still, this came back to bite me on my shiny proverbial. A lot of those missing error checks would have been handy when it came to debugging.

The previous example does nothing but call a few functions and throw data into the HTML body. Electron Drift has two specific features which demanded a little more concentration than Hello World: database access and session management.

Touching Base

On paper, MySQL database access is fairly simple. There are functions provided with PHP for MySQL database access. It's just a case of initiating a connection with something like mysql_pconnect, selecting the database with mysql_select_db, sending queries with mysql_query and then dissecting any results with mysql_fetch_array. Maybe I have simplified things a bit, but that's it in a nutshell. As an aside, if you have any user input, a responsible web developer must take care of those potential SQL injection attacks.

I was not sure where to put my database parameters. All the (non-web) work I have done in the past, has always resulted in the database access details being located in a private safe, that can only be unlocked by reciting a special magic incantation while dancing to the tune of Climb Every Mountain. However, it seems that this is not the done thing with PHP. Every example I found seemed to have the parameters in the code somewhere. If I was lucky, they might put them into constants instead of hard-coding them directly into the database connection call. The whole thing made me shudder a little, because the password details are embedded in a PHP file which has to be publicly accessible. Your web server should strip out PHP commands before they reach the real world, of course, but...

I should note that not all was plain sailing with the database connectivity on my PC. From MySQL version 4.1, a new authentication system was introduced. Great, sounds more secure. However, the MySQL client libraries which were included as part of the PHP 4 package I had installed were unable to connect through this new authentication procedure and I kept getting the touching message "Client does not support authentication protocol requested by server; consider upgrading MySQL client". I opted for a simple solution, I just told MySQL to use the old authentication system; you will notice the setting old-passwords=1 in the MySQL article. Note that all of the passwords need to be refreshed for this to work.

Sessions

If a user can set preferences, refine search criteria or sign in to a web site, a session is required. Think of a session in terms of a user session. The user starts accessing the web site and the user's browsing session begins; the code must somehow be aware that this is the same user session in progress when calling up different pages. That way, the site can respond more permanently to a user's ongoing actions. How else will you display the message "You are logged on as theL33Tone" to theL33Tone and no other user?

Where does Electron Drift need sessions? If the site was merely a viewer for articles then sessions would not be necessary. However, if you were able to take a peek under the hood and you would find that Electron Drift has article input and editing functionality, as shown by the image below. The article functions are only available if I log on as a special user; to keep the user logon status alive, a session must be used.

Electron Drift Article Input Screenshot

Sessions are not automatic. On every web page that session data is in play, special PHP must be used to invoke a session. A session_start() must be issued in the first PHP code block on the page and there must be nothing else before this code block. Nothing at all, no comments, no DOCTYPE declarations, no dust, nothing. Put even a single character in and the PHP god will strike you down. I know. It happened to me and I couldn't walk for days.

Session management, however, is automatic in PHP so you do not need to worry about the details. For example, sessions are usually managed through the use of cookies but PHP takes care of that for you. Variables can be kept alive throughout the duration of a session by storing data in a special session array called $HTTP_SESSION_VARS array (or $_SESSION for short).

My last comment on sessions is that, indeed, it is a little confusing that session_start appears on every web page regardless of whether a new session is starting or not. This is particularly strange when you want to destroy a session; you really, absolutely have to issue a session_start() on the page first. It is better interpreted as make_session_data_visible_thanks().

Perspective Matters

I want to change topic and move away from coding concepts into something a little more ambiguous. Recall that Electron Drift was my first dynamic web project; I had not implemented code for a web site before. JavaScript, PHP, Cold Fusion and ASP were just empty words to me, or at least empty acronyms. My original understanding of dynamic web page development was executable code embedded in the HTML. The web server would notice that the page had a code stowaway and then call the appropriate interpreter to run the code on the fly; output, if any, would be pasted onto the page where the code had once dwelled.

How valid was this perspective? Not at all.

A PHP file is primarily a PHP script that might have HTML in it. It is easy to overlook this point, because HTML can often dominate the structure of a PHP file and, in fact, there is no requirement for there to be any PHP code in there at all. Despite this, there are PHP files that contain no HTML; functions are commonly stored in separate PHP files and such PHP function libraries are unlikely to harbour any explicit HTML.

Here is an example that proves that PHP is in charge of a PHP file. It is representative of the sleight-of-hand techniques that make PHP very useful, but sadly send you to the local mental hospital if you think too much about them. To summarise, the HTML in the lower part of this code segment will only be displayed when the else condition of the PHP is triggered, even though it lies outside the PHP code.

<?php
if (!$admin_user) {
  print '<p>Hail and well met, fair user!</p>';
  $done_sheep_user_header=1;
}
else {
  ?>
<h1>Welcome my lord Administrator Acrinimiril.
Here is your staff, sir.</h1>
<p>This text only appears if we come through
the else condition.</p>
  <?php
  $done_godadmin_header=1;
}
?>

Add to this potential confusion the issue of web structure imposed on code. Let's take an example from this site.

If I click a button which represents "save the article" to the web site, I probably want to save the article and then show the saved article. The normal way this is managed in PHP is to incorporate all of the code handling the different user options into a single page; if you hit "save" then the same web page will be run again, but parameters will be different this time around. I dislike this because it smacks of creating a single subroutine (the PHP page) which can peform different functions based on its input (the user action), breaking function modularity. The web page is more than HTML for the browser; the page itself has become a code library that the user interacts with.

If I was programming in a non-web language, I would have created separate functions or maybe, egad, a switch statement. Because I hated this idea so much, I ended up creating evil, trying to recapture the spirit of clearly delineated code. You do not want to know what I did, but simply know this: only seven people on this small village called Earth have seen the code and they have all paid the price. Two disappeared mysteriously, four died in horrific circumstances and one is utterly insane.

You are now wondering what the Hell my point is. To a seasoned web developer, none of this is confusing at all. To your newbie, web programming involves a subtle but unavoidable paradigm shift, particularly because the phrase "paradigm shift" is sexy and I just had to include it once. Web development proves to be different on a number of levels that are not obvious immediately.

As I developed in PHP, I could see how one could easily lose sight of basics like input validation, error handling and the most important feature of all - security. Keeping primary code, reusable code sections, visual style and content safe, secure and clearly distinct is a tall order. I wouldn't be surprised that there is a large body of messy web work out there. These abstractions don't just leak, they've drowned whole programming teams.

After Considerable Consideration

To my pleasant surprise, I picked up the basics of PHP quickly, although I did find that I made better progress by not wrestling with my personal language demons and just getting on with the coding. It took a bit more effort to become accustomed to treating web pages as spaces containing functionality as opposed to web pages with code attached. The web site you are now admiring is the result, although please excuse the lack of error handling. Cheers.

I have just one more thing to say before I leave you. PHP is still evolving, such is the nature of open source, and I do have some concerns that the language may get too complicated. I see some similarities in this situation to that of Visual Basic.

VB is an effective RAD tool, but much-maligned as it is perceived by some as the tool of the amateur making an easy but shoddy buck in the IT market; you wouldn't find VB in a professional's toolbox. PHP has been a great tool for throwing together a web site quickly, yet it did not originally have "professional" features like full object-oriented programming support.

Do you know what VB is being replaced with? Nothing. VB is dead. VB.NET is just C# in another guise. Where are all the RAD tools going?

End