8 Mbits on the left lane
Web development, Google, PHP, Firefox, HD DVD, Canon HV20, HTML, NAS, satay, privacy
PHP 6 has been in the making for a while now and is hopefully going to come out this year. Although no official release - not even alpha - has been made, it's quite possible to use it right now by compiling the nightly build. Why would you want to use PHP 6 to begin with ? Well the biggest reason is Unicode: handling non-latin language with PHP 5 is a tricky exercise because it doesn't recognize Unicode character encoding natively (PHP considers that all characters are 1 byte long). As such the only solution is to use dedicated extensions like mbstring and be extra careful with other functions that expect strings, eventually filtering and converting when necessary.
PHP 6 however knows how to use Unicode (UTF-8 to be precise) natively, and properly configured will handle all strings as Unicode (some changes have been made to handle binary strings as a different type). Best of all, to make upgrading easy, it can use a mismatch of different character encodings seamlessly : you can input and output proper UTF-8 Unicode, store data in a MySQL database configured a different character set all with a scripts written in another one.
Disclaimer: the following is written using whatever information I could gather at the time, as PHP 6 is in the making, it could change over time and I'll try to update this post accordingly.
Since the code is not officially out and subject to bugs as well as compatibility-breaking changes, you might just want to make your current code future-proof and ready for the big release. Or you can try the nightly build at your own risk, but it might not be polished enough for a production environement. If you do install it right away, the following php.ini settings will make it work into Unicode mode :
unicode.semantics = on
And for inputing/outputing into UTF-8 :
unicode.http_input_encoding = utf-8
unicode.output_encoding = utf-8
As I wrote above, you don't have to write your scripts in Unicode to process and output Unicode. But it will certainly make things a lot easier. You can tell PHP what is your default script encoding inside php.ini with :
unicode.script_encoding = utf-8
From there you can either convert/write your scripts in UTF-8 with your favorite text editor, or warn PHP of what encoding each file is using by starting your code with an encoding declaration such as declare(encoding="iso-8859-1" ); (it has to be the very first PHP code line). The above declaration is PHP 5 compatible (it then doesn't do anything of course), so it's a good habit to use it everywhere right now. It also makes it easy to share include files between scripts running PHP 5 and PHP 6 and/or in different encoding (very convenient for progressively rolling out your upgrade).
First if you are still using PHP4 (but why ?), well obviously you should start by applying all the changes necessary to work with PHP5.
The ereg extension (regular expressions) is moved to PECL, which mean it's no longer installed by default. The PCRE extension however stays there, and as it's more powerful as well as faster, there's no reason to convert all your code right away to it (all right, so you spent nights trying to get those regexp working and now you have to change them again).
register_globals is gone for good (and it was already just an option in PHP 5), so you need to access all your input parameter with the $_REQUEST[] or other global arrays. $HTTP_POST_VARS and $HTTP_GET_VARS are gone also.
magic_quotes are no more. Good riddance! You can also wave good bye to safe_mode.
Then there are a few rough edges : an exemple is urlencode() which is supposed to process and output non-Unicode string only, but rather than silently convert your string to a binary string (1 byte per char) it just fails and returns a warning. The simple fix is to typecast your string first to binary using something like urlencode((binary)$some_string). This is also normally compatible with PHP 5.2.1 and above.
Once your PHP code is cleaned up and you have made the switch, you might also want to make sure your database is Unicode compatible. If you are using MySQL, it's usually configured to run some western-latin encoding by default (in which case PHP 6 will send queries in Unicode, and MySQL will transparently convert the best it can). You can convert each table to Utf-8 and use it's full character set by running a ALTER TABLE name_of_table CONVERT TO CHARACTER SET utf8 query (as usual, don't forget to do backups first in case something goes wrong). Do note that as utf8 can use multiple bytes for a single character, it will reduce the amount of space a text or varchar field can accomodate. If your table has such fields that are already full in a 1 byte/char encoding, converting it to UTF-8 might end up "cutting" the end of the text as space is lacking - data loss.
You might also want to serve Javascript, plain HTML or XML or even CSS as UTF-8 : you can use your .htaccess to do that (if you run Apache) by adding a AddCharset utf-8 .html line for every extension.
You can read one of the presentation of Andrei Zmievski about Unicode and PHP 6, which greatly helped me upgrade my own code. Also check out Minutes PHP Developers Meeting which has a list of the planned major changes.
There are probably over a 100 different PHP template engines out there (the most famous one being Smarty). Templates are a good concept: keep the presentation and layout in one file, keep the business logic in another (and data storage in a third: the database). Use a simple and straightforward language to make templates, so that any graphics designer or HTML artist can use it, and leave the messy programming to something and someone else. Yes, templates are good with Java, C#, Python or whatever language you can think of. But not PHP.
Why is that ? Well here's a hint: PHP stands for "Hypertext Preprocessor". That sounds an awful lot like the description of a HTML template engine. And in fact, PHP is a sort of template engine: it was designed as a simple and straightforward language that even a graphic designer could use, and that you put directly inside your HTML to control the output (just as any other template language does). So what all those PHP template engine do is run a template engine on top of another template engine. Duh!
Consider for example this lines of a Smarty template:
{$title}
{include file='header.tpl'}
{if $a}
Hello
{else}
Bye
{/if}
And now the same thing in PHP:
<?php
echo($title);
include('header.php');
if ($a)
echo('Hello');
else
echo('Bye');
?>
I mean, what's the point of reinventing the wheel ?
Template engines requires the learning of a new syntax, slow down stuff (because you need to parse and process the template) and basically, as any extra layer of complexity does, increase the chances of hitting a bug or a security hole while making debugging harder. They only make sense in languages where outputting HTML is hard and painful. Is that the case with PHP ? Obviously not.
Of course, I'm not saying you should not separate presentation and logic. But you can do that without having to drop PHP: write a main PHP page with the presentation and include the logic in external files, or do the opposite and handle the output with templates written in plain PHP and HTML. Or do both and keep everything in neatly organized files.
In my first installement of Fast Web Sites I mentioned the possibility of using a script to serve several Javascript (or CSS) files together to gain speed. Here's a simple and rough PHP script that could do such packaging:
header('Cache-Control: max-age=3600, must-revalidate');
header('Content-type: text/javascript');
ob_start('ob_gzhandler');
if ($_REQUEST['f'])
{ $tab=explode(' ',trim($_REQUEST['f']));
foreach ($tab as $file)
if (preg_match('/^([0-9a-z_\-]+\/)*[0-9a-z_\-]+\.js$/i',$file)) readfile($file);
}
The script is supposed to be installed in the directory where Javascript files are stored. In order to use it you would do something like this to serve up 3 different Javascript files at once:
<script src="/javascripts/script.php?f=file1.js+file2.js+file3.js" type="text/javascript"></script>
A few words on the code :
Adapting the script for CSS should be trivial. As it turns out, Rakaz explains and develop the same idea, and goes even further by using mod_rewrite to present a clean URL that hides the packaging script.