8 Mbits on the left lane
PHP 6 has been in the making for a while now and is hopefully going to come out this year. Although no official release - not even alpha - has been made, it's quite possible to use it right now by compiling the nightly build. Why would you want to use PHP 6 to begin with ? Well the biggest reason is Unicode: handling non-latin language with PHP 5 is a tricky exercise because it doesn't recognize Unicode character encoding natively (PHP considers that all characters are 1 byte long). As such the only solution is to use dedicated extensions like mbstring and be extra careful with other functions that expect strings, eventually filtering and converting when necessary.
PHP 6 however knows how to use Unicode (UTF-8 to be precise) natively, and properly configured will handle all strings as Unicode (some changes have been made to handle binary strings as a different type). Best of all, to make upgrading easy, it can use a mismatch of different character encodings seamlessly : you can input and output proper UTF-8 Unicode, store data in a MySQL database configured a different character set all with a scripts written in another one.
Disclaimer: the following is written using whatever information I could gather at the time, as PHP 6 is in the making, it could change over time and I'll try to update this post accordingly.
Since the code is not officially out and subject to bugs as well as compatibility-breaking changes, you might just want to make your current code future-proof and ready for the big release. Or you can try the nightly build at your own risk, but it might not be polished enough for a production environement. If you do install it right away, the following php.ini settings will make it work into Unicode mode :
unicode.semantics = on
And for inputing/outputing into UTF-8 :
unicode.http_input_encoding = utf-8
unicode.output_encoding = utf-8
As I wrote above, you don't have to write your scripts in Unicode to process and output Unicode. But it will certainly make things a lot easier. You can tell PHP what is your default script encoding inside php.ini with :
unicode.script_encoding = utf-8
From there you can either convert/write your scripts in UTF-8 with your favorite text editor, or warn PHP of what encoding each file is using by starting your code with an encoding declaration such as declare(encoding="iso-8859-1" ); (it has to be the very first PHP code line). The above declaration is PHP 5 compatible (it then doesn't do anything of course), so it's a good habit to use it everywhere right now. It also makes it easy to share include files between scripts running PHP 5 and PHP 6 and/or in different encoding (very convenient for progressively rolling out your upgrade).
First if you are still using PHP4 (but why ?), well obviously you should start by applying all the changes necessary to work with PHP5.
The ereg extension (regular expressions) is moved to PECL, which mean it's no longer installed by default. The PCRE extension however stays there, and as it's more powerful as well as faster, there's no reason to convert all your code right away to it (all right, so you spent nights trying to get those regexp working and now you have to change them again).
register_globals is gone for good (and it was already just an option in PHP 5), so you need to access all your input parameter with the $_REQUEST[] or other global arrays. $HTTP_POST_VARS and $HTTP_GET_VARS are gone also.
magic_quotes are no more. Good riddance! You can also wave good bye to safe_mode.
Then there are a few rough edges : an exemple is urlencode() which is supposed to process and output non-Unicode string only, but rather than silently convert your string to a binary string (1 byte per char) it just fails and returns a warning. The simple fix is to typecast your string first to binary using something like urlencode((binary)$some_string). This is also normally compatible with PHP 5.2.1 and above.
Once your PHP code is cleaned up and you have made the switch, you might also want to make sure your database is Unicode compatible. If you are using MySQL, it's usually configured to run some western-latin encoding by default (in which case PHP 6 will send queries in Unicode, and MySQL will transparently convert the best it can). You can convert each table to Utf-8 and use it's full character set by running a ALTER TABLE name_of_table CONVERT TO CHARACTER SET utf8 query (as usual, don't forget to do backups first in case something goes wrong). Do note that as utf8 can use multiple bytes for a single character, it will reduce the amount of space a text or varchar field can accomodate. If your table has such fields that are already full in a 1 byte/char encoding, converting it to UTF-8 might end up "cutting" the end of the text as space is lacking - data loss.
You might also want to serve Javascript, plain HTML or XML or even CSS as UTF-8 : you can use your .htaccess to do that (if you run Apache) by adding a AddCharset utf-8 .html line for every extension.
You can read one of the presentation of Andrei Zmievski about Unicode and PHP 6, which greatly helped me upgrade my own code. Also check out Minutes PHP Developers Meeting which has a list of the planned major changes.
Also do not initialize objects with the reference operator.
ie:
$obj =& new SomeKindOfObject(); //WRONG!
$obj = new SomeKindOfObject(); //RIGHT!