Sunday 25th May 2008

Bare-bones Rails-style MVC Request Router for PHP

Tagged:

Route 66 image

After playing with Rails properly for a week or so now (more on this shortly!) and looking enviously at it for like, forever, I decided to see if I could replace the all the messy MVC routing code a lot of my PHP projects have with a simple class where routes can be configured a bit like routes.rb.

Enter router.php

Before we start, THIS IS NOT YET ANOTHER MVC FRAMEWORK FOR PHP, there are plenty of those to choose from already. Rather, this code assumes you already have most of the infrastructure in place for talking to the database, object->relational mapping, etc. This code is intended as a lightweight (around 200 lines) model for handling rewritten urls and dispatching requests to the appropriate controller and action.

What you need:

  • A recent(ish) PHP - 5.1.3 or later ought to do it. Older versions of PHP 5 don't have all the reflection voodoo we need
  • Apache webserver with mod_rewrite turned on (I think it's probably on by default in most setups). You may be ok with other servers if they support rewritten urls, but the sample code works with apache.

Get the code

It's worth dowloading a copy of the code before reading any further, as it'll all make a lot more sense when you see the examples below in context.

Grab a tarball of the newest version (the only version at time of writing!) here:

https://github.com/pokeb/php-mvc-router/tarball/master

Alternatively, grab a copy of the git repository:

$ git clone git@github.com:pokeb/php-mvc-router.git

Installing the basic example

1) Stick the mvc-routes folder somewhere your webserver can get to it, and setup a virtual host to point at the htdocs folder. eg:

<VirtualHost *:80>
    ServerName mvc-routes
    DocumentRoot "/Users/ben/Sites/mvc-routes/htdocs"
</VirtualHost>

In this example, I'm just hosting it on my local apache, so I've just added mvc-routes to my hosts file.

2) Copy htdocs/_htaccess.default to htdocs/.htaccess, and change the first line to point at the mvc-routes folder, eg:

php_value include_path .:/Users/ben/Sites/mvc-routes

3) Copy config/config.default.php to config/config.php and set SITE_PATH to the same path as above:

<?php
define("SITE_PATH","/Users/ben/Sites/mvc-routes");

That should be enough to get you up and running.

How it works

You specify routes in config/routes.php. This array is automatically included by bootstrap.php, and defines a mapping between urls and your controller objects.

Each item in the $GLOBALS['routes'] associative array defines a single route, where the key is the url, and the value is the controller (and optionally action) it should map to.

Simplest possible example

<?php
$GLOBALS['routes'] = array(
	'/hello-world' => 'hello_world'			
);

Requests in the browser for '/hello-world' will instantiate a new hello_world_controller (defined in controllers/hello_world_controller.php), and call the method 'index'. 'index' is the default action that will be called if no specific action is specified (see below).

Calling a specific action

<?php
$GLOBALS['routes'] = array(
	'/hello-world/say_hello' => 'hello_world:say_hello'			
);

In this example, requests to '/hello-world/say_hello' will instantiate a new hello_world_controller and call the say_hello method.

Automatically call an appropriate action based on a url

<?php
$GLOBALS['routes'] = array(
	'/hello-world/[action]' => 'hello_world'			
);

When you specify the [action] in the url, the router will call the method with the same name, so:

/hello-world/say_cheese	- will call the say_cheese action
/hello-world/go_to_sleep - will call the go_to_sleep action

Displaying a view without a controller

For relatively straightforward pages, creating a controller to display them may be overkill. If the router can't find a controller that matches the name you specified, it will look in /views for a view instead.

'/fairly-static-page' => 'fairly_static_page'

In the above example, if no controller exists in controllers/ called 'fairly_static_page_controller.php', the router will attempt to just include 'views/fairly_static_page.php'.

Auto-magic instantiation

'/projects/(project)' => 'projects:view',

In the above example, the router understands that '(project)' is a reference to a project model, and will try to create an instance of project by calling whatever appeared in the url as a parameter to the project class's constructor.

So, a request for

/projects/123

..will perform the equivalent of:

<?php
$project = new project(123);
$projects_controller = new projects_controller();
$projects_controller->view($project);

Notice how the newly instantiated object is passed as a parameter to the action we are calling.

Named parameters

For more control, you can name parameters in your url:

'/friends/:user/:friend' => 'friends:view_friend'

In this case, a request for:

/friends/bobsmith/billjones

...will perform the equivalent of:

<?php
$friends_controller = new friends_controller();
$friends_controller->parameters = { "user" => "bobsmith", "friend" => "billjones" };
$friends_controller->view_friend();

Note that named parameters are set in an instance variable of the controller, rather than passed as parameters to the action method.

Using named parameters is especially useful if you need to instantiate objects based on several components of a url. For example, if you have a friend class where 'friend' is a joining table between users in a database, you might use their two usernames as a composite key, and create a friend object using:

$friend = new friend("bobsmith","fredblogs");

Other features

The router class performs case-insensitive matching, so /Things/Stuff is treated the same as /things/stuff.

It also converts '-' to '_' when calling controller methods, so you can use '-' or '_' in your URLS interchangeably.

Wrap up

Get the code, poke around, and if you find router.php useful, please let me know!

Posted by Ben @ 16:05 PM

Sunday 31st Aug 2008

BBEdit 9

Barebones have just released a new version of the venerable (17 years and counting!) Mac text editor, BBEdit.

For me, and I suspect most other people, the most important addition is the new Projects feature. Over the last few years, it seems like I've been experimenting with a different text editor every other week. It wasn't that I didn't like BBEdit (more on this in a bit), but that I couldn't help thinking that constantly switching to the finder to open files was sucking up an awful lot of time. So, having tried and failed to find another editor that I really liked, I now find that Barebones have finally got around to answering my prayers. The BBEdit implementation of projects is not without its problems, however.

Open, Slowly (⌘⇥)

For starters, there doesn't appear to be an 'Open Quickly' feature (as in Xcode or TextMate).

This is a curious omission. Since setting up a project implies you'll be working on several files at once, it's strange that opening a different file in the same project is still awkward. I should have thought this is the one area where users could get the biggest productivity boost from a project-orientated workflow, but no dice.

I do hope this feature will show up in 9.1.

Not got Git?

Michael Tsai highlights the lack of Git integration in his beautifully succinct roundup of the new release. I was thinking the same thing when I tried out the new version of Coda a few days ago - I know I said I wanted SVN support when you brought out 1.0, but, well, I...

Then I realised that I never actually use SCM integration features anyway. BBEdit has had SVN integration for a while, as has Xcode, and I still have a decent number of projects in SVN, but I guess I just don't see the point. A visual SCM tool needs to make some task that would be a pita to accomplish in the terminal straightforward. Github seems to pull this off nicely - the visual diffs are great, but I suppose I need this fairly infrequently anyway, so having it built-in to an editor isn't that important to me. For my workflow, something integrated into the file browser, like Tortoise SVN or SCPlugin would probably make the most sense. Adding new files to repository tends to be about the only thing with SCM that regularly takes any time at all, but I guess once you're used to the command line tools, it's hard to break the habit.

Actually, the projects feature doesn't really fit with the SCM stuff in BBEdit. You still specify your SCM servers in the preferences window, rather than attaching them to a project. In fact, projects don't really seem to have any settings at all - once you've made a project, and added files, you've pretty much reached the end of their capabilities. You can't specify SFTP servers, or remote/local URLS for testing. You can't even give a project a name, beyond the file name of the project file, and even this isn't exactly prominently displayed for ease of switching between open projects. I guess a lot of this comes down to the fact that BBEdit is intended to be a general purpose text editor, rather than an IDE for web development, but given the array of HTML tools it provides, I can't help wanting a little more.

A waste of space

Something I really dislike about BBEdit's projects implementation is the way it handles the list of open files. BBEdit added the open files drawer a few versions back, and I can't say I've ever really taken to it:

For starters, it's a drawer, but I won't go into why they suck so badly here. But look what happens to a project window in BBEdit 9:

That's right: by default, you lose screen real estate on BOTH sides of the edit window! The left side is the list of projects, the right side is the list of open files. Simple, you think, I'll just hide the open files list. Then, when I come to open another file in the same project - ARGH, it pops open again! Don't worry: BBEdit lets you turn off the drawer in the preferences window, so it won't keep popping out. But wait, this is a global preference! You'll have to turn it back on when you're not working on a project, unless you want to resort to switching the active file with the toolbar menu or the Window menu.

This is not an easy problem to solve. If you have files that haven't been saved yet, or are editing a file that isn't in the project, you can't just show them in the project's file list. So how do you get to them?(i)

TextMate uses the tabbed approach to handle this:

This is great for saving screen space, but this model soon breaks when you want to open more than 5 or 6 files at once:

Here is my (Coda-inspired) suggestion for what might have been been a better approach:

Basically, we show both the open files list and the list of files in a project in the same area, with tabs to switch between the two lists.

This way, we always have a list of open files on the left hand side, even if we aren't working in a project, but we can swap it for the list of the files in a project when we are. The project files list SHOULD also show which files are open (I've put them in bold, like TextMate): again, this small feature seems like a no-brainer for 9.1.

Relatedly, those project-orientated toolbar buttons that appear above the files list are more or less useless. I don't need an add or remove button, and I ought be able to rename project items by clicking on their names and waiting.

The complete package

BBEdit 9 also introduces auto-complete, as in the good kind that finishes words for you, rather than the bad kind that starts pairing up your HTML tags and function curly braces. It doesn't seem to work perfectly yet (it failed to autocomplete a couple of standard PHP functions I tried), but I think this is a welcome, albeit long overdue, addition. It even allows you to tab between the various arguments for a function (I still haven't figured out how to do this in Xcode!).

I find you very unfamiliar

The find and replace functionality has had an overhaul in BBEdit 9. The find dialog is now non-modal, and multi-file find has been split into a separate dialog. I could write about how much better this new arrangement is, but I must have spent so much time looking at the old dialog, I think it's going to take me a while to adjust.

Still broken

There are a few of things that still bug me about BBEdit:

  1. Preferences window

    It's still ugly. It's still really hard to find what you want(ii). The text is still too small. I still don't really think that stuff like FTP sites or SCM repos belong in global preferences.
  2. No colour themes

    It's a small thing, but I really like being able to try out different colour themes easily (as in Terminal or TextMate). BBEdit lets you change all the text colours, but offers no theme support. I really don't want to waste my time trying to find a black background set of colours that work for me. Someone else has done this before, and they probably have better taste than me. Why not give me a range of presets, and the ability to customise and share new versions?
  3. No CSS validation

    Yeah, it's easier to pickup CSS problems than HTML problems, especially since Firefox gets so shouty about them in the Error Console, but this would be a helpful timesaver.

Still golden

There are a few things about BBEdit that I really love. Every time I experiment with another code environment, these are the things that I find myself unable to do without:

  1. Speed

    I don't think I've ever found a faster editor that I actually wanted to use. It appears to handle big files with ease, multi-file find is lightning fast, and you very rarely see it trying to catch up with syntax colouring. I suppose this isn't really a feature - I'm just reminded of it whenever I play with TextMate, an otherwise great piece of software that fails so MISERABLY speed-wise.
  2. It doesn't attempt to guess what I'm trying to do.

    Picture of Power Pup It won't try to close my tags for me. No doubt you can turn this feature off in many editors, but first impressions are important - just because I can turn off Power Pup, doesn't mean he should be there in the first place. I find this stuff really off-putting in TextMate.
  3. Information I want frequently is always visible

    The charset and line ending format for the current document is always shown at the bottom of the window, and I can convert between charsets and line endings with a single click. This is not rocket science. Why doesn't everyone else do it?
  4. Really nice HTML tools

    HTML / XHTML validation. HTML Tidy. Tag editor for when I can't remember the obscure attribute name. TextMate gets some of this right, and while technically using the W3C Validator tool may give you the most correct results, it doesn't make it super easy to see and fix problems in your document. It just presents the problems in a mini browser window, you need to find the line number, then switch back to the editor window to find the issue in your code.
  5. Really useful text tools

    Zap Gremlins and Find Differences stand out for me, as I seem to find myself using them all the time, but there are so many wonderful text tools in BBEdit. Some of these you might only ever use rarely (e.g. Sort Lines, Normalize Line Endings etc), but it saves writing a script to do this stuff for you.

In short, BBEdit remains the king of Mac text editors in my book. Hopefully some of the niggles mentioned earlier will disappear in the next couple of point releases, and Projects will in time grow into an implementation that software this wonderful deserves.

  1. You use Open Quickly, of course. Oh wait...
  2. You can use the preferences sidebar and search for the feature you want. But this looks to me like an admission that the prefs window is unwieldy, rather than a real solution to the problem.

Posted by Ben @ 20:08 PM

Tuesday 6th May 2008

How to setup your PHP site to use UTF8

Tagged:

Tower of Babel image

According to Google (via: DF), UTF8 is now the most popular character set on the web! I wonder how much this is down to sensible defaults in web authoring tools, rather than a conscious shift in mindset. It's a long time since I looked at it, but as far as I can remember Dreamweaver defaults to UTF8 for new web pages, so a lot of beginning web designers are probably building Unicode sites without even realising it.

I think there are a couple of reasons that many web designers and developers still aren't using Unicode across the board.

“I don't need Unicode, because my site is in English!”

I'll bet this is the most common (and stupid) excuse. Even assuming all your content is in English, many of your visitors may not use English as their first language. If you've got any areas where users can contribute content (for example, forums, contact us, blog comments etc), things will go badly. Even if all your visitors are native English-speaking monoglots, it's more than likely that some will have characters in their name that can't be represented in Windows Latin or ASCII.

Even if all the pages on your website are hand-coded by you, and users have no opportunities to break your site by posting content in a different character set, there are still huge advantages to using Unicode. You can stop worrying about accents in English words like Résumé(i), or the pounds sterling symbol (£), or “quotation marks”. Basically, Unicode means you can stop worrying about HTML entities (except for &amp; / &lt; / &gt; / &quot;) forever.

“Unicode is hard!”

Actually, this one is partly valid, for a website at least, because there are quite a few steps involved in making a site fully Unicode-compliant. Let's go through the key steps for a typical PHP + MySQL / Postgres site:

A quick note on UTF-8 and Unicode

There are actually several formats of Unicode data, but UTF-8 is the most commonly used online. In this post, I'll refer to UTF-8 and Unicode as being the same thing. UTF-8 is a variable-width encoding, where each character takes up between 1 and 4 bytes. This sounds confusing and dumb, but there are actually two pretty good reasons for this:

  • Very frequently used characters (e.g. roman letters, numbers, punctuation) only use 1 byte, while less frequently used characters use more. This means text takes up less space than it would if every character took 4 bytes (in a typical English-language document, around 4 times less!) Obviously, this probably won't be quite so much of a benefit if you generally write in a language that doesn't use these characters.
  • UTF-8 is backwards compatible with ASCII. UTF-8 stores the characters that are valid in an ASCII file in the same was as they would be if that file was saved as ASCII text. This means that an ASCII text file is also a valid UTF-8 text file. This makes converting your legacy files much easier for the most part.

STEP 1: Set up your text editor / IDE to talk in UTF8

I probably use a different text editor from you, so I won't go into the steps involved for any particular editor. What you need to do is set your editor so that:

  • New files are created in UTF-8, no BOM (more on this in a sec) format
  • Existing files are read as UTF-8 when the character set could not be detected

On BOMs

A BOM (or Byte Order Mark) is a character that appears at the very start of a text file, to indicate which character set it is encoded in. Since plain text files are the simplest type of file there is, they don't have headers or meta data to tell software what type of data they contain. As a particular character code could represent two totally different characters in two different character sets, a hint on which encoding is used becomes useful when it is no longer possible for software to guess the encoding based on the content. UTF-8 text files can optionally use a BOM to tell software that reads them that they contain UTF-8 data. If your editor supports Unicode, you won't see this character, as it will be removed from the top of the file when you open it, and written to the start of the file when you save it.

But, you probably don't want to use a BOM if you're building a site in PHP, since PHP will include the BOM character in the output at the top of included files. As long as your editor is setup to assume UTF-8 where appropriate, this shouldn't be a problem.

STEP 2: Add the appropriate <meta> tag to your HTML header

For HTML

<meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>

For XHTML

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

Another alternative for XHTML documents is to use the XML declaration to set the encoding for your web pages:

<? xml version="1.0" encoding="utf-8" ?>

However, this approach has one serious downside: IE 6 will jump back to 1997 and render the page in quirks mode. So, it's best if you just stick to the meta tag approach.

STEP 3: Setup PHP to work with Unicode

Of course, Unicode isn't just about how data is stored on disk. Any program that handles that data needs to be able to handle multi-byte characters, and ensure that the data remains valid UTF-8 when it changes.

Unicode is not quite a first class citizen in PHP, so you'll have to do some tweaking to get it to grok UTF-8.

Firstly, you need to ensure that you have MBString enabled in your copy of PHP. If you're on Linux and using a packaged PHP, it may be installed by default. If not, it's probably just a case of adding it with:

$ yum install php-mbstring

...or whatever the equivalent is for your package manager.

On Mac OS X, Marc Liyanage's excellent PHP package includes it (go here for the leopard version).

If you build PHP from source, all you need to do is make sure you add

--enable-mbstring

...to your configure string.

Assuming you have multi-byte support built-in, now you need to make sure PHP knows that you want to handle text as UTF-8 internally. Add the following to an include that gets parsed before anything else, and you should be good to go:

//setup php for working with Unicode data
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');

If you're doing anything with strings other than reading them from a database and outputing them, you'll probably want to read about PHP's multi-byte functions. Basically, many string functions have multi-byte capable alternatives, with the prefix 'mb_'. So, substr() becomes mb_substr().

STEP 4: Setup your database to store UTF-8

It's probably best set your server to use UTF8 at database level, that is, specifying the charset to use for each database, rather than having a single charset for for whole server. If you do things this way, you won't get caught out by the default character set changing when you move your database to a different server.

MySQL

Sample SQL for creating a database using UTF-8:

CREATE DATABASE mydatabase CHARACTER SET utf8 COLLATE utf8_Unicode_ci;

Postgres

In Postgres, you can create databases from the terminal:

$ createdb -E UTF8 mydatabase

...or in SQL:

CREATE DATABASE mydatabase WITH ENCODING 'UTF8';

STEP 5: Setup your database server to handle UTF-8

We also need to tell our database server that we want to talk to it in UTF-8.

MySQL

MySQL has a bewildering range of options for configuring charsets in my.cnf. The safest way to ensure your scripts are sending and receiving UTF-8 from MySQL is to set the character set of the connection _after_ you connect to the server, by sending these queries:

SET NAMES utf8;
SET CHARACTER SET utf8;

Postgres

For postgres, it's nearly the same thing:

SET NAMES 'UTF8';

Phew!

It might sound like all this is a lot of effort, but once you build this in to your workflow, it becomes trivial, and your sites can enjoy Unicode goodness to the end of their days.

  1. Well, I laughed.

Posted by Ben @ 15:05 PM