After playing with Rails properly for a week or so now (more on this shortly!) and looking enviously at it for like, forever, I decided to see if I could replace the all the messy MVC routing code a lot of my PHP projects have with a simple class where routes can be configured a bit like routes.rb.
Enter router.php
Before we start, THIS IS NOT YET ANOTHER MVC FRAMEWORK FOR PHP, there are plenty of those to choose from already. Rather, this code assumes you already have most of the infrastructure in place for talking to the database, object->relational mapping, etc. This code is intended as a lightweight (around 200 lines) model for handling rewritten urls and dispatching requests to the appropriate controller and action.
What you need:
- A recent(ish) PHP - 5.1.3 or later ought to do it. Older versions of PHP 5 don't have all the reflection voodoo we need
- Apache webserver with mod_rewrite turned on (I think it's probably on by default in most setups). You may be ok with other servers if they support rewritten urls, but the sample code works with apache.
Get the code
It's worth dowloading a copy of the code before reading any further, as it'll all make a lot more sense when you see the examples below in context.
Grab a tarball of the newest version (the only version at time of writing!) here:
https://github.com/pokeb/php-mvc-router/tarball/master
Alternatively, grab a copy of the git repository:
$ git clone git@github.com:pokeb/php-mvc-router.git
Installing the basic example
1) Stick the mvc-routes folder somewhere your webserver can get to it, and setup a virtual host to point at the htdocs folder. eg:
<VirtualHost *:80>
ServerName mvc-routes
DocumentRoot "/Users/ben/Sites/mvc-routes/htdocs
</VirtualHost>
In this example, I'm just hosting it on my local apache, so I've just added mvc-routes to my hosts file.
2) Copy htdocs/_htaccess.default to htdocs/.htaccess, and change the first line to point at the mvc-routes folder, eg:
php_value include_path .:/Users/ben/Sites/mvc-routes
3) Copy config/config.default.php to config/config.php and set SITE_PATH to the same path as above:
<?php
define("SITE_PATH","/Users/ben/Sites/mvc-routes");
That should be enough to get you up and running.
How it works
You specify routes in config/routes.php. This array is automatically included by bootstrap.php, and defines a mapping between urls and your controller objects.
Each item in the $GLOBALS['routes'] associative array defines a single route, where the key is the url, and the value is the controller (and optionally action) it should map to.
Simplest possible example
<?php
$GLOBALS['routes'] = array(
'/hello-world' => 'hello_world'
);
Requests in the browser for '/hello-world' will instantiate a new hello_world_controller (defined in controllers/hello_world_controller.php), and call the method 'index'. 'index' is the default action that will be called if no specific action is specified (see below).
Calling a specific action
<?php
$GLOBALS['routes'] = array(
'/hello-world/say_hello' => 'hello_world:say_hello'
);
In this example, requests to '/hello-world/say_hello' will instantiate a new hello_world_controller and call the say_hello method.
Automatically call an appropriate action based on a url
<?php
$GLOBALS['routes'] = array(
'/hello-world/[action]' => 'hello_world'
);
When you specify the [action] in the url, the router will call the method with the same name, so:
/hello-world/say_cheese - will call the say_cheese action
/hello-world/go_to_sleep - will call the go_to_sleep action
Displaying a view without a controller
For relatively straightforward pages, creating a controller to display them may be overkill. If the router can't find a controller that matches the name you specified, it will look in /views for a view instead.
'/fairly-static-page' => 'fairly_static_page'
In the above example, if no controller exists in controllers/ called 'fairly_static_page_controller.php', the router will attempt to just include 'views/fairly_static_page.php'.
Auto-magic instantiation
'/projects/(project)' => 'projects:view',
In the above example, the router understands that '(project)' is a reference to a project model, and will try to create an instance of project by calling whatever appeared in the url as a parameter to the project class's constructor.
So, a request for
/projects/123
..will perform the equivalent of:
<?php
$project = new project(123);
$projects_controller = new projects_controller();
$projects_controller->view($project);
Notice how the newly instantiated object is passed as a parameter to the action we are calling.
Named parameters
For more control, you can name parameters in your url:
'/friends/:user/:friend' => 'friends:view_friend'
In this case, a request for:
/friends/bobsmith/billjones
...will perform the equivalent of:
<?php
$friends_controller = new friends_controller();
$friends_controller->parameters = { "user" => "bobsmith", "friend" => "billjones" }'
$friends_controller->view_friend();
Note that named parameters are set in an instance variable of the controller, rather than passed as parameters to the action method.
Using named parameters is especially useful if you need to instantiate objects based on several components of a url. For example, if you have a friend class where 'friend' is a joining table between users in a database, you might use their two usernames as a composite key, and create a friend object using:
$friend = new friend("bobsmith","fredblogs");
Other features
The router class performs case-insensitive matching, so /Things/Stuff is treated the same as /things/stuff.
It also converts '-' to '_' when calling controller methods, so you can use '-' or '_' in your URLS interchangeably.
Wrap up
Get the code, poke around, and if you find router.php useful, please let me know!
Posted by Ben @ 16:05 PM
According to Google (via: DF), UTF8 is now the most popular character set on the web! I wonder how much this is down to sensible defaults in web authoring tools, rather than a conscious shift in mindset. It's a long time since I looked at it, but as far as I can remember Dreamweaver defaults to UTF8 for new web pages, so a lot of beginning web designers are probably building Unicode sites without even realising it.
I think there are a couple of reasons that many web designers and developers still aren't using Unicode across the board.
“I don't need Unicode, because my site is in English!”
I'll bet this is the most common (and stupid) excuse. Even assuming all your content is in English, many of your visitors may not use English as their first language. If you've got any areas where users can contribute content (for example, forums, contact us, blog comments etc), things will go badly. Even if all your visitors are native English-speaking monoglots, it's more than likely that some will have characters in their name that can't be represented in Windows Latin or ASCII.
Even if all the pages on your website are hand-coded by you, and users have no opportunities to break your site by posting content in a different character set, there are still huge advantages to using Unicode. You can stop worrying about accents in English words like Résumé(i), or the pounds sterling symbol (£), or “quotation marks”. Basically, Unicode means you can stop worrying about HTML entities (except for & / < / > / ") forever.
“Unicode is hard!”
Actually, this one is partly valid, for a website at least, because there are quite a few steps involved in making a site fully Unicode-compliant. Let's go through the key steps for a typical PHP + MySQL / Postgres site:
A quick note on UTF-8 and Unicode
There are actually several formats of Unicode data, but UTF-8 is the most commonly used online. In this post, I'll refer to UTF-8 and Unicode as being the same thing. UTF-8 is a variable-width encoding, where each character takes up between 1 and 4 bytes. This sounds confusing and dumb, but there are actually two pretty good reasons for this:
- Very frequently used characters (e.g. roman letters, numbers, punctuation) only use 1 byte, while less frequently used characters use more. This means text takes up less space than it would if every character took 4 bytes (in a typical English-language document, around 4 times less!) Obviously, this probably won't be quite so much of a benefit if you generally write in a language that doesn't use these characters.
- UTF-8 is backwards compatible with ASCII. UTF-8 stores the characters that are valid in an ASCII file in the same was as they would be if that file was saved as ASCII text. This means that an ASCII text file is also a valid UTF-8 text file. This makes converting your legacy files much easier for the most part.
STEP 1: Set up your text editor / IDE to talk in UTF8
I probably use a different text editor from you, so I won't go into the steps involved for any particular editor. What you need to do is set your editor so that:
- New files are created in UTF-8, no BOM (more on this in a sec) format
- Existing files are read as UTF-8 when the character set could not be detected
On BOMs
A BOM (or Byte Order Mark) is a character that appears at the very start of a text file, to indicate which character set it is encoded in. Since plain text files are the simplest type of file there is, they don't have headers or meta data to tell software what type of data they contain. As a particular character code could represent two totally different characters in two different character sets, a hint on which encoding is used becomes useful when it is no longer possible for software to guess the encoding based on the content. UTF-8 text files can optionally use a BOM to tell software that reads them that they contain UTF-8 data. If your editor supports Unicode, you won't see this character, as it will be removed from the top of the file when you open it, and written to the start of the file when you save it.
But, you probably don't want to use a BOM if you're building a site in PHP, since PHP will include the BOM character in the output at the top of included files. As long as your editor is setup to assume UTF-8 where appropriate, this shouldn't be a problem.
STEP 2: Add the appropriate <meta> tag to your HTML header
For HTML
<meta http-equiv="content-type" content="text/html; charset=utf-8"></meta>
For XHTML
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
Another alternative for XHTML documents is to use the XML declaration to set the encoding for your web pages:
<? xml version="1.0" encoding="utf-8" ?>
However, this approach has one serious downside: IE 6 will jump back to 1997 and render the page in quirks mode. So, it's best if you just stick to the meta tag approach.
STEP 3: Setup PHP to work with Unicode
Of course, Unicode isn't just about how data is stored on disk. Any program that handles that data needs to be able to handle multi-byte characters, and ensure that the data remains valid UTF-8 when it changes.
Unicode is not quite a first class citizen in PHP, so you'll have to do some tweaking to get it to grok UTF-8.
Firstly, you need to ensure that you have MBString enabled in your copy of PHP. If you're on Linux and using a packaged PHP, it may be installed by default. If not, it's probably just a case of adding it with:
$ yum install php-mbstring
...or whatever the equivalent is for your package manager.
On Mac OS X, Marc Liyanage's excellent PHP package includes it (go here for the leopard version).
If you build PHP from source, all you need to do is make sure you add
--enable-mbstring
...to your configure string.
Assuming you have multi-byte support built-in, now you need to make sure PHP knows that you want to handle text as UTF-8 internally. Add the following to an include that gets parsed before anything else, and you should be good to go:
//setup php for working with Unicode data
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');
If you're doing anything with strings other than reading them from a database and outputing them, you'll probably want to read about PHP's multi-byte functions. Basically, many string functions have multi-byte capable alternatives, with the prefix 'mb_'. So, substr() becomes mb_substr().
STEP 4: Setup your database to store UTF-8
It's probably best set your server to use UTF8 at database level, that is, specifying the charset to use for each database, rather than having a single charset for for whole server. If you do things this way, you won't get caught out by the default character set changing when you move your database to a different server.
MySQL
Sample SQL for creating a database using UTF-8:
CREATE DATABASE mydatabase CHARACTER SET utf8 COLLATE utf8_Unicode_ci;
Postgres
In Postgres, you can create databases from the terminal:
$ createdb -E UTF8 mydatabase
...or in SQL:
CREATE DATABASE mydatabase WITH ENCODING 'UTF8';
STEP 5: Setup your database server to handle UTF-8
We also need to tell our database server that we want to talk to it in UTF-8.
MySQL
MySQL has a bewildering range of options for configuring charsets in my.cnf. The safest way to ensure your scripts are sending and receiving UTF-8 from MySQL is to set the character set of the connection _after_ you connect to the server, by sending these queries:
SET NAMES utf8;
SET CHARACTER_SET utf8;
Postgres
For postgres, it's nearly the same thing:
SET NAMES 'UTF8';
Phew!
It might sound like all this is a lot of effort, but once you build this in to your workflow, it becomes trivial, and your sites can enjoy Unicode goodness to the end of their days.
Posted by Ben @ 15:05 PM
It must have been months ago that I watched Linus Torvalds' presentation on Git. At that time, I remember thinking it probably wasn't for me. I already had a source control system that worked and did everything I wanted. I generally worked on projects on my own, and didn't need to worry about how long it took to merge other people's changes. Git was supposed to be better at branching, but I never branched my code anyway. Git had a distributed model, but the centralised approach worked fine for me. People were saying that Git was harder to use. Etc.
If I think back to before I started using source control a few years ago, I had a whole set of reasons for why I didn't need it. I largely wrote code on my own. I kept copious backups. A lot of my source code was stored in binary format RealBasic project files that couldn't be diffed. I didn't see why I should have to keep fiddling with the terminal every time I changed something. Excuses, excuses.
Of course, source control is one of those things that you find impossible to live without as soon as you start using them, and all the excuses I'd made became unimportant overnight once I got began using SVN. Starting to use source control was a revelation: things that were fiddly became easy, and things that were time consuming became things the computer did for me.
After a fair amount of prevaricating, I've made the jump to Git for several larger projects. This switch has been a revelation too, albeit a smaller one than starting to use source control.
Unless you're the smartest and most organised person on the world (in which case, what are you doing reading this?) or are currently in higher education, you probably have limited time available to learn new things. Why should you devote time to learning Git, over and above say, playing GTA? I'll note a few reasons why you should give Git a go, with a emphasis on migrating from Subversion, since that's the way I came in. However, it's worth noting that fair few of the things I find so compelling about Git are features of other modern SCM systems too, so do shop around for the best deal.
Why do I want Git?
Apart from a really cool name, Git has lots of great things going for it. A couple of the obvious ones:
- Every checkout is a complete repository, so you can still commit, revert and explore the history of a project, even when you're offline
- De-centralised design means you can push batches of commits to another checkout (eg a repository that other developers publish their changes to) whenever you like
- It's very, very fast, compared to SVN at least. If this doesn't sound like a big deal, wait till you've been using Git for a couple of weeks - you'll be amazed at how much time you spent waiting for subversion.
- Git keeps all its files in a single hidden (.git) directory at the top of your source tree, so you don't have .svn folders hiding all over your source
- Git is used to manage the Linux Kernel, and Ruby on Rails, so people a great deal smarter than me think it's the bee's knees too!
- Git makes branching really easy
Hold on: I don't need Git because I never do branching!
Chances are, you never do branching because your source control system makes it hard. Branches in git are super-easy and super-quick to do:
Show the list of branches
$ git branch
Make a new branch and switch to it
$ git branch mybranch
$ git checkout mybranch
>>make changes<<
Commit changes
$ git commit -a
Switch back to main branch
$ git checkout master
Merge mybranch into main
$ git merge mybranch
There's no fiddling about with branch paths, and git won't waste time and disk space duplicating all the stuff in trunk so you won't have to clean it up later. The fundamental difference between Git and SVN in this regard is that Git 'knows' about branches, while branches in SVN are simply a pattern you use in laying out your repository structure. With SVN as each branch is simply an 'svn copy' of an existing directory.(i) Git shields you from the details, so there's no need to bother with trunk/tags/branches folders ever again.
The fact that Git makes branching easy is a lot more important than it might sound. You'll find you start to make temporary branches as part of your day to day work - experiments to try things out. This helps remove one of the mental barriers to getting things done - what if I try something and it doesn't work - how much time will I waste trying to revert?
GitHub - like Trac, without the pain
Trac is a neat web frontend to SVN that allows you to browse your source history visually, and keep notes about your project in a wiki. It's also a real PITA to install in my experience...
GitHub is a software as service tool that's quite similar to Trac. They host (and backup(ii)) your Git repositories for you, so there's minimal setup required. It's beautifully designed and very easy to use. It's also pretty inexpensive ($7 month for the cheapest version with private repositories), or free for open source projects.
Surely Git must be hard to install?
I installed git with MacPorts:
# sudo port selfupdate
# sudo port install git-core +svn
On Linux, I was able to build from source without any headaches, so if you aren't on Mac OS, you should still find Git trivial to install.
Super-helpful links
Git is certainly a little confusing when you arrive as I did with your head full of how things are supposed to work in another source control system. I found these pages particularly helpful in getting to grips with Git, I hope you will too:
Posted by Ben @ 11:05 AM