RegEx for Redirection: A Reference Guide

Regular Expressions are very handy and save you a lot of time when you have the need to manipulate a lot of data in a little bit of time. When you have

REGEX

multiplestrings where parts of them are the same, they are considered a pattern. Regular expressions can be built to match those patterns and manipulate them in many ways.

Meta Characters

A metacharacter is a character that has a special meaning (instead of a literal meaning) to a computer program, such as a shell interpreter or a regular expression engine. — http://en.wikipedia.org/wiki/Meta_Character. For a full list of meta characters, visit php.net.

Examples of Meta Characters:

^     The root of the domain - only what is on the right of this will be matched.
.     (period) Match anything
\     First part used with a meta character (see examples below)
\d    Match only digits
\D    Match only non-digits
\w    Match only words
\W    Match any non-words
\s    Match any white space
\S    Match any non-white space
*     Match the character to the left and match it 0 or more times.
+     Match the character to the left one or more times

*Note: If there is no d, w, s (or other meta character) then the character proceeding the \ is treated literally. You can also escape a backslash; \d tells regex to treat it as a digit, however \\d tells regex to actually look for a \d (literally as a backslash and d).

As another example, if you have a directory with spaces in the name such as “directory name,” you would need to escape the space (escape means put \ in front of something) as follows:

directory\ name == “directory name”

Regular Expression Examples

== Strip “articles” out of the URL ==

http://www.bees.com/articles/bees/bee-left-in-the-cold
Regex:        ^    /articles/(.*)/       (.*)
Variables:         /          $1 /        $2
http://www.bees.com/        /bees/bee-left-in-the-cold
Resulting URL: http://www.bees.com/bees-abuse-1/bee-left-in-the-cold

== Strip out the .html from http://www.bees.com/filename.html ==

http://www.bees.com/honey_bees.html
Regex:     ^       /    (.*) \.html$
Variables:         /     $1   /
http://www.bees.com/honey_bees/
Resulting URL: http://www.bees.com/honey_bees/

== The difference between / and not / ==

/title-of-post is not equal to /title-of-post/, however /title-of-post/*$ is equal to both of them.
If you are redirecting http://www.bees.com/honey-bee-shortage to http://www.bees.com/honey-bee-population-comeback, this is how you would do it in the redirection plugin:

redirection

== Detecting Digits ==

As you saw on the first page in the meta characters section, Regex can match digits.

http://bees.com/   2012   /     06     /   regex-is-amazingly-powerful   /
Regex:     ^   /   (\d*?) /   (\d*?)   /             (.*)                /    *$
Variables:     /     $1   /     $2     /              $3                 /

The output of the above would be just what the input is because I didn’t change anything – I just wrote down what everything was. However, if I wanted to display the URL above as http://bees.com/regex-is-amazingly-powerful/06/2012, here’s how I would do it:

http://bees.com/     $3   /    $2      /              $1                 /    *$

Here’s what it would look like in the redirection plugin:


== Strip out part of a word ==

Occasionally you may have a WordPress category such as types-of-honey-1. Suppose that the -1 shouldn’t be in the category – the category is actually supposed to be types-of-honey. Now, suppose there are 1,241 articles in the types-of-honey-1 category now, yet all the links point to types-of-honey. This would be a nightmare without Regex.

Stripping out the -1 is very simple:

http://www.bees.com        /    types-of-honey-1    /    clover-honey/
Regex:     ^               /        (.*)-1          /    clover-honey/
Variable:                  /         $1             /    clover-honey/
Result: http://www.bees.com/    types-of-honey      /    clover-honey/