Regular Expression Syntax
By Joe Gardiner Thursday, 3rd March 2011
The .htaccess and .caccess files are written using ‘regular expressions’, and some rules specific to Apache.
Here is the basic regex syntax you need to write your own Regular Expressions in your .htaccess and .craccess files.
| Syntax | Description | Examples |
|---|---|---|
| . | this matches any single character except for line breaks |
The dot is used as a place holder for any single character. th. would match the and thy, and any other single character after the letters th. |
| * | would match zero or more of the previous character |
* states that zero or more of the previous character should match in sequence. th.* would match with the, thistle, and thought, etc. This is because it is th matched with zero or more of the previous character, a . in this case which is any character. So it would match every word starting with th. to* would only match to and too.. This is because the * follows a o, so it will only match to followed by o’s. tre*.. would match tread, treat. This is because tre followed by zero or more e’s followed by any two characters. |
| + | one or more of the character before the + symbol |
The plus sign states that there must be one or more of the previous character in sequence. “tre+” would match tree. tre+.. would match trees, and treat this is because the + is satisfied by the e in tre, and two characters follow, e.g. at, or es. |
| ? | none or only one of the previous character |
The ? states there should be zero or only one of the previous character in the expression. Here are some examples: flo?at would match flat and float. This is because the ? states that the o is not necessary and there can be zero of this character. |
| ( ) | grouping together patterns |
The ( ) are used to group patterns of characters, for example combining numerous numbers of patterns. In the example it groups two patterns. (tall|short) is correct for tall and short because the parentheses mean tall or short. |
| [] | any character from the set inside the brackets |
Square brackets ([]) can be used ato hold a single character which matches any of a set of characters. This is quite a complicated idea… “ha[ts]“ matches hat and has because the statement is ha followed by one character from the set within the square brackets. Here’s another exmaple: “f[aou]r” matches far, for, and fur. Another example combines the + symbol. r[aeiou]+t matches rat, ret, rot, rut, and also riot, and root. This is because the + states one or more of the previous character so any number of the vowels can be used followed by the t character. |
| [^] | any character that is not within the set |
A carat ^ inside the square brackets ([]) negates the set which is the opposite of the square brackets; any character not inside the set of square brackets. This is commonly used for specifying a large set, for example all the numbers between 10 and 20, not the numbers under 10. t[^aeiou]+.*s is correct for thanks, this, and trappings. This is because t followed by one or more of any character which is not a vowel followed by zero or more of any character followed by an s. |
| {min,max} | number of character occurrences |
The curly brackets specify the number of times a preceding character, or character from a range, should appear. [a-z]{3} matches any lower case letter that must appear three consecutive times. “[a-z]{5,10}” matches any lower case letter that must appear between 5 and 10 consecutive times. |
Posted in Advanced Configuration, Guides | No Comments »