apache url rewrite及正則表達式筆記
mod_rewrite是apache一個(gè)允許服務(wù)器端對請求url做修改的模塊。入端url將和一系列的rule來(lái)比對。這些rule包含一個(gè)正則表達式以便檢測每個(gè)特別的模式。如果在url中檢測到該模式,并且適當的預設條件滿(mǎn)足,name該模式將被一個(gè)預設的字符串或者行為所替換。
這個(gè)過(guò)程持續進(jìn)行直到?jīng)]有任何未處理的規則或者該過(guò)程被顯式地停止。
這可以用三點(diǎn)來(lái)總結:
- 有一系列的順序處理的規則rule集
- 如果有一條規則被匹配,將同時(shí)檢查該規則對應的條件是否滿(mǎn)足
- 如果一切處理結果都是go,那么將執行一條替換或者其他動(dòng)作
mod_rewrite的好處
有一些比較明顯的好處,但是也有一些并不是很明顯:
mod_rewrite非常普遍地被用于轉換丑陋的,難以明義的URL,形成所謂"友好或干凈的url"。
另一方面,這些轉換后的url將會(huì )是搜索引擎友好的
正則表達式token:
\s{2,} 2個(gè)以上的空格
\| backward referrence
\\ matches a '\'
\b word boundary position,比如whitespace或者字符串的開(kāi)始或者結束
\B Not a word boundary position
(?=ABC) positive lookahead. Matches a group after your main expression without including it in the result
(?!ABC) Negative lookahead.Specifies a group that can not match after your main expression(ie. if it matches, the result is discarded)
(?<=ABC) Positive lookbehind. Matches a group before your main expression without including it in the result.
(?<!ABC) Negative lookbehind.Specifies a group that can not match before your main expression(ie.if it matches, the result is discarded)
*? :match zero or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token
+? :match one or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token
{5} :matches exactly 5 of the preceeding token;
{2,5} : matches 2 to 5 of the preceding token. Greedy match;
{2,5}? matches 2 to 5 of the preceding token. lazy match;
(ABC) groups multiple tokens together. This allows you to apply quantifiers to the fall group. Creates a capture group roll over a match highlight to see the capture group result
(?:ABC) groups multiple tokens without creating capture group;
$$ escaped $ symbol $`: insert the portion of the string that precedes the match
$&: inserts the matched substring $' : insert the portion of the string that follows the match
[$1]: inserts the result of the first capture group
m multiline
i ignore case
"S" match any character, except for line breaks if dotall is false
"g" search globally
var str='The price of tomato is 5, the price of apple is 10'; str.replace(/(\d+)/g, '$1.00'); // 5.00 10.00
? zero or one
\ escape
\. \\ \+ \* \? \^ \$ \[ \] \( \) \{ \} \/ \' \#
[ABC] Any single character in ABC set
/th(a|i)nk/=/th[ai]nk/
() :捕獲 /(.+)@(163|126|188)\.com$/ 檢查網(wǎng)易郵箱的格式
(?:)不捕獲 /(.+)@(?:163|126|188)\.com$/
javascript中的str.match(regexp)獲取被捕獲的字符串以便使用
var url='http://blog.163.com/album?; var reg=/(https?:)\/\/([^\/]+)(\/[^\?]*)?(\?[^#]*)?(#.*)?/; var arr=rul.match(reg); var protocal = arr[1]; //http var host=arr[2];//blog.163.com var pathname=arr[3]; // /album var search=arr[4]; // id=1 var hash=arr[5]; //#comment
+ one or more
* zero or more
| or matches the full before or after '|' (https?|ftp)://
^ matches the beginning of the string
$ matches the end of the string
$1 refer to a match
$2 refer to another match
?: within parenthesis to not capture (^.+(?:jpg|png|gif)$)
[^ABC] Any single character not in the set
[a-z] any single character in the a-z range
[^b-e] any single character that is not in range b-e
[0-9]
[\w'-] any world characater, single quote or -
\t \r\n tab
\xFF specifying a character by its hexdecimal index
\xA9 => copyright symbol
如何匹配不包含連續出現的一串字符串?
^(?!.*ab).*$ :不匹配ab連續出現
如何lazy模式盡可能少的匹配到字符串?
alert( "123 456".match(/\d+ \d+?/g) ); // 123 4
注意上面代碼中的?就起到了數字匹配lazy最少的模式!
http://javascript.info/regexp-greedy-and-lazy
https://24ways.org/2013/url-rewriting-for-the-fearful/ 號稱(chēng)是最適合人來(lái)閱讀的關(guān)于url-rewrite的文章