正则表达式 #
一、正则表达式基础 #
1.1 创建正则表达式 #
Clojure使用 #"..." 语法创建正则表达式:
clojure
#"\d+"
#"hello"
#"[a-z]+"
#"^\d{4}-\d{2}-\d{2}$"
clojure
(class #"\d+")
1.2 使用re-pattern #
clojure
(re-pattern "\\d+")
(re-pattern "[a-z]+")
(def pattern (re-pattern "\\w+"))
1.3 正则表达式语法 #
| 语法 | 含义 | 示例 |
|---|---|---|
. |
任意字符 | a.c 匹配 “abc” |
^ |
行首 | ^hello |
$ |
行尾 | world$ |
* |
0次或多次 | ab*c |
+ |
1次或多次 | ab+c |
? |
0次或1次 | ab?c |
\d |
数字 | \d+ |
\w |
单词字符 | \w+ |
\s |
空白字符 | \s+ |
[abc] |
字符集 | [aeiou] |
[^abc] |
排除字符集 | [^0-9] |
() |
分组 | (ab)+ |
| |
或 | a|b |
二、匹配函数 #
2.1 re-matches #
完全匹配整个字符串:
clojure
(re-matches #"\d+" "123")
(re-matches #"\d+" "123abc")
(re-matches #"(\d+)-(\d+)" "123-456")
返回值:
- 不匹配:
nil - 无分组:匹配字符串
- 有分组:向量(完整匹配 + 各分组)
2.2 re-find #
查找第一个匹配:
clojure
(re-find #"\d+" "abc123def456")
(re-find #"(\d+)" "abc123def")
(re-find #"\d+" "no numbers")
2.3 re-seq #
返回所有匹配的惰性序列:
clojure
(re-seq #"\d+" "abc123def456ghi789")
(re-seq #"[a-z]+" "Hello World Clojure")
(re-seq #"\w+" "one two three")
2.4 re-matcher #
创建Matcher对象:
clojure
(def m (re-matcher #"\d+" "abc123def456"))
(re-find m)
(re-find m)
(re-find m)
三、捕获组 #
3.1 基本分组 #
clojure
(re-matches #"(\d+)-(\d+)-(\d+)" "2024-03-27")
(re-find #"(\w+)@(\w+)\.(\w+)" "user@example.com")
3.2 命名捕获组 #
Java不支持命名捕获组,但可以用Clojure封装:
clojure
(defn parse-date [s]
(let [[_ year month day] (re-matches #"(\d{4})-(\d{2})-(\d{2})" s)]
(when year
{:year year :month month :day day})))
(parse-date "2024-03-27")
3.3 嵌套分组 #
clojure
(re-matches #"((\d+)-(\d+))" "123-456")
3.4 非捕获组 #
clojure
(re-find #"(?:abc|def)+" "abcabc")
四、替换操作 #
4.1 re-substitute(clojure.string) #
clojure
(require '[clojure.string :as str])
(str/replace "hello world" #"o" "0")
(str/replace "abc123def456" #"\d+" "NUM")
4.2 使用函数替换 #
clojure
(str/replace "abc123def456" #"\d+"
(fn [match]
(str "<" match ">")))
4.3 使用分组替换 #
clojure
(str/replace "hello world" #"(\w+)" "$1!")
(str/replace "2024-03-27" #"(\d{4})-(\d{2})-(\d{2})" "$2/$3/$1")
4.4 替换第一个 #
clojure
(str/replace-first "aaa bbb aaa" "aaa" "XXX")
(str/replace-first "123-456-789" #"\d+" "NUM")
五、分割字符串 #
5.1 基本分割 #
clojure
(str/split "a,b,c" #",")
(str/split "one two three" #"\s+")
(str/split "line1\nline2\nline3" #"\n")
5.2 限制分割数量 #
clojure
(str/split "a,b,c,d,e" #"," 3)
5.3 分割保留分隔符 #
clojure
(defn split-keep [pattern s]
(let [parts (re-seq (re-pattern (str "(?=" pattern ")")) s)]
...))
六、常用正则模式 #
6.1 数字 #
clojure
(def int-pattern #"-?\d+")
(def float-pattern #"-?\d+\.\d+")
(def hex-pattern #"0x[0-9a-fA-F]+")
(def number-pattern #"-?\d+\.?\d*")
6.2 日期时间 #
clojure
(def date-pattern #"\d{4}-\d{2}-\d{2}")
(def time-pattern #"\d{2}:\d{2}:\d{2}")
(def datetime-pattern #"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}")
6.3 邮箱和URL #
clojure
(def email-pattern #"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
(def url-pattern #"https?://[^\s]+")
(def phone-pattern #"\d{3}-\d{3}-\d{4}")
6.4 标识符 #
clojure
(def word-pattern #"[a-zA-Z_]\w*")
(def clojure-symbol-pattern #"[a-zA-Z*+!_?-][a-zA-Z0-9*+!_?-]*")
七、实践示例 #
7.1 解析日志 #
clojure
(def log-pattern
#"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(\w+)\] (.*)")
(defn parse-log [line]
(let [[_ timestamp level message] (re-matches log-pattern line)]
(when timestamp
{:timestamp timestamp
:level level
:message message})))
(parse-log "2024-03-27 10:30:45 [INFO] Application started")
7.2 提取URL组件 #
clojure
(def url-pattern
#"(\w+)://([^/:]+)(?::(\d+))?(/[^?]*)?(?:\?(.*))?")
(defn parse-url [url]
(let [[_ protocol host port path query]
(re-matches url-pattern url)]
{:protocol protocol
:host host
:port (when port (Integer/parseInt port))
:path path
:query query}))
(parse-url "https://example.com:8080/path?query=value")
7.3 验证输入 #
clojure
(defn valid-email? [s]
(boolean (re-matches #"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" s)))
(valid-email? "user@example.com")
(valid-email? "invalid-email")
(defn valid-phone? [s]
(boolean (re-matches #"\d{3}-\d{3}-\d{4}" s)))
(valid-phone? "123-456-7890")
7.4 代码分析 #
clojure
(defn count-words [code]
(count (re-seq #"[a-zA-Z_]\w*" code)))
(defn extract-strings [code]
(re-seq #""[^"]*"" code))
(defn find-comments [code]
(re-seq #";[^\n]*" code))
八、性能考虑 #
8.1 预编译正则 #
clojure
(def email-regex #"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
(defn valid-email? [s]
(boolean (re-matches email-regex s)))
8.2 避免贪婪匹配 #
clojure
(str/replace "<div>content</div>" #"<div>.*</div>" "")
(str/replace "<div>content</div>" #"<div>.*?</div>" "")
8.3 使用锚点优化 #
clojure
(re-matches #"^\d+$" "123")
(re-find #"^\d+$" "123")
九、正则表达式工具 #
9.1 测试工具 #
clojure
(defn test-regex [pattern text]
{:matches (re-matches pattern text)
:find (re-find pattern text)
:all (re-seq pattern text)})
(test-regex #"\d+" "abc123def456")
9.2 调试辅助 #
clojure
(defn explain-match [pattern text]
(let [result (re-matches pattern text)]
(cond
(nil? result) "No match"
(string? result) (str "Full match: " result)
(vector? result) (str "Matched with groups: " result))))
(explain-match #"(\d+)-(\d+)" "123-456")
十、常见问题 #
10.1 转义问题 #
clojure
(re-pattern "\\d+")
#"\d+"
(re-find #"\." "a.b")
(re-find #"\\" "a\\b")
10.2 多行匹配 #
clojure
(def text "line1\nline2\nline3")
(re-seq #".+" text)
(re-seq #"(?m)^line" text)
10.3 大小写不敏感 #
clojure
(re-find #"(?i)hello" "HELLO World")
(re-find #"(?i)hello" "Hello World")
十一、总结 #
正则表达式函数总结:
| 函数 | 用途 | 返回值 |
|---|---|---|
re-matches |
完全匹配 | 字符串或向量 |
re-find |
查找第一个 | 字符串或向量 |
re-seq |
查找所有 | 惰性序列 |
re-matcher |
创建Matcher | Matcher对象 |
常用模式:
clojure
#"\d+" # 数字
#"\w+" # 单词
#"\s+" # 空白
#"[a-z]+" # 小写字母
#"[a-zA-Z0-9]+" # 字母数字
正则表达式是字符串处理的强大工具,熟练掌握将大大提升文本处理能力!
最后更新:2026-03-27