正则表达式 #

一、正则表达式基础 #

1.1 创建正则表达式 #

Clojure使用 #"..." 语法创建正则表达式:

clojure
#"\d+"

#"hello"

#"[a-z]+"

#"^\d{4}-\d{2}-\d{2}$"
clojure
(class #"\d+")

1.2 使用re-pattern #

clojure
(re-pattern "\\d+")

(re-pattern "[a-z]+")

(def pattern (re-pattern "\\w+"))

1.3 正则表达式语法 #

语法 含义 示例
. 任意字符 a.c 匹配 “abc”
^ 行首 ^hello
$ 行尾 world$
* 0次或多次 ab*c
+ 1次或多次 ab+c
? 0次或1次 ab?c
\d 数字 \d+
\w 单词字符 \w+
\s 空白字符 \s+
[abc] 字符集 [aeiou]
[^abc] 排除字符集 [^0-9]
() 分组 (ab)+
| a|b

二、匹配函数 #

2.1 re-matches #

完全匹配整个字符串:

clojure
(re-matches #"\d+" "123")

(re-matches #"\d+" "123abc")

(re-matches #"(\d+)-(\d+)" "123-456")

返回值:

  • 不匹配:nil
  • 无分组:匹配字符串
  • 有分组:向量(完整匹配 + 各分组)

2.2 re-find #

查找第一个匹配:

clojure
(re-find #"\d+" "abc123def456")

(re-find #"(\d+)" "abc123def")

(re-find #"\d+" "no numbers")

2.3 re-seq #

返回所有匹配的惰性序列:

clojure
(re-seq #"\d+" "abc123def456ghi789")

(re-seq #"[a-z]+" "Hello World Clojure")

(re-seq #"\w+" "one two three")

2.4 re-matcher #

创建Matcher对象:

clojure
(def m (re-matcher #"\d+" "abc123def456"))

(re-find m)

(re-find m)

(re-find m)

三、捕获组 #

3.1 基本分组 #

clojure
(re-matches #"(\d+)-(\d+)-(\d+)" "2024-03-27")

(re-find #"(\w+)@(\w+)\.(\w+)" "user@example.com")

3.2 命名捕获组 #

Java不支持命名捕获组,但可以用Clojure封装:

clojure
(defn parse-date [s]
  (let [[_ year month day] (re-matches #"(\d{4})-(\d{2})-(\d{2})" s)]
    (when year
      {:year year :month month :day day})))

(parse-date "2024-03-27")

3.3 嵌套分组 #

clojure
(re-matches #"((\d+)-(\d+))" "123-456")

3.4 非捕获组 #

clojure
(re-find #"(?:abc|def)+" "abcabc")

四、替换操作 #

4.1 re-substitute(clojure.string) #

clojure
(require '[clojure.string :as str])

(str/replace "hello world" #"o" "0")

(str/replace "abc123def456" #"\d+" "NUM")

4.2 使用函数替换 #

clojure
(str/replace "abc123def456" #"\d+"
             (fn [match]
               (str "<" match ">")))

4.3 使用分组替换 #

clojure
(str/replace "hello world" #"(\w+)" "$1!")

(str/replace "2024-03-27" #"(\d{4})-(\d{2})-(\d{2})" "$2/$3/$1")

4.4 替换第一个 #

clojure
(str/replace-first "aaa bbb aaa" "aaa" "XXX")

(str/replace-first "123-456-789" #"\d+" "NUM")

五、分割字符串 #

5.1 基本分割 #

clojure
(str/split "a,b,c" #",")

(str/split "one  two   three" #"\s+")

(str/split "line1\nline2\nline3" #"\n")

5.2 限制分割数量 #

clojure
(str/split "a,b,c,d,e" #"," 3)

5.3 分割保留分隔符 #

clojure
(defn split-keep [pattern s]
  (let [parts (re-seq (re-pattern (str "(?=" pattern ")")) s)]
    ...))

六、常用正则模式 #

6.1 数字 #

clojure
(def int-pattern #"-?\d+")

(def float-pattern #"-?\d+\.\d+")

(def hex-pattern #"0x[0-9a-fA-F]+")

(def number-pattern #"-?\d+\.?\d*")

6.2 日期时间 #

clojure
(def date-pattern #"\d{4}-\d{2}-\d{2}")

(def time-pattern #"\d{2}:\d{2}:\d{2}")

(def datetime-pattern #"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}")

6.3 邮箱和URL #

clojure
(def email-pattern #"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")

(def url-pattern #"https?://[^\s]+")

(def phone-pattern #"\d{3}-\d{3}-\d{4}")

6.4 标识符 #

clojure
(def word-pattern #"[a-zA-Z_]\w*")

(def clojure-symbol-pattern #"[a-zA-Z*+!_?-][a-zA-Z0-9*+!_?-]*")

七、实践示例 #

7.1 解析日志 #

clojure
(def log-pattern 
  #"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(\w+)\] (.*)")

(defn parse-log [line]
  (let [[_ timestamp level message] (re-matches log-pattern line)]
    (when timestamp
      {:timestamp timestamp
       :level level
       :message message})))

(parse-log "2024-03-27 10:30:45 [INFO] Application started")

7.2 提取URL组件 #

clojure
(def url-pattern 
  #"(\w+)://([^/:]+)(?::(\d+))?(/[^?]*)?(?:\?(.*))?")

(defn parse-url [url]
  (let [[_ protocol host port path query] 
        (re-matches url-pattern url)]
    {:protocol protocol
     :host host
     :port (when port (Integer/parseInt port))
     :path path
     :query query}))

(parse-url "https://example.com:8080/path?query=value")

7.3 验证输入 #

clojure
(defn valid-email? [s]
  (boolean (re-matches #"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" s)))

(valid-email? "user@example.com")

(valid-email? "invalid-email")

(defn valid-phone? [s]
  (boolean (re-matches #"\d{3}-\d{3}-\d{4}" s)))

(valid-phone? "123-456-7890")

7.4 代码分析 #

clojure
(defn count-words [code]
  (count (re-seq #"[a-zA-Z_]\w*" code)))

(defn extract-strings [code]
  (re-seq #""[^"]*"" code))

(defn find-comments [code]
  (re-seq #";[^\n]*" code))

八、性能考虑 #

8.1 预编译正则 #

clojure
(def email-regex #"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")

(defn valid-email? [s]
  (boolean (re-matches email-regex s)))

8.2 避免贪婪匹配 #

clojure
(str/replace "<div>content</div>" #"<div>.*</div>" "")

(str/replace "<div>content</div>" #"<div>.*?</div>" "")

8.3 使用锚点优化 #

clojure
(re-matches #"^\d+$" "123")

(re-find #"^\d+$" "123")

九、正则表达式工具 #

9.1 测试工具 #

clojure
(defn test-regex [pattern text]
  {:matches (re-matches pattern text)
   :find (re-find pattern text)
   :all (re-seq pattern text)})

(test-regex #"\d+" "abc123def456")

9.2 调试辅助 #

clojure
(defn explain-match [pattern text]
  (let [result (re-matches pattern text)]
    (cond
      (nil? result) "No match"
      (string? result) (str "Full match: " result)
      (vector? result) (str "Matched with groups: " result))))

(explain-match #"(\d+)-(\d+)" "123-456")

十、常见问题 #

10.1 转义问题 #

clojure
(re-pattern "\\d+")

#"\d+"

(re-find #"\." "a.b")

(re-find #"\\" "a\\b")

10.2 多行匹配 #

clojure
(def text "line1\nline2\nline3")

(re-seq #".+" text)

(re-seq #"(?m)^line" text)

10.3 大小写不敏感 #

clojure
(re-find #"(?i)hello" "HELLO World")

(re-find #"(?i)hello" "Hello World")

十一、总结 #

正则表达式函数总结:

函数 用途 返回值
re-matches 完全匹配 字符串或向量
re-find 查找第一个 字符串或向量
re-seq 查找所有 惰性序列
re-matcher 创建Matcher Matcher对象

常用模式:

clojure
#"\d+"           # 数字
#"\w+"           # 单词
#"\s+"           # 空白
#"[a-z]+"        # 小写字母
#"[a-zA-Z0-9]+"  # 字母数字

正则表达式是字符串处理的强大工具,熟练掌握将大大提升文本处理能力!

最后更新:2026-03-27