正则表达式基础 #

一、正则表达式概述 #

Perl内置强大的正则表达式支持,是文本处理的利器。

1.1 基本匹配 #

perl
my $str = "Hello, World!";

if ($str =~ /World/) {
    print "Found 'World'\n";
}

1.2 匹配运算符 #

运算符 说明
=~ 匹配
!~ 不匹配
perl
my $str = "Hello, World!";

if ($str =~ /Perl/) {
    print "Found Perl\n";
} else {
    print "Not found\n";
}

if ($str !~ /Perl/) {
    print "Does not contain Perl\n";
}

二、元字符 #

2.1 特殊字符 #

元字符 说明
. 任意字符(除换行)
^ 行首
$ 行尾
\ 转义字符
|
() 分组
perl
my $str = "Hello, World!";

if ($str =~ /W.rld/) {
    print "Matched\n";
}

if ($str =~ /^Hello/) {
    print "Starts with Hello\n";
}

if ($str =~ /World!$/) {
    print "Ends with World!\n";
}

2.2 量词 #

量词 说明
* 0次或多次
+ 1次或多次
? 0次或1次
恰好n次
至少n次
n到m次
perl
my $str = "aaaabbbb";

if ($str =~ /a+b+/) {
    print "Matched\n";
}

if ($str =~ /a{4}/) {
    print "Exactly 4 a's\n";
}

if ($str =~ /b{2,4}/) {
    print "2 to 4 b's\n";
}

2.3 贪婪与非贪婪 #

默认贪婪匹配,添加 ? 变为非贪婪:

perl
my $str = "<div>content</div>";

if ($str =~ /<div>(.+)<\/div>/) {
    print "Greedy: $1\n";
}

if ($str =~ /<div>(.+?)<\/div>/) {
    print "Non-greedy: $1\n";
}

三、字符类 #

3.1 自定义字符类 #

perl
my $str = "Hello123";

if ($str =~ /[aeiou]/) {
    print "Contains vowel\n";
}

if ($str =~ /[0-9]/) {
    print "Contains digit\n";
}

if ($str =~ /[a-zA-Z]/) {
    print "Contains letter\n";
}

3.2 否定字符类 #

perl
my $str = "Hello123";

if ($str =~ /[^0-9]/) {
    print "Contains non-digit\n";
}

3.3 预定义字符类 #

字符类 说明
\d 数字 [0-9]
\D 非数字 [^0-9]
\w 单词字符 [a-zA-Z0-9_]
\W 非单词字符
\s 空白字符
\S 非空白字符
perl
my $str = "Hello 123";

if ($str =~ /\d+/) {
    print "Contains digits\n";
}

if ($str =~ /\w+/) {
    print "Contains word chars\n";
}

if ($str =~ /\s+/) {
    print "Contains whitespace\n";
}

3.4 边界匹配 #

边界 说明
\b 单词边界
\B 非单词边界
\A 字符串开头
\z 字符串结尾
perl
my $str = "hello world";

if ($str =~ /\bworld\b/) {
    print "Found word 'world'\n";
}

if ($str =~ /\Ahello/) {
    print "Starts with hello\n";
}

四、修饰符 #

4.1 常用修饰符 #

修饰符 说明
i 忽略大小写
m 多行模式
s 单行模式(.匹配换行)
x 扩展模式(允许空白和注释)
g 全局匹配
perl
my $str = "Hello, World!";

if ($str =~ /hello/i) {
    print "Matched (case insensitive)\n";
}

my $text = "Line1\nLine2\nLine3";
if ($text =~ /^Line/m) {
    print "Multiline match\n";
}

4.2 全局匹配 #

perl
my $str = "hello world hello perl";

while ($str =~ /hello/g) {
    print "Found 'hello'\n";
}

my @matches = $str =~ /hello/g;
print "Count: " . scalar(@matches) . "\n";

五、捕获 #

5.1 基本捕获 #

使用 () 捕获匹配内容:

perl
my $str = "Hello, World!";

if ($str =~ /(\w+), (\w+)/) {
    print "First: $1\n";
    print "Second: $2\n";
}

5.2 多个捕获 #

perl
my $date = "2024-03-27";

if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
    print "Year: $1\n";
    print "Month: $2\n";
    print "Day: $3\n";
}

5.3 命名捕获 #

perl
my $date = "2024-03-27";

if ($date =~ /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/) {
    print "Year: $+{year}\n";
    print "Month: $+{month}\n";
    print "Day: $+{day}\n";
}

5.4 非捕获分组 #

perl
my $str = "hello world";

if ($str =~ /(?:hello|hi) (\w+)/) {
    print "Word: $1\n";
}

六、特殊变量 #

6.1 匹配相关变量 #

perl
my $str = "Hello, World!";

if ($str =~ /(\w+)/) {
    print "Match: $&\n";
    print "Before: $`\n";
    print "After: $'\n";
    print "Group 1: $1\n";
}

6.2 所有捕获 #

perl
my $str = "a1 b2 c3";

my @captures = $str =~ /(\w)(\d)/g;
print "@captures\n";

七、实践练习 #

练习1:邮箱验证 #

perl
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;

my $email = "test@example.com";

if ($email =~ /^[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}$/) {
    say "Valid email";
} else {
    say "Invalid email";
}

练习2:提取URL #

perl
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;

my $text = "Visit https://www.example.com or http://test.org";

while ($text =~ m{(https?://[^\s]+)}g) {
    say "Found URL: $1";
}

练习3:单词统计 #

perl
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;

my $text = "Hello, World! This is a test.";

my @words = $text =~ /\b\w+\b/g;
my %count;

$count{$_}++ foreach @words;

foreach my $word (sort keys %count) {
    say "$word: $count{$word}";
}

八、总结 #

本章学习了:

  • 正则表达式基本语法
  • 元字符和量词
  • 字符类
  • 修饰符
  • 捕获分组

下一章将学习正则表达式进阶。

最后更新:2026-03-27