正则表达式基础 #
一、正则表达式概述 #
Perl内置强大的正则表达式支持,是文本处理的利器。
1.1 基本匹配 #
perl
my $str = "Hello, World!";
if ($str =~ /World/) {
print "Found 'World'\n";
}
1.2 匹配运算符 #
| 运算符 | 说明 |
|---|---|
| =~ | 匹配 |
| !~ | 不匹配 |
perl
my $str = "Hello, World!";
if ($str =~ /Perl/) {
print "Found Perl\n";
} else {
print "Not found\n";
}
if ($str !~ /Perl/) {
print "Does not contain Perl\n";
}
二、元字符 #
2.1 特殊字符 #
| 元字符 | 说明 |
|---|---|
| . | 任意字符(除换行) |
| ^ | 行首 |
| $ | 行尾 |
| \ | 转义字符 |
| | | 或 |
| () | 分组 |
perl
my $str = "Hello, World!";
if ($str =~ /W.rld/) {
print "Matched\n";
}
if ($str =~ /^Hello/) {
print "Starts with Hello\n";
}
if ($str =~ /World!$/) {
print "Ends with World!\n";
}
2.2 量词 #
| 量词 | 说明 |
|---|---|
| * | 0次或多次 |
| + | 1次或多次 |
| ? | 0次或1次 |
| 恰好n次 | |
| 至少n次 | |
| n到m次 |
perl
my $str = "aaaabbbb";
if ($str =~ /a+b+/) {
print "Matched\n";
}
if ($str =~ /a{4}/) {
print "Exactly 4 a's\n";
}
if ($str =~ /b{2,4}/) {
print "2 to 4 b's\n";
}
2.3 贪婪与非贪婪 #
默认贪婪匹配,添加 ? 变为非贪婪:
perl
my $str = "<div>content</div>";
if ($str =~ /<div>(.+)<\/div>/) {
print "Greedy: $1\n";
}
if ($str =~ /<div>(.+?)<\/div>/) {
print "Non-greedy: $1\n";
}
三、字符类 #
3.1 自定义字符类 #
perl
my $str = "Hello123";
if ($str =~ /[aeiou]/) {
print "Contains vowel\n";
}
if ($str =~ /[0-9]/) {
print "Contains digit\n";
}
if ($str =~ /[a-zA-Z]/) {
print "Contains letter\n";
}
3.2 否定字符类 #
perl
my $str = "Hello123";
if ($str =~ /[^0-9]/) {
print "Contains non-digit\n";
}
3.3 预定义字符类 #
| 字符类 | 说明 |
|---|---|
| \d | 数字 [0-9] |
| \D | 非数字 [^0-9] |
| \w | 单词字符 [a-zA-Z0-9_] |
| \W | 非单词字符 |
| \s | 空白字符 |
| \S | 非空白字符 |
perl
my $str = "Hello 123";
if ($str =~ /\d+/) {
print "Contains digits\n";
}
if ($str =~ /\w+/) {
print "Contains word chars\n";
}
if ($str =~ /\s+/) {
print "Contains whitespace\n";
}
3.4 边界匹配 #
| 边界 | 说明 |
|---|---|
| \b | 单词边界 |
| \B | 非单词边界 |
| \A | 字符串开头 |
| \z | 字符串结尾 |
perl
my $str = "hello world";
if ($str =~ /\bworld\b/) {
print "Found word 'world'\n";
}
if ($str =~ /\Ahello/) {
print "Starts with hello\n";
}
四、修饰符 #
4.1 常用修饰符 #
| 修饰符 | 说明 |
|---|---|
| i | 忽略大小写 |
| m | 多行模式 |
| s | 单行模式(.匹配换行) |
| x | 扩展模式(允许空白和注释) |
| g | 全局匹配 |
perl
my $str = "Hello, World!";
if ($str =~ /hello/i) {
print "Matched (case insensitive)\n";
}
my $text = "Line1\nLine2\nLine3";
if ($text =~ /^Line/m) {
print "Multiline match\n";
}
4.2 全局匹配 #
perl
my $str = "hello world hello perl";
while ($str =~ /hello/g) {
print "Found 'hello'\n";
}
my @matches = $str =~ /hello/g;
print "Count: " . scalar(@matches) . "\n";
五、捕获 #
5.1 基本捕获 #
使用 () 捕获匹配内容:
perl
my $str = "Hello, World!";
if ($str =~ /(\w+), (\w+)/) {
print "First: $1\n";
print "Second: $2\n";
}
5.2 多个捕获 #
perl
my $date = "2024-03-27";
if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
print "Year: $1\n";
print "Month: $2\n";
print "Day: $3\n";
}
5.3 命名捕获 #
perl
my $date = "2024-03-27";
if ($date =~ /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/) {
print "Year: $+{year}\n";
print "Month: $+{month}\n";
print "Day: $+{day}\n";
}
5.4 非捕获分组 #
perl
my $str = "hello world";
if ($str =~ /(?:hello|hi) (\w+)/) {
print "Word: $1\n";
}
六、特殊变量 #
6.1 匹配相关变量 #
perl
my $str = "Hello, World!";
if ($str =~ /(\w+)/) {
print "Match: $&\n";
print "Before: $`\n";
print "After: $'\n";
print "Group 1: $1\n";
}
6.2 所有捕获 #
perl
my $str = "a1 b2 c3";
my @captures = $str =~ /(\w)(\d)/g;
print "@captures\n";
七、实践练习 #
练习1:邮箱验证 #
perl
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
my $email = "test@example.com";
if ($email =~ /^[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}$/) {
say "Valid email";
} else {
say "Invalid email";
}
练习2:提取URL #
perl
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
my $text = "Visit https://www.example.com or http://test.org";
while ($text =~ m{(https?://[^\s]+)}g) {
say "Found URL: $1";
}
练习3:单词统计 #
perl
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
my $text = "Hello, World! This is a test.";
my @words = $text =~ /\b\w+\b/g;
my %count;
$count{$_}++ foreach @words;
foreach my $word (sort keys %count) {
say "$word: $count{$word}";
}
八、总结 #
本章学习了:
- 正则表达式基本语法
- 元字符和量词
- 字符类
- 修饰符
- 捕获分组
下一章将学习正则表达式进阶。
最后更新:2026-03-27