正则表达式基础 #

一、正则表达式概述 #

Perl内置强大的正则表达式支持，是文本处理的利器。

1.1 基本匹配 #

perl

my $str = "Hello, World!";

if ($str =~ /World/) {
    print "Found 'World'\n";
}

1.2 匹配运算符 #

运算符	说明
=~	匹配
!~	不匹配

perl

my $str = "Hello, World!";

if ($str =~ /Perl/) {
    print "Found Perl\n";
} else {
    print "Not found\n";
}

if ($str !~ /Perl/) {
    print "Does not contain Perl\n";
}

二、元字符 #

2.1 特殊字符 #

元字符	说明
.	任意字符（除换行）
^	行首
$	行尾
\	转义字符
\|	或
()	分组

perl

my $str = "Hello, World!";

if ($str =~ /W.rld/) {
    print "Matched\n";
}

if ($str =~ /^Hello/) {
    print "Starts with Hello\n";
}

if ($str =~ /World!$/) {
    print "Ends with World!\n";
}

2.2 量词 #

量词	说明
*	0次或多次
+	1次或多次
?	0次或1次
	恰好n次
	至少n次
	n到m次

perl

my $str = "aaaabbbb";

if ($str =~ /a+b+/) {
    print "Matched\n";
}

if ($str =~ /a{4}/) {
    print "Exactly 4 a's\n";
}

if ($str =~ /b{2,4}/) {
    print "2 to 4 b's\n";
}

2.3 贪婪与非贪婪 #

默认贪婪匹配，添加 ? 变为非贪婪：

perl

my $str = "<div>content</div>";

if ($str =~ /<div>(.+)<\/div>/) {
    print "Greedy: $1\n";
}

if ($str =~ /<div>(.+?)<\/div>/) {
    print "Non-greedy: $1\n";
}

三、字符类 #

3.1 自定义字符类 #

perl

my $str = "Hello123";

if ($str =~ /[aeiou]/) {
    print "Contains vowel\n";
}

if ($str =~ /[0-9]/) {
    print "Contains digit\n";
}

if ($str =~ /[a-zA-Z]/) {
    print "Contains letter\n";
}

3.2 否定字符类 #

perl

my $str = "Hello123";

if ($str =~ /[^0-9]/) {
    print "Contains non-digit\n";
}

3.3 预定义字符类 #

字符类	说明
\d	数字 [0-9]
\D	非数字 [^0-9]
\w	单词字符 [a-zA-Z0-9_]
\W	非单词字符
\s	空白字符
\S	非空白字符

perl

my $str = "Hello 123";

if ($str =~ /\d+/) {
    print "Contains digits\n";
}

if ($str =~ /\w+/) {
    print "Contains word chars\n";
}

if ($str =~ /\s+/) {
    print "Contains whitespace\n";
}

3.4 边界匹配 #

边界	说明
\b	单词边界
\B	非单词边界
\A	字符串开头
\z	字符串结尾

perl

my $str = "hello world";

if ($str =~ /\bworld\b/) {
    print "Found word 'world'\n";
}

if ($str =~ /\Ahello/) {
    print "Starts with hello\n";
}

四、修饰符 #

4.1 常用修饰符 #

修饰符	说明
i	忽略大小写
m	多行模式
s	单行模式（.匹配换行）
x	扩展模式（允许空白和注释）
g	全局匹配

perl

my $str = "Hello, World!";

if ($str =~ /hello/i) {
    print "Matched (case insensitive)\n";
}

my $text = "Line1\nLine2\nLine3";
if ($text =~ /^Line/m) {
    print "Multiline match\n";
}

4.2 全局匹配 #

perl

my $str = "hello world hello perl";

while ($str =~ /hello/g) {
    print "Found 'hello'\n";
}

my @matches = $str =~ /hello/g;
print "Count: " . scalar(@matches) . "\n";

五、捕获 #

5.1 基本捕获 #

使用 () 捕获匹配内容：

perl

my $str = "Hello, World!";

if ($str =~ /(\w+), (\w+)/) {
    print "First: $1\n";
    print "Second: $2\n";
}

5.2 多个捕获 #

perl

my $date = "2024-03-27";

if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
    print "Year: $1\n";
    print "Month: $2\n";
    print "Day: $3\n";
}

5.3 命名捕获 #

perl

my $date = "2024-03-27";

if ($date =~ /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/) {
    print "Year: $+{year}\n";
    print "Month: $+{month}\n";
    print "Day: $+{day}\n";
}

5.4 非捕获分组 #

perl

my $str = "hello world";

if ($str =~ /(?:hello|hi) (\w+)/) {
    print "Word: $1\n";
}

六、特殊变量 #

6.1 匹配相关变量 #

perl

my $str = "Hello, World!";

if ($str =~ /(\w+)/) {
    print "Match: $&\n";
    print "Before: $`\n";
    print "After: $'\n";
    print "Group 1: $1\n";
}

6.2 所有捕获 #

perl

my $str = "a1 b2 c3";

my @captures = $str =~ /(\w)(\d)/g;
print "@captures\n";

七、实践练习 #

练习1：邮箱验证 #

perl

#!/usr/bin/perl
use strict;
use warnings;
use v5.10;

my $email = "test@example.com";

if ($email =~ /^[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}$/) {
    say "Valid email";
} else {
    say "Invalid email";
}

练习2：提取URL #

perl

#!/usr/bin/perl
use strict;
use warnings;
use v5.10;

my $text = "Visit https://www.example.com or http://test.org";

while ($text =~ m{(https?://[^\s]+)}g) {
    say "Found URL: $1";
}

练习3：单词统计 #

perl

#!/usr/bin/perl
use strict;
use warnings;
use v5.10;

my $text = "Hello, World! This is a test.";

my @words = $text =~ /\b\w+\b/g;
my %count;

$count{$_}++ foreach @words;

foreach my $word (sort keys %count) {
    say "$word: $count{$word}";
}

八、总结 #

本章学习了：

正则表达式基本语法
元字符和量词
字符类
修饰符
捕获分组

下一章将学习正则表达式进阶。