本文共 1032 字,大约阅读时间需要 3 分钟。
分析txt文件内容,并按照以下规则及过滤器对文档添加对应的html标签
规则:使用正则表达式对文本块内容进行过滤,分别对尖括号内的内容,星号内的内容和网站及邮箱进行过滤
以下四个正则表达式对应过滤条件:r'\<(.+?)\>'r'\*(.+?)\*'r'(http(s){0,1}://[\.a-zA-z0-9/]+)'r'([\.a-zA-z0-9]+@[\.a-zA-z0-9]+[a-zA-z]+)'
各文本块之间用一个或多个空行间隔开,示例文档
Welcome to Foodly ,Inc.There π are the corporate web pages of *Foodly*.We hope you find your stay enjoyable,and that you will sample many of our product.A short history of th company......*Parsing HTML*Use the BeautifulSoup class to parse an HTML document. Here are some of the things that BeautifulSoup knows: - Some tags can be nested () and some can't (). - Table and list tags have a natural nesting order. For instance, tags go inside tags, not the other way around. - The contents of a
执行方式:
python3 markup.py < test_input.txt > test_input.html
test_input.html在谷歌上的显示效果如下
源码仓:转载地址:http://euiti.baihongyu.com/