如何用Python解析HTML?

htmlnames = []

imagelist = []

tempstring = ''

filenames = os.listdir('/home/gregp/development/Scribus15x/doc/en/')

for name in filenames:

if name.endswith('.html'):

htmlnames.append(name)

#print htmlnames

for htmlfile in htmlnames:

all_text = open('/home/gregp/development/Scribus15x/doc/en/' + htmlfile).read()

linelength = len(all_text)

index = 3

while index < linelength:

if (all_text[index] == '='):

if (all_text[index-3] == 's') and (all_text[index-2] == 'r') and

(all_text[index-1] == 'c'):

imagefound(all_text, imagelist, index)

　　推荐阅读

　　苹果秘密专利曝光：一支能在空中写字的Apple Pen

沙龙晃荡 | 3月31日京东、微博、华为拭魅战专家与你合营商量容器技巧实践！ 2015年，固执的苹不雅也推出了带压力感应的手写笔，然则迄今为止，这个笔如今只能在iPad Pro上应用，这让苹不雅>>>详细阅读

地址：http://www.17bianji.com/lsqh/40963.html