需求:把上一篇做網頁的步驟自動化,這裏我用的是python,使用beautifulsoup庫處理html,以及Pillow庫處理圖片(進行縮放等)。
素材準備
假定數字出版物的章節中包括文字,圖片,視頻音頻四個部分,分別保存在四個文件夾中,編寫代碼自動讀取這些素材,用beautifsoup創建標籤包裹內容,再插入到模板html中去最終生成一個完整的html頁面,文件結構大致如下:
.
├── base.txt #模板文件的純文本形式
├── Charpter1.html #生成的html頁面
├── Charpter2.html
├── main.py #主程序
├── pics #圖片素材文件夾
│ ├── cover.jpg
│ ├── pic1.jpg
│ ├── pic2.jpg
│ ├── pic3.jpg
│ ├── pic4.jpg
│ ├── pic5.jpg
│ ├── pic6.jpg
│ └── small_pic6.jpg
├── README.md #說明文檔
├── requirement.txt #庫需求文檔
├── sounds #音頻素材文件夾
│ └── sound1.mp3
├── text #文本素材文件夾
│ ├── Charpter1.txt
│ └── Charpter2.txt
├── videos #視頻素材文件夾
│ └── video1.mp4
讀取模板和文字內容
# read html template text files
with open('base.txt', 'r+') as f:
text = f.read()
# read html template
temp = BeautifulSoup(text, "lxml")
# open file and read paragraphs
with open(os.path.join('./text/', filename), 'r+') as f:
paras = [p.strip() for p in f.readlines() if len(p) > 3]
# replace cover img
cover = temp.find('img', {'id': 'cover'})
cover['src'] = './pics/cover.jpg'
# handle title
title = temp.find('h3')
title.string = paras[0]
插入圖片
插入圖片需要判斷圖片在文章的哪個部分,所以需要在文字中標示出來
# handle paras
textbox = temp.find('div', {'id': 'text'})
count = [0,0]
for i in range(1, len(paras)):
new_p = temp.new_tag('p')
new_br = temp.new_tag('br')
new_p.string = paras[i]
# handle img in text
img_result = insert_img('pic', paras[i], temp, count)
new_img_div, count = img_result[0], img_result[1]
if new_img_div:
textbox.append(new_img_div)
textbox.append(new_p)
textbox.append(new_br)
以上代碼首先找到放置文字的div塊,然後讀取每一段文字,在insert_img方法中判斷其中有沒有圖片關鍵字,根據返回值確定,如果有的話生成圖片div並插入在文字部分之前,以下是insert_img方法:
def insert_img(img_keyword, para, temp, count):
"""
:param img_keyword:word for search in text to show here should be a picture, such as 'img', 'pic', '圖片'
:param para:one paragraphs in a chapter.
:param temp: template of html
:param count: count for img at left or right side
:return new_div: create a tag of the picture, to insert into html.
"""
if img_keyword in para:
# search pic id in current para, like 'pic1','img1'
pic_id = re.search(img_keyword + r'(\d+)', para).group()
print '==========insert img ' + pic_id + '=========='
# get path of the pic, like './pics/pic1'
pic_url = [
url for url in os.listdir('./pics') if url.startswith(pic_id)][0]
# use pillow lib to open the pic
im = Image.open(os.path.join('./pics', pic_url))
# decide where to locate the pic
# rules: 1. if picture's width > 1/3 of the browser width
# and picture's width > height: locate it center
# 2. if picture's width > 1/3 of the browser width
# and picture's width < height:zoom the pic and locate it at side
# 3. if picture's width <1/3 of the browser width : locate it at side
# 4. when locate pictures at side ,put it at left first, then right.
if im.size[0] > 400 and im.size[0] > im.size[1]:
# create a div to put the img
new_div = temp.new_tag('div')
# create a img tag
new_pic = temp.new_tag('img', src='./pics/' + pic_url)
# add class to div
new_div['class'] = 'pic_in_text_center'
# add img to div
new_div.append(new_pic)
elif im.size[0] > 400 and im.size[0] < im.size[1]:
new_pic_url = 'small_' + pic_url
im = change_img_size(im, new_pic_url)
im.save(os.path.join('./pics', new_pic_url))
if count[0] > count[1]:
new_div = temp.new_tag('img', src='./pics/' + new_pic_url)
new_div['class'] = 'pic_in_text_right'
count[1] += 1
else:
new_div = temp.new_tag('img', src='./pics/' + new_pic_url)
new_div['class'] = 'pic_in_text_left'
count[0] += 1
else:
if count[0] > count[1]:
new_div = temp.new_tag('img', src='./pics/' + pic_url)
new_div['class'] = 'pic_in_text_right'
count[1] += 1
else:
new_div = temp.new_tag('img', src='./pics/' + pic_url)
new_div['class'] = 'pic_in_text_left'
count[0] += 1
return new_div, count
else:
return None, count
這裏的count,是用來判斷圖片應該插在左邊還是右邊的一個參數。
插入視頻音頻
和插入圖片類似:
def insert_sound(sound_keyword, para, temp):
"""
:param sound_keyword:word for search in text to show here should be a sound file, such as 'sound', 'music', '音樂'
:param para:one paragraphs in a chapter.
:param temp: template of html
:return new_div: create a tag of the sound, to insert into html.
"""
if sound_keyword in para:
# search sound id in current para, like 'sound1','img1'
sound_id = re.search(sound_keyword + r'(\d+)', para).group()
print '==========insert sound ' + sound_id + '=========='
# get path of the sound, like './sounds/sound1'
sound_url = [
url for url in os.listdir('./sounds') if url.startswith(sound_id)][0]
new_div = temp.new_tag('audio', src='./sounds/' + sound_url, controls="controls")
new_div['class'] = 'sound_in_text'
return new_div
else:
return None
def insert_video(video_keyword, para, temp):
"""
:param video_keyword:word for search in text to show here should be a video file, such as 'video', 'music', '音樂'
:param para:one paragraphs in a chapter.
:param temp: template of html
:return new_div: create a tag of the video, to insert into html.
"""
if video_keyword in para:
# search video id in current para, like 'video1','img1'
video_id = re.search(video_keyword + r'(\d+)', para).group()
print '==========insert video ' + video_id + '=========='
# get path of the video, like './videos/video1'
video_url = [
url for url in os.listdir('./videos') if url.startswith(video_id)][0]
new_div = temp.new_tag(
'video',
src='./videos/' + video_url,
controls="controls",
width="600",
height="450"
)
new_div['class'] = 'video_in_text'
return new_div
else:
return None
處理文字時:
for i in range(1, len(paras)):
new_p = temp.new_tag('p')
new_br = temp.new_tag('br')
new_p.string = paras[i]
# handle img in text
img_result = insert_img('pic', paras[i], temp, count)
new_img_div, count = img_result[0], img_result[1]
if new_img_div:
textbox.append(new_img_div)
new_sound_div = insert_sound('sound', paras[i], temp)
if new_sound_div:
textbox.append(new_sound_div)
new_video_div = insert_video('video', paras[i], temp)
if new_video_div:
textbox.append(new_video_div)
textbox.append(new_p)
textbox.append(new_br)
完成後,再將所有代碼寫入html文件:
with open(filename[:-4] + '.html', 'w+') as f:
f.write(temp.prettify("utf-8"))
print '==========finish ' + filename + '=========='