[Epub]-數字出版物製作-網頁版-[2]

需求：把上一篇做網頁的步驟自動化，這裏我用的是python，使用beautifulsoup庫處理html，以及Pillow庫處理圖片（進行縮放等）。

素材準備

假定數字出版物的章節中包括文字，圖片，視頻音頻四個部分，分別保存在四個文件夾中，編寫代碼自動讀取這些素材，用beautifsoup創建標籤包裹內容，再插入到模板html中去最終生成一個完整的html頁面，文件結構大致如下：

.
├── base.txt #模板文件的純文本形式
├── Charpter1.html #生成的html頁面
├── Charpter2.html
├── main.py #主程序
├── pics #圖片素材文件夾
│   ├── cover.jpg
│   ├── pic1.jpg
│   ├── pic2.jpg
│   ├── pic3.jpg
│   ├── pic4.jpg
│   ├── pic5.jpg
│   ├── pic6.jpg
│   └── small_pic6.jpg
├── README.md #說明文檔
├── requirement.txt #庫需求文檔
├── sounds #音頻素材文件夾
│   └── sound1.mp3
├── text #文本素材文件夾
│   ├── Charpter1.txt
│   └── Charpter2.txt
├── videos #視頻素材文件夾
│   └── video1.mp4

讀取模板和文字內容

# read html template text files
with open('base.txt', 'r+') as f:
    text = f.read()
# read html template
temp = BeautifulSoup(text, "lxml")

# open file and read paragraphs
with open(os.path.join('./text/', filename), 'r+') as f:
    paras = [p.strip() for p in f.readlines() if len(p) > 3]

# replace cover img
cover = temp.find('img', {'id': 'cover'})
cover['src'] = './pics/cover.jpg'

# handle title
title = temp.find('h3')
title.string = paras[0]

插入圖片

插入圖片需要判斷圖片在文章的哪個部分，所以需要在文字中標示出來

# handle paras
textbox = temp.find('div', {'id': 'text'})
count = [0,0]
for i in range(1, len(paras)):
    new_p = temp.new_tag('p')
    new_br = temp.new_tag('br')
    new_p.string = paras[i]
    # handle img in text
    img_result = insert_img('pic', paras[i], temp, count)
    new_img_div, count = img_result[0], img_result[1]
    if new_img_div:
        textbox.append(new_img_div)
    textbox.append(new_p)
    textbox.append(new_br)

以上代碼首先找到放置文字的div塊，然後讀取每一段文字，在insert_img方法中判斷其中有沒有圖片關鍵字，根據返回值確定，如果有的話生成圖片div並插入在文字部分之前，以下是insert_img方法：

def insert_img(img_keyword, para, temp, count):
    """
    :param img_keyword:word for search in text to show here should be a picture, such as 'img', 'pic', '圖片'
    :param para:one paragraphs in a chapter.
    :param temp: template of html
    :param count: count for img at left or right side
    :return new_div: create a tag of the picture, to insert into html.
    """
    if img_keyword in para:
        # search pic id in current para, like 'pic1','img1'
        pic_id = re.search(img_keyword + r'(\d+)', para).group()
        print '==========insert img ' + pic_id + '=========='
        # get path of the pic, like './pics/pic1'
        pic_url = [
            url for url in os.listdir('./pics') if url.startswith(pic_id)][0]
        # use pillow lib to open the pic
        im = Image.open(os.path.join('./pics', pic_url))
        # decide where to locate the pic
        # rules: 1. if picture's width > 1/3 of the browser width
        # and picture's width > height: locate it center
        # 2. if picture's width > 1/3 of the browser width
        # and picture's width < height:zoom the pic and locate it at side
        # 3. if picture's width <1/3 of the browser width : locate it at side
        # 4. when locate pictures at side ,put it at left first, then right.
        if im.size[0] > 400 and im.size[0] > im.size[1]:
            # create a div to put the img
            new_div = temp.new_tag('div')
            # create a img tag
            new_pic = temp.new_tag('img', src='./pics/' + pic_url)
            # add class to div
            new_div['class'] = 'pic_in_text_center'
            # add img to div
            new_div.append(new_pic)
        elif im.size[0] > 400 and im.size[0] < im.size[1]:
            new_pic_url = 'small_' + pic_url
            im = change_img_size(im, new_pic_url)
            im.save(os.path.join('./pics', new_pic_url))
            if count[0] > count[1]:
                new_div = temp.new_tag('img', src='./pics/' + new_pic_url)
                new_div['class'] = 'pic_in_text_right'
                count[1] += 1
            else:
                new_div = temp.new_tag('img', src='./pics/' + new_pic_url)
                new_div['class'] = 'pic_in_text_left'
                count[0] += 1
        else:
            if count[0] > count[1]:
                new_div = temp.new_tag('img', src='./pics/' + pic_url)
                new_div['class'] = 'pic_in_text_right'
                count[1] += 1
            else:
                new_div = temp.new_tag('img', src='./pics/' + pic_url)
                new_div['class'] = 'pic_in_text_left'
                count[0] += 1
        return new_div, count
    else:
        return None, count

這裏的count，是用來判斷圖片應該插在左邊還是右邊的一個參數。

插入視頻音頻

和插入圖片類似：

def insert_sound(sound_keyword, para, temp):
    """
    :param sound_keyword:word for search in text to show here should be a sound file, such as 'sound', 'music', '音樂'
    :param para:one paragraphs in a chapter.
    :param temp: template of html
    :return new_div: create a tag of the sound, to insert into html.
    """
    if sound_keyword in para:
        # search sound id in current para, like 'sound1','img1'
        sound_id = re.search(sound_keyword + r'(\d+)', para).group()
        print '==========insert sound ' + sound_id + '=========='
        # get path of the sound, like './sounds/sound1'
        sound_url = [
            url for url in os.listdir('./sounds') if url.startswith(sound_id)][0]
        new_div = temp.new_tag('audio', src='./sounds/' + sound_url, controls="controls")
        new_div['class'] = 'sound_in_text'
        return new_div
    else:
        return None


def insert_video(video_keyword, para, temp):
    """
    :param video_keyword:word for search in text to show here should be a video file, such as 'video', 'music', '音樂'
    :param para:one paragraphs in a chapter.
    :param temp: template of html
    :return new_div: create a tag of the video, to insert into html.
    """
    if video_keyword in para:
        # search video id in current para, like 'video1','img1'
        video_id = re.search(video_keyword + r'(\d+)', para).group()
        print '==========insert video ' + video_id + '=========='
        # get path of the video, like './videos/video1'
        video_url = [
            url for url in os.listdir('./videos') if url.startswith(video_id)][0]
        new_div = temp.new_tag(
            'video', 
            src='./videos/' + video_url, 
            controls="controls", 
            width="600", 
            height="450"
            )
        new_div['class'] = 'video_in_text'
        return new_div
    else:
        return None

處理文字時：

for i in range(1, len(paras)):
    new_p = temp.new_tag('p')
    new_br = temp.new_tag('br')
    new_p.string = paras[i]
    # handle img in text
    img_result = insert_img('pic', paras[i], temp, count)
    new_img_div, count = img_result[0], img_result[1]
    if new_img_div:
        textbox.append(new_img_div)
    new_sound_div = insert_sound('sound', paras[i], temp)
    if new_sound_div:
        textbox.append(new_sound_div)
    new_video_div = insert_video('video', paras[i], temp)
    if new_video_div:
        textbox.append(new_video_div)
    textbox.append(new_p)
    textbox.append(new_br)

完成後，再將所有代碼寫入html文件：

 with open(filename[:-4] + '.html', 'w+') as f:
        f.write(temp.prettify("utf-8"))
        print '==========finish ' + filename + '=========='

zhu_free

發佈了74 篇原創文章 · 獲贊 18 · 訪問量 21萬+

私信關注

[Epub]-數字出版物製作-網頁版-[2]

素材準備

讀取模板和文字內容

插入圖片

插入視頻音頻

Android啓動過程-萬字長文(Android14)

【SQL進階】CASE語句的使用

optional install error: Error: Unsupported URL Type: npm:vue-loader@^16.1.0

這種嵌套字典類型的數據，我想把它讀取到df裏，如何操作？

微調真的能讓LLM學到新東西嗎:引入新知識可能讓模型產生更多的幻覺

iNeuOS工業互聯網操作系統，增加電力IEC104協議

微服務實踐k8s&dapr開發部署實驗（3）訂閱發佈

chromedriver版本

kbgressdb之數據結構V0.2

《CSS權威指南》讀書筆記

c++/java/python 實現二分查找

requests有關cookie的使用

ubuntu下boot分區空間不足問題解決

python全局可變變量的坑

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結