2.3. 构建¶
本节我们将解释构建项目的各种选项。这些选项可以分为四类:
完整性检查
d2lbook build linkcheck
将检查所有内部和外部链接是否可访问。d2lbook build outputcheck
将检查是否没有 notebook 包含代码输出。
构建结果
d2lbook build html
:将 HTML 版本构建到_build/html
d2lbook build pdf
:将 PDF 版本构建到_build/pdf
d2lbook build pkg
:构建一个包含所有.ipynb
notebook 的 zip 文件。
附加功能
d2lbook build colab
:将所有可以在 Google Colab 上运行的 notebook 转换到_build/colab
。更多信息请参见 第 2.9 节。d2lbook build lib
:构建一个 Python 包,以便我们可以在其他 notebook 中重用代码。更多信息请参见 XXX。
内部阶段,通常会自动触发。
d2lbook build eval
:评估所有 notebook 并将它们作为.ipynb
notebook 保存到_build/eval
。d2lbook build rst
:将所有 notebook 转换为rst
文件,并在_build/rst
中创建一个 Sphinx 项目。
2.3.1. 构建缓存¶
我们鼓励你评估你的 notebook 以获取代码单元格的结果,而不是将这些结果保留在源文件中,原因有二:1. 这些结果使代码审查变得困难,特别是当它们因数值精度或随机数生成器而具有随机性时。 2. 长时间未评估的 notebook 可能会因为包升级而损坏。
但是评估会在构建过程中产生额外的开销。我们建议将每个 notebook 的运行时间限制在几分钟内。而 d2lbook
将重用之前构建的内容,只评估已修改的 notebook。
例如,在 《动手学深度学习》 中,一个 notebook(章节)在 GPU 机器上的平均运行时间约为 2 分钟,这是因为需要训练神经网络。它包含 100 多个 notebook,这使得总运行时间成本达到 2-3 小时。实际上,每次代码更改只会修改少数几个 notebook,因此构建时间通常少于 10 分钟。
让我们看看它是如何工作的。首先,像我们在 第 2.1 节 中那样创建一个项目。
!mkdir -p cache
%%writefile cache/index.md
# My Book
The starting page of my book with `d2lbook`.
````toc
get_started
````
Writing cache/index.md
%%writefile cache/get_started.md
# Getting Started
Please first install my favorite package `numpy`.
Writing cache/get_started.md
!cd cache; d2lbook build html
[d2lbook:build.py:L147] INFO 2 notebooks are outdated
[d2lbook:build.py:L149] INFO [1] ./get_started.md
[d2lbook:build.py:L149] INFO [2] ./index.md
[d2lbook:build.py:L153] INFO Evaluating notebooks in parallel with 8 CPU workers and 8 GPU workers
[d2lbook:resource.py:L196] INFO Starting task "Evaluating ./get_started.md" on CPU [0]
[d2lbook:resource.py:L159] INFO Status: 1 running tasks, 0 done, 1 not started
[d2lbook:resource.py:L164] INFO - Task "Evaluating ./get_started.md" on CPU [0] is running for 00:00:00
[d2lbook:resource.py:L196] INFO Starting task "Evaluating ./index.md" on CPU [3]
[d2lbook:resource.py:L159] INFO Status: 2 running tasks, 0 done, 0 not started
[d2lbook:resource.py:L164] INFO - Task "Evaluating ./get_started.md" on CPU [0] is running for 00:00:02
[d2lbook:resource.py:L164] INFO - Task "Evaluating ./index.md" on CPU [3] is running for 00:00:00
[d2lbook:resource.py:L223] INFO Task "Evaluating ./get_started.md" on CPU [0] is finished in 00:00:03
[d2lbook:resource.py:L223] INFO Task "Evaluating ./index.md" on CPU [3] is finished in 00:00:02
[d2lbook:resource.py:L142] INFO All 2 tasks are done, sorting by runtime:
[d2lbook:resource.py:L148] INFO - 00:00:02 on CPU [3] for Evaluating ./index.md
[d2lbook:resource.py:L148] INFO - 00:00:03 on CPU [0] for Evaluating ./get_started.md
[d2lbook:build.py:L56] INFO === Finished "d2lbook build eval" in 00:00:13
[d2lbook:build.py:L322] INFO 2 rst files are outdated
[d2lbook:build.py:L324] INFO Convert _build/eval/index.ipynb to _build/rst/index.rst
[d2lbook:build.py:L324] INFO Convert _build/eval/get_started.ipynb to _build/rst/get_started.rst
[d2lbook:build.py:L56] INFO === Finished "d2lbook build rst" in 00:00:14
[d2lbook:build.py:L56] INFO === Finished "d2lbook build ipynb" in 00:00:00
[d2lbook:build.py:L56] INFO === Finished "d2lbook build colab" in 00:00:00
[d2lbook:build.py:L56] INFO === Finished "d2lbook build sagemaker" in 00:00:00
Running Sphinx v5.3.0
making output directory... done
checking bibtex cache... out of date
parsing bibtex file /home/d2l-worker/workspace/d2l-book/docs/_build/eval/user/cache/_build/rst... WARNING: could not open bibtex file /home/d2l-worker/workspace/d2l-book/docs/_build/eval/user/cache/_build/rst.
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2 source files that are out of date
updating environment: [new config] 2 added, 0 changed, 0 removed
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
generating indices... genindex done
writing additional pages... search done
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 1 warning.
The HTML pages are in _build/html.
[d2lbook:build.py:L56] INFO === Finished "d2lbook build html" in 00:00:15
你可以看到 index.md
被评估了。(虽然它不包含代码,但将其作为 Jupyter notebook 评估也是可以的。)
如果再次构建,我们会看到没有 notebook 会被评估。
!cd cache; d2lbook build html
[d2lbook:build.py:L147] INFO 0 notebooks are outdated
[d2lbook:build.py:L153] INFO Evaluating notebooks in parallel with 8 CPU workers and 8 GPU workers
[d2lbook:build.py:L56] INFO === Finished "d2lbook build eval" in 00:00:00
[d2lbook:build.py:L322] INFO 0 rst files are outdated
[d2lbook:build.py:L56] INFO === Finished "d2lbook build rst" in 00:00:00
[d2lbook:build.py:L56] INFO === Finished "d2lbook build ipynb" in 00:00:00
[d2lbook:build.py:L56] INFO === Finished "d2lbook build colab" in 00:00:00
[d2lbook:build.py:L56] INFO === Finished "d2lbook build sagemaker" in 00:00:00
Running Sphinx v5.3.0
loading pickled environment... checking bibtex cache... up to date
done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
no targets are out of date.
build succeeded.
The HTML pages are in _build/html.
[d2lbook:build.py:L56] INFO === Finished "d2lbook build html" in 00:00:00
现在让我们修改 get_started.md
,你会看到它将被重新评估,但 index.md
不会。
%%writefile cache/get_started.md
# Getting Started
Please first install my favorite package `numpy>=1.18`.
Overwriting cache/get_started.md
!cd cache; d2lbook build html
[d2lbook:build.py:L147] INFO 1 notebooks are outdated
[d2lbook:build.py:L149] INFO [1] ./get_started.md
[d2lbook:build.py:L153] INFO Evaluating notebooks in parallel with 8 CPU workers and 8 GPU workers
[d2lbook:resource.py:L196] INFO Starting task "Evaluating ./get_started.md" on CPU [7]
[d2lbook:resource.py:L159] INFO Status: 1 running tasks, 0 done, 0 not started
[d2lbook:resource.py:L164] INFO - Task "Evaluating ./get_started.md" on CPU [7] is running for 00:00:00
[d2lbook:resource.py:L223] INFO Task "Evaluating ./get_started.md" on CPU [7] is finished in 00:00:02
[d2lbook:resource.py:L142] INFO All 1 tasks are done, sorting by runtime:
[d2lbook:resource.py:L148] INFO - 00:00:02 on CPU [7] for Evaluating ./get_started.md
[d2lbook:build.py:L56] INFO === Finished "d2lbook build eval" in 00:00:03
[d2lbook:build.py:L322] INFO 1 rst files are outdated
[d2lbook:build.py:L324] INFO Convert _build/eval/get_started.ipynb to _build/rst/get_started.rst
[d2lbook:build.py:L56] INFO === Finished "d2lbook build rst" in 00:00:03
[d2lbook:build.py:L56] INFO === Finished "d2lbook build ipynb" in 00:00:00
[d2lbook:build.py:L56] INFO === Finished "d2lbook build colab" in 00:00:00
[d2lbook:build.py:L56] INFO === Finished "d2lbook build sagemaker" in 00:00:00
Running Sphinx v5.3.0
loading pickled environment... checking bibtex cache... up to date
done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: 0 added, 1 changed, 0 removed
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
generating indices... genindex done
writing additional pages... search done
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.
The HTML pages are in _build/html.
[d2lbook:build.py:L56] INFO === Finished "d2lbook build html" in 00:00:04
触发整个构建的一种方法是删除 _build/eval
中保存的 notebook,或者简单地删除 _build
。另一种方法是指定一些依赖项。例如,在下面的单元格中,我们将 config.ini
添加到依赖项中。每当 config.ini
被修改时,它将使所有 notebook 的缓存失效,并从头开始触发构建。
%%writefile cache/config.ini
[build]
dependencies = config.ini
Writing cache/config.ini
!cd cache; d2lbook build html
[d2lbook:config.py:L12] INFO Load configure from config.ini
[d2lbook:build.py:L147] INFO 2 notebooks are outdated
[d2lbook:build.py:L149] INFO [1] ./get_started.md
[d2lbook:build.py:L149] INFO [2] ./index.md
[d2lbook:build.py:L153] INFO Evaluating notebooks in parallel with 8 CPU workers and 8 GPU workers
[d2lbook:resource.py:L196] INFO Starting task "Evaluating ./get_started.md" on CPU [5]
[d2lbook:resource.py:L159] INFO Status: 1 running tasks, 0 done, 1 not started
[d2lbook:resource.py:L164] INFO - Task "Evaluating ./get_started.md" on CPU [5] is running for 00:00:00
[d2lbook:resource.py:L196] INFO Starting task "Evaluating ./index.md" on CPU [2]
[d2lbook:resource.py:L159] INFO Status: 2 running tasks, 0 done, 0 not started
[d2lbook:resource.py:L164] INFO - Task "Evaluating ./get_started.md" on CPU [5] is running for 00:00:02
[d2lbook:resource.py:L164] INFO - Task "Evaluating ./index.md" on CPU [2] is running for 00:00:00
[d2lbook:resource.py:L223] INFO Task "Evaluating ./get_started.md" on CPU [5] is finished in 00:00:03
[d2lbook:resource.py:L223] INFO Task "Evaluating ./index.md" on CPU [2] is finished in 00:00:02
[d2lbook:resource.py:L142] INFO All 2 tasks are done, sorting by runtime:
[d2lbook:resource.py:L148] INFO - 00:00:02 on CPU [2] for Evaluating ./index.md
[d2lbook:resource.py:L148] INFO - 00:00:03 on CPU [5] for Evaluating ./get_started.md
[d2lbook:build.py:L56] INFO === Finished "d2lbook build eval" in 00:00:05
[d2lbook:build.py:L322] INFO 2 rst files are outdated
[d2lbook:build.py:L324] INFO Convert _build/eval/get_started.ipynb to _build/rst/get_started.rst
[d2lbook:build.py:L324] INFO Convert _build/eval/index.ipynb to _build/rst/index.rst
[d2lbook:build.py:L56] INFO === Finished "d2lbook build rst" in 00:00:05
[d2lbook:build.py:L56] INFO === Finished "d2lbook build ipynb" in 00:00:00
[d2lbook:build.py:L56] INFO === Finished "d2lbook build colab" in 00:00:00
[d2lbook:build.py:L56] INFO === Finished "d2lbook build sagemaker" in 00:00:00
Running Sphinx v5.3.0
loading pickled environment... checking bibtex cache... up to date
done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2 source files that are out of date
updating environment: 0 added, 2 changed, 0 removed
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
generating indices... genindex done
writing additional pages... search done
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.
The HTML pages are in _build/html.
[d2lbook:build.py:L56] INFO === Finished "d2lbook build html" in 00:00:06
最后,让我们清理我们的工作区。
!rm -rf cache