Workload generation
/src/synth-benchmark-1.ts
generates documents with a variable number of paragraphs (each containing a hard-coded version of Lorem Ipsum, ie without our shortcode).
First, we generate a number of synthetic documents:
Code
import os
import pathlib
import pandas as pd
import timeit
import glob
support_path = pathlib.Path("../_supporting_docs/synth-benchmark-1" )
(support_path / "inputs" ).mkdir(parents= True , exist_ok= True )
(support_path / "outputs" ).mkdir(parents= True , exist_ok= True )
for work_factor in range (7 , 13 ):
# I know.
cmd = "quarto run ../src/synth-benchmark-1.ts %s > ../_supporting_docs/synth-benchmark-1/inputs/_size- %s .qmd" % (1 << work_factor, work_factor)
result = os.system(cmd)
if result != 0 :
raise Exception ("os.system failed: %s " % result)
Benchmarks
Pandoc
Then, we run Pandoc with a varying number of settings to attempt to control for different parts of Pandoc’s pipeline:
Code
cols = {
"work_factor" : [],
"html_writer" : [],
"markdown_reader" : [],
"json_reader" : [],
"json_writer" : [],
"file_size" : []
}
def time_call(s):
def call_it():
result = os.system(s)
if result != 0 :
raise Exception ("call to ' %s ' failed with exit code %s " % (s, result))
return timeit.timeit(lambda : os.system(s), number= 5 ) / 5
for f in glob.glob("../_supporting_docs/synth-benchmark-1/inputs/*.qmd" ):
p = pathlib.Path(f)
json_output = p.parent.parent / "outputs" / p.name.replace(".qmd" , ".json" )
cols['work_factor' ].append(int (f.split("-" )[- 1 ].split("." )[0 ]))
cols['file_size' ].append(p.stat().st_size)
md_reader_time = time_call("quarto pandoc -f markdown -t html -L ../_supporting_docs/filters/empty.lua %s -o /dev/null" % f)
cols['markdown_reader' ].append(md_reader_time)
html_total_time = time_call("quarto pandoc -f markdown -t html %s -o /dev/null" % f)
cols['html_writer' ].append(html_total_time - md_reader_time)
json_writer_total = time_call("quarto pandoc -f markdown -t json %s -o %s " % (f, json_output))
cols['json_writer' ].append(json_writer_total - md_reader_time)
cols['json_reader' ].append(time_call("quarto pandoc -f json -t html -L ../_supporting_docs/filters/empty.lua %s -o /dev/null" % json_output))
Code
from great_tables import GT
df = pd.DataFrame(cols).sort_values(by= 'work_factor' )
(
GT(df)
.cols_hide('work_factor' )
.fmt_bytes(columns= "file_size" )
.fmt_number(columns= ['html_writer' , 'markdown_reader' , 'json_reader' , 'json_writer' ], decimals= 3 , use_seps= False )
.cols_label(
html_writer = "w:html (s)" ,
markdown_reader = "r:md (s)" ,
json_reader = "r:json (s)" ,
json_writer = "w:json (s)" ,
file_size = "File size"
)
)
w:html (s)
r:md (s)
r:json (s)
w:json (s)
File size
0.009
0.491
0.464
−0.021
29.7 kB
0.030
0.507
0.487
0.008
59.4 kB
0.033
0.563
0.504
0.002
118.8 kB
0.073
0.665
0.565
0.004
237.6 kB
0.205
0.863
0.660
0.021
475.2 kB
0.445
1.253
0.864
0.067
950.3 kB
Appendix
Software versions:
Code
os.system("quarto check" )
Quarto 99.9.9
[✓] Checking versions of quarto binary dependencies...
Pandoc version 3.2.0: OK
Dart Sass version 1.70.0: OK
Deno version 1.41.0: OK
Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
Version: 99.9.9
commit: 913e49a20fa409530a4a4b92070b22c80ac06e2c
Path: /Users/cscheid/repos/github/quarto-dev/quarto-cli/package/dist/bin
[✓] Checking tools....................OK
TinyTeX: v2024.04
Chromium: 869685
(|) Checking LaTeX....................(/) Checking LaTeX....................[✓] Checking LaTeX....................OK
Using: TinyTex
Path: /Users/cscheid/Library/TinyTeX/bin/universal-darwin
Version: 2024
(|) Checking basic markdown render....(/) Checking basic markdown render....(-) Checking basic markdown render....(\) Checking basic markdown render....(|) Checking basic markdown render....(/) Checking basic markdown render....(-) Checking basic markdown render....[✓] Checking basic markdown render....OK
(|) Checking Python 3 installation....[✓] Checking Python 3 installation....OK
Version: 3.10.13
Path: /Users/cscheid/virtualenvs/homebrew-python3/bin/python3
Jupyter: 5.4.0
Kernels: echo, python3, deno, julia-1.9
(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....[✓] Checking Jupyter engine render....OK
(|) Checking R installation...........(/) Checking R installation...........[✓] Checking R installation...........OK
Version: 4.3.2
Path: /Library/Frameworks/R.framework/Resources
LibPaths:
- /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
knitr: 1.45
rmarkdown: 2.25
(|) Checking Knitr engine render......(/) Checking Knitr engine render......(-) Checking Knitr engine render......(\) Checking Knitr engine render......(|) Checking Knitr engine render......(/) Checking Knitr engine render......(-) Checking Knitr engine render......(\) Checking Knitr engine render......[✓] Checking Knitr engine render......OK