Setting up a synthetic workload generator

Published

August 30, 2024

Workload generation

/src/synth-benchmark-1.ts generates documents with a variable number of paragraphs (each containing a hard-coded version of Lorem Ipsum, ie without our shortcode).

First, we generate a number of synthetic documents:

Code
import os
import pathlib
import pandas as pd
import timeit
import glob
support_path = pathlib.Path("../_supporting_docs/synth-benchmark-1")
(support_path / "inputs").mkdir(parents=True, exist_ok=True)
(support_path / "outputs").mkdir(parents=True, exist_ok=True)

for work_factor in range(7, 13):
    # I know.
    cmd = "quarto run ../src/synth-benchmark-1.ts %s > ../_supporting_docs/synth-benchmark-1/inputs/_size-%s.qmd" % (1 << work_factor, work_factor)
    result = os.system(cmd)
    if result != 0:
        raise Exception("os.system failed: %s" % result)

Benchmarks

Pandoc

Then, we run Pandoc with a varying number of settings to attempt to control for different parts of Pandoc’s pipeline:

Code
cols = {
    "work_factor": [],
    "html_writer": [],
    "markdown_reader": [],
    "json_reader": [],
    "json_writer": [],
    "file_size": []
}


def time_call(s):
    def call_it():
        result = os.system(s)
        if result != 0:
            raise Exception("call to '%s' failed with exit code %s" % (s, result))
    return timeit.timeit(lambda: os.system(s), number=5) / 5

for f in glob.glob("../_supporting_docs/synth-benchmark-1/inputs/*.qmd"):
    p = pathlib.Path(f)
    json_output = p.parent.parent / "outputs" / p.name.replace(".qmd", ".json")
    cols['work_factor'].append(int(f.split("-")[-1].split(".")[0]))
    cols['file_size'].append(p.stat().st_size)
    md_reader_time = time_call("quarto pandoc -f markdown -t html -L ../_supporting_docs/filters/empty.lua %s -o /dev/null" % f)
    cols['markdown_reader'].append(md_reader_time)
    html_total_time = time_call("quarto pandoc -f markdown -t html %s -o /dev/null" % f)
    cols['html_writer'].append(html_total_time - md_reader_time)
    json_writer_total = time_call("quarto pandoc -f markdown -t json %s -o %s" % (f, json_output))
    cols['json_writer'].append(json_writer_total - md_reader_time)
    cols['json_reader'].append(time_call("quarto pandoc -f json -t html -L ../_supporting_docs/filters/empty.lua %s -o /dev/null" % json_output))
Code
from great_tables import GT
df = pd.DataFrame(cols).sort_values(by='work_factor')
(
  GT(df)
  .cols_hide('work_factor')
  .fmt_bytes(columns="file_size")
  .fmt_number(columns=['html_writer', 'markdown_reader', 'json_reader', 'json_writer'], decimals=3, use_seps=False)
  .cols_label(
    html_writer = "w:html (s)",
    markdown_reader = "r:md (s)",
    json_reader = "r:json (s)",
    json_writer = "w:json (s)",
    file_size = "File size"
  )
)
w:html (s) r:md (s) r:json (s) w:json (s) File size
0.009 0.491 0.464 −0.021 29.7 kB
0.030 0.507 0.487 0.008 59.4 kB
0.033 0.563 0.504 0.002 118.8 kB
0.073 0.665 0.565 0.004 237.6 kB
0.205 0.863 0.660 0.021 475.2 kB
0.445 1.253 0.864 0.067 950.3 kB

Appendix

Software versions:

Code
os.system("quarto check")
Quarto 99.9.9
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.2.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 99.9.9
      commit: 913e49a20fa409530a4a4b92070b22c80ac06e2c
      Path: /Users/cscheid/repos/github/quarto-dev/quarto-cli/package/dist/bin

[✓] Checking tools....................OK
      TinyTeX: v2024.04
      Chromium: 869685

(|) Checking LaTeX....................(/) Checking LaTeX....................[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /Users/cscheid/Library/TinyTeX/bin/universal-darwin
      Version: 2024

(|) Checking basic markdown render....(/) Checking basic markdown render....(-) Checking basic markdown render....(\) Checking basic markdown render....(|) Checking basic markdown render....(/) Checking basic markdown render....(-) Checking basic markdown render....[✓] Checking basic markdown render....OK

(|) Checking Python 3 installation....[✓] Checking Python 3 installation....OK
      Version: 3.10.13
      Path: /Users/cscheid/virtualenvs/homebrew-python3/bin/python3
      Jupyter: 5.4.0
      Kernels: echo, python3, deno, julia-1.9

(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....(\) Checking Jupyter engine render....(|) Checking Jupyter engine render....(/) Checking Jupyter engine render....(-) Checking Jupyter engine render....[✓] Checking Jupyter engine render....OK

(|) Checking R installation...........(/) Checking R installation...........[✓] Checking R installation...........OK
      Version: 4.3.2
      Path: /Library/Frameworks/R.framework/Resources
      LibPaths:
        - /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      knitr: 1.45
      rmarkdown: 2.25

(|) Checking Knitr engine render......(/) Checking Knitr engine render......(-) Checking Knitr engine render......(\) Checking Knitr engine render......(|) Checking Knitr engine render......(/) Checking Knitr engine render......(-) Checking Knitr engine render......(\) Checking Knitr engine render......[✓] Checking Knitr engine render......OK
0