Turning dpANS into new Specification Documents

Jan Moringen (scymtym on IRC)

Agenda

Goal: Introduce the project, motivate and explain the approach, outline possibilities.

Part I (today)
  1. Introduction
    • dpANS, X3J13 and the Common Lisp HyperSpec
    • This project
  2. Motivation: Sources, Renderings and Problems
  3. Our Method
    • Representation
    • Parsing
    • Processing and Generation
  4. Applications
  5. Open Problems and Future Work
Part II (not today)
Well-Specified Common Lisp

Introduction

Introduction: Organizations, Processes and Artifacts

Sorry, your browser does not support SVG.

Introduction: History of this Project

  • 2015 Common Lisp Ultra Spec
  • Wednesday, 23rd of March 2016 – Relevant discussion

    <kenanb> KMP mentions he wrote a lisp program to parse the tex sources
             to generate Hyperspec automatically, so it must be possible,
             but surely hard, considering it is Tex
    <kenanb> I only wish they released the program
    <kenanb> I also remember seeing a program that translate the draft
             into emacs info files.
    
  • Sunday, 28th of May 2017 – First mention of Well-Specified Common Lisp

    <beach> Speaking of which, I decided to write down my ideas for
            improved Common Lisp standard a bit more concretely:
            https://github.com/robert-strandh/Well-Specified-Common-Lisp
    
  • Wednesday, 28th of April 2021 – Start of this project

    <scymtym> beach: is there more information on this idea of
              reformatting the dpANS sources anywhere? like previous
              attempts, necessary steps, expected outcome, etc?
    

Introduction: Goals of this Project

  • Produce new specifications without legal complications
  • Produce new HTML renderings of the specification
    • "Page granularity", table of contents, indices
    • Highlighting (syntax and otherwise)
    • Finer grained issue annotations
    • But most importantly: repeatable, customizable conversion
  • Correct errors that are not correctable in the Common Lisp HyperSpec (both, conversion errors and errors in the TeX sources)
  • Data for tools such as IDEs and documentation browsers
    • But, currently only HTML generator, in-progress CLIM browser and converter framework are available
  • Future
    • Derived specifications such as WSCL (Part II)
    • Common Lisp Extensions
    • Possibly interlinked documentation

Starting Point and Motivation

Starting Point: dpANS TeX Documents

\begincom{lambda}\ftype{Symbol}

\issue{DECLS-AND-DOC}

\label Syntax::

\Defspec lambda {lambda-list {\DeclsAndDoc} \starparam{form}}

\label Arguments::

\param{lambda-list}---an \term{ordinary lambda list}.
\param{declaration}---a \misc{declare} \term{expression}; \noeval.
\param{documentation}---a \term{string}; \noeval.
\param{form}---a \term{form}.

\label Description::

A \term{lambda expression} is a \term{list} that can be used in place of a
\term{function name} in certain contexts to denote a \term{function} by
directly describing its behavior rather than indirectly by referring to the
name of an \term{established} \term{function}.

\param{Documentation} is attached to the denoted \param{function} (if any
is actually created) as a \term{documentation string}.

\label See Also::

\specref{function},
\funref{documentation},
{\secref\LambdaExpressions},
{\secref\LambdaForms},
{\secref\DocVsDecls}

\label Notes::

The \term{lambda form}

\code
((lambda \param{lambda-list} . \param{body}) . \param{arguments})
\endcode

is semantically equivalent to the \term{function form}

\code
(funcall #'(lambda \param{lambda-list} . \param{body}) . \param{arguments})
\endcode

\endissue{DECLS-AND-DOC}

\endcom%{lambda}
  • TeX (not LaTeX)
  • Lots of custom macros
    • Some for typesetting
    • Some for semantic markup
    • Macros for "Dictionaries"
    • Index and glossary entries
    • Issue references
    • Other references
  • Annotations omitted from the standard document and Common Lisp HyperSpec
    • Reviewer comments
    • Editor comments
    • TeX comments

Starting Point: X3J13 Issues

Status: Proposal FIRST passed, Nov 89 X3J13

Forum:          Cleanup
Issue:          &ENVIRONMENT-BINDING-ORDER
References:     CLtL p. 145-146, 63
                Issue DEFMACRO-LAMBDA-LIST
Category:       CLARIFICATION
Edit History:   V1, 24 Oct 1989, Sandra Loosemore
                V3, 02 Nov 1989, Sandra Loosemore (comments from Moon)

Problem Description:

Issue DEFMACRO-LAMBDA-LIST states that &ENVIRONMENT can appear once
anywhere at top level of a macro lambda list, but doesn't say anything
about the order in which the &ENVIRONMENT variable is bound relative
to the other lambda-list variables.

…

Proposal (&ENVIRONMENT-BINDING-ORDER:FIRST):

Clarify that the &ENVIRONMENT parameter is bound along with &WHOLE
before any of other variables in the lambda list, regardless of where
&ENVIRONMENT appears in the lambda list.

  Rationale:

  This proposal provides a convenient explanation for the special
  treatment of &WHOLE and &ENVIRONMENT at top-level in a DEFMACRO-style
  lambda list.

  …

Proposal (&ENVIRONMENT-BINDING-ORDER:LEFT-TO-RIGHT):

Clarify that the all lambda variables in a DEFMACRO-style lambda list
are bound left-to-right, including the &WHOLE and &ENVIRONMENT parameters.

  Rationale:

  This is more consistent with the order in which variables in ordinary
  lambda lists are bound.

Current Practice:

Lucid CL, Utah CL, and KCL implement proposal FIRST.  CMU CL
implements proposal LEFT-TO-RIGHT.

…
  • Plain text but with relatively strict format
    • Header: Status, Related Issues, History
    • Fixed set of sections (more or less)
    • Proposals
  • Probably written with cross-referencing in mind (CL symbols, issues, proposals)

Motivation: Section Structure Errors

Sections in dpANS
streams-section-error-dpans.png
Sections in the HyperSpec
streams-section-error-hyperspec.png

Why did this happen?

\beginsubsubsection{Abstract Classifications of Streams}
[hundreds of lines]
\beginsubsubsubsection{Interactive Streams}
\DefineSection{InteractiveStreams}
[hundreds of lines]
\endsubsubsubsection%{Interactive Streams}
\beginsubsubsubsection{File Streams}
[hundreds of lines]
\endsubsubsubsection%{File Streams}
\endsubsubsection%{Abstract Classifications of Streams}

Motivation: Macro Lambda Lists

Grammar in dpANS
macro-lambda-list-dpans.png
Grammar in the HyperSpec
macro-lambda-list-hyperspec.png

Motivation: Even TeX Wizards Make Mistakes

dpANS TeX source
(flet ((test (x)
         (let ((*print-pretty* t))
           (print x)
           (format t "~%~S " x)
           (terpri) (princ x) (princ " ")
           (format t "~%~A " x))))
  (test '#'(lambda () (list "a" #\b 'c #'d))))
\OUT #'(LAMBDA ()
\OUT     (LIST "a" #\b 'C #'D))
\OUT #'(LAMBDA ()
\OUT     (LIST "a" #\b 'C #'D))

Listing in dpANS
bold-macro-dpans.png
Listing in the HyperSpec
bold-macro-hyperspec.png

Motivation: Other Errors and Downsides

Other errors and downsides in /dpANS/ (and thus the ANSI standard as well as the Common Lisp HyperSpec)
  • No checking, highlighting or linking in listings
  • Markup errors (for example \funref{most-positive-fixnum})
  • Misspelled things (for example \macref{destruct})
  • Errors that affect the semantics but have an obvious fix (for example the famous prog2 error)
  • Unclear or contradictory parts of the specification, incorrect examples
Downsides specific to the Common Lisp HyperSpec
  • Legal situation: my understanding is that it is not permitted to distribute modified versions of the Common Lisp HyperSpec
  • Thus compounded: The HTML conversion has been done in one particular way with respect to visual style, page granularity, indices, amount of annotations and cannot be changed

Our Method

Our Method: Overview

Sorry, your browser does not support SVG.
Sorry, your browser does not support SVG.
Sorry, your browser does not support SVG.

Document Object Tree Representation

Example Tree
atom-component-clim-render.png
atom-component-document-object-tree.png
(Meta-)Meta-model
  • Meta-meta-model based on architecture.builder-protocol system

    Built around nodes, initargs and relations:

    Sorry, your browser does not support SVG.
  • Meta-model currently not specified

Our Method: TeX Parser

(defrule argument (environment)
    (and (has-syntax? '#\# :argument environment)
         (bounds (start end)
           (seq (+ (<<- level #\#))
                (<- number
                    (:transform
                       (guard digit digit-char-p)
                     (digit-char-p digit))))))
  (bp:node* (:argument :level  (length level)
                       :number number
                       :bounds (cons start
                                     end))))

(defrule editor-note (environment)
    (bounds (start end)
       (seq/ws "\\editornote{"
               (<- editor (person)) #\:
               (<- content (balanced-content))
               #\}))
  (bp:node* (:editor-note :editor  editor
                          :content content
                          :bounds  (cons start
                                         end))))
  • Written as a set of rules for a parser generator
    • Mostly recursive descent
    • Rules are heavily parameterized:
      • Current syntax of particular characters
      • Current environment
      • Defined macros
    • Performs source location tracking
  • Mixture of general-purpose rules for TeX and special-purpose rules for dpANS semantic and typographic markup

Our Method: TeX Parser: Example

Input
\DefsetfMulti
  {bit-array {\rest} subscripts}
  {new-bit}
  {\entry{bit} \entry{sbit}}
Document Object Tree
bit-array-document-object-tree.png
Parser Call
(let* ((input (format nil "\\DefsetfMulti~@
                             {bit-array {\\rest} subscripts}~@
                             {new-bit}~@
                             {\\entry{bit} \\entry{sbit}}"))
       (tree (dpans-conversion.parser::parse-tex-string
              'list input "file.tex")))
  (render-to-file tree "images/bit-array-document-object-tree.png"))

Our Method: TeX Macro Expander

(defun expand
    (builder environment macro arguments)
  (typecase macro
    (function ; built-in macro
     (apply macro builder environment arguments))
    (t ; user-level macro defined in TeX source
     (let* ((body (bp:node-relation*
                   '(:body . *) macro))
            (first (first
                    (bp:node-relation*
                     '(:argument . *) macro)))
            (level (if (null first-parameter)
                       1
                       (getf (bp:node-initargs*
                              first-parameter)
                             :level))))
       (mapcar
        (lambda (element)
          (substitute-arguments
           builder element level arguments))
        body)))))
Macro Definition
  • Represented as body ASTs containing argument nodes
    • Arguments have numbers and "levels" (for macros-in-macros)
  • Stored in an environment under the given name
Macro Expansion
  1. Look up macro by name in environment
  2. Determine "level"
  3. Clone body ASTs, replacing arguments at determined "level" with supplied arguments

Our Method: TeX Macro Expander: Example

(defparameter *env* (dpans-conversion::make-environment))
(defparameter *tree*
  (let ((input (format nil "\\def\\foo#1#2{\\it#1 and #2}~
                            \\foo{a}{b}")))
    (dpans-conversion.parser::parse-tex-string
     'list (coerce input '(simple-array character 1)) "file.tex")))
(render-to-file *tree* "images/macro-example-original.png")
(let ((expanded (dpans-conversion.transform::apply-transform
                 (make-instance 'dpans-conversion.transform::expand-macros
                                :builder 'list :environment *env*)
                 *tree*)))
  (render-to-file expanded "images/macro-example-expanded.png"))
macro-example-original.png

macro-example-expanded.png

Our Method: Issue Parser

(defrule issue (filename process)
    (bounds (start end)
      (seq (? (<- preamble (preamble)))
           (+ (or (<-  name            (section-name))
                  (<-  related-issues  (section-related-issues))
                  (<-  required-issues (section-required-issues))
                  (<-  forum           (section-forum))
                  (<-  category        (section-category))
                  (<<- sections        (section))
                  (seq (whitespace*) #\Newline)))))
  (unless (find :proposal sections :key #'bp:node-kind*)
    (cerror "Use the issue anyway" "No Proposal"))
  (bp:node* (:issue :filename filename
                    :process  process
                    :bounds   (cons start end))
    (1    (:name           . 1)    name)
    (*    (:required-issue . *)    required-issues)
    (*    (:related-issue  . *)    related-issues)
    (1    (:forum          . 1)    forum)
    (1    (:category       . 1)    category)
    (*    (:section        . *)    (nreverse sections))
    (bp:? (:preamble       . bp:?) preamble)))

Demo: Document Object Tree Inspection and Queries

inspector-query-screenshot.png
  • Uses previously mentioned (meta-)meta-model:

    Sorry, your browser does not support SVG.
  • Presentation and "folding" work as an extension for McCLIM's inspector Clouseau
  • Queries are currently simplistic, but easy to extend

Our Method: (Some of the) Transformations

Transformation HTML CLIM browser
Drop unused nodes
Parse listings
Expand macros
Massage {tables,math,comp.,issues}
Attach labels
Add dictionary sections
Split into files
{symbol,table,issue,note} index
Note output files
Note parents
Build references
Render HTML CLIM

Our Method: Syntax Highlighting

Goal
\def\alfa{$\alpha$}
\code
 (alpha-char-p #\\a) \EV \term{true}
 (alpha-char-p #\\5) \EV \term{false}
 (alpha-char-p #\\Newline)
   \EV \term{false}
 ;; This next example presupposes
 ;; an implementation in which
 ;; #\\\alfa is a defined character.
 (alpha-char-p #\\\alfa)
   \EV \term{implementation-dependent}
\endcode

alpha-char-p-html-screenshot.png

Solution
  1. Parse listings using Eclector
  2. Within symbols, strings, character literals, comments, etc.
    • Recognize TeX macros
    • Expand TeX macros
  3. Output
    • Non-macro parts of input with highlighting
    • Skip over macros parts of input
    • Splice macro expansion into output

Our Method: HTML Generation

Decisions and Parameters
  • MathJax vs. MathML
  • Require Javascript?
  • Styling (CSS)
  • Table of contents
    • None
    • Static
    • Dynamic Sidebar (requires Javascript)
  • Permalinks
  • Annotations (via Hover?)
  • Search (via Javascript?)
component-nil-html-screenshot.png
(define-render (:part)
  (let ((name (transform::node-name node)))
    (flet ((do-it ()
             (cxml:with-element "dl"
               (cxml:with-element "dt"
                 (class-attribute "label")
                 (recurse '(:name . 1)))
               (cxml:with-element "dd"
                 (recurse '(:element . *))))))
      (maybe-removable-text
       transform name #'do-it
       :removable '("Note" "Example"
                    "Pronunciation" "See Also")))))

Demo: Parsing and Generating

  • Run the parser
  • Perform transformations
    • Macro expansion
  • Generate HTML

Applications

Demo: HTML Output

chapter-characters-html-screenshot.png
  • Math
  • Syntax Highlighting
  • Sidebar
  • Generated Indices
  • Annotations

Demo: CLIM-based Specification Browser

clim-documentation-browser-screenshot.png
  • McCLIM-based graphical specification browser
  • Directly renders document object tree
  • Can show/hide aspects on the fly, such as
    • X3J13 issue annotations
    • Reviewer notes
    • Editor notes
    • The removable text indication
  • Not yet supported
    • Search
    • Automatic indices
    • Navigation history

Application: Concurrency Specification

chapter-characters-html-screenshot.png
  • Extension for well-specified concurrency:
    • Rules for concurrent evaluations (related: a memory model)
    • Threads, locks, condition variables
    • Atomic operations
    • Interrupts
  • Written by Bike, see GitHub repository
  • Adds new
    • Chapter
    • Concept Sections
    • Dictionary Section

Open Problems and Future Work

Problem: Simple Matters of Programming™

  • Clean up conversion code
  • Line breaks in call syntax and BNFs
  • Figure captions
  • Local macro definitions
  • hbox
  • Non-section link anchors
  • Is XHTML with MathML a sane choice? HTML5 with MathJax didn't work so well

Problem: How to Ensure Correctness?

  • We probably cannot ensure correctness
  • We can make sure to fix things that
    • are known to be wrong in the dpANS sources
    • that the Common Lisp HyperSpec conversion got wrong
  • But how can we catch new conversion errors that we introduce?

Problem: Stemming Algorithm for Glossary

Problem
References to glossary entries are very free-form (in terms of numerus, conjugation, grouping of words that individually have glossary entries)
Examples
  • \term{evaluates}, \term{evaluated} must refer to the "evaluate" entry
  • \term{satisfying the test}, \term{satisfies the test} must refer to "satisfy the test" entry
Possible Solutions
  • Hack some good enough stemming
    • Maybe some tables for exceptions?
    • I have tried this with limited success
  • Implement a proper stemming algorithm (or use some database)

How did Kent Pitman do it for the Common Lisp HyperSpec back in the day?

Problem: Are Final Versions of X3J13 Issues Available Anywhere?

If you have access to more recent versions of the X3J13 issues, please contact us!

To be continued …

Part II will cover

  • dpANS
    • Modifications to dpANS we already made (don't worry they are all cosmetic)
    • How we track modifications to dpANS
  • Well-Specified Common Lisp
    • The goal of Well-Specified Common Lisp
    • Writing issues for Well-Specified Common Lisp
    • A process for approving and applying Well-Specified Common Lisp issues
  • Other specifications and extensions

Thank You for Your Attention!

This Presentation
https://techfak.de/~jmoringe/presentation-dpans-conversion/slides.html
IRC Channels
  • #dpans on the libera IRC network for dpANS parsing and conversion
  • #commonlisp on the libera IRC network for discussions about new issues, specification changes and Well-Specified Common Lisp
Code