a fucking TeX tldr

this page is under constructionbeing updated..me, screaming


status: right now i have very minimal confidence in anything i have written on this page so far,
        beware.
        
hyper-tldr about LaTeX: if you are planning to use LaTeX, you should ignore the "Plain TeX" section of
                        this because LaTeX isn't compatible with it.

additional detailed stuff that would clutter up this page here

motivation:
- i found myself fully able to subvert postscript and turn it into something fun to write directly
- postscript is ultimately low level - i found it easy to typeset text very weirdly, but i wanted 
  to start out with something more normal for another project
- nobody knows how TeX works, trying to figure it out is a nightmare. i have no idea what audience
  "The TeXBook" is for.
- want to know the entire guts of the system, preferably quickly, so i can learn how to aggressively
  subvert it
- this is prob not the guide to look at if you are just trying to quickly throw something together,
  more a guide for understanding TeX as a system/ecology/ecosystem/object/whatever
  
hierarchy:
- TeX: the base "language" and system, everything uses this as a starting point in some sense
  (not sure if this is a distinct implementation of TeX that still has relevance)
  - IniTeX: refers to an implementation of TeX that can generate formats (that is, it can run 
                   \dump)
  - VirTeX: "virgin TeX" (im serious) that cannot generate formats
  - Plain TeX: a set of macros on top of TeX. "Plain TeX" is not the same thing as "TeX".
    if you are in a situation where you are distinguishing between TeX and Plain TeX, TeX refers to all
    the primitive control sequences, and Plain TeX refers to any additional macros (such as those in the
    TeXBook) that are not primitive. LaTeX/ConTeXt/etc is on top of Plain TeX (which is on e-Tex+TeX...)
    - if you have a binary on your computer that is just "tex" it will probably include Plain TeX by
      default, but i can promise you nothing
    - LaTeX is emphatically not compatible with Plain TeX. keep this in mind when you read literally 
      anything written about TeX.
  - if you are trying to make your own formats, you probably need to run `initex`
- e-TeX: an implementation (and extension) of TeX that LaTeX uses by default. pdftex is built 
  on top of it.
- pdfTeX: implementation of TeX that outputs directly to pdf, not dvi. includes a number of
  special features for PDFs.
- XeTeX: implementation of TeX that actually supports unicode (does pdftex?) also can use any
  font you have installed. outputs an xdv (extended-dvi). needs "xdv2pdf" (mac only) or "xdvipdfmx"
  (anything) to convert to pdf.
- LaTeX: an extension of TeX, but *not* a superset - something that generates output with
  TeX has no guarantee to work with LaTeX. "LaTeX2ε" is the current version. "LaTeX3" is vaporware
- pdfLaTeX: pdfTeX, but with the LaTeX format loaded. on my computer, the command "LaTeX" uses
  this by default.
- ConTeXt: see above. seems to be extremely slow. apparently uses pdftex by default?
- PDFLaTeX: command to feed LaTeX through PDFTeX (what does it use by default?? e-TeX?)
- HeVeA: LaTeX -> HTML5
- AMSTeX, AMSLaTeX: some sort of additional macro package for math extensions
- luaTeX: i think this is pure TeX, but you can mess around with the internals with lua
- metafont, metapost, metatype1: cousins rather than extensions (i think), generate stuff you
  can use with (extended only?) TeX systems. (sometimes happens prior to processing - metafont?)
  (sometimes happens during processing - metapost?)
- TeX Live: the TeX distributon i've always used. includes a ton of the above.
- MacTeX: TeX Live for macs.
- MikTeX: a TeX distribution i've never heard of before, i guess it's for windows. i don't know 
  what it includes.
- GNU Texinfo: oh my god 
- BibTeX: not TeX



pet theory: americans use LaTeX more, europeans use ConTeXt more

packages:
- PSTricks
- PGF/TikZ

comedy:
- "As of 2019, LaTeX3, which started in the early 1990s, is under a long-term development project.[35]"
- the amount of beautifully-typeset documents i've looked at that use eyesplitting colors like lime
  green on a white background
- "The version number of TeX is converging to π and is now at 3.141592653"
- "an implementation of Hermann Zapf's ideas for improving the grayness of a typeset page"
  - (i like imagining the body of research of studious Men researching grey pages; improving them)
- "Please note that this does not cover the complete range of a 32 bit integer, I do not know why."
- "Note that deallocation is impossible."
- https://www.texfaq.org/FAQ-spawnprog (thanks leah2)
  
hyper-tldr:
- there are a million resources on TeX. TeX isn't LaTeX. valid TeX documents are by no means valid LaTeX
  documents
- this shit is absurd

  TOP     -->    LaTeX (eg LaTeX2ε)
                 Plain TeX
                 e-TeX (potentially this is pdftex, which includes e-TeX)
  BOTTOM  -->    TeX (conceptually)
  CTHONIC -->    whatever ultimately processes the DVI to put the graphics into place 
                 ( such asdvips -> ps2pdf maybe)
  
==================== DVI ====================

dvi (device-independent file) is the sort of file that TeX is intended to generate. 

me [innocent]: i bet if i learn TeX at a really low level, i could do some fun typographic vector art like
               i did with postscript
TeX [full of wrath]: lol

DVI files are basically like a machine code that lays out the huge mountain of nested boxes that TeX 
generates. nothing to do with graphics just placing them. 

anything to do with actual graphics is handled by either:
- metafont specifications 
- whatever ends up processing the DVI file

if it's something DVI-specific, then 
really does sound totally device-independent and stuff. your computer needs to have whatever font files
TeX thought were installed and stuff.

https://web.archive.org/web/20070403030353/http://www.math.umd.edu/~asnowden/comp-cont/dvi.html
https://mirrors.concertpass.com/tex-archive/dviware/driv-standard/level-0/dvistd0.pdf

- dvips: dvi -> ps (i think this comes with TeXlive)
- ps2pdf: ps -> pdf (i think this is from ghostscript)
- XeTeX uses xdv which it might convert to pdf automatically, if it doesnt good luck fucker

pdftex (and thus pdflatex) avoids the dvi step, rendering directly to pdf.

==================== TEX ====================

syntax:

whitespace:
 - "a row of spaces in the input is usually equivalent to just one space"
 - "Under nearly all circumstances TEX treats several spaces in a row as being equivalent to a single
   space."
   - get used to seeing the word "usually"
 - a blank line marks the end of the paragraph
 - spaces (and line breaks?) ignored in math formulae

comments:
 - starts with %, ends at end of line
 - you are advised to use these also to prevent output from containing certain line-breaks

escaping special characters:
   #  $  %  ^    &  _  {  }  ~
  \# \$ \% \^{} \& \_ \{ \} \~{}
  - to get a literal backslash, type: \string\\
  - or: \char`\\ (thanks @Random832)
  - i have seen documents claim that both "\\" and "\textbackslash" will do this, but this is not 
    fucking true for TeX or Plain TeX. the TeXBook hints, vaguely, that "\\" will 
    produce a backslash but it fucking doesn't. it looks like it does a lot of weird
    different things.
  - \backslash produces a backslash in math mode only.

group:
 - { «stuff» }
   - establishes a context/scope, state changes made within it are reverted after exit
   - whatever is inside it gets typeset with the state changes though eg
     { not italic \it italic} not italic
 - some arguments to functions require braces and some dont. fuck me
   - if braces aren't required for an argument, the braces serve to delimit the argument
   - if it does require braces...
     - some treat the braces as defining a group
     - others interpret the argument in some special way that depends on the command
     - for _primitive commands_ (eg, commands that don't get expanded)
       - if it doesn't define a group, it encloses tokens that aren't processed in TeX's stomach
         (that is to say - isn't executed)
 - you and i are in this together


control sequence/character:
  \«control sequence»«parameter text»«replacement text»
  - ?? find way to represent general syntax of calling a control sequence
    - tricky for reasons i will explain at some point
  - «control sequence» is case sensitive, only a-zA-Z
  - needs *something* after it to break. if its a space it will consume it. you can get around this like:
    "\foo\ " (with an escaped space after) or:
    "{\foo}" entirely contained within a "group"
    "\foo{}" but i am suspicious of this tbh if it is something that needs an argument
  \«exactly one non-letter»
  - "control character"
  - eg \$ to insert a literal $
  - does not consume a space, UNLESS it is an "accenting" function, like:
    - $13.56 -> must be \$13.56
    - déshabiller -> either d\’eshabiller or d\’ eshabiller
    
^^(character):
 - eg: ^^+
 - lets you add in characters you may not be able to type
 - adjusts the character code of the character by /some/ amount, eg:
   - ^^+ -> k
 - if character following ^^ has ascii value between 64 and 127, subtract 64
 - if character following ^^ has ascii value between 0 and 63, add 64

dashes:
 - they're converted:
   -    -> hypen
   --   -> en-dash (used for page ranges, apparently?)
   ---  -> em-dash

math modes:
 - $ delimit 'text math' mode
 - $$ delimit 'display math' mode

16 "categories" of character
| #  | desc                      | contents
+----+---------------------------+---------
|  0 | escape                    | \
|  1 | beginning of group        | {
|  2 | end of group              | } 
|  3 | math shift                | $
|  4 | alignment tab             | &
|  5 | end of line               | return (\n)
|  6 | parameter                 | #
|  7 | superscript               | ^
|  8 | subscript                 | _
|  9 | ignored character         | null (\0)
| 10 | space                     | space (\32)
| 11 | letter                    | a-zA-Z
| 12 | other                     | anything not in other categories
| 13 | active character          | ~
| 14 | comment character         | %
| 15 | invalid character         | non-ascii stuff, or non-printing ascii stuff that isn't whitespace
- 15: invalid
  - if you are typing something weird by using ^^ notation, you will need to change this so it doesnt 
    flip out:
    - \catcode'\^^?=12
    - yeah you can change the category number of everything, sucker
- the larger point of changing character categories is that you can change anything, at any time:
  - eg you can change some other character to start control sequences with the same trick as above
  - so here we are just going to use the defaults but like be aware of that

operating structure:
(think of: input is .tex, output is .dvi)
(these are happening simulatenously, but ive been told its ok to think of them happening sequentially)
- input processor "eyes"
  - tokenizes .tex file
  - before state machine operation, the "^^s" sequences are expanded
    - thus \vs^^+ip -> \vskip and will be executed as such later on
  - can be thought of as a state machine with modes:
    - N: new line
    - M: middle of line
    - S: skipping spaces
- expansion processor "mouth"
  - expand macros, conditionals, some primitives
- execution processor "stomach"
  - execute control sequences that are not expandable
  - where changes to internal state happens
    - assigments,
    - construction of horiz/vert/math lists
- visual processor "bowels"
  - horiz lists -> paragraphs
  - vert lists  -> pages
  - math lists  -> formulae
  - output to dvi
- i have read at least one book so far that had the fucking audacity to break this into 5 steps,
  eyes, mouth, gullet, stomach, intestines, where "eyes" is doing character scanning, "mouth" is 
  tokenizing, and the gullet is doing expansion. stay strong, comrade
 
control flow:
- \if
 - \if«condition»«true text»\else«false text»\fi
- \ifcase

io:
- \input: import a .tex file, don't need to specify extension
- \endinput (???)
- \read (?)
- \openin (?)
- \closein (?)
- \message
  - print to stdio?
- \errmessage 
  - print to stderr?
- \write
- \write16
- \write18 (??????!!!!!)

fonts:
- .tfm? (metrics)
  - dimensions of characters
- .pk, .pxl, .gf files
  - shapes of characters
- "TeX itself uses only the metrics file, since it doesn’t care what the characters look like but
  only how much space they occupy"
  - this feels a bit dubious in the era of pdflatex?

parameters:
- this is apparently not the same thing as registers
- eg \hbadness=200 (or - apparently - you can have spaces like "\parindent = 15pt"
- pass to a command eg "\vskip\parskip" (passing \parskip to \vskip)

registers:
- this is apparently not the same thing as parameters
- 256 of them, hold 32 bit signed integer
- can get/set with count:
  - \count0=42
  - "The count is now \count0"
- equals sign is apparently optional
- eg \pageno (? but it functions as a parameter, i'm told)

tables:
- function "like parameters" but require an additional argument
- "\catcode'~=13"

units (dimensions?):
- \vskip 2in
  - syntax? can units be anything or are these like reserved words?
  - pt: point
  - pc: pica
  - in: inch
  - bp: big point
  - cm: centimeter
  - mm: millimeter
  - dd: didot point
  - cc: cicero 
  - sp: scaled point
- fil, fill, filll (?)
- "constructions like 77`pt" (????????????)
- http://brokestream.com/tex.pdf#page=164 (#447)

structures:
- box
- glue

state:
- registers (?)
  - ints (?)
    - 255 of them
    - \count
    - \newcount
  - floats (?)
    - 255 of them
    - \dimen
    - \newdimen
    - requires a unit
  - "tokens" (?)
    - 255 of them
    - a "string" sort of
    - special behavior with \the
    - \toks
    - \newtoks
- tables (?)
  - also called vectors, maybe?
  - \catcode
  - \uccode
  - \lccode
  - \sffcode
  - \mathcode
  - \delcode
- parameters (?)
  - values passed to macros are called 'parameters', but i think there are other state values
    called parameters too.
- whatsits (????)
- \lastpenalty, \lastkern, \lastskip (?)

modes while assembling pages:
- ordinary horizontal mode (assembling paragraphs)
- ordinary vertical mode (assembling pages)
- restricted horizontal mode
  - append items horizontally to form a horizontal box
- restricted verticalmode
  - append items vertically to form a vertical box _other than a page_
- text math mode
- display math mode

defining/assignment:
- \def
  - \def«control sequence»«parameter text»{«replacement text»}
    - http://visualmatheditor.equatheque.net/doc/texbook.pdf#page=214
    - braces do not represent grouping here
    - replacement text
      - you can include braces in replacement text as long as they are properly matched
        - eg «\def\xbold{{\bf x}}» -> «{\bf x}»
          - (are the braces in the output considered a group, or literal??) 
        - (what happens if they don't match??)
    - parameters (#1...#9)
      - parameter text can contain no braces
      - parameters must be defined in order (#1...#9) with no skips
      - a # in the replacement text must either be followed by a digit or another #
      - it looks like can use anything in the parameters (eg, all categories of character)
        except for groups (eg {})
      - delimited parameter:
        - not immediately followed by a parameter token
        - corresponding argument determined to be:
          - "shortest (possibly empty) sequence of token with properly nested groups" that
            is also followed by whatever follows the delimited parameter
            - which means: both the character codes and the category codes must match.
      - if the parameter text ENDS with a #, it will include a '{' at the end of the
        parameter (this is still sort of ambiguous, fix)
      - undelimited parameter:
        - immediately followed by a parameter token (or the end of the parameter text)
      - there is definitely some weirdness with spaces here...
        - «\def\row#1#2{(#1_1,\ldots,#1_#2)}» works with
          - «\row xn» (what the fuck?)
          - «\row x n»
          - "TeX doesn't use single spaces as undelimited arguments"
        - but «\def\row #1 #2{(#1_1,\ldots,#1_#2)}» only works with
          - «\row x n»
      - but in general you can delimit parameters in highly abstract ways
        - «\def\cs #1. #2\par{...}»
        - greedily tries to match an input like "«anything».«anything»\par"
          - so eg «\cs abc.abc. def\par»:
            - #1: abc.abc
            - #2: def
        - however: this argument will not stop if the delimiter is enclosed in braces
  - general macro definition
  - "The token \par is not allowed to occur as part of an argument, unless you 
     explicitly tell TEX that \par is OK"
     - \long (ie \long\def)
  - \outer
  - \global
- \edef
- \gdef
  - \global\def
  - lets the definition break out of its group
- \xdef (?)
- \let (?)
- \futurelet (???)
  - http://visualmatheditor.equatheque.net/doc/texbook.pdf#page=217
  - very weird
- \endcsname (???)
- \chardef (?)
- \mathchardef (?)
- \count (?)
- \dimen (?)
- \renewcommand
- \the (?)
  - "extracts" a value, in some sense... ?
  - "expand an internal quantity"
  - https://tex.stackexchange.com/questions/38674/the-the-command
  - "When \the produces a string of characters, they will all have category code 12, 
     excepts spaces that receive category code 10."
     - store value at time of edef: \edef\thischapternumber{\the\value{chapter}}.
- \value (?)
- things can be defined/redefined, macro expansion will use the most recent definition
- \font\cs=«number»: makes \cs a font identifier
- \chardef\cs=«number»: makes \cs a character code
- \countdef\cs=«number»: makes \cs a \count register
- \def\cs=«name»: makes \cs a macro
- \let\cs=«token»: gives \cs the token's current meaning

arithmetic:
- \advance
  - "\advance\count0 by 10" 
    - is this syntax, or just fancy argument usage? 
- \multiply ?
- \divide ?
- \dimen (SORT OF???)
 
rendering:
(white)space:
 - whitespace between words either becomes line-breaks or "glue" which is adjusted to justify text
 - tilde (~) prevents whitespace from turning into a line-break (tilde called "tie" here)
   - eg "Fig.~8"
 - extra space is added after punctuation marks UNLESS that mark is directly connected to a capital letter
   - use \null or {} or something before punctuation after capital to force extra space is added
   - use control space ("\ ") after punctuation to force extra space not added
   - jesus christ
   - command \frenchspacing (?) prevents TeX from adding extra space after 
 - \thinspace
 
special:
- \special (??? !!!)

formats:
- &
- \dump
 
debugging:
- \tracingmacros=1
  - adds info to log file whenever a macro is expanded

mysteries:
- work through the defining things
- work through macro expansion
- work through control flow
- parameter vs. register vs. table vs. whatsit
- function arguments?
-- is "{}" ever not a "normal" argument
--- can "{}" be processed? is it a data structure we can do operations on?
-- is "[]" a data structure of a part of control sequence call syntax?
- \immediate
- debugging (without e-TeX?)
 
index of command codes: http://brokestream.com/tex.pdf#page=71

==================== PLAIN TEX ====================
https://www.ntg.nl/doc/wilkins/pllong.pdf

if for some reason you are using a degenerate TeX that does not include Plain TeX, you would include it by loading the
"format file" for Plain TeX

probably:
  - on command line: `tex &plain`
  - interactively:
    - tex
      ** &plain
    

control flow:
- \loop
  - \loop «a» \if «b» \repeat
  - do a, if condition is true do b, repeat whole process again starting with a
  - the "\if" here is a generalized thing

to investigate:
- newwrite
- put all the features that plain TeX uses in here

plain TeX has all of the:
- category codes? (except some...?)
- non-primitive commands

==================== E-TEX ====================
https://ctan.org/pkg/etex?lang=en
https://tex.stackexchange.com/questions/2047/what-are-benefits-of-e-tex-for-latex-users
http://tex.loria.fr/moteurs/etex_ref.html

    ds@star ~/m/b/1 [0]> etex
    This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) (preloaded format=etex)
     restricted \write18 enabled.
    **^D
    ! End of file on the terminal... why?
    ds@star ~/m/b/1 [0]> oh fuck
    fish: Unknown command: oh

"e-TeX provides lots of additional features for package writers such as an increased number
of registers. The thing I find most useful is its extended tracing ability which I usually
access through the trace package. In particular, tracing commands and tracing assignments 
are extremely helpful when trying to diagnose a problem." 

\unless
\readline
\scantokens
\detokenize

etoolbox package
etextools package (expands etoolbox)

\usepackage{etex}

65536 registers

mysteries:
- debugging
- additional primitives
- 



==================== DVIPS, \SPECIAL ====================
http://www.bakoma-tex.com/doc/dvips/base/dvips.pdf

- inject arbitrary postscript
- run arbitrary commands (!!!)
 - if a filename parameter starts with a backstick, it runs it as a shell that will write to standard
   output and include that
 - \special{psfile="‘zcat foo.ps.Z"}
 - \epsffile[72 72 540 720]{"‘zcat screendump.ps.Z"}
 
dvips alone blows this situation totally wide-open with the two above points. game over, TeX. 

==================== PDFTEX ====================
http://texdoc.net/texmf-dist/doc/pdftex/manual/pdftex-a.pdf

in general skips DVI, but you can make it produce DVI. (which seems like it would have some wild consequences - need to look into that)
has a lot of additional commands for doing pdf-specific manipulations.

==================== OTHER \SPECIAL ====================
https://ctan.org/pkg/dtl
- dvipdf
- html generation?

====================  LATEX? ====================
https://tobi.oetiker.ch/lshort/lshort.pdf

i'm not going to bother describing LaTeX in depth here because as i learn more i think it is not helpful for my project

"Macros → TeX → Driver → Output"

i think this is either:
PDFLaTeX: LaTeX -> PDFTeX -> PDF
LaTeX:    LaTeX -> e-TeX -> DVI
          ... DVI -> dvips -> PS
          ... PS  -> ps2pdf (ghostscript?) -> PDF
          
includes sequences of the form:
  \foo*[]{}
  (with a star)
  
producing a slash: \textbackslash (not \\)
this is not in Plain TeX. stay safe 
- citation: https://www.latex-project.org/help/documentation/source2e.pdf#page=130

\newcommand
 
==================== CONTEXT? ====================
it's slow
it has better integration with metapost apparently
definitely has some weird magic going on with \input
i'm not going to bother describing ConTeXt in depth here because as i learn more i think it is not helpful for my project

==================== METAFONT? ====================
anything to do with actually putting ink on paper comes from metafont (or specials in DVI)
it is almost immediately weirder (in a good way) and more appealing than TeX

putting my metafont notes here

remaining notes on how to interact with metafonts through tex:
«\font\mine=myfont10»
«{\mine Mary had a little lamb,}»

==================== METAFONT? ====================

 terminology:
 
references:
 Notes On Programming in TeX (ACTUALLY GOOD): http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.404.7342
 The TeXBook:
 The Not So Short Introduction to LaTeX2ε: https://tobi.oetiker.ch/lshort/lshort.pdf
 tex, the program: http://brokestream.com/tex.pdf 
 https://tex.stackexchange.com/questions/38674/the-the-command
 
go home