Highlight documentation
Highlight manual
Content
Overview
Highlight converts sourcecode to HTML, XHTML, RTF, LaTeX, TeX, SVG, BBCode and terminal escape sequences with coloured syntax highlighting. Language definitions and colour themes are customizable.
Intended purpose
Highlight was designed to offer a flexible but easy to use syntax highlighter for several output formats. Instead of hardcoding syntax or colouring information, all relevant data is stored in configuration scripts. These scripts may be altered or enhanced with plug-in scripts.
Feature list
- highlighting of keywords, types, strings, numbers, escape sequences, comments, operators and preprocessor directives
- coloured output in HTML, XHTML 1.1, RTF, TeX, LaTeX, SVG, BBCode and terminal escape sequences
- supports referenced stylesheet files for HTML, LaTeX, TeX or SVG output
- syntax elements are defined as Regular Expressions or plain string lists
- customizable keyword groups
- recognition of nested languages within a file
- all configuration files are Lua scripts
- supports plug-in scripts to tweak language definitions and themes
- reformatting and indentation of C, C++, C# and Java source code
- wrapping of long lines
- configurable output of line numbers
Usage and options
Quick introduction
The following examples show how to produce a highlighted HTML file, using an input file called main.cpp:
- Generate HTML: highlight -i main.cpp -o main.cpp.html highlight < main.cpp > main.cpp.html --syntax cpp You will find the HTML file and highlight.css in the working directory. If you use IO redirection, you must define the programming language with --syntax. - Generate HTML with embedded CSS definitions and line numbers: highlight -i main.cpp -o main.cpp.html --include-style --line-numbers - Generate HTML with inline CSS definitions: highlight -i main.cpp -o main.cpp.html --inline-css - Generate LaTeX using "horstmann" source formatting style and "neon" colour theme: highlight -O latex -i main.cpp -o main.cpp.html --reformat horstmann --style neon The following output formats may be used with --out-format: html: HTML5 (Standard) xhtml: XHTML 1.1 tex: Plain TeX latex: LaTeX rtf: RTF odt: OpenDocument Text (Flat XML) svg: SVG bbcode: BBCode pango: Pango markup ansi: Terminal 16 color escape codes xterm256: Terminal 256 color escape codes truecolor: Terminal 16m color escape codes - Customize font settings: highlight --syntax ada --out-format=xhtml --font-size 12 --font Consolas,\'Courier\ New\' highlight --syntax ada --out-format=latex --font-size tiny --font sffamily - Define an output directory: highlight -d some/target/dir/ *.cpp *.h
CLI options
The command line version of highlight offers following options:
USAGE: highlight [OPTIONS]... [FILES]... General options: -B, --batch-recursive=<wc> convert all matching files, searches subdirs (Example: -B '*.cpp') -D, --data-dir=<directory> set path to data directory --config-file=<file> set path to a lang or theme file -d, --outdir=<directory> name of output directory -h, --help[=topic] print this help or a topic description <topic> = [syntax, theme, plugin, config, test, lsp] -i, --input=<file> name of single input file -o, --output=<file> name of single output file -P, --progress print progress bar in batch mode -q, --quiet suppress progress info in batch mode -S, --syntax=<type|path> specify type of source code or syntax file path --syntax-by-name=<name> specify type of source code by given name will not read a file of this name, useful for stdin --syntax-supported test if the given syntax can be loaded -v, --verbose print debug info; repeat to show more information --force[=syntax] generate output if input syntax is unknown --list-scripts=<type> list installed scripts <type> = [langs, themes, plugins] --list-cat=<categories> filter the scripts by the given categories (example: --list-cat='source;script') --max-size=<size> set maximum input file size (examples: 512M, 1G; default: 256M) --plug-in=<script> execute Lua plug-in script; repeat option to execute multiple plug-ins --plug-in-param=<value> set plug-in input parameter --print-config print path configuration --print-style print stylesheet only (see --style-outfile) --skip=<list> ignore listed unknown file types (Example: --skip='bak;c~;h~') --stdout output to stdout (batch mode, --print-style) --validate-input test if input is text, remove Unicode BOM --version print version and copyright information Output formatting options: -O, --out-format=<format> output file in given format <format>=[html, xhtml, latex, tex, odt, rtf, ansi, xterm256, truecolor, bbcode, pango, svg] -c, --style-outfile=<file> name of style file or print to stdout, if 'stdout' is given as file argument -e, --style-infile=<file> to be included in style-outfile (deprecated) use a plug-in file instead -f, --fragment omit document header and footer -F, --reformat=<style> reformats and indents output in given style <style> = [allman, gnu, google, horstmann, java, kr, linux, lisp, mozilla, otbs, pico, vtk, ratliff, stroustrup, webkit, whitesmith] -I, --include-style include style definition in output file -J, --line-length=<num> line length before wrapping (see -V, -W) -j, --line-number-length=<num> line number width incl. left padding (default: 5) --line-range=<start-end> output only lines from number <start> to <end> -k, --font=<font> set font (specific to output format) -K, --font-size=<num?> set font size (specific to output format) -l, --line-numbers print line numbers in output file -m, --line-number-start=<cnt> start line numbering with cnt (assumes -l) -s, --style=<style|path> set colour style (theme) or theme file path -t, --replace-tabs=<num> replace tabs by <num> spaces -T, --doc-title=<title> document title -u, --encoding=<enc> set output encoding which matches input file encoding; omit encoding info if set to NONE -V, --wrap-simple wrap lines after 80 (default) characters w/o indenting function parameters and statements -W, --wrap wrap lines after 80 (default) characters --wrap-no-numbers omit line numbers of wrapped lines (assumes -l) -z, --zeroes pad line numbers with 0's --isolate output each syntax token separately (verbose output) --keep-injections output plug-in injections in spite of -f --kw-case=<case> change case of case insensitive keywords <case> = [upper, lower, capitalize] --no-trailing-nl[=mode] omit trailing newline. If mode is empty-file, omit only for empty input --no-version-info omit version info comment (X)HTML output options: -a, --anchors attach anchor to line numbers -y, --anchor-prefix=<str> set anchor name prefix -N, --anchor-filename use input file name as anchor prefix -C, --print-index print index with hyperlinks to output files -n, --ordered-list print lines as ordered list items --class-name=<name> set CSS class name prefix; omit class name if set to NONE --inline-css output CSS within each tag (verbose output) --enclose-pre enclose fragmented output with pre tag (assumes -f) LaTeX output options: -b, --babel disable Babel package shorthands -r, --replace-quotes replace double quotes by \dq{} --beamer adapt output for the Beamer package --pretty-symbols improve appearance of brackets and other symbols RTF output options: --page-color include page color attributes -x, --page-size=<ps> set page size <ps> = [a3, a4, a5, b4, b5, b6, letter] --char-styles include character stylesheets SVG output options: --height set image height (units allowed) --width set image width (see --height) Terminal escape output options (xterm256 or truecolor): --canvas[=width] set background colour padding (default: 80) Language Server options: --ls-profile=<server> read LSP configuration from lsp.conf --ls-delay=<ms> set server initialization delay --ls-exec=<bin> set server executable name --ls-option=<option> set server CLI option (can be repeated) --ls-hover execute hover requests (HTML output only) --ls-semantic retrieve semantic token types (requires LSP 3.16) --ls-syntax=<lang> set syntax which is understood by the server --ls-syntax-error retrieve syntax error information (assumes --ls-hover or --ls-semantic) --ls-workspace=<dir> set workspace directory to init. the server If no in- or output files are specified, stdin and stdout will be used. Reading from stdin can also be triggered using the '-' option. Default output format: xterm256 or truecolor if appropriate, HTML otherwise. Style definitions are stored in highlight.css (HTML, XHTML, SVG) or highlight.sty (LaTeX, TeX) if neither -c nor -I is given. Reformatting code (-F) will only work with C, C++, C# and Java input files. LSP features require absolute input paths and disable reformatting (-F) Wrapping lines with -V or -W will cause faulty highlighting of long single line comments and directives. Using line-range might interfere with multi line syntax elements. Use with caution.
GUI options
The Graphical User Interface offers a subset of the CLI features. It includes a dynamic preview of the output file's apperarance. Please see screenshots and screencasts.
Input and output
If no input or output file name is defined by --input and --output
options,
highlight will use stdin and stdout for file processing.
If no input filename is defined by --input
or given at the prompt, highlight is
not able to determine the language type by means of the file extension (but
some scripting languages are determined by the shebang in the first input
line). In this case you have to pass highlight the given langage with --syntax
(this should be the file suffix of the source file in most cases).
highlight test.py highlight < test.py --syntax py # --syntax option necessary cat test.py | highlight --syntax py
If there exist multiple suffixes (like C, cc, cpp, h for C++ - files), they are mapped to a language definition in $CONF_DIR/filetypes.conf.
Highlight enters the batch processing mode if multiple input files are defined
or if --batch-recursive
is set.
In batch mode, highlight will save the generated files using the original
filename, appending the extension of the chosen output type.
If files in the input directories happen to share the same name, the output
files will be prefixed with their source path name.
The --out-dir
option is recommended in batch mode. Use --quiet
to improve
performance (recommended for usage in shell scripts).
HTML, TeX, LaTeX and SVG output
The HTML, TeX, LaTeX and SVG output formats allow to reference style definition files which contain the formatting information (stylesheets).
In HTML and SVG output, this file contains CSS definitions and is saved as 'highlight.css'. In LaTeX and TeX, it contains macro definitions, and is saved as 'highlight.sty'.
Name and path of the stylesheet may be modified with --style-outfile
.
If the --outdir
option is given, all generated output, including stylesheets,
are stored in this directory.
Use --include-style
to embed style information in the output
documents without referencing a stylesheet.
Referenced style definitions have the advantage to share all formatting information in a single file, which affects all referencing documents.
With --style-infile
you define a file to be included in the final formatting
information of the document. This way you enhance or redefine the default
highlight style definitions without editing generated code.
Note: Using a plug-in script is the preferred way to enhance styling.
GNU source-highlight compatibility
The command line interface is extensively harmonised with source-highlight
(http://www.gnu.org/software/src-highlite/).
The following highlight options have the same meaning as in source-highlight:
--input, --output, --help, --version, --out-format, --title, --data-dir, --verbose, --quiet, --ctags-file
These options were added to enhance compatibility:
--css, --doc, --failsafe, --line-number, --line-number-ref, --no-doc, --tab, --output-dir, --src-lang
These switches provide a common highlighter interface for scripts, plugins etc.
Advanced options
Prevent parsing of binary input files
If highlight could be invoked with all kinds of input, you can disable parsing
of binary files using --validate-input
. This flag causes highlight to match the
input file header with a list of magic numbers. If a binary file type is
detected, highlight quits with an error message.
Highlight nested code without starting delimiter
If a file starts with an embedded code section which misses the starting
delimiter, the --start-nested
option will switch to the nested language mode.
This can happen with LuaTeX files:
highlight luatex.tex --latex --start-nested=inc_luatex
Inc_luatex is a Lua definition with TeX line comments. Note that the nested code section has to end with the ending delimiter defined in the host language definition.
Tips and tricks
Test new configuration scripts
The option --config-file
helps to test new config files before installing them.
The given file must be a lang or theme file.
highlight --config-file xxx.lang --config-file yyy.theme -I
Debug language definitions
Use --verbose
to display the Lua and syntax data.
→ See the wiki for more information
Remove an UTF-8 BOM
Use --validate-input
to get rid of the UTF-8 byte order mark.
Configuration
File format
Configuration files are Lua scripts. These constructs are sufficient to edit the scripts:
Variable assigment: name = value (variables have no type, only values have) Strings string1="string literal with escape: \n" string2=[[raw string without escape sequence]] If raw string value starts with "[" or ends with "]", pad the paranthesis with space to avoid a syntax error. Highlight will strip the string. Comments -- line comment --[[ block comment ]] Arrays array = { first=1, second="2", 3, { 4,5 } }
Please refer to the Lua manual for more details about the Lua syntax.
Regular Expressions
Please see Regular Expressions for the supported regex constructs.Language definitions
A language definition describes all elements of a programming language which will be highlighted by different colours and font types. Save the new file in langDefs/, using the following name convention:
<usual extension of sourcecode files>.lang
Examples: PHP -> php.lang, Java -> java.lang If there exist multiple suffixes, list them in filetypes.conf.
Keywords = { { Id, List|Regex, Group?, Priority?, Constraints? } } Id: Integer, keyword group id (can be reused for several groups). Default themes support 4 and base16 themes 6 groups. List: List, list of keywords Regex: String, regular expression Group: Integer, capturing group id of regular expression, defines part of regex which should be returned as keyword (optional; if not set, the match with the highest group number is returned (counts from left to right)) Priority: Integer, if not zero no more regexes will be evaluated if this regex matches Constraints: table consisting of: Line: Integer, limit match to line number, Filename: String, limit match to input file name Regular Expressions are evaluated in the their order within Keywords. If a regex does not appear to match, there might be a conflicting expression listed before. Comments = { {Block, Nested?, Delimiter={Open, Close?} } Block: Boolean, true if comment is a block comment Nested: Boolean, true if block comments can be nested (optional) Delimiter: List, contains open delimiter regex (line comment) or open and close delimiter regexes (block comment) Strings = { Delimiter|DelimiterPairs={Open, Close, Raw?}, Escape?, Interpolation?, RawPrefix?, AssertEqualLength? } Delimiter: String, regular expression which describes string delimiters DelimiterPairs: List, includes open and close delimiter expressions if not equal, includes optional Raw flag as boolean which marks delimiter pair to contain a raw string Escape: String, regex of escape sequences (optional) Interpolation: String, regex of interpolation sequences (optional) RawPrefix: String, defines raw string indicator (optional) AssertEqualLength: Boolean, set true if delimiters must have the same length PreProcessor = { Prefix, Continuation? } Prefix: String, regular expression which describes open delimiter Continuation: String, contains line continuation character (optional). NestedSections = {Lang, Delimiter= {} } Lang: String, name of nested language Delimiter: List, contains open and close delimiters of the code section KeywordFormatHints={ { Id, Bold?, Italic?, Underline? } } Id: Integer, keyword group id whose attributes should be changed Bold: Boolean, font weight property Italic: Boolean, font style property Underline: Boolean, font decoration property These hints may have no effect if multiple syntax types are highlighted in batch mode without --include-style. Description: String, Defines syntax description Categories: Table, List of categories (config, source, script, etc) Digits: String, Regular expression which defines digits (optional) Identifiers: String, Regular expression which defines identifiers (optional) Operators: String, Regular expression which defines operators EnableIndentation: Boolean, set true if syntax may be reformatted and indented IgnoreCase: Boolean, set true if keyword case should be ignored Script Environment: The following variables are defined when a script is executed: hl_lang_dir: current path of language definitions (use with dofile) Identifiers: Default regex for identifiers; Digits: Default reegx for numbers The following variables are integers which represent the internal highlighting states: HL_STANDARD HL_STRING HL_NUMBER HL_LINE_COMMENT HL_BLOCK_COMMENT HL_ESC_SEQ HL_PREPROC HL_PREPROC_STRING HL_OPERATOR HL_LINENUMBER HL_KEYWORD HL_STRING_END HL_LINE_COMMENT_END HL_BLOCK_COMMENT_END HL_ESC_SEQ_END HL_PREPROC_END HL_OPERATOR_END HL_KEYWORD_END HL_EMBEDDED_CODE_BEGIN HL_EMBEDDED_CODE_END HL_IDENTIFIER_BEGIN HL_IDENTIFIER_END HL_UNKNOWN HL_REJECT The function OnStateChange: This function is a hook which is called if an internal state changes (e.g. from HL_STANDARD to HL_KEYWORD if a keyword is found). It can be used to alter the new state or to manipulate syntax elements like keyword lists. OnStateChange(oldState, newState, token, kwGroupID, lineno, column) Hook Event: Highlighting parser state change Parameters: oldState: old state newState: intended new state token: the current token which triggered the new state kwGroupID: if newState is HL_KEYWORD, the parameter contains the keyword group ID lineno: line number (since 3.50) column: line column (since 3.50) Returns: Correct state to continue OR HL_REJECT Return HL_REJECT if the recognized token and state should be discarded; the first character of token will be outputted and highlighted as "oldState".
Example:
01 Description="C and C++" 02 03 Keywords={ 04 { Id=1, 05 List={"goto", "break", "return", "continue", "asm", "case", "default", 06 -- [..] 07 } 08 }, 09 -- [..] 10 } 11 12 Strings = { 13 Delimiter=[["|']], 14 RawPrefix="R", 15 } 16 17 Comments = { 18 { Block=true, 19 Nested=false, 20 Delimiter = { [[\/\*]], [[\*\/]] } }, 21 { Block=false, 22 Delimiter = { [[//]] } } 23 } 24 25 IgnoreCase=false 26 27 PreProcessor = { 28 Prefix=[[#]], 29 Continuation="\\", 30 } 31 32 Operators=[[\(|\)|\[|\]|\{|\}|\,|\;|\.|\:|\&|<|>|\!|\=|\/|\*|\%|\+|\-|\~]] 33 34 EnableIndentation=true
→ See the wiki for more information
Theme definitions
Colour themes contain the formatting information of the language elements which are described in language definitions.
The files have to be stored as *.theme in themes/.
Apply a style with the --style
option.
Format attributes: Attributes = {Colour, Bold?, Italic?, Underline? } Colour: String, defines colour in HTML hex notation ("#rrggbb") Bold: Boolean, true if font should be bold (optional) Italic: Boolean, true if font should be italic (optional) Underline: Boolean, true if font should be underlined (optional) Theme elements: Description: = String, Defines theme description Categories = Table, List of categories (dark, light, etc) Default = Attributes (Colour of unspecified text) Canvas = Attributes (Background colour) Number = Attributes (numbers) Escape = Attributes (escape sequences) String = Attributes (strings) Interpolation = Attributes (interpolation sequences) PreProcessor = Attributes (preprocessor directives) StringPreProc = Attributes (strings within preprocessor directives) BlockComment = Attributes (block comments) LineComment = Attributes (line comments) LineNum = Attributes (line numbers) Operator = Attributes (operators) Hover = Attributes (LSP Hover elements) Error = Attributes (LSP syntax errors) ErrorMessage = Attributes (LSP error descriptions) Keywords= { Attributes1, Attributes2, Attributes3, Attributes4, Attributes5, Attributes6, } AttributesN: Formatting of keyword group N. SemanticAttributesN: An array consisting of: `Type`: Token Identifier of the LS protocol (V 3.16) `Style`: formatting of the token
Example:
1 2 Description = "vim zmrok" 3 4 Categories = {"dark", "vim"} 5 6 Default = { Colour="#F8F8F8" } 7 Canvas = { Colour="#141414" } 8 Number = { Colour="#FACE43" } 9 Escape = { Colour="#ffa500" } 10 String = { Colour="#D9FF77" } 11 BlockComment = { Colour="#8a8a8a" } 12 PreProcessor = { Colour="#8b864e" } 13 LineNum = { Colour="#777777" } 14 StringPreProc = String 15 LineComment = BlockComment 16 Operator = { Colour="#888888" } 17 Interpolation = { Colour="#D084CE" } 18 19 Keywords = { 20 { Colour="#A56A30" , Bold=true}, 21 { Colour="#C7CA87" }, 22 { Colour="#30a630" }, 23 { Colour="#3b84cc" }, 24 { Colour= "#d484aa" }, 25 { Colour= "#ae84d4" }, 26 } 27 28 -- new LSP based elements: 29 30 SemanticTokenTypes = { 31 { Type = 'keyword', Style = Keywords[1] }, 32 { Type = 'type', Style = Keywords[2] }, 33 { Type = 'function', Style = Keywords[4] }, 34 { Type = 'method', Style = Keywords[4] }, 35 { Type = 'class', Style = Keywords[1] }, 36 { Type = 'struct', Style = Keywords[2] }, 37 { Type = 'parameter', Style = Keywords[6] }, 38 { Type = 'variable', Style = Keywords[5] }, 39 { Type = 'number', Style = Number }, 40 { Type = 'regexp', Style = String }, 41 { Type = 'operator', Style = Operator }, 42 }
→ See the wiki for more information
Keyword groups
You may define custom keyword groups and corresponding highlighting styles. This is useful if you want to highlight functions of a third party library, macros, constants etc.
You define a new group in two steps:
1. Define a new group in your language definition (lang file): Keywords = { -- add your keyword description: {Id=5, List = {"ERROR", "DEBUG", "WARN"} } } 2. Add a corresponding highlighting style in your colour theme (theme file): Keywords= { --add your keyword style as 5th item in the list: { Colour= "#ff0000", Bold=true }, }
It is recommended to define keyword groups in user-defined plugin scripts to avoid editing of original highlight files.
Plug-ins
The --plug-in
option receives the name of a Lua script which can override and
enhance the settings of theme and language definition files. Plug-ins make
it possible to apply costum settings without editing installed highlight
configuration files.
See Plug-Ins for file format and examples.
File mapping
The script filetypes.conf assigns file extensions and shebang descriptions to language definitions.
Format: FileMapping={ { Lang, Extensions|Shebang }, } Lang: String, name of language definition Extensions: list of strings, contains file extensions referring to "Lang" Shebang: String, Regular expression which matches the first line of the input file
Edit the file gui_files/ext/fileopenfilter.conf to add new syntax types to the file open filter of the GUI.
Config file search
Configuration scripts are searched in the following directories:
1. ~/.highlight/ 2. value of the environment variable HIGHLIGHT_DATADIR 3. user defined directory set with --data-dir 4. /usr/share/highlight/ 5. /etc/highlight/ (location of filetypes.conf) 6. current working directory (fallback)
These subdirectories are expected to contain the corresponding scripts:
-langDefs: *.lang -themes: *.theme -plugins: *.lua
A custom filetypes.conf may be placed directly in ~/.highlight/.
This search order enables you to enhance the installed scripts without the need
to copy all preinstalled files somewhere else.
As the --plug-in
option of older releases accepted absolute paths only, the
given plugin scripts will be searched in the directories above only if the
absolute file path access fails.
→ See the wiki for more information