text processing utilities

Highlight documentation

Highlight manual


  1. Overview
    1. Intended purpose
    2. Feature list
    3. Supported programming and markup languages
  2. Usage and options
    1. Quick introduction
    2. CLI options
    3. GUI options
    4. Input and output
    5. GNU source-highlight compatibility
    6. Advanced options
    7. Tips and tricks
  3. Configuration
    1. File format
    2. Regular expressions
    3. Language definitions
    4. Theme definitions
    5. Keyword groups
    6. Plug-ins
    7. File mapping
    8. Config file search
  4. Embedding highlight
    1. Sample scripts
    2. SWIG interface
    3. Third party scripts and plug-ins
  5. Building and installing
    1. Building dependencies
    2. Packaging example


Highlight converts sourcecode to HTML, XHTML, RTF, LaTeX, TeX, SVG, BBCode and terminal escape sequences with coloured syntax highlighting. Language definitions and colour themes are customizable.

Intended purpose

Highlight was designed to offer a flexible but easy to use syntax highlighter for several output formats. Instead of hardcoding syntax or colouring information, all relevant data is stored in configuration scripts. These scripts may be altered or enhanced with plug-in scripts.

Feature list

Supported programming and markup languages

Please see the supported language list.

Usage and options

Quick introduction

The following examples show how to produce a highlighted C++ file, using an input file called main.cpp:

- Generate HTML:
  highlight -i main.cpp -o main.cpp.html
  highlight < main.cpp > main.cpp.html --syntax cpp

  You will find the HTML file and highlight.css in the working directory.
  If you use IO redirection, you must define the programming language with

- Generate HTML with embedded CSS definitions and line numbers:
  highlight -i main.cpp -o main.cpp.html --include-style --line-numbers

- Generate HTML with inline CSS definitions:
  highlight -i main.cpp -o main.cpp.html --inline-css

- Generate HTML using "horstmann" source formatting style and "neon" colour
  highlight -i main.cpp -o main.cpp.html --reformat horstmann --style neon

- Generate LaTeX:
  highlight --out-format=latex -i main.cpp -o main.cpp.tex

  The following output formats may be used with --out-format:
  html:     HTML5
  xhtml:    XHTML 1.1
  tex:      Plain TeX
  latex:    LaTeX
  rtf:      RTF
  odt:      OpenDocument Text (Flat XML)
  ansi:     Terminal 16 color escape codes
  xterm256: Terminal 256 color escape codes
  svg:      SVG
  bbcode:   BBCode
  pango:    Pango Markup

  Default output is HTML if no other format is specified.

- Customize font settings:
  highlight --syntax ada --out-format=xhtml --font-size 12 --font  Consolas,\'Courier\ New\'
  highlight --syntax ada --out-format=latex --font-size tiny --font sffamily

- Define an output directory:
  highlight -d some/target/dir/ *.cpp *.h

CLI options

The command line version of highlight offers following options:

USAGE: highlight [OPTIONS]... [FILES]...

General options:

 -B, --batch-recursive=<wc>     convert all matching files, searches subdirs
                                  (Example: -B '*.cpp')
 -D, --data-dir=<directory>     set path to data directory (deprecated)
     --config-file=<file>       set path to a lang or theme file
 -d, --outdir=<directory>       name of output directory
 -h, --help                     print this help
 -i, --input=<file>             name of single input file
 -o, --output=<file>            name of single output file
 -P, --progress                 print progress bar in batch mode
 -q, --quiet                    supress progress info in batch mode
 -S, --syntax=<type>            specify type of source code
 -v, --verbose                  print debug info
     --force                    generate output if input syntax is unknown
     --list-scripts=<type>      list installed scripts
                                  <type> = [langs, themes, plugins]
     --plug-in=<script>         execute Lua plug-in script; repeat option to
                                  execute multiple plug-ins
     --plug-in-param=<value>    set plug-in input parameter
     --print-config             print path configuration
     --print-style              print stylesheet only (see --style-outfile)
     --skip=<list>              ignore listed unknown file types
                                  (Example: --skip='bak;c~;h~')
     --start-nested=<lang>      define nested language which starts input
                                  without opening delimiter
     --validate-input           test if input is text, remove Unicode BOM
     --version                  print version and copyright information

Output formatting options:

 -O, --out-format=<format>      output file in given format
                                  <format>=[html, xhtml, latex, tex, odt, rtf,
                                  ansi, xterm256, truecolor, bbcode, pango, svg]
 -c, --style-outfile=<file>     name of style file or print to stdout, if                                                                                   
                                  'stdout' is given as file argument                                                                                        
 -e, --style-infile=<file>      to be included in style-outfile (deprecated)                                                                                
                                  use a plug-in file instead                                                                                                
 -f, --fragment                 omit document header and footer                                                                                             
 -F, --reformat=<style>         reformats and indents output in given style                                                                                 
                                  <style> = [allman, banner, gnu,                                                                                           
                                  horstmann, java, kr, linux, mozilla, otbs, vtk,                                                                           
                                  stroustrup, whitesmith, google, pico, lisp]                                                                               
 -I, --include-style            include style definition in output file
 -J, --line-length=<num>        line length before wrapping (see -V, -W)
 -j, --line-number-length=<num> line number width incl. left padding (default: 5)
 -k, --font=<font>              set font (specific to output format)
 -K, --font-size=<num?>         set font size (specific to output format)
 -l, --line-numbers             print line numbers in output file
 -m, --line-number-start=<cnt>  start line numbering with cnt (assumes -l)
 -s, --style=<style>            set colour style (theme)
 -t, --replace-tabs=<num>       replace tabs by <num> spaces
 -T, --doc-title=<title>        document title
 -u, --encoding=<enc>           set output encoding which matches input file
                                  encoding; omit encoding info if set to NONE
 -V, --wrap-simple              wrap lines after 80 (default) characters w/o
                                  indenting function parameters and statements
 -W, --wrap                     wrap lines after 80 (default) characters
     --wrap-no-numbers          omit line numbers of wrapped lines
                                  (assumes -l)
 -z, --zeroes                   pad line numbers with 0's
     --delim-cr                 set CR as end-of-line delimiter (MacOS 9)
     --keep-injections          output plug-in injections in spite of -f
     --kw-case=<case>           change case of case insensitive keywords
                                  <case> =  [upper, lower, capitalize]
     --no-trailing-nl           omit trailing newline

(X)HTML output options:

 -a, --anchors                  attach anchor to line numbers
 -y, --anchor-prefix=<str>      set anchor name prefix
 -N, --anchor-filename          use input file name as anchor prefix
 -C, --print-index              print index with hyperlinks to output files
 -n, --ordered-list             print lines as ordered list items
     --class-name=<name>        set CSS class name prefix;
                                  omit class name if set to NONE
     --inline-css               output CSS within each tag (verbose output)
     --enclose-pre              enclose fragmented output with pre tag 
                                  (assumes -f)

LaTeX output options:

 -b, --babel                    disable Babel package shorthands
 -r, --replace-quotes           replace double quotes by \dq{}
     --pretty-symbols           improve appearance of brackets and other symbols

RTF output options:

     --page-color               include page color attributes
 -x, --page-size=<ps>           set page size 
                                  <ps> = [a3, a4, a5, b4, b5, b6, letter]
     --char-styles              include character stylesheets

SVG output options:

     --height                   set image height (units allowed)
     --width                    set image width (see --height)

GNU source-highlight compatibility options:

     --doc                      create stand alone document
     --no-doc                   cancel the --doc option
     --css=filename             the external style sheet filename
     --src-lang=STRING          source language
 -t, --tab=INT                  specify tab length
 -n, --line-number[=0]          number all output lines, optional padding
     --line-number-ref[=p]      number all output lines and generate an anchor,
                                  made of the specified prefix p + the line
                                  number  (default='line')
     --output-dir=path          output directory
     --failsafe                 if no language definition is found for the
                                  input, it is simply copied to the output

If no in- or output files are specified, stdin and stdout will be used.
HTML will be generated unless an other output format is given. Style definitions
are stored in highlight.css (HTML, XHTML, SVG) or highlight.sty (LaTeX, TeX)
if neither -c nor -I is given.
Reformatting code (-F) will only work with C, C++, C# and Java input files.
Wrapping lines with -V or -W will cause faulty highlighting of long single
line comments and directives. Use with caution.
See README files how to apply plug-ins to customize the output.

GUI options

The Graphical User Interface offers a subset of the CLI features. It includes a dynamic preview of the output file's apperarance. Please see screenshots and screencasts.

Input and output

If no input or output file name is defined by --input and --output options, highlight will use stdin and stdout for file processing.

If no input filename is defined by --input or given at the prompt, highlight is not able to determine the language type by means of the file extension (but some scripting languages are determined by the shebang in the first input line). In this case you have to pass highlight the given langage with --syntax (this should be the file suffix of the source file in most cases).

highlight test.py
highlight < test.py --syntax py       # --syntax option necessary
cat test.py | highlight --syntax py

If there exist multiple suffixes (like C, cc, cpp, h for C++ - files), they are mapped to a language definition in $CONF_DIR/filetypes.conf.

Highlight enters the batch processing mode if multiple input files are defined or if --batch-recursive is set. In batch mode, highlight will save the generated files using the original filename, appending the extension of the chosen output type.
If files in the input directories happen to share the same name, the output files will be prefixed with their source path name.
The --out-dir option is recommended in batch mode. Use --quiet to improve performance (recommended for usage in shell scripts).

HTML, TeX, LaTeX and SVG output

The HTML, TeX, LaTeX and SVG output formats allow to reference style definition files which contain the formatting information (stylesheets).

In HTML and SVG output, this file contains CSS definitions and is saved as 'highlight.css'. In LaTeX and TeX, it contains macro definitions, and is saved as 'highlight.sty'.

Name and path of the stylesheet may be modified with --style-outfile. If the --outdir option is given, all generated output, including stylesheets, are stored in this directory.

Use --include-style to embed style information in the output documents without referencing a stylesheet.

Referenced style definitions have the advantage to share all formatting information in a single file, which affects all referencing documents.

With --style-infile you define a file to be included in the final formatting information of the document. This way you enhance or redefine the default highlight style definitions without editing generated code.
Note: Using a plug-in script is the preferred way to enhance styling.

GNU source-highlight compatibility

The command line interface is extensively harmonised with source-highlight (http://www.gnu.org/software/src-highlite/).
The following highlight options have the same meaning as in source-highlight:

 --input, --output, --help, --version, --out-format, --title, --data-dir,
 --verbose, --quiet, --ctags-file

These options were added to enhance compatibility:

 --css, --doc, --failsafe, --line-number, --line-number-ref, --no-doc, --tab,
 --output-dir, --src-lang

These switches provide a common highlighter interface for scripts, plugins etc.

Advanced options

Prevent parsing of binary input files

If highlight could be invoked with all kinds of input, you can disable parsing of binary files using --validate-input. This flag causes highlight to match the input file header with a list of magic numbers. If a binary file type is detected, highlight quits with an error message.

Highlight nested code without starting delimiter

If a file starts with an embedded code section which misses the starting delimiter, the --start-nested option will switch to the nested language mode. This can happen with LuaTeX files:

highlight luatex.tex --latex --start-nested=inc_luatex

Inc_luatex is a Lua definition with TeX line comments. Note that the nested code section has to end with the ending delimiter defined in the host language definition.

Tips and tricks

Test new configuration scripts

The option --config-file helps to test new config files before installing them. The given file must be a lang or theme file.

highlight --config-file xxx.lang --config-file yyy.theme -I

Debug language definitions

Use --verbose to display the Lua and syntax data.

Remove an UTF-8 BOM

Use --validate-input to get rid of the UTF-8 byte order mark.


File format

Configuration files are Lua scripts. These constructs are sufficient to edit the scripts:

Variable assigment:
name = value
(variables have no type, only values have)

string1="string literal with escape: \n"
string2=[[raw string without escape sequence]]

If raw string value starts with "[" or ends with "]", pad the paranthesis with
space to avoid a syntax error. Highlight will strip the string.

-- line comment
--[[ block comment ]]

array = { first=1, second="2", 3, { 4,5 } }
Arrays may have identifiers and can be nested.

Please refer to the Lua manual for more details about the Lua syntax.

Regular expressions

Please see Regular expressions for the supported regex constructs.

Language definitions

A language definition describes all elements of a programming language which will be highlighted by different colours and font types. Save the new file in $HL_DIR/langDefs, using the following name convention:

<usual extension of sourcecode files>.lang

Examples: PHP -> php.lang, Java -> java.lang If there exist multiple suffixes, list them in $HL_DIR/filetypes.conf.

Keywords = { Id, List|Regex, Group? }

  Id:    Integer, keyword group id (values 1-4, can be reused for several keyword
  List:  List, list of keywords
  Regex: String, regular expression
  Group: Integer, capturing group id of regular expression, defines part of regex
	which should be returned as keyword (optional; if not set, the match
	with the highest group number is returned (counts from left to right))

Comments = { {Block, Nested?, Delimiter=} }

  Block:     Boolean, true if comment is a block comment
  Nested:    Boolean, true if block comments can be nested (optional)
  Delimiter: List, contains open delimiter regex (line comment) or open and close
	    delimiter regexes (block comment)

Strings = { Delimiter|DelimiterPairs={Open, Close, Raw?}, Escape?, RawPrefix? }

  Delimiter:      String, regular expression which describes string delimiters
  DelimiterPairs: List, includes open and close delimiters if not equal (regex),
			includes optional Raw flag as boolean which marks
			delimiter pair as raw string
  Escape:         String, regular expression of escape sequences (optional)
  RawPrefix:      String, defines raw string indicator (optional)

PreProcessor = { Prefix, Continuation? }

  Prefix:        String, regular expression which describes open delimiter
  Continuation:  String, contains continuation character (optional)

NestedSections = {Lang, Delimiter= {} }

  Lang:      String, name of nested language
  Delimiter: List, contains open and close delimiters of the code section

Description:       String, Defines syntax description

Digits:            String, Regular expression which defines digits (optional)

Identifiers:       String, Regular expression which defines identifiers

Operators:         String,Regular expression which defines operators

EnableIndentation: Boolean, set true if syntax may be reformatted and indented

IgnoreCase:        Boolean, set true if keyword case should be ignored

Script Environment:

The following variables are defined when a script is executed:

hl_lang_dir: current path of language definitions (use with dofile)

Identifiers: Default regex for identifiers;
Digits:      Default reegx for numbers

The following variables are integers which represent the internal highlighting


Hook functions:

OnStateChange(oldState, newState, token)

Hook Event: Highlighting parser state change
Input:      Old state, intended new state and the current token which led to
            the new state
Returns:    Correct state to continue

See the file README_REGEX for a detailed description of the regular expression


01 Description="C and C++"
03 Keywords={
04   {  Id=1,
05    List={"goto", "break", "return", "continue", "asm", "case", "default",
06          -- [..]
07         }
08   },
09   -- [..]
10 }
12 Strings = {
13   Delimiter=[["|']],
14   RawPrefix="R",
15 }
18    { Block=true,
19      Nested=false,
20      Delimiter = { [[\/\*]], [[\*\/]] }  },
21    { Block=false,
22      Delimiter = { [[//]] } }
23 }
25 IgnoreCase=false
27 PreProcessor = {
28   Prefix=[[#]],
29   Continuation="\\",
30 }
32 Operators=[[\(|\)|\[|\]|\{|\}|\,|\;|\.|\:|\&|\<|\>|\!|\=|\/|\*|\%|\+|\-|\~]]
34 EnableIndentation=true

Theme definitions

Colour themes contain the formatting information of the language elements which are described in language definitions.

The files have to be stored as *.theme in HL_DIR/themes*. Apply a style with the --style option.

Format attributes:

Attributes = {Colour, Bold?, Italic?, Underline? }

Colour:    String, defines colour in HTML hex notation ("#rrggbb")
Bold:      Boolean, true if font should be bold (optional)
Italic:    Boolean, true if font should be italic (optional)
Underline: Boolean, true if font should be underlined (optional)

Theme elements:

Description:   String, Defines theme description

Default        = Attributes (Colour of unspecified text)

Canvas         = Attributes (Background colour )

Number         = Attributes (Formatting of numbers)

Escape         = Attributes (Formatting of escape sequences)

String         = Attributes (Formatting of strings)

PreProcessor   = Attributes (Formatting of preprocessor directives)

StringPreProc  = Attributes (Formatting of strings within
                             preprocessor directives)

BlockComment   = Attributes (Formatting of block comments)

LineComment    = Attributes (Formatting of line comments)

LineNum        = Attributes (Formatting of line numbers)

Operator       = Attributes (Formatting of operators)

Keywords= {

AttributesN: Formatting of keyword group N. There should be at least four items
             to match the number of keyword groups defined in the language


01 Default        = { Colour="#000000" }
02 Canvas         = { Colour="#ffffff" }
03 Number         = { Colour="#000000" }
04 Escape         = { Colour="#bd8d8b" }
05 String         = { Colour="#bd8d8b" }
06 StringPreProc  = { Colour="#bd8d8b" }
07 BlockComment   = { Colour="#ac2020", Italic=true }
08 PreProcessor   = { Colour="#000000" }
09 LineNum        = { Colour="#555555" }
10 Operator       = { Colour="#000000" }
11 LineComment = BlockComment
13 Keywords = {
14   { Colour= "#9c20ee", Bold=true },
15   { Colour= "#208920" },
16   { Colour= "#0000ff" },
17   { Colour= "#000000" },
18 }

Keyword groups

You may define custom keyword groups and corresponding highlighting styles. This is useful if you want to highlight functions of a third party library, macros, constants etc.

You define a new group in two steps:

 1. Define a new group in your language definition (lang file):

    Keywords = {
      -- add your keyword description:
      {Id=5, List = {"ERROR", "DEBUG", "WARN"} }

 2. Add a corresponding highlighting style in your colour theme (theme file):

    Keywords= {
      --add your keyword style as 5th item in the list:
      { Colour= "#ff0000", Bold=true },

It is recommended to define keyword groups in user-defined plugin scripts to avoid editing of original highlight files.


The --plug-in option receives the name of a Lua script which can override and enhance the settings of theme and language definition files. Plug-ins make it possible to apply costum settings without editing installed highlight configuration files.
See Plug-Ins for file format and examples.

File mapping

The script filetypes.conf assigns file extensions and shebang descriptions to language definitions.


  {  Lang, Extensions|Shebang },

Lang:       String, name of language definition
Extensions: list of strings, contains file extensions referring to "Lang"
Shebang:    String, Regular expression which matches the first line of the input

Edit the file gui_files/ext/fileopenfilter.conf to add new syntax types to the file open filter of the GUI.

Config file search

Since release 3.14 the configuration scripts are searched in the following directories:

1. ~/.highlight/
2. user defined directory set with --data-dir (deprecated option)
3. /usr/share/highlight/
4. /etc/highlight/ (location of filetypes.conf)
5. current working directory (fallback)

These subdirectories are expected to contain the corresponding scripts:

-langDefs: *.lang
-themes: *.theme
-plugins: *.lua

A custom filetypes.conf may be placed directly in ~/.highlight/. This search order enables you to enhance the installed scripts without the need to copy all preinstalled files somewhere else.
As the --plug-in option of older releases accepted absolute paths only, the given plugin scripts will be searched in the directories above only if the absolute file path access fails.

Embedding highlight

Sample scripts

See the /examples subdirectory in the highlight source directory for some example scripts in PHP, Perl and Python which invoke highlight and retrieve its output as string. These scripts may be used to develop plug-ins for other applications.

SWIG interface

A SWIG interface file is located in /examples/swig. See README_SWIG for installation instructions and the example scripts as programming reference.

Third party scripts and plug-ins

See the /examples/web_plugins subdirectory in the highlight installation for some plugins which integrate highlight in Wiki and blogging software:

Other uses of highlight can be found online:

Building and installing

Building dependencies

Highlight is known to compile with gcc and clang. It depends on Boost headers and Lua 5.x/LuaJit developer packages. The optional GUI depends on Qt5 developer packages.
Please see the makefile for further options.

Packaging example

See Packaging resources for Debian and Fedora packaging examples.

More information can be found in the Wiki.

Deutsche Dokumentation