HRC Language Reference

 20 February 2003

This version:
take5.alpha3: 20 February 2003
Previous versions:
take5.alpha2: 30 January 2003
Author:
Cail Lomecb (Igor Ruskih) <ruiv@uic.nnov.ru><cail@nm.ru>

Abstract

This reference defines syntax and semantic of HRC language, used in Colorer library to represent and describe syntax and lexical structure of target programming language. This description is used by library to parse and colorize text in editors or other systems.


Table of Contents

1. Introduction
2. Core Syntax
2.1. File Types
2.2. Schemas
2.3. Namespaces
3. Scheme syntax
3.1. Scheme boundaries
3.2. Keyword lists
3.3. Regular Expressions
3.4. Blocked context switch
4. Inter-scheme links
4.1. Inheritance
4.2. Schemes substitutions

Appendixes

A. Regular Expressions syntax
1. Introduction
2. Syntax
3. Metacharacters
4. Extended metacharacter
5. Operators
6. Extended operators
7. Examples
B. HRC Coding Recommendations
C. XML Schema for HRC Language
References

1. Introduction

2. Core Syntax

2.1. File Types

2.2. Schemas

2.3. Namespaces

3. Scheme syntax

3.1. Scheme boundaries

3.2. Keyword lists

3.3. Regular Expressions

3.4. Blocked context switch

4. Inter-scheme links

4.1. Inheritance

4.2. Schemes substitutions

A. Regular Expressions syntax

1. Introduction

All work of the Colorer library is based on the regular expressions (RE) usage. They are allow you to create universal syntax rules of highlighting in HRC. Here you'll find description of regular expressions syntax, used in HRC Language. I'm assume, that you know what it is - and why and how they are used. At first I'll describe regexp's syntax, and after I shall try to help you to understand them. Certainly you can read some other documents - perl regexps (man perlre) or something else...

Regular expressions consists of the set of characters. Some of these are simple, but some - special metacharacters. All metacharacters (escapes) are divided into three categories: first - zerolength (words boundaries and so on); second - class metacharacters (\w, \s .); and the third class is an operators. Regular expressions operators could be applied to single character, to block, enwrapped in brackets, and to other operators. You can use brackets to group any sequence of characters. Regular expressions in HRC Language are like Perl regexp in their base syntax. There are some differences in extended operators.

2. Syntax

All regexps must be in slashes /.../ After the end slash there could be a parameters:

  • i - ignore symbols case
  • x - ignore direct spaces and crlf (for comfort)
  • s - suppose, that regexp is single line - it means, than '.' class should include \r\n symbols.
Each symbol in RE is linearly compared with the target string. Everything, that don't looks like metacharacters means simple character.

3. Metacharacters

Table A.1. Metacharacters

^Match the beginning of the line
$Match the end of the line
.Match any character (except \r\n)
[...]Match characters in set
[^...]Match characters not in set. Here all the operators are disabled, but you can use other metacharacters, and range operator: a-z means all chars from first to second (a - z)
\#Next symbol '#' after slash (except a-z and 1-9)
\bStart of word
\BEnd of word
\xNNNN - ASCII char (hex)
\n0x10 (lf)
\r0x13 (cr)
\t0x09 (tab)
\stab/space/cr/lf
\SNon-space
\wWord symbol (chars, digits, _)
\WNon-word symbol
\dDigit
\DNon-Digit
\uUppercase symbol
\lLowercase symbol

4. Extended metacharacter

These metacharacters are incompatible with Perl

Table A.2. Extended Metacharacters

\cmeans 'non-word before'
\NLink inside of regexp to one of its brackets. N - needed brackets pair. This operator works only with non-operator symbols in a bracket.

And these could be disabled during compilaton as highlight-dependent

Table A.3. Extended Metacharacters

~matches for start of parent scheme (end of start start).
\mChange start of regexp
\MChange end of regexp
\yNLink to the external regexp (in End to the Start param). N - requeried brackets pair.

5. Operators

Operators couldn't be used without some preceding character sequence. Each operator have to apply to the appropriate character, metacharacter, or block of their combination (brackets).

Table A.4. Operators

( )Group and remember characters to form one pattern.
|Match previous or next pattern.
*Match previous pattern 0 or more times.
+Match previous pattern 1 or more times.
?Match previous pattern 0 or 1 times.
{n}Repeat n times.
{n,}Repeat n or more times.
{n,m}Repeat from n to m times.

If you'll add ? after operator, it becomes nongreedy. For example * operator becomes nongreedy if placing *? Greedy operators try to take as much in string, as they can. NonGreedy take by minimum.

6. Extended operators

Table A.5. Extended Operators

?#NLook-behind. N - symbols number.
?~NInverted Look-behind.
?=Look-ahead.
?!Inverted Look-ahead.

Note, that two last operators exist in Perl - in form of (?=foobar). But colorer uses syntax (foobar)?=

7. Examples

Example A.1. RE examples

/foobar/

will match "foobar", "foobar barfoo"

/ FOO bar /ix

will match "foobar" "FOOBAR" "foobar and two other foos"

/(foo)?bar/

will match "foobar", "bar"

/^foobar$/

will match _only_ with "foobar"

/([\d\.])+/

will match any number

/(foo|bar)+/

will match "foofoofoobarfoobar", "bar"

/f[obar]+r/

will match "foobar", "for", "far"

B. HRC Coding Recommendations

Пока хочу сказать, что нужно править всю базу на предмет появившихся пространств имен. Так как раньше ничего этого не было, каждая схема именовалась со своим префиксом. И теперь это все надо убирать нафиг. Иными словами, в описании каждого типа файлов схемы нужно переименовывать, убирать префиксы и называть их нормальными именами. То же самое относится к использованию сущностей и определений регионов.

По возможности не надо использовать функции импорта других типов, потому как это уменьшает читабельность. Исключением может быть только тип def, и случаи, когда очень много ссылок на объекты других типов. На много понятней выглядит scheme='c:StringCore' чем просто scheme='StringCore'.

changes:

Возможно, <import> я вообще уберу.

C. XML Schema for HRC Language

<schema targetNamespace="http://colorer.sf.net/2003/hrc" elementFormDefault="qualified"
  xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <simpleType name="REstring">
    <restriction base="xs:string">
      <whiteSpace value="collapse"/>
      <pattern value="/.*/[ix]*"/>
    </restriction>
  </simpleType>
  <simpleType name="REstring-or-null">
    <union memberTypes="REstring">
      <simpleType>
        <restriction base="xs:string">
          <enumeration value=""/>
        </restriction>
      </simpleType>
    </union>
  </simpleType>
  <simpleType name="QName">
    <restriction base="xs:string"/>
  </simpleType>
  <attributeGroup name="regionX">
    <attribute name="region" type="QName"/>
    <attribute name="region0" type="QName"/>
    <attribute name="region1" type="QName"/>
    <attribute name="region2" type="QName"/>
    <attribute name="region3" type="QName"/>
    <attribute name="region4" type="QName"/>
    <attribute name="region5" type="QName"/>
    <attribute name="region6" type="QName"/>
    <attribute name="region7" type="QName"/>
    <attribute name="region8" type="QName"/>
    <attribute name="region9" type="QName"/>
    <attribute name="regiona" type="QName"/>
    <attribute name="regionb" type="QName"/>
    <attribute name="regionc" type="QName"/>
    <attribute name="regiond" type="QName"/>
    <attribute name="regione" type="QName"/>
    <attribute name="regionf" type="QName"/>
  </attributeGroup>
  <element name="hrc">
    <complexType>
      <sequence>
        <element name="annotation" type="annotation" minOccurs="0"/>
        <element name="prototype" type="prototype" minOccurs="0" maxOccurs="unbounded"/>
        <element name="type" type="filetype" minOccurs="0" maxOccurs="unbounded"/>
      </sequence>
      <attribute name="version" type="xs:NMTOKEN" use="required"/>
    </complexType>
  </element>
  <complexType name="annotation">
    <choice minOccurs="0" maxOccurs="unbounded">
      <element name="appinfo">
        <complexType mixed="true">
          <sequence minOccurs="0" maxOccurs="unbounded">
            <any processContents="lax"/>
          </sequence>
        </complexType>
      </element>
      <element name="documentation">
        <complexType mixed="true">
          <choice minOccurs="0" maxOccurs="unbounded">
            <element name="contributors" type="xs:string"/>
            <any namespace="##other" processContents="skip"/>
          </choice>
        </complexType>
      </element>
    </choice>
  </complexType>
  <complexType name="prototype">
    <sequence>
      <element name="annotation" type="annotation" minOccurs="0"/>
      <element name="location" minOccurs="0">
        <complexType>
          <attribute name="link" type="xs:anyURI" use="required"/>
        </complexType>
      </element>
      <element name="filename" type="filename" minOccurs="0" maxOccurs="unbounded"/>
      <element name="firstline" type="firstline" minOccurs="0" maxOccurs="unbounded"/>
      <element name="parameters" minOccurs="0">
        <complexType>
          <sequence minOccurs="0" maxOccurs="unbounded">
            <element name="param">
              <complexType>
                <attribute name="name" type="xs:string" use="required"/>
                <attribute name="value" type="xs:string" use="required"/>
              </complexType>
            </element>
          </sequence>
        </complexType>
      </element>
    </sequence>
    <attribute name="name" type="xs:NCName" use="required"/>
    <attribute name="group" type="xs:Name"/>
    <attribute name="description" type="xs:string" use="required"/>
    <attribute name="targetNamespace" type="xs:anyURI"/>
  </complexType>
  <complexType name="firstline">
    <simpleContent>
      <extension base="REstring">
        <attribute name="weight" type="xs:decimal" default="1"/>
      </extension>
    </simpleContent>
  </complexType>
  <complexType name="filename">
    <simpleContent>
      <extension base="REstring">
        <attribute name="weight" type="xs:decimal" default="2"/>
      </extension>
    </simpleContent>
  </complexType>
  <complexType name="filetype">
    <choice minOccurs="0" maxOccurs="unbounded">
      <element name="annotation" type="annotation"/>
      <element name="import" type="import"/>
      <element name="region" type="region"/>
      <element name="entity" type="entity"/>
      <element name="scheme" type="scheme"/>
    </choice>
    <attribute name="name" type="xs:NCName" use="required"/>
    <attribute name="access" type="access" default="private"/>
  </complexType>
  <simpleType name="access">
    <annotation>
      <documentation>
    Deprecated???
    </documentation>
    </annotation>
    <restriction base="xs:string">
      <enumeration value="public"/>
      <enumeration value="private"/>
    </restriction>
  </simpleType>
  <complexType name="scheme">
    <sequence>
      <element name="annotation" type="annotation" minOccurs="0"/>
      <choice minOccurs="0" maxOccurs="unbounded">
        <element name="inherit" type="inherit"/>
        <element name="regexp" type="regexp"/>
        <element name="block" type="block"/>
        <element name="keywords" type="keywords"/>
      </choice>
    </sequence>
    <attribute name="name" type="xs:NCName" use="required"/>
    <attribute name="access" type="access" default="private"/>
  </complexType>
  <complexType name="import">
    <attribute name="type" type="xs:NCName" use="required"/>
  </complexType>
  <complexType name="entity">
    <attribute name="name" type="xs:NCName" use="required"/>
    <attribute name="value" type="xs:string" use="required"/>
  </complexType>
  <complexType name="region">
    <attribute name="name" type="xs:NCName" use="required"/>
    <attribute name="parent" type="QName"/>
    <attribute name="description" type="xs:string"/>
  </complexType>
  <complexType name="regexp">
    <simpleContent>
      <extension base="REstring-or-null">
        <attribute name="match" type="REstring"/>
        <attribute name="priority" type="priority" default="normal"/>
        <attributeGroup ref="regionX"/>
      </extension>
    </simpleContent>
  </complexType>
  <simpleType name="priority">
    <restriction base="xs:string">
      <enumeration value="low"/>
      <enumeration value="normal"/>
    </restriction>
  </simpleType>
  <complexType name="block">
    <sequence minOccurs="0">
      <element name="start" type="blockInner"/>
      <element name="end" type="blockInner"/>
    </sequence>
    <attribute name="start" type="REstring"/>
    <attribute name="end" type="REstring"/>
    <attribute name="scheme" type="QName" use="required"/>
    <attribute name="priority" type="priority" default="normal"/>
    <attribute name="content-priority" type="priority" default="normal"/>
    <attribute name="region" type="QName"/>
    <attribute name="region00" type="QName"/>
    <attribute name="region01" type="QName"/>
    <attribute name="region02" type="QName"/>
    <attribute name="region03" type="QName"/>
    <attribute name="region04" type="QName"/>
    <attribute name="region05" type="QName"/>
    <attribute name="region06" type="QName"/>
    <attribute name="region07" type="QName"/>
    <attribute name="region08" type="QName"/>
    <attribute name="region09" type="QName"/>
    <attribute name="region0a" type="QName"/>
    <attribute name="region0b" type="QName"/>
    <attribute name="region0c" type="QName"/>
    <attribute name="region0d" type="QName"/>
    <attribute name="region0e" type="QName"/>
    <attribute name="region0f" type="QName"/>
    <attribute name="region10" type="QName"/>
    <attribute name="region11" type="QName"/>
    <attribute name="region12" type="QName"/>
    <attribute name="region13" type="QName"/>
    <attribute name="region14" type="QName"/>
    <attribute name="region15" type="QName"/>
    <attribute name="region16" type="QName"/>
    <attribute name="region17" type="QName"/>
    <attribute name="region18" type="QName"/>
    <attribute name="region19" type="QName"/>
    <attribute name="region1a" type="QName"/>
    <attribute name="region1b" type="QName"/>
    <attribute name="region1c" type="QName"/>
    <attribute name="region1d" type="QName"/>
    <attribute name="region1e" type="QName"/>
    <attribute name="region1f" type="QName"/>
  </complexType>
  <complexType name="blockInner">
    <simpleContent>
      <extension base="REstring">
        <attributeGroup ref="regionX"/>
      </extension>
    </simpleContent>
  </complexType>
  <complexType name="inherit">
    <sequence>
      <element name="virtual" type="virtual" minOccurs="0" maxOccurs="unbounded"/>
    </sequence>
    <attribute name="scheme" type="QName" use="required"/>
  </complexType>
  <complexType name="virtual">
    <attribute name="scheme" type="QName" use="required"/>
    <attribute name="subst-scheme" type="QName" use="required"/>
  </complexType>
  <complexType name="keywords">
    <choice minOccurs="0" maxOccurs="unbounded">
      <element name="word" type="word"/>
      <element name="symb" type="symb"/>
    </choice>
    <attribute name="ignorecase" default="yes">
      <simpleType>
        <restriction base="xs:string">
          <enumeration value="yes"/>
          <enumeration value="no"/>
        </restriction>
      </simpleType>
    </attribute>
    <attribute name="region" type="QName"/>
    <attribute name="priority" type="priority" default="low"/>
    <attribute name="worddiv" type="xs:string"/>
  </complexType>
  <complexType name="symb">
    <attribute name="name" type="xs:string" use="required"/>
    <attribute name="region" type="QName"/>
  </complexType>
  <complexType name="word">
    <attribute name="name" type="xs:string" use="required"/>
    <attribute name="region" type="QName"/>
  </complexType>
</schema>

References

[XML 1.0] Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, Eve Maler, editors. Extensible Markup Language (XML) 1.0 Second Edition. W3C (World Wide Web Consortium), 2000.