Copyright © 2003 Cail Lomecb
This reference defines syntax and semantic of HRC language, used in Colorer library to represent and describe syntax and lexical structure of target programming language. This description is used by library to parse and colorize text in editors or other systems.
All work of the Colorer library is based on the regular expressions (RE) usage. They are allow you to create universal syntax rules of highlighting in HRC. Here you'll find description of regular expressions syntax, used in HRC Language. I'm assume, that you know what it is - and why and how they are used. At first I'll describe regexp's syntax, and after I shall try to help you to understand them. Certainly you can read some other documents - perl regexps (man perlre) or something else...
Regular expressions consists of the set of characters. Some of these are simple, but some - special metacharacters. All metacharacters (escapes) are divided into three categories: first - zerolength (words boundaries and so on); second - class metacharacters (\w, \s .); and the third class is an operators. Regular expressions operators could be applied to single character, to block, enwrapped in brackets, and to other operators. You can use brackets to group any sequence of characters. Regular expressions in HRC Language are like Perl regexp in their base syntax. There are some differences in extended operators.
All regexps must be in slashes /.../ After the end slash there could be a parameters:
Table A.1. Metacharacters
^ | Match the beginning of the line |
$ | Match the end of the line |
. | Match any character (except \r\n) |
[...] | Match characters in set |
[^...] | Match characters not in set. Here all the operators are disabled, but you can use other metacharacters, and range operator: a-z means all chars from first to second (a - z) |
\# | Next symbol '#' after slash (except a-z and 1-9) |
\b | Start of word |
\B | End of word |
\xNN | NN - ASCII char (hex) |
\n | 0x10 (lf) |
\r | 0x13 (cr) |
\t | 0x09 (tab) |
\s | tab/space/cr/lf |
\S | Non-space |
\w | Word symbol (chars, digits, _) |
\W | Non-word symbol |
\d | Digit |
\D | Non-Digit |
\u | Uppercase symbol |
\l | Lowercase symbol |
These metacharacters are incompatible with Perl
Table A.2. Extended Metacharacters
\c | means 'non-word before' |
\N | Link inside of regexp to one of its brackets. N - needed brackets pair. This operator works only with non-operator symbols in a bracket. |
And these could be disabled during compilaton as highlight-dependent
Operators couldn't be used without some preceding character sequence. Each operator have to apply to the appropriate character, metacharacter, or block of their combination (brackets).
Table A.4. Operators
( ) | Group and remember characters to form one pattern. |
| | Match previous or next pattern. |
* | Match previous pattern 0 or more times. |
+ | Match previous pattern 1 or more times. |
? | Match previous pattern 0 or 1 times. |
{n} | Repeat n times. |
{n,} | Repeat n or more times. |
{n,m} | Repeat from n to m times. |
If you'll add ? after operator, it becomes nongreedy. For example * operator becomes nongreedy if placing *? Greedy operators try to take as much in string, as they can. NonGreedy take by minimum.
Table A.5. Extended Operators
?#N | Look-behind. N - symbols number. |
?~N | Inverted Look-behind. |
?= | Look-ahead. |
?! | Inverted Look-ahead. |
Note, that two last operators exist in Perl - in form of (?=foobar). But colorer uses syntax (foobar)?=
Example A.1. RE examples
will match "foobar", "foobar barfoo"
will match "foobar" "FOOBAR" "foobar and two other foos"
will match "foobar", "bar"
will match _only_ with "foobar"
will match any number
will match "foofoofoobarfoobar", "bar"
will match "foobar", "for", "far"
Пока хочу сказать, что нужно править всю базу на предмет появившихся пространств имен. Так как раньше ничего этого не было, каждая схема именовалась со своим префиксом. И теперь это все надо убирать нафиг. Иными словами, в описании каждого типа файлов схемы нужно переименовывать, убирать префиксы и называть их нормальными именами. То же самое относится к использованию сущностей и определений регионов.
По возможности не надо использовать функции импорта других типов, потому как это уменьшает читабельность. Исключением может быть только тип def, и случаи, когда очень много ссылок на объекты других типов. На много понятней выглядит scheme='c:StringCore' чем просто scheme='StringCore'.
Возможно, <import> я вообще уберу.
<schema targetNamespace="http://colorer.sf.net/2003/hrc" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <simpleType name="REstring"> <restriction base="xs:string"> <whiteSpace value="collapse"/> <pattern value="/.*/[ix]*"/> </restriction> </simpleType> <simpleType name="REstring-or-null"> <union memberTypes="REstring"> <simpleType> <restriction base="xs:string"> <enumeration value=""/> </restriction> </simpleType> </union> </simpleType> <simpleType name="QName"> <restriction base="xs:string"/> </simpleType> <attributeGroup name="regionX"> <attribute name="region" type="QName"/> <attribute name="region0" type="QName"/> <attribute name="region1" type="QName"/> <attribute name="region2" type="QName"/> <attribute name="region3" type="QName"/> <attribute name="region4" type="QName"/> <attribute name="region5" type="QName"/> <attribute name="region6" type="QName"/> <attribute name="region7" type="QName"/> <attribute name="region8" type="QName"/> <attribute name="region9" type="QName"/> <attribute name="regiona" type="QName"/> <attribute name="regionb" type="QName"/> <attribute name="regionc" type="QName"/> <attribute name="regiond" type="QName"/> <attribute name="regione" type="QName"/> <attribute name="regionf" type="QName"/> </attributeGroup> <element name="hrc"> <complexType> <sequence> <element name="annotation" type="annotation" minOccurs="0"/> <element name="prototype" type="prototype" minOccurs="0" maxOccurs="unbounded"/> <element name="type" type="filetype" minOccurs="0" maxOccurs="unbounded"/> </sequence> <attribute name="version" type="xs:NMTOKEN" use="required"/> </complexType> </element> <complexType name="annotation"> <choice minOccurs="0" maxOccurs="unbounded"> <element name="appinfo"> <complexType mixed="true"> <sequence minOccurs="0" maxOccurs="unbounded"> <any processContents="lax"/> </sequence> </complexType> </element> <element name="documentation"> <complexType mixed="true"> <choice minOccurs="0" maxOccurs="unbounded"> <element name="contributors" type="xs:string"/> <any namespace="##other" processContents="skip"/> </choice> </complexType> </element> </choice> </complexType> <complexType name="prototype"> <sequence> <element name="annotation" type="annotation" minOccurs="0"/> <element name="location" minOccurs="0"> <complexType> <attribute name="link" type="xs:anyURI" use="required"/> </complexType> </element> <element name="filename" type="filename" minOccurs="0" maxOccurs="unbounded"/> <element name="firstline" type="firstline" minOccurs="0" maxOccurs="unbounded"/> <element name="parameters" minOccurs="0"> <complexType> <sequence minOccurs="0" maxOccurs="unbounded"> <element name="param"> <complexType> <attribute name="name" type="xs:string" use="required"/> <attribute name="value" type="xs:string" use="required"/> </complexType> </element> </sequence> </complexType> </element> </sequence> <attribute name="name" type="xs:NCName" use="required"/> <attribute name="group" type="xs:Name"/> <attribute name="description" type="xs:string" use="required"/> <attribute name="targetNamespace" type="xs:anyURI"/> </complexType> <complexType name="firstline"> <simpleContent> <extension base="REstring"> <attribute name="weight" type="xs:decimal" default="1"/> </extension> </simpleContent> </complexType> <complexType name="filename"> <simpleContent> <extension base="REstring"> <attribute name="weight" type="xs:decimal" default="2"/> </extension> </simpleContent> </complexType> <complexType name="filetype"> <choice minOccurs="0" maxOccurs="unbounded"> <element name="annotation" type="annotation"/> <element name="import" type="import"/> <element name="region" type="region"/> <element name="entity" type="entity"/> <element name="scheme" type="scheme"/> </choice> <attribute name="name" type="xs:NCName" use="required"/> <attribute name="access" type="access" default="private"/> </complexType> <simpleType name="access"> <annotation> <documentation> Deprecated??? </documentation> </annotation> <restriction base="xs:string"> <enumeration value="public"/> <enumeration value="private"/> </restriction> </simpleType> <complexType name="scheme"> <sequence> <element name="annotation" type="annotation" minOccurs="0"/> <choice minOccurs="0" maxOccurs="unbounded"> <element name="inherit" type="inherit"/> <element name="regexp" type="regexp"/> <element name="block" type="block"/> <element name="keywords" type="keywords"/> </choice> </sequence> <attribute name="name" type="xs:NCName" use="required"/> <attribute name="access" type="access" default="private"/> </complexType> <complexType name="import"> <attribute name="type" type="xs:NCName" use="required"/> </complexType> <complexType name="entity"> <attribute name="name" type="xs:NCName" use="required"/> <attribute name="value" type="xs:string" use="required"/> </complexType> <complexType name="region"> <attribute name="name" type="xs:NCName" use="required"/> <attribute name="parent" type="QName"/> <attribute name="description" type="xs:string"/> </complexType> <complexType name="regexp"> <simpleContent> <extension base="REstring-or-null"> <attribute name="match" type="REstring"/> <attribute name="priority" type="priority" default="normal"/> <attributeGroup ref="regionX"/> </extension> </simpleContent> </complexType> <simpleType name="priority"> <restriction base="xs:string"> <enumeration value="low"/> <enumeration value="normal"/> </restriction> </simpleType> <complexType name="block"> <sequence minOccurs="0"> <element name="start" type="blockInner"/> <element name="end" type="blockInner"/> </sequence> <attribute name="start" type="REstring"/> <attribute name="end" type="REstring"/> <attribute name="scheme" type="QName" use="required"/> <attribute name="priority" type="priority" default="normal"/> <attribute name="content-priority" type="priority" default="normal"/> <attribute name="region" type="QName"/> <attribute name="region00" type="QName"/> <attribute name="region01" type="QName"/> <attribute name="region02" type="QName"/> <attribute name="region03" type="QName"/> <attribute name="region04" type="QName"/> <attribute name="region05" type="QName"/> <attribute name="region06" type="QName"/> <attribute name="region07" type="QName"/> <attribute name="region08" type="QName"/> <attribute name="region09" type="QName"/> <attribute name="region0a" type="QName"/> <attribute name="region0b" type="QName"/> <attribute name="region0c" type="QName"/> <attribute name="region0d" type="QName"/> <attribute name="region0e" type="QName"/> <attribute name="region0f" type="QName"/> <attribute name="region10" type="QName"/> <attribute name="region11" type="QName"/> <attribute name="region12" type="QName"/> <attribute name="region13" type="QName"/> <attribute name="region14" type="QName"/> <attribute name="region15" type="QName"/> <attribute name="region16" type="QName"/> <attribute name="region17" type="QName"/> <attribute name="region18" type="QName"/> <attribute name="region19" type="QName"/> <attribute name="region1a" type="QName"/> <attribute name="region1b" type="QName"/> <attribute name="region1c" type="QName"/> <attribute name="region1d" type="QName"/> <attribute name="region1e" type="QName"/> <attribute name="region1f" type="QName"/> </complexType> <complexType name="blockInner"> <simpleContent> <extension base="REstring"> <attributeGroup ref="regionX"/> </extension> </simpleContent> </complexType> <complexType name="inherit"> <sequence> <element name="virtual" type="virtual" minOccurs="0" maxOccurs="unbounded"/> </sequence> <attribute name="scheme" type="QName" use="required"/> </complexType> <complexType name="virtual"> <attribute name="scheme" type="QName" use="required"/> <attribute name="subst-scheme" type="QName" use="required"/> </complexType> <complexType name="keywords"> <choice minOccurs="0" maxOccurs="unbounded"> <element name="word" type="word"/> <element name="symb" type="symb"/> </choice> <attribute name="ignorecase" default="yes"> <simpleType> <restriction base="xs:string"> <enumeration value="yes"/> <enumeration value="no"/> </restriction> </simpleType> </attribute> <attribute name="region" type="QName"/> <attribute name="priority" type="priority" default="low"/> <attribute name="worddiv" type="xs:string"/> </complexType> <complexType name="symb"> <attribute name="name" type="xs:string" use="required"/> <attribute name="region" type="QName"/> </complexType> <complexType name="word"> <attribute name="name" type="xs:string" use="required"/> <attribute name="region" type="QName"/> </complexType> </schema>
[XML 1.0] Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, Eve Maler, editors. Extensible Markup Language (XML) 1.0 Second Edition. W3C (World Wide Web Consortium), 2000.