View Full Version : Sytnax Highlighting

01-23-2004, 11:18 PM
I'm planning on writing a couple of functions to so customisable syntax highlighting for a number of languages (XHTML, CSS, PHP) that will output code that not only validates, but is neat.

i've got two angles i can approach this from:

A) A Wordlist in a predefined format that is updateable and read from the script.

B) A syntax checker.

eg for CSS i could either have:


Or i could say all text before a { is a selector, between { and : is a property, : to ; is value, and so on.

what do you guys recommend?

01-23-2004, 11:42 PM
The first one seems easier to code, simpler and more robust.

01-24-2004, 12:01 AM
For a decent syntax highlighting you need your application to be able to read and understand the syntax. In other words, you need to write a parser. Not impossible to do, there's much literature out there, but it takes much more time than to collect a wordlist and put that through some regular expressions.

EDIT: The wordlist approach will never be accurate. Option B is the only way to go.

01-24-2004, 03:39 PM
Hmm, yeah i was thinking that.

But then again if it only parses known tags then it'll work work like a spelling checker in a way, since if you spell a tag name wrong it wont be highlighted, might make this into a poll.

This is still in early Dev and i'm unlikely to even start the coding for about a month. So I have plenty of time to plan it out.

01-24-2004, 04:42 PM
I looked at something similar once and recall thinking that the tokenizer (http://www.php.net/tokenizer) functions may be of use ... like many of my `projects` it did not get that far ;)

01-25-2004, 05:26 AM
I wrote a couple of PHP syntax-highlighting scripts for XML/HTML and Javascript - if you'd like those? They're not perfect but they mostly work :)

01-25-2004, 02:59 PM
that'd be great for reference if you could.

can you send them to webmaster@readme.reosurce-locator.com?

01-25-2004, 11:03 PM
I tried to post it here but there's too much parsing going on - I can't get the regexes to come through unaltered.

I'll document it and post that later on.

01-25-2004, 11:20 PM
Use the "code" tag rather than the "php" tag. The latter tends to eat backslashes from regex code. Very annoying.

01-26-2004, 08:04 PM
Or upload the file.

02-17-2004, 02:08 AM
Sorry .. I completely forgot about this :o It's maybe too late .. but here's the info anyway.

The syntax files are attached to the next post (I tried code .. but it added mysterious spaces into some of the regexs..)

The XML syntax file can handle any element or attribute name, including empty elements, but all attribute values must be double quoted; it can only handle HTML comments that come all on one line. The Javascript syntax file handles reserved words, parenthisese, braces, single-quote strings and single line comments (starting with //).

The highlighting is done with CSS, and since the scripts aren't intelligent enough to differentiate things like instances of a reserved word in a comment or string, the visual output is tidied up with added descendent selectors, to ensure the highlighting is correct:

pre, em.codeComment, .codeString, .codeReserved, .codeParen, .codeBrace, .codeElement, .codeAttr {
font-family:"lucida console", "courier new", monotype;

pre {

.genericColumn .codeComment, .codeComment .codeReserved, .codeComment .codeString, .codeComment .codeString .codeReserved, .codeComment .codeParen {

.codeString, .codeString .codeReserved, .codeString .codeParen {

.codeReserved {

.codeParen {

.codeBrace {

.codeElement {

.codeAttr {

Finally, to actually use it, you put PHP here-markers inside a <pre> block, then @include the syntax file, with a fail-condition to output the unparsed code, like this:

<ul class="skipCode">
<li><a href="#fig0-after" tabindex="6">Skip code example</a></li>
<pre id="fig0">
$code = <<<endh
&amp;lt;ul id="udm" class="udm"&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href="/"&amp;gt;Home&amp;lt;/a&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href="/about/"&amp;gt;About&amp;lt;/a&amp;gt;
&amp;lt;li&amp;gt;&amp;lt;a href="/contact/"&amp;gt;Contact&amp;lt;/a&amp;gt;
if(!@include ("scripts/syntax-xml.php")) { echo($code); }

Make sure there's no trailing whitespace at the end of any line within the <pre>, otherwise mozilla loses the line-break. It's also not perfect with HTML attributes - sometimes they don't get highlighted, and I don't know why.

If you wanna see this in action have a look at my list-menu's user manual - like this page (http://www.udm4.com/manual/quickstart/) or this one (http://www.udm4.com/manual/development/js/)

02-17-2004, 02:18 AM
Here are the syntax files :)