Sunday, June 29, 2008

JSyntaxPane with more Languages

Some more free time and I had time to fix Java and Groovy Lexers to have better String handling. The older versions did not process String Escape Characters properly.

I also added Property file and SQL lexers and did some refactoring by moving the lexers in their own package.

The Build file was modified to also build the Java lexers from the flex files (you need the JFlex.jar file in your ant/lib folder for this to work).

This may be the last version for a while. I may not have time in the next few weeks to manage this.

The source is now available in the Source Tab and only the binary distribution is available as a download. You can still load and execute the Jar file. It will start the tester.

Have fun!

Monday, June 23, 2008

JSyntaxPane new version - now with auto indentation and undo/redo

I had some more free time to work on JSyntaxPane. I did some refactoring, that should not change the way to use it though.
Refactoring was done by creating a new SyntaxDocument to store the Tokens, instead of having the SyntaxView store them. The former way was actually very stupid!
The Document should know about its own tokens, and the View gets the tokens from the Document. That is how it is done now. And this way you can also retrieve data about the SyntaxDocument by knowing its EditorPane control. And no need for listeners!
The new SyntaxDocument also has built in undo and redo.
A new class SyntaxActions was also created with several useful TextActions thrown in. These include Smart and Java Indentation, and mapping the TAB / Shift-TAB to indent / unindent the selected lines. Also mapped the default undo / redo keys to behave properly.
I also modified the Test application, and now it will display the token details under the cursor. This is very helpful for debugging the lexers.

Also changed is the way Fonts are used. Now the font of the Component is used, and you can only change the style (bold/italic) and color. Which means that the entire EditorPane will have one single font face and size. This actually suites my taste, unlike Notepad++ which by default uses different fonts and sizes for comments.

Again, all you need to do is one line to set the EditorKit. All the above is done for you automatically.

Have a look at the Test Application using Java WebStart here. Java 1.5+ is needed.

Project home is on Google Code here.

Hmm.. what next? Scripting? Nah.. I'm probably done, for now. I needed a simple control to edit scripts in my Java Swing applications, and thats probably it. unless...



Wednesday, June 18, 2008

JSyntaxPane is born

Based on demand (can't say popular, yet), I just created a Google Project for JSyntaxPane. The NetBeans project is located as the main download.

You'll find the source under the src folder of the archive.

Here is a break down of what the classes do and how you can use the library in your own projects:

The SyntaxTester is a NetBeans created Main application you can use to see and test how syntax highlighting works. Or you can just start the Jar file in the dist folder.

SyntaxKit: you need to set your JEditorPane control's editorKit to a new instance of this. Just pass the required language as a String to the SyntaxKit constructor:

jEdtTest.setEditorKit(new SyntaxKit("java"));

SyntaxView does all the ugly work of maintaining the Tokens List for the Document and drawing the highlighted document.

SyntaxStyle is used to store various data about the style to use for each TokenType.

And finally SyntaxStyles is just a map of Styles. It has one method to set a Graphics object with the needed Font and Color for a Token.

All the Lexers were created using JFlex and the sources for them are in the JFlex folder of the archive.

Have fun! and please let me know if you find it useful or for any feature requests.

Monday, June 16, 2008

Java Syntax Highlighting with JEditorPane

I've been playing around with implementing a Syntax Highlighting / Coloring Editor or Text control in Java Swing. Just for fun. It would be part of TranScope to edit scripts, mostly Groovy, and to view some TAL / DDL and XML. That was both hard, time consuming and a hell-of-a-lot of fun and rewarding experience. I'll summarize what I did, and if there is demand publish my final code as a project. Most of these topics where new to me before I started.

You may also want to check out JEdit Syntax Package. The project seems dead, but it works.

XML EditorKit:

I started out by reading, and then getting the source for the Batik XML Editor, more here.
It was okay, but just for XML, and it seemed quiet tight to XML, so modifying it for other languages was not easy. But it is a very good library to use by itself. And I used it until I wrote my XML Lexer.

Take One - Dynamic Regex:

I came across this link, which was really helpful in understanding what all this Views, Documents, EditorKits are all about. Please read that link as it is very helpful and to the point. There is also the Sun documentation about the Swing Text API here.

The code I created based on Kees was very simple. I used some regular expressions to get tokens from each line, and whenever a match is found, the Color for that regex is used to color the match. All what is needed is to put the regex and associated colors in a Map, load from a Properties file, and voila! Dynamic highlighting, without any code change and for any language.

It worked perfectly... Almost. There is no need to keep any extra data about the Document, and highlighting does not need to parse anything except the single line being drawn by the View's drawUnselectedText method. This means it is very fast and needs no extra memory. The only problem is that multi-line constructs will NOT work. So multi-line comments are not handled.

This is no big limitation at all in many cases.

Take Two - Lexing + StyledDocument:

Here is where the fun begans. To properly handle multi-line constructs, simple regex matches are not really usable, and very slow. What is needed is a parser or lexer. Java has many of these, including Antlr, JavaCC and JFlex.

I did some research and found JFlex to be the easiest to use for Lexing. Remember I only need to get Tokens and not create a compiler. JFlex was also very easy to use for in-memory characters (from the Document), and very fast. I did some benchmarks, on my work PC: 2GHz, 1Gb RAM with lots of programs running, including NetBeans. Parsing a 200K Document still takes less than 15ms in most cases, and no performance is noticed while typing.

I created my Lexer to return a Token object of this form:

public class Token implements Serializable, Comparable {
public TokenType type;
public int start;
public int length;

// other boilerplate code....

@Override
public int compareTo(Object o) {
Token t = (Token) o;
if (this.start != t.start) {
return (this.start - t.start);
} else if(this.length != t.length) {
return (this.length - t.length);
} else {
return this.type.compareTo(t.type);
}
}
}

TokenType is an enum with all possible token types (OPER, IDENT, KEYWORD, STRING, COMMENT etc.)

So, what I initially did is create a DocumentListener that updates a matching List of Tokens for the Document, whenever the Document is updated.

Whenever the Document is updated I just call the setCharacterAttributes for the all tokens depending on their type.

That worked perfectly. If you have just a few lines. It quickly became very slow for any documents with more than about 100 lines. It also consumed a LOT of memory. The main thing is that updating the styles of a StyledDocument was not designed for this purpose.

When you write code, say you are writing the keyword "public":
  1. type "p", and parse the whole document. p is lexed as an identifier and those attributes are set to it, and everything else.
  2. type "u", same thing, "pu" is still lexed as identifier.
  3. type "b"...
  4. type "l"...
  5. type "i"...
  6. type "c" and now you have a keyword, so you change the char attributes for the whole "public".
Changing attributes is VERY slow in such cases. lots of events are fired and the StyledDocument keeps track of a lot of data about the styles of each character. For a script, or program, you will have a separate style for almost every single word. So you will have a lot of data for even the shortest of scripts. The StyledDocument was not designed for this. It was designed for normal "English" text, where most of it is the same style, except for a header here or a bold word there.

I initially changed the implementation to only call setCharacterAttributes for the modified parts of the Document. This was done by a calculating a Delta of the old and new Token, and then only the changes were used to update the Styles. But the memory use was still too much. And when a big file is opened or pasted on the JEditorPane it took a a while to set all the attributes.

It worked, but I could do better... And I am still having fun, so why stop there?

Take Three - Lexing + PlainDocument:

The final solution is to Lex the entire document whenever it changes (which is very fast) and use a PlainView and PlainDocument implementation to render the text using the drawUnselectedText method.

The code now is structured like this:
class SyntaxKit extends StyledEditorKit implements ViewFactory:
This class is used by the JEditorPane to set the type of text it will show. In NetBeans, I change the EditorKit property to point use an instance of this class. The create method of this class returns an instance of the SyntaxView class below.

class SyntaxView extends PlainView implements DocumentListener
This is the heart of the code. This class maintains a List of Tokens that match the contents of the Document it is to render. It keeps itself in-sync with all document changes by registering itself as a DocumentListener. The insertUpdate and removeUpdate methods are overridden to re-parse, or Lex the entire Document and put the Tokens in the tokens List member of this class. I removed the logic of maintaining a delta. It is fast and less code to maintain. As I said, lexing was not a performance issue at all.

The drawUnselectedText method off this class is called to draw lines of text. This method looks at the tokens and draws them in the proper Fonts, and Colors.

One more thing done in this class is to override the updateDamage method. This is needed so that something like closing a multi-line comment updates not just the last line, but all lines in the view.

If anybody is interested, I'll either put the code on a Google Project or show parts of it here. The project is now tightly integrated with TranScope, but I can spin it off as a separate project and remove the dependencies. There are currently Lexers for Java, Groovy, JavaScript, XML and Tandem / HP NSK TAL. To create your own, you only need to create the Lexer file and run it throw JFlex.

Just Google it!