Kanji XML
This sub project is the current KanjiXML file and its description or XML Schema
file.
The XML file will be available in a few different types, depending on how
much time I have to do each seperate release.
-
a Human readable format, that means Newlines and spaces will
be added to make the XML pretty.
-
a Computer readable format, that means the entire XML file will be on one line
with no spaces between tags. This format will be much smaller then the human
readable format..
The file may be produced in the following Unicode encodings:
-
UTF-8, most of the file is in the ASCII range, so this file should be much
smaller then the full UTF-16 version. In this encoding ASCII characters take 1
byte, while non ASCII characters use multiple bytes.
-
UTF-16, in this encoding all characters take 2 bytes.
Java Kanji XML (JKanjiXML)
This sub project includes the Java Data structures to hold the KanjiXML data,
Classes to read the data from the original kanjidic file format, and the Classes to read and wrote to the XML format. The project also includes the tools used to add different indices to the XML file. These can be used as examples of how to use the Libraries in a program or used to create more programs to add data to the XML file.
Kanji XSLT
This sub project contains XSLT files to translate the XML file to a number of different formats. It will some day include a XSLT to translate the XML file back to the kanjidic file format. Since each person or program will require different fields these XSLT files will most likely be a good template to use in extracting the information you require.