Kanji XML

Download Project Page Bugs Forum Subprojects

The Kanji XML project started as representing Jim Breen's popular KanjiDic file in an XML format. Kanji are the Japanese characters or ideographs, most of which were derived from Chinese writing.

I have created an XML Schema to represent his original structure and since modified it to give it the ability to contain much more data. The original file format was in EUC-JP encoding and line based which required a complicated parsing routine, and many people may not be able to use the file if the encoding is not supported on their platform. In order to allow people to start using this project I have provided (currently Java) data structures to hold all of the information contained in the XML file. These structures use XML data binding so they are actually generated from the XSD file and will read and write to/from a valid KanjiXML file.

All the tools used to generate the KanjiXML file from the original are included in the package, this code is written so that it does not require the EUC-JP encoding, since many Java JREs do not include this. The XML file that is produced is typically in Unicode, although other formats could be supported.

The file currently coantins a wealth of index number from various books on learning kanji, dictionaries, and encodings for 6355 kanji characters. It also contains Chinese, Korean readings and Japanese readings in native and roman alphabets and English Meanings.

I am currently working on and would like to find other contributors to add even more indices, and meanings in other languages. I have found and am working on Portugese at this time, I might have located German translations also. Also I would like to convert the Korean readings to the native alphabets.

Then the addition of the rest of the Kanji in the Unicode character set.

For additional information please contact the project admin, Duane J. May.

Here is the current Javadoc

For more information see the current Release Notes

Licenses and Credits

Data

Most of the data in the KanjiXML file are protected by the KANJIDIC LICENCE STATEMENT AND COPYRIGHT NOTICE
A file describing the kanjidic file format is available here.

The Kask indices were entered by Duane May and Miho Ogawa May. The Tohsaku(Yookoso!) indices were entered by Duane May.

The SJIS data in the KanjiXML file are protected by the Unicode Consortium

Code

The Java Classes were written by Duane May and are protected by the General Public License.

Resources

Thanks to Sourceforge for hosting this project.