September 16th, 2008

Yesterday I finished the first version of fancy XSLT 2.0 stylesheet that transforms JMDict into insert statements; today I tried to run it on a 70Mb file. Results are as following:

  • AltovaXML – allocated 1.5Gb or RAM and died after 30 minutes.
  • MSXSL – crashed after 15 minutes.
  • Saxon SA – died immediately with class cast exception.
  • Saxon-B – gobbled 300Mb of RAM and successfully finished processing in 218 minutes.

15 minutes of work and template was processed by Saxon in about an hour. Still not good. The reason is: enormous amount of cross references that require iterating over the whole document to find exact match.

By the time I got there I buried the idea to load data using inserts. We’ll use bulk load. So tomorrow another optimization round is expected. However out of curiosity I tried to import ‘insert-like’ data. It took approximately 6 hours :)


And the war began…

September 8th, 2008

Several years passed since I was first engaged with Japanese. Our relationship looked like tide due to permanent crunch time on my projects, business trips and thousands of other things that normally fill your life like sand fills the spaces between rocks and pebbles. However I was constantly finding myself with the textbook during the most inappropriate times. I simply couldn’t resist…

Quite recently the idea of how should I proceed in order to master the language crystallized in my head. And led me to the registration of this domain and couple of sleepless nights when I was working on database structure.

The site is empty at the moment. But you’ll see the results soon…