Got data?

September 16th, 2008

Yesterday I finished the first version of fancy XSLT 2.0 stylesheet that transforms JMDict into insert statements; today I tried to run it on a 70Mb file. Results are as following:

  • AltovaXML – allocated 1.5Gb or RAM and died after 30 minutes.
  • MSXSL – crashed after 15 minutes.
  • Saxon SA – died immediately with class cast exception.
  • Saxon-B – gobbled 300Mb of RAM and successfully finished processing in 218 minutes.

15 minutes of work and template was processed by Saxon in about an hour. Still not good. The reason is: enormous amount of cross references that require iterating over the whole document to find exact match.

By the time I got there I buried the idea to load data using inserts. We’ll use bulk load. So tomorrow another optimization round is expected. However out of curiosity I tried to import ‘insert-like’ data. It took approximately 6 hours :)