Zend Form revisited

Posted in development on January 3rd, 2009 by admin

Some time ago I introduced declarative creation of Zend Form instances. But what should you do if you have two or more forms that have common parts? Yes, you can copy-paste form files but adding extension mechanism is much easier and much more elegant :)

You will need only a couple changes to your code:

function exclude_duplicates($var) {

    if( is_array($var) && count($var) > 0 ) {
        $cnt = count($var) - 1;
        if( array_values($var) === $var ) {
            return exclude_duplicates( $var[$cnt] );

        $keys   = array_keys($var);
        $values = array_map( 'exclude_duplicates', array_values($var));
        return array_combine($keys, $values);

    return $var;

class KB_Form extends Zend_Form {

        public function setOptions(array $options) {
        $result = array();

        if( isset($options['extends']) ) {
            $form_name = $options['extends'];

            $parent_options = KB_Form_Factory::getInstance()->getFormConfig($form_name);
            if( $parent_options == null ) {
                throw new KB_Exception('Cannot obtain form '.$form_name.' config.', KB_Exception::KB_FORM);
            $result = array_merge_recursive($parent_options->toArray(), $options);
            $result = array_combine(array_keys($result), array_map( 'exclude_duplicates', array_values($result) ) );
        } else {
            $result = $options;


        if (isset($result['bo_class'])) {
        return $this;
    public function addElement($element, $name = null, $options = null) {
        parent::addElement( $element, $name, $options );

        if( $options !== null && isset($options['bo_method'])) {
            $name = $e->getName();
            if( ! isset($this->_bo_methods[$name]) ) {
                $this->_bo_methods[$name] = $spec['bo_method'];

And now if you have form named ‘Search_Index’ you can derive your new form from it by adding one simple line:

; general form meta information
search.kanji.extends = "Search_Index"

You can override properties from parent form by simply adding them to your new file:

: override order
search.kanji.elements.submit.options.order = 6

That’s it for today…

Tags: ,

Porter2 and regexp kung-fu

Posted in development on December 23rd, 2008 by admin

To improve search quality I needed stemming algorithm. Porter2 seemed to be the best choice. However I realized that the only reference implementation exists is written on Snowball.

Now I’ll be throwing stones to Snowball. I really cannot get people who handcrafted this language. Its unreadability can be compared to perl, but the syntax and expression possibilities are really limited.

Can you tell me for sure what this piece of code is doing?

[substring] among (
'eed' 'eedly'
(R1 <-'ee')
'ed' 'edly' 'ing' 'ingly'
test gopast v delete
test substring among(
'at' 'bl' 'iz'
(<+ 'e')
'bb' 'dd' 'ff' 'gg' 'mm' 'nn' 'pp' 'rr' 'tt'
([next] delete)
'' (atmark p1 test shortv <+ 'e'

So I finally sat and implemented PHP5 version of this algorithm.

Read more »

Tags: ,

Zend Forms

Posted in development on December 4th, 2008 by admin

It’s being a while since my last Zend-related post. Now I’ll try to cover approach to Zend_Form usage. This component has really extensive functionality. You can use Zend_Form in a straightforward manner by creating form elements in your controller’s code:

protected function prepareForm() {
    $form = new Zend_Form();

    $query = new Zend_Form_Element_Text('query');

    // ...

    $submit = new Zend_Form_Element_Submit('submit');

    return $form;

Read more »

Tags: ,

VocabuLearn – is it really about vocabulary?

Posted in japanese on November 20th, 2008 by admin

Expanding your vocabulary is considered to be one of the most important, and I would say, difficult tasks in language acquisition. Especially if you’re living some place that is far enough from Japan.  So, finally you came across the product that offers you easy and fun way to extend your vocabulary

offering approximately 7500 nouns, adjectives and adverbs, expressions and verbs over the three levels.

I’m talking about VocabuLearn Japanese Complete

Well, if you decide to buy it please be well aware of what you’re buying. Essentially the list of chaotically aligned nouns, verbs and phrases. These words are pronounced in English and Japanese. That’s it.

No context, no usage examples, no topics, nothing… How I’m I supposed to remember 300 words in 10 minutes if everything you do is just read them one by one with 5 seconds interval. Moreover you hear annoying music in background. I cannot really grasp the idea of the guys who created that piece of … media.

Summary: if you want to spend $50 go and rent 10 Japanese DVD’s.


All Japanese All the Time

Posted in japanese on October 14th, 2008 by admin

Recently I came across great blog that can make a breakthrough in your mind if you really, I mean REALLY want to master Japanese. Please check it, you won’t regret.


BTW this site gave me an answer why I couldn’t really find Japanese movies or anime with Japanese subs regardless how hard I tried:

… the thinking in Japan’s movie industry has typically followed two distinct lines:

  1. Hearing-impaired people can go in the general direction of heck.
  2. Subtitles on foreign movies are not merely intended to repeat dialogue, but to convey, clarify and expound on dialogue — in other words, to pick up perceived slack in the audio translation

There are several hot discussions going on around his method, many people admire his way of learning the language, others are quite skeptical. But IMHO you should read it yourself, analyze it and then…

Do whatever works for you.

Tags: ,

Logo: “dragon book” allusion

Posted in kanjibox on October 14th, 2008 by admin

Some of you might know an amazing book by Alfred Aho & Ravi Sethi ‘Compilers: Principles, Techniques, and Tools‘ which is often referred as “dragon book“. I would say that many people treat kanji the same way. Therefore we’ll get pretty dragon as a logo for the site.

Toon Dragon - Modo Render (Draft).jpg

Tags: ,

Zend & Smarty – ステップニ

Posted in development on October 13th, 2008 by admin

In this post I will reveal the secret of really smart integration between Zend and Smarty :) Since I’m too lazy to write 10 similar functions inside the Smarty plugin, I decided to modify the Smarty compiler. And it worked well.

Making Smarty zend-aware

This step is simple – you’re just adding a function that allows you to call Zend View Helper using

class KB_SmartyZendAware extends Smarty {
    private $_zendView                 = null;
    private $_cfg;

    public function __construct() {
        $this->compiler_class = 'KB_SmartyZendAwareCompiler';

    public function setZendView(Zend_View_Abstract $view) {
        if( $view === null ) {
            throw new KB_Exception('Zend_View cannot be null.', KB_Exception::KB_SMARTY);
        $this->_zendView = $view;

    public function callZendViewHelper( $name, $method, $args ) {
        if( $this->_zendView === null || ! is_string($name) || strlen($name) == 0 ) {
            return '';
        $helper = $this->_zendView->getHelper($name);

        if( ! is_string($method) || strlen($method) == 0 ) {
            return call_user_func_array( array( $helper, $name ), $args);
        } else {
            return call_user_func_array( array( call_user_func( array($helper, $name) ), $method ), $args);

Read more »

Tags: , ,

Zend & Smarty – ステップワン

Posted in development on October 9th, 2008 by admin

It’s been a while since I updated the blog. But the things were pretty busy lately…
So finally the application’s skeleton is in place. It uses Zend + Smarty. I’m also done with
Tanaka corpus parsing. I can only say that APR is really easy to use.

Here I won’t explain in details how to integrate Zend View and Smarty. This is actually
pretty easy. You may refer a pretty old post of Ralf Eggert ‘Integrating Smarty with the Zend Framework‘ or read Quentin Zervaas’s ‘Practical Web 2.0 Applications with PHP‘.

After you nailed this most likely you’ll start thinking about something more elaborate, capable of supporting Zend_View helpers such as headScript, doctype etc. And most probably you will end up with a plugin class that maps Zend_View helpers to Smarty custom functions.

Mine code looked like this:

Read more »

Tags: , ,

Thinking on how-to learn (part 1)

Posted in japanese on September 25th, 2008 by admin

Kanji… List of kanji’s… For many people these are synonyms. And it’s quite natural for many Japanese learners to think about kanji’s as of long list of characters that should be indexed, graded and memorized. You will find lots of pre-cooked lists and most likely fall into the trap.

Flash card programs, paper flash cards, books like Heisig’s ‘Remembering the Kanji’, JLPT-based lists and 常用漢字 on top of it.

In my opinion, the worst problem with those pre-cooked lists is that beginners try use them somehow in their studies. One sees 常用漢字 list and thinks, ‘This list has grades and is arranged according to frequency of usage. So if I make flash cards and will be memorizing 5 kanji a day I will be able to learn them all in 13 months.’ Others go a little bit farther – they take into additional factors like time required to revisit already learned symbols and summer vacation. Even in this case this way of thinking leads only to frustration once you started your attempts to nail these characters down.

So, what many people don’t understand is that you have to be a real genius to memorize all 1945 characters absolutely without context. Even if you managed somehow to remember all the kanji’s from the list, you should be aware of the fact that they are not real words and you have no idea how to transform these pictograms into meaningful language primitives (I mean words, of course). You have no idea how to read them and how to use them.

Even if you put the fact that you cannot really use those ready-to-use kanji lists aside, what’s the usefulness of these kanji inventories? Let’s take 常用漢字 as an example. Why am I supposed to learn 「亜」but not the kanji for the word “who” (誰)? Have you ever seen, even once, 「アジア」 and 「アメリカ」 written as 「亜細亜」 and 「亜米利加」? Moreover this kanji goes FIRST in this list. Why can’t I find 「枕」 among these 1945 characters? Don’t you use pillows every single day in your life? But you definitely should know that 「斤」 means 1.32 lb.

So what’s the bottom line for this post? Throw all your flash cards? No, I’m not advocating for throwing your stuff out, I’m simply trying to say that we should always think of the list not as of goal but as of an aid.

Tags: ,

Saxon thrashes Altova…

Posted in development on September 22nd, 2008 by admin

This Sunday I was busy trying to optimize data load process. In fact I ended up by completely rewriting the stylesheets. During this process I had a chance to compare performance of 2 XSLT processors I use: Altova XSLT and Saxon-B. The results are nonpresumable.

What you can find below is not a real benchmark. I simply took an average execution time calculated after 3 test runs on some of the stylesheets I use.

  Saxon-B1 (compiled) Saxon-B2 Saxon-SA  AltovaXSLT 
Stylesheet #1 (input file: 45 Mb) 11.2183 11.156 11.484 108.031
Stylesheet #2 (input file: 1.5Mb) 3.296 3.171 4.484 153.671
Stylesheet #3 (input file: 70Mb) N/A 77.453 N/A ERR_OOM

[1] – Compiled stylesheet was used instead of raw XSLT. And these results lead us to the interesting conclusion: popular assumption that a product that compiles to bytecode will be necessarily faster than an interpreter is WRONG. I will try to cover this topic in my next posts.

[2] – Settings for all saxon runs are as following: -l:off -dtd:off -tree:tiny

[3] – All results are in seconds

Well, in some cases Saxon, which is pure Java, is up to 48 times slower than pure C++. Moreover Altova consumes enormous amounts of memory failing to process relatively small files (approximately 45 – 70 Mb) on a 32-bit machine, while Saxon uses around 300Mb regardless of the input file size.

So right now the dictionary is processed in 77 seconds and loaded into the DB in less than 5 seconds. Not bad I think…