Porter2 and regexp kung-fu

Posted in development on December 23rd, 2008 by admin

To improve search quality I needed stemming algorithm. Porter2 seemed to be the best choice. However I realized that the only reference implementation exists is written on Snowball.

Now I’ll be throwing stones to Snowball. I really cannot get people who handcrafted this language. Its unreadability can be compared to perl, but the syntax and expression possibilities are really limited.

Can you tell me for sure what this piece of code is doing?


[substring] among (
'eed' 'eedly'
(R1 <-'ee')
'ed' 'edly' 'ing' 'ingly'
(
test gopast v delete
test substring among(
'at' 'bl' 'iz'
(<+ 'e')
'bb' 'dd' 'ff' 'gg' 'mm' 'nn' 'pp' 'rr' 'tt'
([next] delete)
'' (atmark p1 test shortv <+ 'e'
)
)
)

So I finally sat and implemented PHP5 version of this algorithm.

Read more »

Tags: ,

Zend Forms

Posted in development on December 4th, 2008 by admin

It’s being a while since my last Zend-related post. Now I’ll try to cover approach to Zend_Form usage. This component has really extensive functionality. You can use Zend_Form in a straightforward manner by creating form elements in your controller’s code:


protected function prepareForm() {
    $form = new Zend_Form();

    $query = new Zend_Form_Element_Text('query');
    $form->addElement($query);

    // ...

    $submit = new Zend_Form_Element_Submit('submit');
    $submit->setLabel('Search');
    $form->addElement($submit);

    return $form;
}

Read more »

Tags: ,