viernes, 18 de octubre de 2013

JCR search text ignoring umlauts, accents

Problem.

Default Magnolia installation demo-project site, comes with a search box on the top right corner


If you try to search for something like résumé you would expect to get:

But instead you get:




Solution.

1. Configure tomcat encoding.


In order to be able to look for text with special characters, first thing you need to do is configure tomcat encoding. The default Tomcat encoding is ISO-8859-1, this encoding does not support some special characters, that means when rendering the text the accents, umlauts won't be displayed.

This link explains how to add the UTF-8 encoding into your Tomcat configuration:
http://struts.apache.org/release/2.0.x/docs/how-to-support-utf-8-uriencoding-with-tomcat.html

2. Create a new analyzer (optional).


You would probably like that when people makes a spelling mistake by missing the accents for example, that they get the results nevertheless, search for "camion" should return results with either "camion" or "camión".

Have a look at the following link, it explains step by step how to do it: http://docs.jboss.org/exojcr/1.12.13-GA/developer/en-US/html/ch-jcr-query-usecases.html#JCR.IgnoreAccentSymbols