miércoles, 16 de enero de 2013

Importing tags with Groovy

In this post I use a Groovy script to import custom tags into Magnolia CMS. The script saves the tags in the Data module. You can use them to categorize pages, images and documents.

Clients often ask me how to import an existing collection of tags into Magnolia CMS. Local governments use tags such as taxestransportation or schools to categorize their content. Travel websites use geographical terms like londonparis and bangkok to tag places to visit. Such vocabularies grow large over time. Creating them from scratch is a lot of work.

I wrote a Groovy script that imports tags from XML. The example below finds any Flickr tags that are related to the given tag london. You can customize the script to import your own tags. I am not sure why I had not used Groovy much before. Once I tried it I loved it! You will too when you see how simple it is to implement a tag importer.



Requirements
  • Magnolia CMS 4.5.7 with Groovy module. It's in the add-ons folder in the bundle.
  • Flickr API key. Register for an API key in order to call the Flickr API.

Creating a Groovy script
  1. In Magnolia CMS, go to Tools > Scripts and add a new Groovy script.
  2. Paste the following code into your script.
  3. Check the Is a script? box.
tag = 'london'
hm = MgnlContext.getHierarchyManager('data')

// Create a folder in the Data module
flickrTags = ContentUtil.createPath(hm, "/categorization/${tag}", new ItemType("dataFolder"))
hm.save()
itemType = new ItemType('category')

// Connect to Flickr REST API
URL url = new URL("http://api.flickr.com/services/rest/?method=flickr.tags.getRelated&api_key=5f26c50b6e67110809b117da0f2bb94f&tag=${tag}")

HttpURLConnection conn = (HttpURLConnection) url.openConnection()
    conn.setRequestMethod("GET")
    conn.setRequestProperty("Accept", "application/xml")

 if (conn.getResponseCode() != 200) {
   throw new RuntimeException("Failed : HTTP error code : "
   + conn.getResponseCode())
 }

// Parse the response XML 
rsp = new XmlSlurper().parse(conn.getInputStream())
rsp.tags.tag.each{
      content = ContentUtil.createPath(hm, "/categorization/${tag}/${it.text()}", itemType)
      content.name = it.text()
}

flickrTags.save()
conn.disconnect()
return "done"

Here's what the script does:
  1. Create a new folder in the Data module with the parent tag name.
  2. Connect to the Flickr REST API and request all tags related to the parent tag.
  3. Parse the resulting XML.
  4. Create categories under the previously created data folder.

Now the tags are available for categorizing content.



Bear in mind is that there is no error handling in this code to keep it short. This solution has been developed and tested with Magnolia 4.5.7

jueves, 25 de octubre de 2012

Shorten URLs in Magnolia CMS

Does your website have an ugly root node that you want to hide from the site URL? Or maybe you have a page deep down in the hierarchy that should have a short and easy-to-remember URL? Both of these tasks are easy to do with URI to repository mapping in Magnolia CMS.

Hiding a site root node

Suppose your site URL is mysite.com/demo-project. To remove the unnecessary /demo-project node map the URI to a particular branch of the site:
  1. Go to Site Definitions > /<your site>/mappings.
  2. Set the handlePrefix property to the path where content should be served from, in this case /demo-project.
  3. Set the URIPrefix to / (forward slash).


Now the ugly root node /demo-project is removed from the URL. Visitors can access the home page a clean URL mysite.com. Subpages such as the About page are at mysite.com/about.

Shortening a URL

To provide a short URL to a page that resides deep in the hierarchy:
  1. Go to Site Definitions > /<your site>/mappings.
  2. Set the handlePrefix property to the branch of the site where content should be served from, in this case /demo-project/service/contact.
  3. Set the URIPrefix to /contact.

Now visitors can get to the Contact page just by typing mysite.com/contact

Find more examples in URI to repository mapping.

NOTE: For Community Edition users, you can achieve this functionality with Apache
http://wiki.magnolia-cms.com/display/WIKI/Rewrite+URLs+with+Apache 

viernes, 20 de julio de 2012

Localized 404 errors

Have you ever needed custom 404 error pages for your Magnolia CMS website? Such a requirement is quite straightforward to implement if the site uses only one language. It gets more complicated when multiple languages are used, each with its own domain such as mysite.de and mysite.fr. In such a case host based virtual URI mapping is useful as explained in How to set up a custom 404 handler.

But what if the languages are not host based? What if you don't have .de or .fr domains? The language of the page can be built into the URL such as mysite.com/de/page.html or mysite.com/fr/page.html. In this post I show how to direct the request to a custom 404 error page using a custom filter.

Tip! I started by investigating if I could configure the exceptions in the web.xml file. I thought the exception-type tag might be the answer. Turns out it doesn't work within the Magnolia CMS filter chain. In case you had the same idea I will save you the trouble of going that route.

The solution is to write a custom filter to handle HTTP errors. The custom filter will extend the default AggregatorFilter. Replace the default class with your custom class in the CMS filter chain in Configuration > /server/filters/cms/aggregator.


The class code should look like this:
public class CustomAggregatorFilter extends AggregatorFilter {
 private static final Logger log = LoggerFactory.getLogger(CustomAggregatorFilter.class);

 @Override
    public void doFilter(HttpServletRequest request, HttpServletResponse response, FilterChain chain) throws IOException, ServletException{

        boolean success;
        try {
            success = collect();
        }
        catch (AccessDeniedException e) {
            // don't throw further, simply return error and break filter chain
            log.debug(e.getMessage(), e);
            if (!response.isCommitted()) {
                response.setStatus(HttpServletResponse.SC_FORBIDDEN);
            }
            // stop the chain
            return;
        }
        catch (RepositoryException e) {
            log.error(e.getMessage(), e);
            throw new ServletException(e.getMessage(), e);
        }
        if (!success) {
            log.debug("Resource not found, redirecting request for [{}] to 404 URI", request.getRequestURI());

            if (!response.isCommitted()) {
             if(MgnlContext.getAggregationState().getLocale().getLanguage().equals("en")){
              RequestDispatchUtil.dispatch("permanent:/en.html", request, response);
             } else {
              RequestDispatchUtil.dispatch("permanent:/de.html", request, response);
             }
            }
            else {
                log.info("Unable to redirect to 404 page, response is already committed. URI was {}", request.getRequestURI());
            }
            // stop the chain
            return;
        }
        chain.doFilter(request, response);
    } 
}

Note that the filter only reacts to 404 errors. It checks the language from the aggregation state. This is the same language as used in the URL, such as mysite.com/de/page.html. Based on the language the filter makes a permanent redirect to the German error page that is part of the website. On the error page you can display whatever content you like, send parameters, log statements and so on.

The filter creates a permanent (301) link. This is the SEO friendly way. Search engines will index the pages while you can still keep track of the errors by setting them in the response the way you decide.

This is just a demonstration. It could be improved by adding the errors and URLs in the filter configuration, for example.

Want to learn more? Register now for Magnolia Conference 2012!

martes, 19 de junio de 2012

How To Export Website Content To Excel

Here is the thing, I have a big report on my site that grew quite big, now is not easy to review it without scrolling up and down.

Well, how about exporting just that content to an Excel file where we can get a graphic of the data, or maybe sort it by specific column or do complex calculations? Here is a very fast way to produce such file.

First we need to give a distinct id to our table, and any other data we will need to produce the output we desire, i.e. title of the report, on our template file:

<h1 id="toExcelReportTitle">Sample Report Title</h1> <table id="toExcelTable"</table> ....

Then we will need a button that when on clicked it will send a request to a servlet that will do the export.



The JQuery is used to load the HTML of the report table and the data of the report title into the parameters in the form. This way when we click on the save button, it will send this parameters to the servlet we have previously created.

The servlet will the read this parameters and write them to the response. This response will automatically be rendered into Excel form by setting the Content Type to application/vnd.ms-excel.



And this is a sample of the resulting excel file:




By exporting the content this way, we don't need to request the table page again, we will just get the already rendered data and sent it to the servlet that will send the data back as an excel file.

martes, 12 de junio de 2012

Queries Logging

Want to know which queries are run and how log did they take individually?
What about enabling the log for jackrabbit? If we enalbe it, it will tell us something like this.

org.apache.jackrabbit.core.query.QueryImpl: executed in 0.00 s. (select * from mgnl:user where jcr:path = '/system/superuser' or jcr:path like '/system/%/superuser')

There are two ways, easiest and does not need restart of the server:
You have to go to menu Tools, logging and there set the value for the query class to DEBUG


Second way is to extend the log4j.xml file so it logs the queries, good thing about this is that you can make it write the queries in a new file, also, the change wont be lost when you restart the server, but, you will need to restart the server in order to see the new logging output.



Note that my log4j.xml is in webapp/WEB-INF/config/default/log4j.xml

miércoles, 30 de mayo de 2012

How to set session timeout of Tomcat within Eclipse

Well, this is not related directly with Magnolia but in any case I find it useful.

Usually to set the session timeout, you go to your Tomcat installation, conf folder, open web.xml file and change the default value:

<session-config> <session-timeout>30</session-timeout> </session-config> 

Within Eclipse, you will have to go to the workspace folder, servers, then inside the tomcat-config folder you will find the web.xml used by Eclipse.

 Hope this helps.

martes, 29 de mayo de 2012

Creating a PDF from Website Content

I recently had a requirement to convert website content to a PDF file. First I thought it was going to be a lot of coding. I would have to make every page component look the way it should on a printable document and generate a nice PDF file from the whole page.

I did not want to spend too much time. There must be an easier way. How else could the Save as PDF button in Safari do such a good job printing a page without knowing anything about my Web application?

Searching for a shortcut I came across the Flying Saucer Project:
Flying Saucer is an XML/CSS renderer, which means it takes XML files as input, applies formatting and styling using CSS, and generates a rendered representation of that XML as output. The output may go to a PDF file. -- Flying Saucer User's Guide
Very impressive work! Especially considering how much it can accomplish with very little code and a print style sheet.

Using the Flying Saucer library in a servlet to access the website anonymously worked OK. But when I tried authenticated access Magnolia CMS resources, everything stopped working. Fortunately, this was easily resolved with the callback class that the Flying Saucer Project offers. I must confess that I overlooked it first. The callback class not only allows you to decide how to access the website resources but also lets you decide how to render the images and styles.

The rest of the work such as which elements to display, where to display them, the page size and page breaks are specified in the style sheet using the W3C Paged Media syntax.



See linked document on how would the document look like by printing the About section of the demo project with all pages in the same PDF document.

For this example, I used the standard print.css that comes with the demo-project theme-pop, adding just a style for the images so it does not make a page break in the middle of the image:

img { page-break-inside: avoid; }

Many thanks to the Flying Saucer Project developers for providing this amazing utility!