Dynamically generate sitemap.xml

sitemap.xml is a top level document on your website “for webmasters to inform search engines about pages on their sites that are available for crawling.”  Google not surprisingly has its own documentation on how to improve your site’s visibility using sitemap.xml.

Typically sitemap.xml is a static file that is hand generated.  But on large sites it makes more sense to generate this dynamically.  One way to do this is to generate it on demand using a servlet.  Here is my simple solution.  I did not include the implementation for outputPages() since that will be specific to each application server’s DB hierarchy or web server’s file structure.

public class SiteMap extends HttpServlet {

  protected static final String MIME_TYPE_XML = "application/xml";

  // XML tags
  protected static final String SITE_MAP_XML_INFO = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
  protected static final String SITE_MAP_BEGIN =
      "<urlset\n\txmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\"\n\txmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n\txsi:schemaLocation=\"http://www.sitemaps.org/schemas/sitemap/0.9\n\t\thttp://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd\">";
  protected static final String SITE_MAP_END = "</urlset>";

  protected static final String LOC_BEGIN = " <loc>";
  protected static final String LOC_END = "</loc>";
  protected static final String PRIORITY_BEGIN = " <priority>";
  protected static final String PRIORITY_END = "</priority>";
  protected static final String URL_BEGIN = "<url>";
  protected static final String URL_END = "</url>";

  @Override
  public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException {

    // set content type to be XML
    response.setContentType(MIME_TYPE_XML);

    // get writer
    PrintWriter out = response.getWriter();

    // output header
    out.println(SITE_MAP_XML_INFO);
    out.println(SITE_MAP_BEGIN);

    // output pages
    outputPages(request, out);

    // output end
    out.println(SITE_MAP_END);
    out.close();
  }

  protected void outputPage(String uri, String priority, PrintWriter out, String urlStart) {
    out.println(URL_BEGIN);
    out.println(LOC_BEGIN + urlStart + uri + LOC_END);
    out.println(PRIORITY_BEGIN + priority + PRIORITY_END);
    out.println(URL_END);
  }
}

Then you configure web.xml to use the SiteMap servlet.

<servlet>
    <servlet-name>sitemap</servlet-name>
    <servlet-class>com.upromise.olm.app.servlet.SiteMap</servlet-class>
</servlet>

<servlet-mapping>
    <servlet-name>sitemap</servlet-name>
    <url-pattern>/sitemap.xml</url-pattern>
</servlet-mapping>

Removing a Cookie

To remove a cookie the Java API suggests getting the cookie, setting its maxAge to 0, and then adding that cookie to the response.  Digging around deeper I realized you also need to set the domain and the path to match the cookie’s domain and path.  Here is an example of how to do this.

    Cookie [] cookies = request.getCookies();
    for (Cookie cookie : cookies) {
      if (cookie.getName().equals(COOKIE_WE_WANT)) {
        cookie.setMaxAge(0);
        cookie.setDomain(".betweengo.com");
        cookie.setPath("/");
        response.addCookie(cookie);
        break;
      }
    }

Note that if the domain was not set when the cookie was created then you should not set it when you try to remove it. Similarly with the path property. For example if the domain was not set at creation then the code would look like this:

    Cookie [] cookies = request.getCookies();
    for (Cookie cookie : cookies) {
      if (cookie.getName().equals(COOKIE_WE_WANT)) {
        cookie.setMaxAge(0);
        cookie.setPath("/");
        response.addCookie(cookie);
        break;
      }
    }

Also you should ensure that you add the cookie to the response before the response has already been committed.  Previously the above code was in a tag but that was too late to modify the response.  I moved this code to a filter and then it worked fine.

Finally you can do this in JavaScript. Doing it in JavaScript has the downside that it is done after the page is loaded. But it’s definitely helpful for testing. Here’s an example of deleting the cookie named “foo”.

document.cookie = 'foo=;expires='+new Date(0).toUTCString()+';';

In the above example I did not set the path or the domain. One will need to do that if the path and/or domain were set in the cookie at creation.

ServletException root cause

Java’s Throwable class defines the getCause() method for accessing the cause of the exception.  This method returns a Throwable object which itself could have a cause.  By traversing down this chain you can find the root cause of an exception.

However for some unknown reason in ServletException the getRootCause() method was added.  Therefore when trying to determine the root cause of an exception in a J2EE environment one has to check what type of exception you have.  I do this in the following code.

  /**
   * Logs all the nested exceptions for the specified exception.
   *
   * @param ex the exception
   */
  protected void logNestedExceptions(Throwable ex) {
    int count = 1;
    Throwable cause = getCause(ex);
    while (cause != null) {
      logger.error("Nested Exception " + count, cause);
      cause = getCause(cause);
      count++;
    }
  }

  /**
   * Gets the cause of the exception.
   *
   * @param ex the exception
   * @return the cause
   */
  protected Throwable getCause(Throwable ex) {
    Throwable cause;
    if (ex instanceof ServletException) {
      ServletException sex = (ServletException) ex;
      cause = sex.getRootCause();
    }
    else {
      cause = ex.getCause();
    }
    return cause;
  }

Finally you need to configure web.xml to use your SiteMap servlet.

<servlet>
    <servlet-name>sitemap</servlet-name>
    <servlet-class>com.betweengo.servlet.SiteMap</servlet-class>
</servlet>

<servlet-mapping>
    <servlet-name>sitemap</servlet-name>
    <url-pattern>/sitemap.xml</url-pattern>
</servlet-mapping>