Enhancement #512

Enhancement #286: Make kune googleable/searchable using hash bangs #! instead of # in hashs

Page static generation via htmlunit perf issues

Added by Pablo Ojanguren over 11 years ago. Updated over 11 years ago.

Status:NewStart date:03/04/2013
Priority:NormalDue date:
Assignee:Pablo Ojanguren% Done:

0%

Category:Server side
Target version:Unplanned
Resolution: Tags:

Description

Generation of static html pages for crawler using htmlunit is provoking a lack of performance sometimes. We want to avoid this controlling how and when htmlunit processes are executed.

Associated revisions

Revision 2be1421b
Added by Vicente J. Ruiz Jurado over 11 years ago

Added some MBean to Search Servlet (wip)

Revision 5b3965e2
Added by Vicente J. Ruiz Jurado over 11 years ago

Added some mbean methods to SearchEngineServletFilter (related to #512 #286 and #70)

History

#1 Updated by Pablo Ojanguren over 11 years ago

Htmlunit process is controlled from this servlet filter class

cc.kune.core.server.searcheable.SearchEngineServletFilter

Init method sets thread configuration:

  public void init(final FilterConfig filterConfig) throws ServletException {
    this.filterConfig = filterConfig;
    cache = new Cache();
    executor = Executors.newFixedThreadPool(THREADS);
  }

We have to define following issues to address a solution:

  • WHEN have the server to avoid launching htmlunit? current server load %, current mem usage %...
  • WHAT behavior will the server have in that case? response HTTP 404?, response HTTP 500? I think crawlers are very sensitive to this!

#2 Updated by Vicente J. Ruiz Jurado over 11 years ago

Other thing I was thinking (and trying without success), was to maintain open the htmlunit WebClient, and not to do a client.closeAllWindows(); only when servlet destroy. I was trying to cache and make the page request faster, but... maybe we have to try again.

By the way, as we are using cloudflare CDN, I've added this rule:
kune.cc/*escaped_fragment*
Cache level: Aggressive caching

#3 Updated by Vicente J. Ruiz Jurado over 11 years ago

  • Parent task set to #286

#4 Updated by Vicente J. Ruiz Jurado over 11 years ago

After our last conversation, and your work with #70 I just added some mbean methods, and also I have refactorized a little bit this servlet. In short (when this is installed in kune.cc) we can debug this.

Also available in: Atom PDF