Forum:Source Code

Forum: Index > Source Code


This page is intended to list the various programs used by search wikia and provide links to the source code. .

Some points

  • Beware that this is my understand of the situation at present so this might be incorrect.
  • Feel free to add anything to this page as a appropriate.
  • This is not intended to completely explain the architecture - rather explain it sufficiently so that people can understand what they are downloading
  • This page might like to be updated to a special page at some point if people find it useful.

(Taw 02:21, 13 January 2008 (UTC))


Contents

[edit] Architecture of search wikia

Search wikia consists of four separate functionalities:

  • Web Crawling
  • Indexing
  • Result presentation
  • Login and social network

Search wikia is designed so that these four functionalities are independent: it would be possible to replace any individual component whilst leaving the other three fixed. For example, commercial third parties could use Search Wikia's web crawling and indexing whilst presenting results using their own branded front-end.

In addition to presenting the search results from the index, the Wikia Search frontend (result presentation) queries the Search wiki for mini articles and presents them next to the index search results. This effectively makes the Wiki a part of the search service, adding a fifth functionality.

Search queries are sent by the client (your browser) to http://alpha.search.wikia.com/. The HTTP response redirects the client to http://re.search.wikia.com/, which delivers the result.

[edit] Web crawling

This is the collection of web address and the caching of webpages on a server. This data can then be processed for other uses - in particular to create search results.

[edit] Indexing

This is the processing of the data provided by the web crawler to produce order sets of links for different search terms.

[edit] Result presentation

This is the conversion of the data produced by indexing into an actual webpage that is presented to the user by their browser.

[edit] Login and Social Networking

This consists of allowing users to log in, create content, such as mini-articles and forum posts and otherwise interact with other another. ditt

[edit] Search Wikia software

Currently the functionality is provided by the following programs. All of these are open source, or are soon to be released as open source.

[edit] Web Crawling

Web crawling is provided by the Grub Server and Client. Grub clients collect web content and forward it the the grub server that acts as a store. The Grub server is derived from a previously closed source project which wikia acquired. The source code for the server is available here. A lighter perl grub client has been developed and is available here.

[edit] Indexing

Indexing is provided by Nutch: an open source indexing program developed by the Apache Foundation. The current source code for nutch can be found here.

Tech/Scoring and Configuration describes how Nutch has been configured. Tech/Open Index explains how to use the Search Wikia index(es) without using Wikia Search's search forms (by composing URLs yourself or by some code).

[edit] Result Presentation

Result presentation is provided by a set of dynamic html pages implemented using javascript. These run locally on the users machine and collect indexing results, formatting them and presenting them. The source code for this is available here. It makes use of the prototype javascript framework (create).

re.search also queries the Search wiki for a mini article. If it finds one, it presents its content next to the index search results.

[edit] Social Network and Login

The Social Network and Login is provided by foowi, a set of jsp (create) pages and Java clases implemented within wikia, these pages are to be run on tomcat or similar. The login system is entirely independent of the searching system, so in theory this component could be used by projects completely unrelated to search. This component relies on various open source java components including

  • springboard (create) - a java library to automate sql queries.
  • others to be included later.

The source for foowi is available here

Retrieved from "http://search.wikia.com/wiki/Forum:Source_Code"

This page was last modified 21:14, 14 May 2008. GFDL