search:Mini article/Policy discussion
This page is to be used for the posting and discussion of ideas for the policy on mini articles and what they should contain. As the main namespace, (the namespace with no prefix) is an alternative location for content, the question of what the main namespace is to be used for is also under consideration, see Search:Main namespace for one alternative use. Please post ideas below and then these will be discussed until February 8, 2008. From then until February 15, 2008 various versions of the policy will be drawn up. The policy which best serves the users of Wikia Search will be selected after discussion and will take place in the form of a vote. Votes will be weighted by considering the degree to which the voter supports their opinion with reasons a particular solution should be chosen, so be sure to explain how your proposed suggestion helps the user of Wikia Search find the information they are looking for.
This timeline seems to be slightly out of date... I suggest people read and make concrete suggestions (bearing in mind the discussions made above) and these are then "voted upon" (i.e agreed and disagreed with). If an idea is broadly supported with little opposition a week after it is suggested then it is approved. (Though this can be cancelled at a later date). (Taw 15:36, 15 February 2008 (UTC))
[edit] Capitalization - some suggestions
I assume most of us are aware of the problems: basically that unless the title of the mini article corresponds exactly to the search term as far as upper and lower case are concerned, the mini will not show up on the results page. To overcome this problem, I suggest the following:
- Encourage everyone to use lower case when creating mini articles. Instead of George Bush write george bush. The reason for this is that nearly everyone uses lower case when searching, even if they know proper names are usually capitalised.
- Adjust the search look-up to pick up mini articles in lower case even if the search words are capitalised. In other words, if you search for George Bush you should see the mini titled george bush.
- When linking to another mini article covering capitalised words in the running text, write the link with lower case first. An example would be Atlantic Ocean in the text. Write the link Atlantic Ocean.
- Similarly, if a keyword template is used to link the term to a mini article, the sequence Atlantic Ocean (edit) (which needs to be capitalised to preserve correct use of language in the article) should default to the mini atlantic ocean (and I mean atlantic space ocean not atlantic+ocean as is presently the case).
- Only when a real distinction needs to be made between an upper-case and a lower-case entry should capitals be used. For example, separate articles could be written on pin (as in needle) and PIN (as in ID) or on open office (as in office layout) as distinct from Open Office (the software package). A mini link here would specifically refer to the capitalised article if necessary Open Office.
I realise this is a quick and dirty approach but for the time being it is likely to give better results than the present experience of writing mini articles which do not turn up during real searches. Perhaps some method could be devised to ensure that all the currently capitalised headings are also copied into lower case together with their airticles. In time, the system algorithms could be improved to ensure that the logic used in finding search results corresponds to that for displaying mini articles. I would be interested to see whether anyone else thinks this might work.--Ipigott 17:45, 12 March 2008 (UTC)
[edit] Functions that user generated content can serve
- Information directly about topic. (e.g you look up Harold Wilson and are told that he was a prime minister of the united kingdom.)
- Linking to information about the topic (e.g you look up IBM and are provided with a link to their website).
- Helping the search process itself (e.g disambiguation, links to other pages of mini articles, links to other search terms.)
[edit] Questions
- Do we want to perform all these functions?
- Do we want all of these functions to be performed in the same wiki-page?
- Should there be standard sections for each mini-page for different functions.
[edit] Comments
RainerBlome: Q#1: Yes. We should allow it, but not require it. Q#2: Usually not all three of them at once, but I would not rule it out. Disambiguation usually implies "information directly about a topic", by adding some distinguishing piece of information for each meaning of the query. This is also a question of economy. Requiring separate articles for very short descriptions requires more resources: IT resources (storage, bandwidth etc.), writer's time, and, most importantly, reader's time. Q#3: It depends on the search term. Should not be required, but will be helpful in some contexts.
The minis should be fairly short (preferably less than 50 words) and should provide essential information about a search topic, including appropriate links to authentic sites and possibly Wikipedia. Disambiguation should only be addressed when there is a real problem in search. Comprehensive lists of all possible uses of a term should therefore be avoided. The minis should also be easily searchable themselves.-- Ipigott 08:15, 5 February 2008 (UTC)
Q#1 - I think #1 and #3 are useful. It will be very difficult to maintain linking to external sites in mini articles.
Q#2 - disambiguation should lead to information pages with only enough information to ensure they searching in the right context. When there is a common use of a word, the default page could provide that context, while including (internal) links to the other contexts if the user so chooses.
Q#3 - I think strict formatting is necessary to ensure that we do not limit ourselves for any possible uses of the mini namespace by our engine. -- Bantab 03:06, 7 February 2008 (UTC)
[edit] Content of mini-articles
[edit] Length
- Should the length of an article be restricted?
- Ideas:
- Limit the size of mini articles to 50-200 words?
- taw: The size limits should be determined by looking at actual content.
- RainerBlome: No. Decide this case-by-case. Prefer to maintain the bulk of content elsewhere, for example in Wikipedia or dedicated wikis, and link to there. If there is no known good external place, just keep it here until a better place is found or created.
- Should this automatically be enforced (i.e an entry word limit)?
- Arguments for:
- will automatically lead to conciseness.
- Arguments against:
- Will lead to content not being added.
- May lead to word-pruning games that result in bad English.
- Potentially causes conflict due to deletion of others content
- Arguments for:
Clear guideliness should be given on length. In many cases, 50 words could provide essential info. 200 words is probably too long, especially when links to other reliable sources are given. A restriction of 250 words may be worthwhile experimenting for a limited period. -- Ipigott 08:20, 5 February 2008 (UTC)
The only information needed in the mini articles is enough information to let the user know that the search engine is searching in the right context. I can't imagine a case where this would take more than two sentences. I think it would be acceptable to go over if needed. However, I can't think of any case where a meaning would be so similar that it would require more than two sentences to describe, yet is different enough that the two contexts would result in entirely different search results. -- Bantab 03:14, 7 February 2008 (UTC)
- yes bantab, i think you are right. as short as possible sounds very good to me. i'd also like to regularily have internal links to something like "related searches", e.g. the mini-article on "moon" could go like this: "moon - natural satellite (create) revolving around a planet (create)" being very short and having a link to searches for planet and satellite perhaps? linking serps instead of other mini-articles would not mean losing but gain, because corresponding mini-articles show up on serps anyway hoopz 22:25, 24 April 2008 (UTC)
[edit] Media types
- Should media types be restricted?
- If so which should be allowed?
- Images
- Audio
- Video
- Should the display-size of images be restricted?
- If so should this be done programmatically?
At this stage, I think small images would normally be sufficient. Automatic downsizing of images might help. If audio or video become necessary, I suggest a linking mechanism to a Wikia Search media area. -- Ipigott 08:24, 5 February 2008 (UTC)
I agree, nothing more than a small (between 32x32 and 64x64) picture if absolutely necessary to distinguish context. Otherwise, as little media as possible, as it is generally not an integral part of searching. -- Bantab 01:21, 7 February 2008 (UTC)
[edit] Formatting
- How should mini-articles be formatted?
- What should the rules about text formatting be?
- What should the rules about sectioning?
Ideas:
- Lift the Wikipedia rules as much as possible.
- What should change?
Avoid large headings. Allow bold type and italics. In general, cater to display of all important information in the mini displayed on the search results page. -- Ipigott 08:28, 5 February 2008 (UTC)
[edit] Agreement and further development
Mini articles will hopefully be used by both humans and the search engine to narrow down the context of a search. As such, not only will it need to be concise enough to be readily available to the searching user (as stated above) but it will also require well defined and strict formatting to aid in the use of this information by software. That is not to say that every mini article must have the same format, but the format for each type of article should be well defined.
Bold and italics could possibly be crucial to the establishment of context, and I agree that large headings would monopolize the limited space available on the search page.
Sections be kept to a minimum. In fact, the only section which I think would be useful would be ones which helped the search engine better classify the context of the particular search phrase meaning, allowing it to give better results once the context is chosen. For example, there may be a subsection in the mini article for dove which provides several contextual links for each of the two contexts. Once a context, (eg. the bird) is chosen, the search engine will then re-rank results incorporating the interconnectedness of target links with links under the subheading (eg. http://www.audubon.org/).
Strike that thought. I will leave it there for reference, because the idea is ripe for manipulation. Linking directly would allow for small manipulations to go unnoticed. Instead, using categories and canonical links within levels of categories would better allow for monitoring and scalability.
It would seem that sections are nearly completely unnecessary. If a term must be split into various definitions, then a brief statement should link to the new page, somewhat like the way disambiguation pages work in wikipedia. However, unlike wikipedia, our larger disambiguation pages would need to be distributed hierarchically due to the small space. For an example, I will develop all of Lambert. Please look at these pages for an idea of what I am describing. The user's search page should show the appropriate context within zero to three clicks, and for that context will be a uniquely crawled result based on the categories of the term. Each category should have associated links which allow the engine to rank the interconnectedness of a crawled page to the context of the term. Categories would become a form of hierarchical tagging with an inherent pagerank-like system.
Please see the example created at Lambert and please give comments. -- Bantab 01:18, 7 February 2008 (UTC)
taw: I very much like the idea of using strict formatting and style rules to make the mini-articles machine-readable. I also think hierarchical disambiguation is a good idea - though care should be taken to avoid excessively deep hierarchies. (Taw 12:35, 7 February 2008 (UTC))
- I was contemplating the problem of creating excessively deep hierarchies, and while it would need to be a community decision, I could not think of a case where it would be necessary two have more than two levels (eg. Lambert -> Lambert (science) -> Beer-Lambert law). Of course this is arbitrary, flexible, and only my opinion, but I think a community restriction on concision could extend to other aspects besides content writing. -- Bantab 14:26, 7 February 2008 (UTC)
hoopz: i like what i see at Lambert quite a lot (a pity that the user would have to expand it), 'cause it's exactly what i'd need when looking for 'lambert' to decide in which direction to look further or/and, being as short as possible provides at the time enough information in case i didn't know what or who "lambert" was - which i didn't ;-) there's one thing in it all i don't understand yet, why couldn't we, to the benefit of the user and without losing the information about relations between mini-articles, directly link from the so called mini-article to search results instead of to other mini-articles, these showing up on the result page anyway? hoopz 23:05, 25 April 2008 (UTC)
[edit] Restriction ofExternal links
- Should links be restricted in mini-articles?
Options:
- Only internal links.
- All links allowed.
- What should the criteria for the inclusion of a link be?
[edit] Comments
taw: I'm strongly in favour of allowing external links, because I think one of the main purposes of the mini-articles is to help people find content elsewhere on the web, and this is best done through providing links.
Bani: I've seen many people starting discussion pages on empty mini-articles saying Wikia Search results are not returning a site they think is a "must have", or that it is not high enough in the rank. So I think it should be explained somewhere that the discussion area is not the proper place to report that, with links to the Whitelist and to somewhere that explains the ranking system (when we have one).
Links are extremely useful in guiding users to key information resources and should not be restricted. There should however be clear guidelines on what is appropriate, i.e. basically one or two authoritative sources. The search:Canonical site guideline is a good example of the kind of link which is certainly not appropriate (particularly the second example which points to a blog). The article needs to be rewritten. --Ipigott 14:51, 5 February 2008 (UTC)
- Glad to see the guideline has now been rewritten.--Ipigott 08:56, 13 February 2008 (UTC)
I think we should also look to the integration of the software being designed for this project (take a look at my comments under formatting). Again, strict formatting will most likely be important, so I think only links internally would be appropriate. If we are determining what links are best for a particular term, this site will never be current. However, if we give canonical links for a topic in its category, then the crawler can use that information, along with other information, to determine the appropriateness of a crawled site in reference to the context of the search term. Regardless of whether you agree with this particular approach, external links are not scalable. -- Bantab 01:33, 7 February 2008 (UTC)
taw:
- I agree with your point about time-dependent links being a potential issue - and think it would be wise to worry about these links. I don't think all topics change that quickly. Specifically, I think canonical links tend to stay constant. But I may be wrong about this.
- I don't understand why internal links are more machine-readable than external links.
- I'm not clear about why external links should not be scalable. What parameter are they not scalable in? Could you clarify the precise argument for them not being scalable? One such argument might be:
- There are essentially a fixed number of editors.
- Editors continually create more content.
- The maintenance costs associated with external links is proportional to the number of links.
- Hence, as the amount of content increases the amount of work related with external links increases linearly whilst the potential amount of work done by editors remains constant.
But I'm doubtful about assertions 1 and 3. (Taw 12:55, 7 February 2008 (UTC))
- Bantab: I will respond to Taw's questions / comments point by point:
- I agree about canonical links being fairly constant (just a personal opinion), and there could be cases where there are very narrowly topical links which are also constant, such that it would only make sense to have those links on a page, and not in that page's category(ies). However, I think that this would be the exception, not the rule, and since using these links to help our search engine determine the interconnectedness of an article to a category would take amount of resources as it is (both in compute resources and in man-hour resources to make this efficient), I think it would make sense not to increase the complexity of the problem for a fairly small gain.
- Because internal links keep the search engine within our highly ordered pages.
- They are not scalable because this is not wikipedia. In wikipedia, there are a few motives for spamming and vandalism, like wanting attention and sociopathy, but there is no direct tangible gain from mass linking to external sites, and the community is thus able to deal with these few cases promptly.
- There are a growing number of editors
- There is an incentive to manipulate the engine (there is an entire industry for SEO)
- The manipulation of one editor who tries to use the engine for personal gain will be disproportionate to the altruistic editor's contribution (eg getting rid of the spam).
- Spam removal (of well planned spam) will require human oversight
- As spam removal will need to be performed manually, the maintenance costs associated with external links is proportional to the number of links.
- Hence, as the amount of content increases the amount of work related with external links increases geometrically whilst the potential amount of work done by editors grows linearly, thus the amount of work related to external links outpaces our ability to do that work.
- I think the one point that would be arguable would be the ratio of the number of new spam edits to the number of new moderating edits (as a function of the number of new editors), but if we have anything greater than one (which I think is the most likely case, given the profitability of this service), then our resources will be over-taxed.
- However, if we use categorical linking, while the number of categories will increase rapidly at first, there will be a point where new categories will slow to a trickle, allowing editors to maintain an essentially fixed number of categories, even though the number of spam articles may be growing exponentially. The best example would be in business. There are only a limited number of sub-sectors a business may be classified into, thus attempts to create new categories for new sub-sectors will be more easily monitored, and the inclusion of a business in a sub-sector would not necessarily mean a particular website would be favored in search results for that business, only that the business in question would now get results which favored the context of the new sub-sector.
- Also, I think this method will be more consistent in the quality of results. While it is not as easy as putting ibm.com directly into IBM's mini article, if we put *computercanonicalexample*.com into the category for computer manufacturers, the engine will be able to return results that are accurate for IBM as well as for ABS PC (I just picked an example I had not heard of), where ABS PC may go undeveloped past adding it to the computer manufacturer category. I hope that makes sense, and please understand that this is a quick and dirty example to show a point and may not be correct for this particular example. Let me know what you think, and what points I may have confused. -- Bantab 15:59, 7 February 2008 (UTC)
Related to your argument:
I agree with point 1,2 and 4.
I'm not sure about point 3, since it is relatively easy to revert destructive edits, and one could imagine it would be possible to revert all the edits of a single user with one operation.Further it isn't necessarily that difficult to ban a user. However this point isn't that relevant to the argument about scalability per se.
For point 5, I don't see why the maintenance cost is proportional to the total number of links. I'd have thought that it would be proportional to the number of newly created links, and link edits. You need only check a link once. (Unless people are changing the content on the other side of the link...).
For point 6, you never know - the number of editor might grow geometrically (grin). But I suppose the point is more that one has no idea what the rate of growth will be, and it would be inadvisable to assume geometric growth.
As I understand it, your point is that the maintenance cost for links is proportional to the total number of links, hence as more links are created the cost continually increases. Can you clarify this point slightly?
(Taw 19:28, 7 February 2008 (UTC))
- I don't see why external links should be necessary in a mini-article where search results are good enough by themselves. I think we should avoid external links where possible and consider them as "desired first result" or "human set temporary authority over automatic results" where search results aren't good enough yet. They could also be anchors for crawling(?) again especially where search results are still poor hoopz 23:20, 25 April 2008 (UTC)
[edit] Namespaces
How many namespaces should exist and what should their purposes be?
[edit] What should the main space be used for?
[edit] Suggestions
- Use the mini namespace for brief explanations of what a search phrase means or about a subject and major alternative meanings.
- Use the main namespace for useful content about search terms or about a subject and alternative meanings. (This assumes the mini namespace is done away with)
- Do away with the mini namespace (possibly by hiding it's existence). It is complicated and confusing to the user.
- Use the main namespace for detailed disambiguation and aid to the user of Wikia Search regarding how they might structure a search to find specific information. The mini namespace would continue to exist, but contain only a brief introduction to the search term and its meaning and disambiguation.
[edit] This is a wiki about searching
The main namespace should reflect that fact. The main namespace should be an encyclopedic reference on all aspects of searching. A goal of this wiki is to create an open search engine. To aid in this goal, we have the mini namespace. This namespace should be used as tool to help users of the search engine correctly identify the context of their search. In that sense the mini namespace should not function as a reference on any topic, but only include enough information to allow the search engine or user correctly and efficiently narrow down the results of a search. -- Bantab 21:54, 6 February 2008 (UTC)
taw: I'm not sure about the restriction of role of mini-articles to helping the search engine to find resources. I think that mini-articles could also be useful for storing answers to specific search terms - if the answers are sufficiently short. For example, if I search for "the population of spain" I think it is very useful if the mini-article tells me the population of spain.
More generally, I'd like to think of the role of a search engine to be "to answer questions" rather than "to search for resources". I think that the restriction of the role to "searching for resources" risks being arbitrary and unhelpful - though it certainly does make sense to concentrate one's efforts to ensure effectiveness. Hmmm, what I'm really trying to say is that arguments should be along the lines of "this is likely to make X worse and not make Y particularly better", rather than "this is for X, hence we shouldn't do Y." (Taw 13:13, 7 February 2008 (UTC))
- I agree that mini articles should contain information on the topic. I also think that including the population of spain in an article on the population of spain would be pertinent to the context. Likewise an article on conversion of Fahrenheit to celsius should include the conversion factor. However, my point is that a mini article on celsius should not include the conversion factor of Fahrenheit to celsius. More information than is needed to establish context will inevitably lead to more information than can be reasonably displayed on the results search page.
- For another example, I think a reasonably mini article for bird would be:
- Birds are winged animals of the class Aves.
- Compared to the introduction in wikipedia:
- Birds (class Aves) are bipedal, warm-blooded, vertebrate animals that lay eggs. There are around 10,000 living species, making them the most numerous tetrapod vertebrates. They inhabit ecosystems across the globe, from the Arctic to the Antarctic. Birds range in size from the 5 cm (2 in) Bee Hummingbird to the 2.7 m (9 ft) Ostrich.
- Modern birds are characterised by feathers, a beak with no teeth, the laying of hard-shelled eggs, a high metabolic rate, a four-chambered heart, and a lightweight but strong skeleton. All birds have forelimbs modified as wings and most can fly, with some exceptions including ratites, penguins, and a number of diverse endemic island species. Birds also have unique digestive and respiratory systems that are highly adapted for flight.
- Many species are of economic importance, mostly as sources of food acquired through hunting or farming. Some species, particularly songbirds and parrots, are popular as pets. Other uses include the harvesting of guano (droppings) for use as a fertiliser. Birds figure prominently in all aspects of human culture from religion to poetry to popular music. About 120–130 species have become extinct as a result of human activity since the 17th century, and hundreds more before then. Currently about 1,200 species of birds are threatened with extinction by human activities, though efforts are underway to protect them.
- I think that including the class above would be proper as it gives context (letting users know they are looking at the biological context of the word) and also gives commonly sought information, the taxonomic classification of the animal. On the other hand, I do not see any other information in the preceding three paragraphs which would aid in context or would be very important to a large number of searchers. Please let me know what you think -- Bantab 16:47, 7 February 2008 (UTC)
First some formalism, I'll call the reason for carrying out a search the intent of a search. I think this might mean the same thing as context in your usage.
I think we're agreed that the inclusion of information about the search term directly within mini-articles is okay.
You raise an important question of what criteria should be used to decide whether one should include information. I see that all the information included in your second example could be relevant to certain intents associated with the search term birds.
I suppose the things that make it unreasonably long are:
- The probability that most of the query is relevant for an single intent is low.
- The amount of time taken for a user to parse to information is high. This means in practice the user will ignore the information.
- Because most of the information is not relevant to any single intent most users will ignore the mini-article - even if some of the information is relvant to the users intent.
- Similar content is available elsewhere (i.e wikipedia).
Conversely, if the information is very likely to relevant to the intent of the search then the information should be included.
So perhaps the probability of relevance for search intent is the criterion we should be interested in.
One interesting question is whether it is reasonable to have large mini-article if the search term is very specific. For example if someone wrote "show me the complete text for the raven by edgar allen poe" what should the mini-article contain? (Taw 19:12, 7 February 2008 (UTC))
[edit] The mapping between search terms and mini-articles
[edit] Should similar search keywords ever generate the same mini-article?
- Idea - Only when the search terms have a near identical intent. Otherwise we should create links to other mini-articles.
- Normalization - Only after some very simple syntax normalization rules, such as bon jovi is the same as "bon jovi" with the quotes for the mini article, even though the search results differ.
[edit] How should this be done?
- Redirection commands
- Wiki content insertion {{article-name:}}
- Cut and paste insertion. (Motivation: Sometimes a particular search term might in practice want content that is slightly different from the similar term) (i.e handled by people).
[edit] Should one search term ever have multiple mini-articles?
Example: Python can be interpreted as a snake, or as a programming language. Including all the information wanted in a single mini-article becomes unwieldy.
Options:
- Don't support this at all
- Disambiguation to different search terms. I.e python contains a link "python (programming language)" which links to the search for "python programming language" which has a corresponding mini-article.
- Disambiguation to different mini-articles specified by search terms. I.e python contains a link "python (programming language)" which when pressed changes the mini-article content to that for the search term "python programming language" but does not change the search.
- Breaking the one to one mini-article to search mapping. I.e python contains a link to "python (programming language)" which when pressed changes the mini-article to that of a "python programming language" which doesn't correspond to any search term.
[edit] Comments
Jer: #2 is my strong preference, and that the answer to this section is "no" that there is a one-to-one mapping, that a mini article only exists within a search result, period.
taw: I'd also favour #2 from the point of view of simplicity of use and engineering.
RainerBlome: #2 is the way to go. Regarding #4 "Breaking the one to one mini-article to search mapping": You mean "search to mini-article mapping", right? This mapping is in my opinion a big selling point of Wikia Search, so keep this mapping. It lets you cover specific searches in a unique way. See [1]. In the last sentence, I don't really understand "which doesn't correspond to any search term". Surely "python programming language" can correspond to a search term. Do you mean "which isn't directly linked to from a search results page for that term"?
taw: Probably - I mean that you don't have a surjection - or something that means the same but sounds less latin. What I meant was when you search for something, you have a page (a "search-article") consisting of links to mini-articles. Then either:
- the set of search-articles and the set of mini-articles are disjoint or;
- the set of search-articles are a proper subset of the set mini-articles.
You could imagine this might be useful when two disambiguations are very similar - hence hard to distinguish by choosing a suitable search term without the search term becoming unwieldy. But, I think #2 is a better solution.
[edit] Linking to other mini-articles
[edit] What should clicking on a link to another mini-article do?
- Change the content of the mini-article section.
- Search for the mini-article name.
- Go to a separate mini-article wiki.
[edit] Comments
Go to the linked to mini article, plain and simple, to avoid needless complexity. --Rogerhc 07:19, 5 February 2008 (UTC)
The links should function as they normally do when at the wiki, but when they are displayed in the box above search results at the search page, I think they should definitely change the content, but I am unsure how to re-search in the new context. I would think either reload the page every time (expensive) or have a redo search button to click, so that users can update results once they find the appropriate context.
The page already uses dynamic content to provide the mini article, so I don't think that it would be technically that difficult. If we keep the edit button at the top like it is now, there is an easy way to get the article, and there should be no reason to look at the article outside of the search page, as all the mini article information should be included in the search page mini article box. -- Bantab 02:25, 7 February 2008 (UTC)
- why should mini-articles link to other mini-articles at all? if they contain links shouldn't these rather refer to other _searches_ , thus avoiding to create a "mini-encyclopedia" consisting of mini-articles? hoopz 22:38, 24 April 2008 (UTC)
- you can use the Template {{k|name-of-your-linked-article|shown-name}} (displayed as shown-name (create)) for a link to other minis including a link to the search engine. Best Regards --Cy 09:46, 25 April 2008 (UTC)
- Okay thx, i began to do so - but still are not sure how to handle this, i see most of the editors doing otherwise - maybe i'm wrong there? hoopz 23:34, 25 April 2008 (UTC)
- you can use the Template {{k|name-of-your-linked-article|shown-name}} (displayed as shown-name (create)) for a link to other minis including a link to the search engine. Best Regards --Cy 09:46, 25 April 2008 (UTC)
[edit] Problem with REDIRECT and upper/lower case
In the general context of linking to mini articles, I would like to point out a problem with :#REDIRECT [[]]. Bantab has encouraged me to use REDIRECT rather than writing multiple minis on the same topic, e.g. newcastle upon tyne, newcastle on tyne, Newcastle on Tyne, etc., should all redirect to the existing mini Newcastle upon Tyne. If this worked properly as in Wikipedia, it would obviously be an advantage as (a) it would link straight into the master article and (b) any change in the master article would be automatically reflected in all the redirects. Unfortunately, at the moment if you try to redirect, the mini displayed on the search results page is
- REDIRECT mini:Newcastle upon Tyne
While Wikipedia experts and Wikia administrators will immediately recognize the meaning of this statement and would realize that if you click on "Full article" you do indeed obtain the master mini, I would guess that most of those users who are there to obtain useful search results will simply overlook the mini completely and go straight to the search results which may be far from satisfactory. It is for this reason that many of us have been creating several minis to address the same search term, especially if we are searching for an item and do not find a mini on it anyway. This often occurs as the system does not yet cater for upper and lower case equivalents for the same term. And as everyone knows, most people search in lower case only and would therefore seldom obtain any result from capitalised terms in minis whether initials like bbc or ibm (for BBC or IBM) or multiword place names or personal names such as washington dc for Washington DC or bill clinton for Bill Clinton. It would help everyone if these issues could be sorted out.--Ipigott 16:56, 7 February 2008 (UTC)
- One of the fundamental things we need to remember however, is that this is a an alpha version of the search, and right now, just as in the first days of wikipedia, we should be creating a framework that will be efficient, effective, and scalable. While it would be easier to copy articles to multiple paths, giving mini articles that are readable by the re.search.wikia.com engine right now, that solution is neither efficient nor is it scalable in the long term, as each change for each article in the mini namespace would require changes to dozens of duplicate articles. Instead of messing with MediaWiki, which has been a very good wiki engine, we should change the results page code, as I think we can all agree this code is still in its infancy. For the time being, it won't be pretty, but I think it would be better to have an ugly search results page right now, when this site is admittedly not good, than to entrench ourselves in a scheme that would create thousands of unnecessary pages which would need to be changed once we have fixed the code. Remember, this is alpha... nowhere near ready for release. There is already a bug notice on the bug board, and hopefully I will have time to look at the code soon so that we can fix this bug. Let me know what you think. -- Bantab 18:26, 7 February 2008 (UTC)
- I think a public facing site by any other name is still earning a reputation. Keeping users in limbo kills applications by reputation. So this redirect bug that looks as if it could be fixed today might be better fixed today than tomorrow. Same for the dead footer links. This is first impression time for Wikia Search; it has already launched. It might do well to progress at Internet speed. Cheers, --Rogerhc 18:46, 8 February 2008 (UTC)
[edit] How should this default functionality be achieved?
- What should be the syntax used to create the links? [[ ]]? {{k| }}?
[edit] Comments
[[ ]], the regular wiki markup link style should be used to avoid needless complexity. --Rogerhc 07:19, 5 February 2008 (UTC)
If the problem of blank space vs underscore can be solved in the k| context, this would be a powerful tool. The reason links to other minis are being marked up with square brackets is because the keyword approach does not work correctly but presupposes a search team A B means A+B (A plus B) rather than A space B. This all needs to be sorted out and explained in guidelines. --Ipigott 08:36, 5 February 2008 (UTC)
[[ ]], this way the functionality would not break inside the wiki, and it will probably make it easier for any AJAX to load linked pages in the search (if we go with the idea of dynamically updating the context in the search). --Bantab 02:25, 7 February 2008 (UTC)
[edit] Spam
From How to handle advertisement? at Forum:Mini_articles:
- Minis that are just a link, or don't have any real information, are not acceptable.
- Mini-articles about a company must go in a keyword related to the name of the company, and not something that describes their business (product or service).
I don't know what keyword means in this sense. I think to maintain the scalability of the search, we cannot include external links directly in the mini articles. As stated before, I think canonical links which appear in categories is the most scalable and useful way to incorporate external links. I agree with the statement that a mini article on a name should link to the relevant business, should it exist. I also agree that general products and services should not link to businesses (eg. laptop should not link to IBM), however, I think proper nouns describing products unique to a business should link that business (eg. Thinkpad should link to IBM [or Lenovo I guess]). -- Bantab 02:40, 7 February 2008 (UTC)
[edit] Commercial subjects
[The following section has been moved here from Forum:Setting expectations.]
There has to be a policy regarding mini-articles about commercial subjects, right now.
On my first day here I requested a number of deletions of not-so-veiled advertising. Now I am not sure whether deleting those was a good idea. They were not pure link spam, they contained about four lines of explanatory text, including a server address (not wikified). But the problem goes deeper.
A search engine as I want it is not a commerce-free zone. Many users do not care whether the results they get are about something commercial or not. Some users are actually searching for something commercial. Notability is not an issue for a search engine. Therefore I'd say: allow them all.
In other words, I currently believe that it is beneficial to allow manual advertisement in mini-articles. Leave it to the users to cull excessive or destructive behavior.
Provide some tools to help with the culling (two-click revert, for example). To prevent automated link flooding, unregistered users should need to enter characters from an image (I tried it, and this safeguard does not seem to be in place).
An alternative policy would be to rule out advertisement just for now, for reasons of scale. The system may need this protection in its childhood, to stop it from being swamped while still small. Whether this is necessary, I have no idea. Later, when the system has matured, it may be scaled up to a level where it can withstand the bulk of unrestricted commercial entries.
--RainerBlome 12:57, 10 January 2008 (UTC)
- Rainer, I completely agree that because of the commercial nature of many users' search habits, leaving commercial-related entries online is valid practice. It does beg the question of where to draw the line;
- For example, two days ago, searching on "Ford" yielded "Ford is the best car company ever www.ford.com drive a ford!", and jargon to that effect. Now, there is an informative article with links and disambiguation... but
- What about people who want to buy a Ford?
- Did we (editors) go too far to delete all semblance of commercial information from the mini article?
- Should the search results be allowed to mature and speak for themselves?
- Where is the place for customer satisfaction? Opinions? Ratings?
- --Vtmemo 20:40, 10 January 2008 (UTC)
>What about people who want to buy a Ford?
Mini articles about companies, even the smallest, should be allowed. In such a mini article, I consider it good(!) to link to the company's site(s).
>Did we (editors) go too far to delete all semblance of commercial information from the mini article?
In this case, no. Your edit was constructive, the previous content was just silly - if you were a manufacturer with that much prestige, would you really want that kind of advertising?
>Should the search results be allowed to mature and speak for themselves?
Mostly yes. If some article is outrageously biased, prefer to edit it towards NPOV (which you did) instead of deleting it.
>Where is the place for customer satisfaction? Opinions? Ratings?
Refer to existing web pages. There are tons of sites for this purpose. Wikias Search's guidelines might explicitly suggest to add links to such external pages. They might even suggest suitable sites. This applies to other kinds of information, too.
In general, a search service's primary purpose is to provide references to other sites, not to provide content. If providing some auxiliary content helps with the searching, it should be provided. But as long as there already is a good place for the content to go, it should go there. If no good site exists or is known, that may change over time and content maintained at the search wiki should then be moved there.
--RainerBlome 13:07, 17 January 2008 (UTC)
- @Rainer: even if good sites exist, I think the miniarticle (as the name says) should offer a summary of all the important informations. For instance if there is a good article in Wikipedia about Wikipedia, the reader need a very short information about that topic, and miniarticles are exactly what they need.--jeanpol 08:08, 11 February 2008 (UTC)
[edit] One possible solution
See http://re.search.swlabs.org/inpage.html#wiki and click on the grey edit link for one possible solution. The idea is to wrap and keep all mini interaction in their native context, in the search result page.
[edit] Internationalization and Localization
How should we distinguish between articles in different languages. I.e what is the difference between cat in French and cat in German?
Possible options:
- Different namespaces within the same wiki.
- How would one handle having several namespaces for each language?
- Each language lives in a separate wiki.
[edit] How should language be selected?
Suggestion:
- The browser's Accept-Language header will be used for all defaults
- If the default language mini article doesn't exist, the next one will be selected, if none, the en article would be the default
- All language versions of a mini article will be returned as a list for the user to switch between
- On new mini article it will be created (and told to the user clearly) in the language set by the accept header, or let the user choose another
Comment:
For the more popular languages (e.g. German, French, Spanish, Portuguese), users could be directed to their own language version such as search.de.wikia.com, etc. If they conduct a search, they would then receive results for German-language sites first, possibly followed by English. If they write minis, the minis would be stored under the German version of the system. They would, of course, also have the option of conducting searches in other languages by clicking on the language of their choice in the LH margin, as in Wikipedia. This approach would also avoid duplication of information in different languages on the same mini page and would separate categories by language version. Language selection is not just an issue for the minis but more importantly for the whole search process.--Ipigott 14:11, 6 February 2008 (UTC)
[edit] Temporary Solution
Presently, the most active pages are just adding foreign language pages as sub-pages of the page. so that this page in german would be:
http://search.wikia.com/search:Mini_article/Policy_discussion/de
I think that is a good temporary solution, as it allows the pages to be developed, but is not overambitious in trying to develop 10 sites at once. I think this issue will be very important, but I think right now it is a little more than we can chew. -- Bantab 02:00, 7 February 2008 (UTC)
[edit] Some concrete policies
This page seems to have vaguely died a death - so I thought it might be a good idea to convert some of the discussion into possible actions. If people suggest concrete policies here, then they can be "voted upon" (i.e people can disagree with things that they think are a bad idea). It might be an idea to keep suggestions quite "earthy", so that some quick and dirty decisions are made, postponing some more tricky points to later discussions. Remember that anything suggested here can be postponed rather than disagreed with.
Feel free to disagree vehemently with all the things I suggest - I felt like I wanted to get something quite clear down. (Taw 15:29, 15 February 2008 (UTC))
[edit] Suggestions
External links are still allowed within mini-articles with the clear understanding that this is open to future discussion based on observed results. Such discussion with follow in mid-march. (taw)
A style guide will be developed on a separate page. This still guide should be suggestive rather than a set of strict rules(taw). It will suggest that:
- Descriptive information should not be given about two separate concepts on the same mini-article. Rather the two concepts should always be disambiguated to separate mini-articles. Which contain the separate informtion. (taw)
- Disambiguation and description should never be presented on the same page. (taw)
- Disambiguation should always be to more descriptive search terms. For example you should not disambiguation to "Python (programming) but "python programming language" (taw).
- Information should not be placed in directly in a mini-article unless it is likely to relevant to a high proportion of people viewing the mini-article (perhaps over 50%) (This will tend to force brevity.) (taw)
- Some limit should be suggested on the size of images within mini-articles. (Perhaps and image should be no wider than a third of the mini-article display). (taw)
- Other types of media are allowed for the present. (There seems no reason to explicty ban then). (taw)
- Rules about text formatting are for the moment to be lifted from the wikipedia. (Though not about style). The relevant pages describing these will be copied and adapated here. (taw)
Companies are allowed to write their own mini-articles - however an advertizing tone is not allowed (This is in pratice quite easy to recognize). All pages written in an advertizing tone can be immediately deleted rather than adapted (making it the responsibility of editors to render advertisements neutral is quite a heavy strain.) (taw)
A redirect feature will be added to mini-articles. (taw)
The "[[ ]]" link will be made to search for the suggested term. (taw)
The mini namespace will be deleted. (taw)
A different namespace will be used for mini-articles in different languages. For example we will have and Fr: namespace for french mini-articles and a De: namespace for german mini-articles. (taw)
The front-end search engine will detect the language required by the user and present mini-articles for that language only, defaulting to english if there is no mini-article in the appropriate language. (taw)
Could someone with a clear opinion make a suggestion about how to deal with capitalization?
[edit] Capitalization - some suggestions
I assume most of us are aware of the problems: basically that unless the title of the mini article corresponds exactly to the search term as far as upper and lower case are concerned, the mini will not show up on the results page. To overcome this problem, I suggest the following:
- Encourage everyone to use lower case when creating mini articles. Instead of George Bush write george bush. The reason for this is that nearly everyone uses lower case when searching, even if they know proper names are usually capitalised.
- Adjust the search look-up to pick up mini articles in lower case even if the search words are capitalised. In other words, if you search for George Bush you should see the mini titled george bush.
- When linking to another mini article covering capitalised words in the running text, write the link with lower case first. An example would be Atlantic Ocean in the text. Write the link Atlantic Ocean.
- Similarly, if a keyword template is used to link the term to a mini article, the sequence Atlantic Ocean (edit) (which needs to be capitalised to preserve correct use of language in the article) should default to the mini atlantic ocean (and I mean atlantic space ocean not atlantic+ocean as is presently the case).
- Only when a real distinction needs to be made between an upper-case and a lower-case entry should capitals be used. For example, separate articles could be written on pin (as in needle) and PIN (as in ID) or on open office (as in office layout) as distinct from Open Office (the software package). A mini link here would specifically refer to the capitalised article if necessary Open Office.
I realise this is a quick and dirty approach but for the time being it is likely to give better results than the present experience of writing mini articles which do not turn up during real searches. Perhaps some method could be devised to ensure that all the currently capitalised headings are also copied into lower case together with their airticles. In time, the system algorithms could be improved to ensure that the logic used in finding search results corresponds to that for displaying mini articles. I would be interested to see whether anyone else thinks this might work.--Ipigott 17:45, 12 March 2008 (UTC)
- I think the better solution is to have the mini show up no matter how it is capitalized. This a trivial coding problem. Fred Talk 19:10, 12 March 2008 (UTC)
- Fred, I agree with you 100% on this but it seems so difficult to get anyone to do anything about it. I brought the whole matter to Angela's attention last week and she told me she would try to sort it out with the Search team this week. I thought that by taking a step-by-step approach to the problem, it might be possible to see some progress. But you are right - there should be a simple match, no matter how much or how little capitalisation there is in the mini. Nice to see you are still around. I thought you had abandoned ship. Best, Ian. --Ipigott 19:28, 12 March 2008 (UTC)
- I'm doing many other things, that is all. Fred Talk 00:48, 13 March 2008 (UTC)