User:Cameltrader/Advisor

From wikishia

This script identifies common formatting and stylistic issues by examining the wikitext as you type.

Installation

File:Advisor-js-screenshot.png
This is what it looks like.
(without the trademarked Wikipedia logo)

Add the following to your personal js:

importScript('User:Cameltrader/Advisor.js');

Then refresh. From now on, a list of suggestions will appear above your edit box while you are editing a page. Clicking on a suggestion name highlights the text it refers to. Clicking twice pops up a help message. The "fix" link next to it inserts the proposed replacement at the appropriate place.

If you are suspicious about stuff you add to your common.js, and suspicious you should be, you might want to review the code before using it.

Features

You might be given any of the following suggestions while editing:

Linking


  • A|A: This is to say that [[A|A]] is equivalent to [[A]] and can be simplified.
    [[relational model|relational model]]
    becomes
    [[relational model]]
  • A|AB: If "B" is a word ending, [[A|AB]] can be abbreviated to [[A]]B.
    [[relational model|relational modelling]]
    becomes
    [[relational model]]ling
  • year link: Year links which are not preceded by a date are of no use and should be suggested for removal.
    In [[1984]] there will be...
    becomes
    In 1984 there will be...
  • decade link: Suggests delinking of decades. Per WP:DATE, decades shouldn't be linked unless they could bring in relevant information to the topic.
    ''Led Zeppelin II'' largely wrote the blueprint for [[1970s]] hard rock.
    becomes
    ''Led Zeppelin II'' largely wrote the blueprint for 1970s hard rock.
Character formatting


  • whitespace: Trailing whitespace characters are of no use. Currently, only regular spaces (U+0020) are recognized. I may add support for the other whitespace characters if ever needed.
    Spaces were not used to separate words in   
    Latin until roughly 600 AD – 800 AD.
    becomes
    Spaces were not used to separate words in
    Latin until roughly 600 AD – 800 AD.
  • ndash for year ranges: My script prefers the Unicode character when suggesting a replacement, but it doesn't attempt to modify existing HTML escapes (–). This rule doesn't (yet) catch "circa" year ranges, like "(c. 1812-c.1934)"
    (1989–2003)
    becomes
    (1989–2003)
  • nbsp-dash: If an mdash or ndash is preceded by a space, it had better be a non-breaking one.
    If you resize your browser window and the mdash is never wrapped at the beginning of a line — it works.
    becomes
    If you resize your browser window and the mdash is never wrapped at the beginning of a line — it works.
  • IPA character: Some IPA characters have false friends. Suggestions are given to replace the colon and apostrophe with the IPA characters which look similar. Only fragments within IPA-related templates are recognised.
    Birmingham {{IPA|/'bɜ:mɪŋˌəm/}}
    becomes
    Birmingham {{IPA|/ˈbɜːmɪŋˌəm/}}
  • unicode-escape: Numeric character escapes (like &#97; or &#x61; for the Latin lowercase letter a) can be written as normal text. Suggestions will be made only for character codes greater than 127, in order to avoid breaking syntax when special symbols like less than, left square bracket, or ampersand (< [ &) are replaced with their literal values.
    In the following example, the accented characters in the word déjà are inlined and the ampersand is left untouched. The é is a single code point hexadecimal escape and the à is a combination of a and an accent. Note, that the letter à still consists of two code points in the result.
    d&#xe9;ja&#768; vu &amp; visité.
    becomes
    déjà vu &amp; visité.
  • HTML entity: Some characters can be escaped in wiki syntax HTML-style. It is more concise and usually more readable to write them directly in Unicode.
    Kurt G&ouml;del was born in Br&uuml;nn.
    &exist;R &forall;x x&isin;R &hArr; x&notin;x
    becomes
    Kurt Gödel was born in Brünn.
    ∃R ∀x x∈R ⇔ x∉x
  • ellipsis: replaces the ellipsis character ("…", U+2026) with three periods/full stops ("...", 3×U+002E), as recommended by WP:MOS#Ellipses.discuss
    A … Z
    becomes
    A ... Z
  • all caps: According to the Manual of Style, all caps should be avoided. The most frequent abuse is of the word "NOT", and it is identified by the script:
    You should NOT use all caps, but italics for an emphasis.
    becomes
    You should ''not'' use all caps, but italics for an emphasis.
Template usage


  • lang-xx where xx as an alpha2 code: Those templates are used to describe a certain piece of text as written in a foreign language. For instance:
    [[Hungarian language|Hungarian]]: Magyar Cserkészlány Szövetség
    becomes
    {{lang-hu|Magyar Cserkészlány Szövetség}}
  • main-article, further, ...: Suggest using templates for standard pieces of text, such as this:
    :Main article: [[A|B]]
    becomes
    {{main|A|B}}
  • default-sort: If the article name looks like a person's name, and it is used as a sort key for some category, then suggest using the "DEFAULTSORT" magic word. Exceptions can be specified in the script. Most Bulgarian, Macedonian, Russian, and similar names are easy to recognize because of the -ov, -ev, or -ski endings. If you have a suggestion for recognizing names of persons in your language, let me know. For instance, in an article named "Georgi Markov":
    [[Category:BBC newsreaders and journalists|Markov, Georgi]]
    [[Category:Bulgarian writers|Markov, Georgi]]
    becomes
    {{DEFAULTSORT:Markov, Georgi}}
    [[Category:BBC newsreaders and journalists]]
    [[Category:Bulgarian writers]]
  • default-sort-magic-word: suggests replacing the {{DEFAULTSORT}} template with a magic word.
    {{DEFAULTSORT|x}}
    becomes
    {{DEFAULTSORT:x}}
  • defaultsort-a and defaultsort-the: add {{DEFAULTSORT:...}} in articles whose name starts with "A" or "The".
    '''The Rolling Stones''' are an English band whose music was ...
    [[Category:1960s music groups]]
    [[Category:1970s music groups]] ...
    [[Category:2000s music groups]]
    becomes
    '''The Rolling Stones''' are an English band whose music was ...
    {{DEFAULTSORT:Rolling Stones, The}}
    [[Category:1960s music groups]]
    [[Category:1970s music groups]] ...
    [[Category:2000s music groups]]
  • deprecated-template: Some templates have been deprecated. Advisor.js only points out that such templates are used, without providing a fix.
    {{Harvard reference | last=Jones | year=2006}}
Other


  • heading: Fixes displaced space characters, allows only == Heading == or ==Heading== (I, personally, prefer the first one), and may correct the spelling and capitalizations of standard headings, such as "External links", "References", "See also". Improper nesting of headings and duplicate section names are reported, too.
    ==Early life ==
    ==== See Also====
    becomes
    ==Early life==
    === See also ===
  • ISBN: validates 10 and 13-digit ISBN numbers. Invalid ISBN-s are reported without proposing a fix.

Known issues

Please report new ones to the talk page.

  • Automatic textarea scrolling in Opera seems impossible hard to implement. It ignores setting the scrollTop property. Opera 9.5 will likely fix this issue (Opera 9.5 Beta 1 changelog, search for "scrollTop"). A workaround could be implemented by wrapping the textarea in a div element and scrolling the div instead.discuss
File:Opera O.png
If you use Opera and you do not see highlighted text in the textarea, press on your keyboard and the cursor will be right after the fixable text; you can click the suggestion link again to highlight it.
  • After clicking "Preview", the proposed summary is lost. In other words: you edit a page, you "fix" some of the suggestions on top, the names of the changes get accumulated and appear with an "Add to summary" link just above the "Edit summary" line, then you click "Preview" (without clicking "Add to summary"), and your accumulated names of changes disappear, so you have to go back. It's a usability issue.
  • File:WikEd logo64x64.gif
    Advisor.js is incompatible with wikEd. The reason for this is that wikEd renders the wikitext not in the usual TEXTAREA but as a dynamically created DOM tree in an IFRAME; that's because of the syntax highlighting. What I need is to be able to select text between two certain offsets (including scrolling it into view) and to replace that text. User:Cacycle suggested a way to do it, and I started implementing the solution. However I've been having a hard time with it this weekend, and only highlighting is functional until now. Scrolling to the correct offset and replacing the highlighted text do not work yet.
  • There is no easy way to switch the script on and off without modifying monobook.js—this could be useful for mobile access. The script should either be proposed for inclusion as a WP:Gadget or a toggle button should be implemented.discuss (I already proposed its inclusion as a gadget.)
  • The "mdash" rule is undocumented and doesn't work as expected. It should only replace the hyphen with an em dash, without converting the space before it to a nbsp. The latter should be handled by "nbsp-dash".discuss
  • It is annoying to be warned about trailing whitespace all the time while you type.
  • After accepting a fix, the undo history is no longer functional in IE and Opera.

Fixed issues


  • ISO dates are not recognized when looking for year links.
  • May behave strangely in Internet Exporer. Automatic selection of suspicious text doesn't work, and there could be more issues. My policy is to be compatible with Internet Explorer, so I'll be working on that.
  • Clicking on a suggestion link sometimes fails to transfer focus to the editing textarea. I observed this behaviour in Firefox and Galeon. A simple workaround is to press "tab" to get the focus right.
  • The code which scrolls to the selected text in the textarea was adapted from Zocky's Searchbox script, and therefore has the scrolling issue: "The found text may sometimes still be outside the edit box. Scroll up and down a bit to find it.".
  • Advisor.js relies on the presence of the edit toolbar, so it doesn't work when it the toolbar is switched off from the user preferences. The toolbar is present by default, and most Wikipedians seem to use it. However, this dependency is not really required, and Advisor.js can be changed accordingly.
  • An edit summary shouldn't be generated for new sections.

Further development

These are not yet in Advisor.js, but are worth considering:

  • ipa: Sequences of IPA characters should be enclosed like this: {{IPA|kʰæmɫ.tɹeɪdə}}, in order to let MediaWiki add browser quirks.
  • AmE-BrE-bias: if there are sufficiently good hints in the article that it uses or should use American English rather than British English, or vice versa, Advisor.js should suggest conforming to the majority.
  • date-bias: same for American-style dates ("[[January 1]], [[2000]]", some other parts of the world use it, too) vs conventional dates ("[[1 January]] [[2000]]"). I believe the topic of the article (e. g. if it is related to the US) should influence the balance, if there's a good heuristic to recognize Americanisms.
  • as-of, as-of-month: Suggest linking of "as of 2000" as "[[as of 2000]]", and converting "[[as of January 2000]]" to "[[as of 2000|as of January 2000]]" per WP:AO.
  • Suggest adding maintenance tags, such as:discuss
    • {{Sections}} to articles that have no headings
    • {{subst:Trivia-now}} below any sections labeled "Trivia" or "Miscellany"
    • {{Nofootnotes}} to any article that lacks a <ref> tag
    • {{subst:dated|uncategorized}} to any article that does not have a category or template transcluded into it (as so many templates add articles to categories)
    • {{Deadend}} to articles that have less than three (for instance) links
    • {{ExcessiveLinks}} or {{Too many links}} to articles that have more than 20 (again, an arbitrary number) links outside of <refs>, or to more than 20 links under a "External links" heading
    • {{Too many categories}} to articles that have more than 20 (arbitrary) categories

Implemented or rejected ideas


  • [[ns:A (B)|A]]: Mediawiki's syntax allows rewriting this as [[ns:A (B)|]]. The namespace and the text in brackets are optional. This is known as the pipe trick, and [[ns:A (B)|]] is restored to [[ns:A (B)|A]] when you save the page. So, it doesn't make sense to suggest replacing it.
  • heading-bias: If spaced headings (== Heading ==) in the article are decisively more than unspaced headings, or vice versa, suggest conforming to the majority. Implemented as part of the "heading" suggestion.
  • Validate ISBN numbers using a checksum.discuss Implemented.
  • lang|bg: Suspiciously-looking sequences of Cyrillic characters can be marked with {{lang|bg|Инджектопляктор}}, if they are in Bulgarian. This has no effect on the way it looks in a browser, but may help tools identify foreign texts. This one will not be implemented because many languages use the Cyrillic alphabet. Identifying foreign-language pieces of text in Wikipedia is uncommon anyway.
  • century link: delink centuries, for instance: [[17th century]] -> 17th century. Implemented.
  • decade: fix incorrectly formatted decades, for instance: 1990's -> 1990s (the genitive case may produce false positives) Implemented.
  • defaultsort-the, defaultsort-a: suggest using {{DEFAULTSORT}} when the article name starts with "a" or "the" and no other sortkey is used. Implemented. Actually the last words should be "... and no other DEFAULTSORT is used."

Discuss these or propose a new one

Contributors

I am open for new ideas and happy to implement them. But if you have a little knowledge of JavaScript, you might consider submitting some code of your own—here is an example of how to do it:

ct.rules.push(function (s) {

    // A ``rule'' is a JavaScript function that accepts a string as a
    // parameter (the wikitext of the page being edited) and returns an array
    // of ``suggestion'' objects.

    var matches = ct.getAllMatches(/int he/g, s);
    // getAllMatches() is a utility function of mine, you are not required to use it

    var suggestions = [];
    // A ``suggestion'' object must have the following properties:
    //     * start---the 0-based inclusive index of the first character to be replaced
    //     * end---analogous to start, but exclusive
    //     * (optional) replacement---the proposed wikitext, if any
    //     * name---this is what appears at the top of the page
    //     * description---used as a tooltip for the name of the suggestion
    //     * (optional) help---an HTML fragment as a string, it will appear in a yellow
    //                         box when a suggestion is double-clicked

    for (var i = 0; i < matches.length; i++) {
        var match = matches[i];
        suggestions.push({
                start: match.start,
                end: match.end,
                replacement: "in the",
                name: "spelling-example",
                description: "You probably meant ``in the'' instead of ``int he''."
        });
    }

    return suggestions;

});

One way to test this is to copy Advisor's source to some page in your private namespace, add the custom code, and do an importScript() for the new page instead of the original Advisor.js. (I wasn't able to achieve my goal by putting the custom code in a separate page and importing both. Probably importScript() imports scripts in some unpredictable order, I don't know...)

However, in order to reduce unnecessary load on Wikipedia servers, I recommend running some simple web server at localhost (such as lighttpd, Apache, Tomcat, or whichever you might be familiar with) at localhost, and replacing your monobook.js with something like:

document.write('<script src="http://localhost:8000/AdvisorCustom.js" type="text/javascript"></script>');

Thus, you can experiment freely without cluttering your list of contributions.