slrn
- Scoringslrn
newsreader.
slrn
awards an article points by giving it a score. If the score for
the article is less than max_low_score (zero by default), the article is
marked as read. If the score is less than or equal to kill_score (-9999
by default), the article is killed. If the score is greater than or equal to
min_high_score (1 by default), the article is marked as important. The
purpose of the score file is to define the set of tests that an article must
go through to determine the score.
Although the score may be based on ANY header item, it is recommended that one
stick with the information found in the news overview data when scoring in
slrn
(slrnpull
gets full headers anyways, so it does not make a
difference there). The news overview file typically contains:
* Subject * From * Date * Message-ID * References * Bytes * Lines
plus any header that your news admin decided to include here (usually, `Xref'
is one of them). slrn
offers two special keywords that also allow
efficient scoring: `Newsgroup' (the newsgroup that the article is part of) and
`Age' (the age of the article in days).
The location of the scorefile is specified with the scorefile configuration variable. E.g.,
set scorefile "scores/default.score"
Note: The file must be created before it can be used.
Tip: Using the ``.score
'' filename suffix may enable appropriate
syntax highlighting in some editors, such as vim
.
The scorefile may be edited while slrn
isn't running, or from within
slrn
with the create_score interactive function in article mode. By
default, this function is bound to the ``K
'' key. Using ``K
'' alone
gives an interactive choice of popular scoring options. To edit the scorefile
with a text editor use a prefix argument (``ESC 1 K
''); you'll be
presented with a commented-out scorefile entry for editing.
Tip: When learning scoring, reading the commented-out lines that
slrn
adds to the scorefile when creating scores interactively is likely
to be as helpful as reading documentation.
The format of the file is very simple (See below for an explicit example). The file is divided into sections delimited by a newsgroup or newsgroups enclosed in square brackets, e.g.,
[rec.crafts.*, rec.hobbies.*]
The name may contain the `*' wild card character.
Comments begin with the `%' character. Leading whitespace is ignored.
Each section consists of comment lines, empty lines or keyword lines. Only the keyword lines are meaningful and all leading whitespace is ignored. A keyword line begins with the name of the keyword followed immediately by one or two colons and one space. The rest of the line usually consists of a regular expression. The keyword may be prefixed by the `~' character to signify that the regular expression should not match the object specified by the keyword.
A group of keywords defines a test that is given to the header of the article. The `Score' keyword is used to assign a score to the header. If it is followed by a single colon, the score is only given if all tests are passed (logical AND); two colons indicate that the score should be awarded if any of the tests are passed (logical OR). The score can be any positive or negative integer. If the numerical value of the score is prefixed by an equals sign, score processing for the header is stopped and the header will be given the score for that test.
Note: The `Score' keyword also serves to delimit tests. You can optionally add a comment behind the score, which will then be used as the name of the scorefile entry and displayed when using view_scores (`v') in article mode. Here is an example of this:
Score: 100 % optional name here
All keywords except for `Score' and `Expires' may be prefixed by the `~' character. If the `Expires' keyword appears, it must immediately follow the `Score' keyword. The `Expires' keyword may be used to indicate that the test is no longer to be applied on the date specified by the keyword. For example,
Expires: 4/1/2010 (or: 1-4-2010)
implies that the given test is no longer valid on or after April first 2010. As the example indicates, the date must be specified using either the format MM/DD/YYYY or DD-MM-YYYY. Note: DO NOT CONFUSE THIS WITH THE EXPIRES HEADER KEYWORD.
The `Lines', `Bytes', `Age' and `Has-Body' keywords are also special. Their value is not a regular expression, rather, a simple integer.
`Lines' and `Bytes' may be used to kill articles which contain too many or too few lines / bytes. For example,
Score: -100 Bytes: 20480
assigns a score of -100 to articles that are larger than 20 kB. Please keep
in mind that `Bytes:' is only available when getting overview data and will
otherwise (e.g. in slrn
pull) be set to 0.
Similarly, the test
Score: -100 ~Lines: 3
assigns a score to articles that have less than or equal to 3 lines.
`Age' can be used to score articles which are younger than N days. For example:
Score: 10 Age: 7
adds 10 points to the score of each article that is at most one week old. You can use negation (`~') to score articles that are older than N days.
`Has-Body' can be used when reading offline in combination with slrn
pull:
You can tell slrnpull
to download only article headers by default and
fetch article bodies on request. In this case, you can use a rule like
Score: 20 Has-Body: 1
to give each article that does have a body 20 points. You can invert this (i.e. score articles without bodies) either by using negation (`~') or by writing `Has-Body: 0'. Values other than 0 or 1 have no meaning.
Finally a score file may include other score files via the `include' statement. The syntax is simple:
include FILE
The name of the file is considered to be relative to the directory of the file
including it, unless an absolute path is specified. For instance, suppose
`/home/john/News/Score
' contains
include /usr/local/share/slrn/score include score_spam
and `/usr/local/share/slrn/score
' contains the line:
include score_spam
In the first instance, `score_spam
' will be read from the directory
`/home/john/News
' but in the second instance it will be read from
`/usr/local/share/slrn
'.
slrn
score file
[news.software.readers] Score: =1000 % All slrn articles are good Subject: slrn Score: 1000 % This is someone I want to hear from From: davis@space\.mit\.edu Score: -9999 Subject: \<agent\> [comp.os.linux.*] Score: -10 Expires: 1/1/2010 Subject: swap Score: 20 Subject: SunOS Score: 50 From: Linus % Kill all articles cross posted to an advocacy group Score: -9999 Xref: advocacy ~From: Linus % This person I want nothing to do with unless he posts about % `gizmos' but only in comp.os.linux.development.* Score: -9999 From: someone@who\.knows\.where ~Subject: gizmo ~Newsgroup: development [~misc.invest.*, misc.taxes] Score:: -9999 Subject: Earn Money Subject: Earn \$
This file consists of three sections. The first section defines a set of tests applied to the news.software.readers newsgroups. The second section applies to the comp.os.linux newsgroups. The final section applies to ALL newsgroups EXCEPT misc.invest.* and misc.taxes (see below).
The first section consists of three tests. The first test applies a score of
1000 to any subject that contains the string `slrn
'. The second test
applies to the `From'. It says that any article from davis@space.mit.edu
has its score increased by 1000. The third test reduces by -9999 the score of
any article whose subject contains the word `agent
'. Since tests are
applied in order, if an article contains both `slrn
' and `agent
', it
will be given a score of 1000 since the value is prefixed with an equal sign.
The second section is more complex. It applies to the comp.os.linux
newsgroups and consists of 5 tests. The first three are simple: -10 points
are given if the subject contains `swap
', 20 if it contains `SunOS
',
and 50 if the article is from someone named `Linus
'. This means that if
Bill@Somewhere writes an article whose subject is `Swap, Swap, Swap
', the
article is given -10 points. However, if Linus writes an article with the
same title, it is given -10 + 50 = 40 points. Note that the first test
expires at the beginning of 2010.
The fourth test kills all articles that were cross-posted to an advocacy newsgroup UNLESS they were posted by Linus. Note that if a keyword begins with the `~' character, the effect of the regular expression is reversed.
The fifth test serves to filter out posts from someone@who.knows.where unless he posts about `gizmos' in one of the comp.os.development newsgroups. Again note the `~' character.
The final section of the score file begins with the line
[~ misc.invest.*, misc.taxes]
If the first character following the opening square bracket is `~', then the newsgroup or newsgroups contained in the brackets are NOT to be matched. That is, the `~' character is used to denote the boolean NOT operation.
For writing even more complex entries, slrn
now allows the grouping of
scorefile rules. Here is a simple example:
Score:: -1000 ~Subject: \c[a-z] {: Subject: ^Re: ~Subject: ^Re:.*\c[a-z] }
Lines enclosed in curly braces are grouped; the initial brace is followed by one or two colons that indicate whether only one (`::') or all of the lines (`:') inside the group need to match for the group to pass.
As the result, the example kills subject header lines that do not contain lowercase characters, not counting an initial `Re:'.