README FILE FOR DIABLO TRANSIT FEEDER --- DIABLO is a news transit and reader system. This readme file describes the transit side of things. You can operate the transit and reader sides separately or together, or just operate one or the other depending on your needs. The transit side of Diablo is designed to transit news between one or more incoming feeds and one or more outgoing feeds. The transit side of Diablo is not designed to operate as a reader. However, it may be used to back a reader's spool for readers that fetch articles by message-id (which the reader side of Diablo does). Since the transit portion of Diablo is not designed to support readers, an active file is not usually used with it ('active off' in diablo.config). If you want the transit portion of Diablo to act as the master article number assignment point which downstream sites are slaved off of, you turn ON feeder-side active file support in diablo.config and run the reader side, dreaderd, with the -x option to slave it's article numbering to the master's. When turned on, the active file only effects how the Xref: header is generated by the transit side of Diablo. The transit side of Diablo will still transit articles even if none of the newsgroups are listed in the active file. The feeder side of diablo does not process control messages, so if you are mastering article numbers on the feeder, you either have to run the reader on the same box in order to process control messages and keep the active file up to date, or you must periodically synchronize the active file from a remote source using dsyncgroups (taking care to use the proper options so you do not overwrite the begin/end/other article numbering parameters that Diablo uses to master XRef: lines). The transit side of Diablo maintains a history file (the reader side does not). This means that the transit side is able to take multiple feeds in parallel without transiting duplicate articles. The reader side's ability to handle duplicate articles is limited only to the case where the -x option is used on dreaderd. If the option is not used dreaderd CANNOT handle duplicate articles being fed to it. It is often beneficial to run the feeder and reader on the same machine, with the feeder feeding the reader internally, even if the feeder is not going to be used as a (major) spool cache. It is more common to run the feeder and the reader on the same machine if the feeder is given a large enough spool to act as the backend cache for the reader. Or you may just want to run a feeder-only machine with no reader elements on it at all. The transit side of Diablo (the 'feeder') is strictly for news transit and does not understand reader-related NNTP commands. The Diablo spool, usually /news/spool/news, is maintained by the transit system. You must size the spool and run dexpire as appropriate to your needs. Diablo stores multiple articles per file in the spool in a two-tiered directory structure. Article files & directories should never be directly edited or removed or you risk corrupting the reference data Diablo stores in the history file. You must use dexpire to free space in the spool. Since Diablo does not write out article files singly it tends to be much more efficient then INN without CNFS. It is roughly on-par with INN + CNFS, though I personally believe it is better. Typically, anyone taking a full feed these days must dedicate a machine to it that is separate from the newsreader machine that your users use to read news. The DIABLO transit system is designed to replace the dedicated newsfeeds machine and is designed to be a mostly hands-off affair once you get past configuring and stabiliing it. See TUNING_NOTES for machine configuration suggestions. OS REQUIREMENTS See TUNING_NOTES. WHO SHOULD RUN DIABLO If you need to run a USENET news feeding/transit system and/or a USENET newsreading system, Diablo may be for you. WHERE TO GET DIABLO http://www.openusenet.org/diablo/ REPORTING BUGS send the bug to: diablo-bugs@openusenet.org send non-bug stuff to: diablo-users@openusenet.org NEWSGROUPS: news.software.nntp MAILING LISTS: http://www.plig.net/mailman/listinfo/diablo-users USE OF REALTIME FEEDS AND FEED DELAYS If you have several outgoing feeds, you should consider using the realtime, queueskip and startdelay options in dnewsfeeds. All of your local and internal feeds should be realtime. Cheap external paths to the internet can also be realtime. To reduce the cost of running outgoing feeds over your internet transit, you may wish to weight the feeds according to cost. For example, our MAE-WEST connection is a lot cheaper then our MCI T3, so I run outgoing feeds with MAE-WEST destinations in realtime and run outgoing feeds which go via MCI in batch mode with a 10 second delay. This way the articles may actually propogate to the more expensive destinations via other means prior to my actually attempting to send them direct. Likewise, if you have T1 and frame customrs, it is usually cheaper to supply them with a newsfeed yourself rather then force them to go to someone over the internet. This way they are not eating your transit bandwidth on newsfeeds. A realtime feed to those people is best. CATCHING UP AFTER BEING DOWN The key item to monitor when catching up on incoming feeds after being down for a while is the incoming article rate. Diablo will generate a log line for every 1024 articles received that looks like this: Jun 24 11:03:59 news1 diablo[18153]: DIABLO uptime=7:46 arts=241.000K tested=0 bytes=1.842G fed=12.613M You can calculate the article rate by looking at the delta activity from two log lines that are around an hour apart from each other. If the article rate is above 9 articles/sec, diablo is catching up reasonably well.. as of today, a full feed is around 5 articles/sec. With a moderate number of incoming feeds, diablo can do around 30 articles/sec. If you have a huge number of incoming feeds that are all in catchup, in-kernel filesystem locking will begin to interfere with the history file lookups and updates. Diablo will be able to maintain a reasonable history file write transaction rate, but the lookup rate will suffer. This causes diablo to catch up on articles first without appreciably reducing the backlog at remote sites due to slow check-responses. Once it passes a certain threshold, however, and the load on the history file turns to mostly-read rather then read/write, the transaction rate will increase dramatically and diablo will generally be able to cleanup the backlogs very quickly after that. SPAMALIAS OPTION IN DNEWSFEEDS Submitted by: uhclem @ nemesis.lonestar.org (Frank Durda IV) Spamalias --------- Related to an "alias" command, any originating server (not transit servers) or username entry in the Path: header that match a "spamalias" wildcard declaration will cause that message to not be propagated to any neighbors. Diablo behaves as though some "alias" command matched that Path: header on all outbound feeds. Intended for use on transit-only servers (since in this implementation, "spamalias" checks are too late to prevent the message from being viewable from a local reader), the "spamalias" parameter acts globally and must appear in the GLOBAL section. For example, groupdef GLOBAL spamalias badpornsite spamalias xxxnilla spamalias *.unresponsive.net spamalias abuseisokay* spamalias annoyizer end These entries would block messages with these Path: headers: Path: someplace!elsewhere!okaysite!badpornsite!not-for-mail Path: someplace!elsewhere!okaysite!notspammershonest!annoyizer Path: someplace!okaysite!xxxnilla!seemelive Path: someplace!elsewhere!okaysite!west-coast.unresponsive.net!me Path: someplace!elsewhere!okaysite!east-coast.unresponsive.net!them Path: someplace!goodplace!elsewhere!abuseisokay Path: someplace!goodplace!abuseisokay.com!nobody However, these messages would not be blocked: Path: someplace!xxxnilla!elsewhere!west.unresponsive.net!goodplace!xyz Path: someplace!elsewhere!badpornsite!goodplace!gooduser Path: someplace!elsewhere!abuseisokay.com!elsewhere!innocentuser Only the last and next-to-last elements of the Path: header are checked by the "spamalias" command. All too frequently there are sites who originate spam or abuse that need to be blocked (mainly to punish a lack of response to abuse/spam problems), but you don't want to lose articles that are just passing through that site and didn't originate there, since your history database is now poisoned, so you can't get the article via some other route just to make sure it didn't really originate there. (If you have limited redundant feeds, hoping an article will show up via a less-tainted route may not be an option.) Using "spamalias" provides reasonable blocking of specific sites or spam/abuse signatures without blacking-out unintended chunks of the USENET network. When a "spamalias" match is made, an entry is logged to news.debug, indicating if it was a position A (originating site) or B (user) match. The entry in incoming.log for such an article will show no sites to be fed. Odd Things and Cautions about Spamalias: If "spamalias" entries are used outside of the GLOBAL section, it may either sly slow Diablo because of the amount of redundant work it ends up doing and may still act globally or act on some but not all feeds. The "spamalias" command is not designed to be used to filter subsets of outbound feeds. There is a potential to defeat "spamalias" if the blocked site allows Path: preloading, but in three years of use, the incidents of any abuser bothering to work around such filtering have been virtually non-existent, and the number of big sites that still allow preloading and don't take their own action against spam/abuse is getting quite small. When writing "spamalias" expressions that are intended to match usernames, be cautious since some "dummy" usernames appear in posts originating from many sites, likely in addition to whatever it is you are trying to block. For example, never block "not-for-mail" or "news", since these are commonly used in place of actual user names. Since the "spamalias" command tests both originating site and originating user fields, make sure an entry written for one field isn't busily zapping non-abusive posts that happen to match on the other field. WHEREIS COMMAND Diablo allows an optional NNTP command 'WHEREIS' which returns the location of an article on the local spool (filename, offset and size). This option is useful for a reader accessing the article from an NFS mounted spool. The use of NFS in Diablo is strongly discouraged, but some people are forced to use diablo in an NFS-only environement. 200 news.example.com NNTP Service Ready WHEREIS <3ba5b2d9.440788@news.tel.hr> 223 0 whereis <3ba5b2d9.440788@news.tel.hr> in \ /news/spool/news/D.00fe7ee6/B.036a offset 83825 length 1018 (line wrap added for this document)