6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Apr 26, 2024 6:01 am

All times are UTC




Post new topic Reply to topic  [ 1 post ] 
Author Message
PostPosted: Sat Dec 19, 2020 5:25 pm 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
Hello to the lurkers. Actually, hello to everyone but particularly the lurkers. Some of the regulars of the forum may not be aware of its current size. This and other impediments may cause people to lurk for one year or three before posting anything. I suspect many people never get "up to speed" or are intimidated by the volume and density of technical information (and the general aptitude of the people who write it). Unfortunately, volume alone is an impediment and this is extremely unlikely to reduce in any appreciable quantity.

Some very good advice given by GARTHWILSON in 1999 was to spend a few evenings reading through the forum archive. In principle, this remains very good advice. For example, it reduces repetition. However, over the subsequent 21 years, depending upon how you count, the forum has grown by more than three million words. To put this succinctly, it would be quicker and easier to read the Bible twice. This problem is not unique to our forum. It is typical of many interest groups which have been ongoing for more than 20 years. For example, the Linux Kernel Mailing List is famously impenetrable due to the volume and density of discussion.

I've been off-line for a few months; partly due to minor illness. However, in my absence, I had the foresight to scrape a large proportion of the forum and sift it into a manageable volume of data. What I am about to suggest might not be popular with the forum administrators because it might encourage a moderate spike in bandwidth consumption. Therefore, it is recommended that you first try this elsewhere and allow our administrators the opportunity to collate a compressed public dataset, in the manner of Wikipedia's snapshots, which exist for similar reasons. Also, if you incur bandwidth here or elsewhere, please have the decency to read it in full. Furthermore, don't rapidly strake any server or cause unnecessary bandwidth consumption.

Now that I have stated the generally accepted ground-rules for spidering, I will explain a technique for working smarter. It can be achieved with moderate experience of Unix pipes and minimal intermediate storage. I hope that future contributors can migrate all of this over to a self-hosting 6502 derivative. Indeed, I explain my methods in the hope that the forum may gain more knowledgeable contributors with this purpose in mind.

I use a utility program to convert HTML to plain text and a separate N-gram pattern matching utility program to de-duplicate quoted text and other repetition. For further gains, I omit glue words. The choice of omissions is likely to be influenced by temperament but it may be desirable to reduce text by a further 10% or more. The resulting text is then processed with text-to-speech software running up to 400 words per minute. It is possible to adjust the speed interactively or dump speech to one or more files which are suitable for music players and similar. Obviously, this arrangement is not suitable to study source code in appreciable detail nor is it convenient for browsing diagrams and references. It is otherwise suitable to reduce the period of lurking by a factor of 10 or more. Indeed, by using a combination of publicly available software, such as curl, wget, an N-gram script, espeak and/or ffmpeg, it is possible to make MP3 files (or similar) which may be played while doing other activities.

In an impaired state and with a relative lack of concentration, I often read more than one million words per month. This may sound impressive but if you divide the figures, you'll find that it requires moderately more than one hour per day. And I don't always have a productive day. Anyhow, I hope my message makes people generally more productive and knowledgeable, helps the infirm, reduces repetition on the forum, and brings more insight to technical problems.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: