Page 1 of 2
Doc archive links pointing to web archive?
Posted: Sun Jun 02, 2024 8:59 pm
by fachat
I've recently noticed that the links in the datasheet archive section of 6502.org seem to point to the internet archiv web.archive.org.
I understand that this could be a measure to reduce bandwidth, but is this true, is this permanent?
(the performance - time for loading the doc - of the internet archive is way worse than what I remember from 6502.org...)
Thanks
André
Re: Doc archive links pointing to web archive?
Posted: Sun Jun 02, 2024 9:28 pm
by BigEd
(I think it's been that way for a while, and like you I imagine it's to save bandwidth. But at present (May 2024) the internet archive is under some denial of service attack, which might explain why it's not always as fast as we're accustomed to seeing.)
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 12:35 am
by GARTHWILSON
Andre, where did you see this? I was surprised by what you said, so I went to that section, and went down the list on every brand, mousing over all the links to see what shows at the bottom of the screen, and every single one of them started with "http://6502.org/documents/datasheets/".
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 5:34 am
by fachat
But if you click on them they end up in the internet archive.
Not sure if all of them but many.
The one I tried yesterday was the Rockwell 6545-1, but I've seen others before.
André
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 6:04 am
by GARTHWILSON
But if you click on them they end up in the internet archive.
Not sure if all of them but many.
The one I tried yesterday was the Rockwell 6545-1, but I've seen others before.
I clicked on it, and the .pdf came up, without going to the internet archive. So then I did a <Ctrl>U to see the page source, and took this screenshot of the part about the Rockwell 6545-1:
As you can see, there's nothing there to refer to archive.org, or to anything outside 6502.org.
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 6:49 am
by fachat
Strange. Just tried again, and the browser displayed the PDF with a URL shown (in the address bar) from the internet archives. I use Firefox if that matters. Will try other browsers today
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 6:51 am
by GARTHWILSON
I use firefox too. You're outside the US, right? I wonder if that has anything to do with it.
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 7:36 am
by fachat
Ok, I tried it on the PC again, tracing the web access. See screenshot.
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 7:39 am
by fachat
P.S.: interestingly the actual PDF seems to be included in the original response, and even two responses from the internet archive. So my browser loads the actual file three times...?
The transferred number of bytes indicate this, as well as clicking on the "response" tab in the browser console, where all three requests with >5MB get a long encoded (base64?) response string.
Oh, the wonders of the modern web...
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 7:48 am
by BigEd
Two of those accesses are 302 - I think they are redirects. The size of the PDF is in the header, but I don't think the PDF is downloaded each time. Might be worth looking a little deeper.
I think Mike has put in the redirects in some straightforward way. If anyone were scraping the document archive, they wouldn't cost him bandwidth (or slow down our accesses to the forum!)
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 8:05 am
by drogon
I tried this. I tried to get the link pointing to the Rockwell 6502, 2nd link on:
http://archive.6502.org/datasheets/rock ... essors.pdf
This sends me a redirect to
http://6502.org/documents/datasheets/ro ... essors.pdf
and that sends me a redirect to
https://web.archive.org/web/20221112220 ... essors.pdf
So someone/thing has intentionally created this archive of 6502.org over on archive.org and altered the server to redirect requests to archive.6502.org which forwards to archive.org....
This also may be a geo-fencing thing. I can't be bothered to check right now as it would mean finding a US based VPN outlet.
Probably OK, however, as of late archive.org is slow and also more frustratingly, archive.org is BLOCKED by default by many ISPs - especially mobiles ones as, as well as being full of technical stuff it's also full of porn and other potentially contentious material.
So I'm blocked from accessing it when out and about which for me right now is 2-3 days a week.
I could go through the shenanigans of enabling adult content for the mobile ISP but it's not worth the hassle.
-Gordon
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 8:14 am
by GARTHWILSON
He must have some sort of redirect set up then. I just looked in my history, and archive.org does show up.
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 10:22 am
by BigEd
It's not a geo thing -
browserling has the same result. (Also a way to access if your ISP isn't happy.)
Another possible workaround/solution is
archive.today although again it's a bit subject to ISP blocking (which is why it has a presence on many top level domains.)
Re: Doc archive links pointing to web archive?
Posted: Mon Jun 03, 2024 3:39 pm
by barrym95838
I'm getting redirected too, but archive.org is currently quite responsive for me, so I wouldn't even have noticed under normal circumstances.
Re: Doc archive links pointing to web archive?
Posted: Tue Jun 25, 2024 6:06 pm
by Mike Naberezny
I've recently noticed that the links in the datasheet archive section of 6502.org seem to point to the internet archiv web.archive.org.
I understand that this could be a measure to reduce bandwidth, but is this true, is this permanent?
No, this is not intended to be permanent.
In the
Git repository for the website, I have a SQLite database file which maps all of the PDF files in the documents archive to known-good copies on
archive.org. I originally did this so that the website can be run locally or can be recreated if something happens to me.
A few months ago, we were having problems where the entire documents archive was being downloaded frequently. The downloaders would sometimes put so much load on the server that the forum became unusably slow. Blocking IPs didn't help because whenever I blocked some, the bulk downloading would start again from new ones.
I started redirecting the document files to archive.org to help deal with this. I was hoping that the people doing the bulk downloading had good intentions and might stop if they realized everything was already backed up on archive.org. However, whenever I look in the logs, they don't seem to have let up. I'd like to restore serving directly from 6502.org and I think the long-term solution will be some kind of rate limiting where an IP address will be redirected to archive.org if it exceeds some reasonable number of downloads within a time period.