I’ve found that all the web archiving software I’ve encountered are either manual (you have to archive everything individually in a separate application) or crawler-based (which can end up putting a lot of extra load on smaller web server, and could even get your ip blocked).

Are there any solutions that simply automatically archive web pages as you load them in your browser? If not, why aren’t there?

I could also see something like that being useful as a self-hosted web indexer, where if you ever go “I think I’ve seen this name before”, you can click on it, and your computer will say something like “this name appeared in a news headline you scrolled past two weeks ago”

  • Arcane2077@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 hours ago

    Check out archive warrior It’s dead simple to set up on Docker, and will run in the background while you help literally save the internet. Ignore the steps about watchtower, as that has been deprecated

  • Æther@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    3 hours ago

    The Firefox extension for archive.org has an option to archive the page you visit if said page hasn’t been archived recently. Its not exactly what you’re asking for, but similar

  • artifex@piefed.social
    link
    fedilink
    English
    arrow-up
    8
    ·
    5 hours ago

    Huh. This seems like one of those “this must exist” situations, but I can’t think of anything that does this, and a brief search suggests there may not be. The closest I could find was The Internet Archive’s Archive-IT, though it’s not an exact match. Otherwise, Archive Webpage , a pricey paid-for option (which seems like a terrible idea) appears to be the closest. OSS/self-host like Archivebox and Linkwarden don’t really do this (though you can save/send a current tab to them), and apart from that… I don’t really see anything.

    • TropicalDingdong@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      5 hours ago

      Yeah, this is exactly what I was thinking, that “surely this must already be a thing”?

      But yeah. I can’t think of something. I mean, its like, you’re already downloading the data. Just write it down somewhere else.

  • NauticalNoodle@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    4 hours ago

    web pages used to sort of operate that way with the ‘Temporary Internet Files’ folder. i’m not sure how it’s changed i just know this was how i used to circumvent websites that disabled right-clicking to save their images.