• meowmeowbeanz@sh.itjust.works
    link
    fedilink
    arrow-up
    18
    arrow-down
    4
    ·
    edit-2
    16 hours ago

    🚨 BIG NEWS Y’ALL! 🚨

    Someone just saved ALL the CDC’s public data before it could disappear! 🦅

    What’s the Deal?

    Some mystery hero downloaded everything from the CDC’s website (that’s 98 GIGABYTES of health info!) and uploaded it to the Internet Archive on Jan 28th. Think of it like making a backup copy of your phone before it breaks!

    Why Should You Care?

    • This is YOUR health data - stuff about vaccines, diseases, and public health that your tax dollars paid for! 🏥
    • Once this info is gone from CDC’s website, it could be really hard for your doctor to get important updates
    • Researchers need this to keep studying ways to keep Americans healthy 💪

    What’s Next?

    Smart folks at places like Harvard are making sure this data stays safe by keeping copies. It’s like having multiple backups of your family photos - can’t be too careful!

    Remember folks: Knowledge is power, and someone just made sure we didn’t lose a whole bunch of it! 🎯

    #SaveTheData #PublicHealth #AmericanRight2Know


    Source: Internet Archive upload by anonymous user on Jan 28, 2025 Post by Ed Summers (@edsu@social.coop) - Feb 3, 2025

    • spujb@lemmy.cafeOP
      link
      fedilink
      arrow-up
      2
      ·
      3 hours ago

      As a reminder, AI generated content is against the rules in this community—see the sidebar. I appreciate your instinct to bring some quality content to this space, but let’s please keep in mind that genuine interaction with diverse voices is what makes this community beautiful. :)

      My reasoning:

      • You have personally admitted to writing AI comments in the past: https://sh.itjust.works/comment/16482371
      • Heavy use of markdown headings, bullets, and section dividers is a common pattern in LLM output
      • Use of “it’s like” or “it’s about” phrases as the conclusion to a paragraph are very common in LLM models like ChatGPT
      • Verbatim replication of content from my original post that is common in LLM output and highly indicates an LLM was instructed to create something based on the text of the original post
      • Use of 🎯 emoji does not match context
      • “100% AI generated” response on multiple AI detection websites (GPTZero, Quillbot)

      Any single one of these facts would not lead me to comment, but with all of it combined it makes a pretty strong case. Thank you for your contribution to this community but please let’s keep it genuine in the future! We love and appreciate the real you :)

  • some_guy@lemmy.sdf.org
    link
    fedilink
    arrow-up
    31
    ·
    21 hours ago

    I will grab this torrent when I get home and make it a permanent seed, alongside the one outing nazis in Patriot Front.

    • paris@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      21
      ·
      24 hours ago

      The Internet Archive is, and I really want to emphasize this, Fucking Huge. If you want to help archive it, every upload has an associated torrent you can download and help seed. Torrenting itself isn’t illegal, only torrenting illegal stuff like copyrighted movies. You can buy a relatively cheap refurbished HDD of whatever size you want, set up qBittorrent, and torrent the uploads that you want to make sure are available even if the Internet Archive has to take them down or has a critical data loss failure.

      • LaunchesKayaks@lemmy.world
        link
        fedilink
        arrow-up
        14
        ·
        edit-2
        22 hours ago

        Thank you so much for the advice! I want to preserve important documents like the bill of rights and the constitution, as well as sexual education material, especially stuff pertaining to women and reproductive health. Also banned books. Things the facists are trying to purge and things that are important to me.

    • spujb@lemmy.cafeOP
      link
      fedilink
      arrow-up
      3
      ·
      1 day ago

      i’m not smart enough for this but maybe look to communities like r/DataHoarder to get started

  • brucethemoose@lemmy.world
    link
    fedilink
    arrow-up
    114
    ·
    edit-2
    2 days ago

    We are screwed if the Internet Archive goes down, right?

    Seems like a huge point of failure for one entity.

    • kautau@lemmy.world
      link
      fedilink
      arrow-up
      57
      ·
      2 days ago

      Agreed, I think the biggest issue though is just scale. It’s over 100 petabytes of data. Not outside the realm of big cloud providers to mirror, but they don’t really give a shit. It would require some sort of significant distributed software solution for the community to work with. Not impossible, but as far as I know, nobody’s taken up the mantle yet as I think it would need custom software just to begin the solution of how to distribute it as a sharded set of community mirrors, different people just mirroring individual pieces.

      • Taalnazi@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        7 hours ago

        So about 104,857,600 GB? You’d need 105,000 people with 1 TB each to save that. Or…

        Assuming you bought 30 TB SSDs, you’d need about 3,500 of those, costing €80 each.

        That’d be €280k, but let’s round it to €300k.

        If every person spent €960 (or €80 per month), then each person could get 12 of those SSDs. You’d need 8,750 people to do that.

        Should be doable if crowdfunded by a community, or if you had some big donor. Then you’d need to connect it.

      • Enceladus@lemmy.ca
        link
        fedilink
        arrow-up
        16
        arrow-down
        1
        ·
        2 days ago

        HexOS has a plan for shared encrypted data. With the simplicity of installation and management it could take off mainstream as personal NAS are gaining popularity, but its still in early development.

        • Swedneck@discuss.tchncs.de
          link
          fedilink
          arrow-up
          13
          ·
          1 day ago

          IPFS is the way to go IMO, it’s so perfect for archival that it pains me that it’s still pretty unknown

          the fact that you don’t need any sort of central organization for everyone to help seed data is amazing, no more duplicate torrents splitting seeders, so long as you have identical data the network just figures it out.
          If you have the hash for a piece of data you can just set a computer to watch for someone to start seeding it, even if the last time anyone saw the data was decades ago and a dude just found a CD in their recently passed dad’s basement, if that dude seeds it overnight and then their computer explodes, you’ve now downloaded it and it’ll remain available. It’s so fucking good.

    • grue@lemmy.world
      link
      fedilink
      English
      arrow-up
      59
      ·
      2 days ago

      It it long past overdue for the Internet Archive to move to the EU or Switzerland or something.

      • ⛓️‍💥@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        1 day ago

        Would be best if there were several mirrors in several countries. It’s unfortunately too large to realistically host via crowd sourcing. The best you could do is something ala Storj where fragments are redundantly distributed across various hosts.

  • jherazob@beehaw.org
    link
    fedilink
    English
    arrow-up
    23
    ·
    1 day ago

    Okay, given how things are going, do we know if the Internet Archive has a backup plan for when these fucks attack it in earnest?

    • spujb@lemmy.cafeOP
      link
      fedilink
      arrow-up
      49
      ·
      edit-2
      2 days ago

      from the linked page

      Excludes corrupt datasets and data not publicly accessible.

    • shikitohno@lemm.ee
      link
      fedilink
      arrow-up
      3
      ·
      1 day ago

      Same, especially before the inevitable attacks on the Internet Archive to come. Who knows what nonsense will be in the works to try and get this removed, or the whole project shut down in the coming years.