I’m trying to look at this from a neutral point of view which is why I believe enforcing a disclosure, when (AI) models are used, would benefit the community.

I believe using models can harm privacy when not used correctly because they’re more likely to output misleading or outright incorrect information due to “hallucinations”. And from my experience, more often than not is this the case with the projects I see.

I’m curious what others think about this, if you disagree, please let me know why.

  • FineCoatMummy@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 days ago

    I read something about that. I will try to find the link and post

    Ha! Found it!

    Large-scale online deanonymization with LLMs

    We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms.