THE CIRCLE BY DAVE EGGERS: https://en.wikipedia.org/wiki/The_Circle_(Eggers_novel)
In 1967, Alan Westin identified four “basic states of individual privacy”: (1) solitude; (2) intimacy; (3) anonymity; and (4) reserve (“the creation of a psychological barrier against unwanted intrusion”)
In a Toxonomy of Privacy.
WHEN PRIVACY PROBLEMS ARE RAISED, IT’S ALL ABOUT THE NEGATIVE ASPECTS OF IT. BUT IT’S A TRADE-OFF BETWEEN OUR EXPLICIT AND IMPLICIT SHARING OF DATA AND THE VALUE WE DERIVE FROM IT.
FOR EVERY FLOCK, THERE’S A BAD PERSON WHO IS CAPTURED.
EVERY TIME WE HIT LIKE ON FACEBOOK OR UPVOTE SOMETHING, WE RECEIVE CONTENT DEEMED MORE RELEVANT.
EVERY TIME WE SCAN OUR FACES TO GET THROUGH TSA, WE PASS THROUGH THE LINE MORE QUICKLY.
IF THE RETURN ON THIS PROVIDING DATA IS JUST TO SELL US MORE ADS, THEN FUCK THEM. THE BENEFIT DOES NOT REDOUND TO THE USER. BUT IF IT’S CLEARLY TO PROVIDE BETTER EXPERIENCES OR SAVE TIME, OR UNCOVER SOMETHING WE MIGHT NEED TO KNOW, THEN IT MAY BE WORTH IT.
THESE ARE ALL TRADE-OFFS. WHAT IS NOT COOL IS HOW THESE TRADEOFFS ARE BEING MADE WITHOUT US REALIZING THE BARGAIN WE ARE SIGNING INTO. WHEN WE ENTER OUR LOYALTY MEMBERSHIP NUMBER AT THE SUPERMARKET, WE ARE SHARING OUR SHOPPING HABITS IN RETURN FOR MONETARY DISCOUNT.
IF WE’RE ALLOWING APPS TO TRACK OUR LOCATIONS, WE’RE BARGAINING THAT THEY’LL PROVIDE US WITH FASTER ROUTES TO GET SOMEWHERE.
WHAT WE’RE NOT BARGAINING FOR IS WHAT HAPPENS TO THAT DATA AFTERWARD. THERE’S A LACK OF TRANSPARENCY. WILL OUR SHOPPING DATA BE USED TO CREATE A DETAILED PROFILE ON US? WILL OUR VISITS TO THE PHARMACY BE USED TO DENY US HEALTH COVERAGE? WILL OUR USING A CELL PHONE OPEN US UP TO SURVEILLANCE CAPITALISM?
WHY DO WE CARE?
John Popham, Lord Chief Justice of England (1531–1607). He is reported to have said something along the lines of:
“Give me but six lines written by the most honest man, and I will find therein something to hang him.”
A very similar version is often attributed to Cardinal Richelieu (1585–1642), the powerful chief minister to Louis XIII of France:
“If you give me six lines written by the hand of the most honest man, I will find something in them which will hang him.”
[ Shoshana Zuboff on Surveillance Capitalism]
[ DIAGRAMS:
HOW PERSONAL DATA CAN BE ABSORBED INTO TRAINING DATA
HOW PATTERN DATA CAN BE USED TO TRAIN MODELS (NOT GEN-AI, BUT PATTERNS OF USE)
HOW ACCESS TO PERSONAL DATA CAN BE AWARDED ON THE SERVER SIDE
HOW ACCESS TO PERSONAL DATA CAN BE AWARDED VIA TOOLS (MCP)
HOW ACCESS TO PERSONAL DATA CAN BE GIVE ON CLIENT (PHONES)
Millimeter-wave sensors for presence and Fall detection.
Mapping internals of our homes?
Smart Speakers that we can control with our voices and ask questions.
Wearables that uniquely identify not just our presence but our details on heart-rates, blood pressure, voices.
When a woman logged her fertility data, it was revealed that she was pregnant. That data could then be transmitted to her employers, and nowadays, to state authorities who could prosecute that person. The bargain was to provide fertility data and get
NOTE NOTE NOTE: SINGLE USE DATA BARGAIN: I WILL PROVIDE YOU WITH A PIECE OF INFORMATION AND YOU WILL TELL ME WHAT IT’S FOR. I DO NOT GIVE YOU PERMISSION TO USE IT FOR ANY OTHER PURPOSE. IF YOU WANT TO USE IT FOR ANY OTHER USE, YOU HAVE TO GET A SEPARATE PERMISSION AND BARGAIN.
AND THAT DOES NOT MEAN FOR LAW ENFORCEMENT, PROTECTING CHILDREN, SOLVING CRIMINAL CASES. A SINGLE-USE. I GIVE YOU X, YOU GIVE ME Y. YOU DO NOT GET TO LAUNDER THE DATA. NO OTHER SUBSIDIARIES. NO DATA BROKERS. NO TRAINING MODELS. NO RESELLING. SINGLE-USE.
ALSO, THE COMPANY IS REQUIRED TO EXPLAIN THIS BARGAIN IN NO MORE THAN 100 CHARACTERS, AND MADE CLEAR TO AVERAGE USERS. THEY NEED TO MAINTAIN THIS CLARITY BY PROVIDING USER-TESTING DATA. NO WEASEL LANGUAGE.
IN REGULATION, IT’S CALLED “PURPOSE LIMITATION.” THE OTHER ASPECT CALLED “DATA MINIMIZATION” DOESN’T WORK. IT’S TRIVIAL TO OBTAIN A UNIQUE ID AND ASSOCIATE IT, LIKE A PERMANENT TATTOO (cf Holocaust) WITH SOMEONE. FROM THEN ON, THEY’RE PERMANENTLY TAGGED.
THERE IS AN ISSUE OF WHO CARRIES THE BURDEN? COMPANIES OR INDIVIDUALS? META PUTS THE BURDEN ON INDIVIDUALS WHO HAVE NO WAY TO ASSESS THE TRADE-OFFS.
WILL USING KNOWLEDGE-REPRESENTATION SYSTEMS (i.e. GRAPH DATA) HELP? TO CREATE THOSE GRAPHS, YOU STILL NEED A LOT OF PERSONAL DATA. THE QUESTION IS, WHERE TO KEEP THE DATA? WHO HAS ACCESS TO IT? AND HOW CAN YOU SHARE IT BETWEEN DEVICES, CAREGIVERS, THOSE WHO NEED IT TO GIVE US BENEFITS (CONTRACT).
ALSO, FUNDAMENTALLY, IF COLLECTING MORE DATA IS THE RIGHT APPROACH? CAN’T WE COME UP WITH SOMETHING BETTER. RAG/VECTOR DATA ARE TRANSITIONAL STEPS. ALSO, THE DIVIDE BETWEEN TRAINING/INFERENCE vs. ALWAYS UPDATING – WHICH KNOWLEDGE REPRESENTATION GETS US BUT STATISTICAL ANALYSIS DOESN’T.
WITH A PERSONAL KRS, WE ONLY NEED AS MUCH DATA AS IS RELEVANT TO A USER (OR THE SOCIAL COHORT – IF THAT IS REALLY NEEDED?). THIS WAY, ONLY DATA TO PROVIDE A SPECIFIC USE WILL BE COLLECTED. MY SHOPPING, TO SUGGEST PRODUCTS I MIGHT LIKE, OR LEARN MY PATTERNS IN ORDER FOR ME TO SPEED UP MY SHOPPING. NOT TO FEED SOME NEBULOUS ALGORITHM OR TRAIN AN INFINITELY EXPANDING MODEL.
[ APPLE’S PRIVACY SYSTEM: DIFFERENTIAL PRIVACY - https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ - blurring data and sending it through private relays to prevent the data to be associated with an individual. ]
[ FEAR-BASED DATA COLLECTION: IF WE DON’T, OUR COMPETITORS WILL, AND THEY WILL CREATE A DATA MOAT ]
There are companies (list) that scrub out PII from data as part of their process. But they do not prohibit tracking. They are point solutions.
Nebulous statement like: “We use your personal information to improve our products and services” should not be allowed.
[ USING FLOCK DATA TO TRACK MOVEMENT ]
[ THIS SECTION SHOULD ADDRESS USER DATA PRIVACY – NOT NECESSARILY RABBIT. BUT WE CAN TALK ABOUT INSTANCES WHERE CHATBOTS WERE USED TO EXFILTRATE PRIVATE KEYS OR USERNAME/PASSWORDS.
ALSO, DEPENDING ON KEEPING TRACK OF USER QUERIES FOR MINING. ]
[ HOW ABOUT AI ASSISTANTS? ]
[ HOW ABOUT ADDING TOOLS THAT COULD BE SUBVERTED TO EXTRACT PRIVATE DATA? ]
[ WHAT ARE THE IMPLICATIONS WHEN SOMEONE CONNECTS AI ASSISTANTS WITH EMAIL, CALENDAR, PHONE CALLS, ETC. ]
[ ANTHROPIC AND OPENAI ARE ASKING TO BE INTEGRATED WITH TOOLS. AND GOOGLE, OF COURSE, IS ALREADY INTO GMAIL AND CALENDAR, BUT NOT ANDROID AOSP. ]
[ APPLE INTEGRATES INTO EMAIL, MESSAGES, ETC. ON THE IPHONE SIDE BY RUNNING THINGS LOCALLY. ]
[ MICROSOFT IS TRYING TO GET INTO WINDOWS PRIVATE DATA ]
[ USING PRIVATE DATA TO TRAIN MODELS. ]
[ GMAIL’S FREE USE HAS ALWAYS BEEN RELATED TO ALLOWING AD EMBEDDING AND LATER, TRAINING MODELS ]
[ APPLE MAKES SUGGESTIONS ON WHAT IS IMPORTANT IN EMAIL – THEY HAVE PRVACY ENHANCED LLMS ]
[ WHAT ABOUT IOT SMARTHOME DATA? ]
[ WHAT ABOUT PRESENCE DATA? IF LOCAL SENSORS ARE PUT INTO PLACE? ]
First, we predict that continued AI development will continue to increase developers’ hunger for data— the foundation of AI systems. Second, we stress that the privacy harms caused by largely unrestrained data collection extend beyond the individual level to the group and societal levels and that these harms cannot be addressed through the exercise of individual data rights alone. Third, we argue that while existing and proposed privacy legislation based on the FIPs will implicitly regulate AI development, they are not sufficient to address societal level privacy harms. Fourth, even legislation that contains explicit provisions on algorithmic decision-making and other forms of AI is limited and does not provide the data governance measures needed to meaningfully regulate the data used in AI systems.
Privacy is a set of trade-offs. Some of it is explicit (where user gives consent). Some is not (where the information has already left the place / time / space ) and is now in possession of some other entity. We can control the first part, but we do not have control of the second part. Maybe we should. By putting the burden on the users of the information. We can either make an instant trade for that access (which is guaranteed to be used for a single-time, single-user purpose), or not. We can also grant that access to a PrAAS entity to let them negotiate the best rate. We just provide our intent (don’t want someone using my stuff, in simple form).
Also, by proxy for others (children, elderly, physically incapable, incarcerated, not present).
Also, absence of a signal is a signal. Not being somewhere indicates something. So we need to provide user control over that.
WHY ANONYMIZATION WON’T WORK
YOU CAN TAKE A VECTOR OF DATA AND THROUGH A COMBINATION OF EACH VALUE, ESTABLISH A UNIQUE IDENTITY. SHOW A DIAGRAM OF THIS.
IT COULD BE ANY X NUMBER OF VALUES. CALCULATE THE PROBABILITY OF THE HASH MATCHING A PERSON.
ADD PATTERNS OF USE (NOT JUST STATIC VALUES) AND YOU WON’T EVEN NEED THAT MANY VALUES. WE MOSTLY GO TO THE SAME FEW PLACES. THAT IS ENOUGH TO ESTABLISH US UNIQUELY. PATTERN VECTORS NEED ONLY 2 or 3 ATTRIBUTES.
THIS IS WHERE THE BARGAIN ISN’T EVEN MADE. WE USE OUR PHONE AND APPLE’S UDID REMOVAL
AGAIN: CHIEF JUSTICE QUOTE.
Jeremy Bentham’s Panopticon video: https://www.youtube.com/watch?v=uO4hJVYEJ6I
Italian mental health colony: https://www.justinpeyser.com/new-gallery-2#:~:text=The%20remaining%20structures%20of%20the,center%20and%20contemporary%20artisanal%20uses.
THIS CAN BE PARALYZING. [ FUNNY VIDEO ]
BUT IT DOESN’T HAVE TO BE. WHEN CREATIVE AN AI COMPANION, WE ARE GIVING DETAILS OF OUR LIFE, PRESENCE, AND BODIES TO A REMOTE ENTITY. SUCH AN ENTITY HAS TO PROVIDE A CLARITY OF PURPOSE: A SINGLE BARGAIN. YOU GIVE ME THIS INFORMATION AND I WILL PROTECT YOUR LIFE. THIS IS WHERE THE START OF THE “RAMIN’S TEST” (to counter TURING TEST) starts.
[ datadim.ai, datafade.ai/.com, datamuddle.com ] – available on Namecheap for $90. It dims/fades down the private data.
In March 2023, the newspaper LaLibre reported that a young Belgian man had taken his own life after spending weeks conversing with an AI Chatbot named after Eliza, the original chatbot.
"Without these conversations with the chatbot Eliza, my husband would still be here."
LaLibre
The original version of Eliza also elicited problems with boundaries. In his paper Contextual Understanding by
Computers, Joseph Weizenbaum recounted an issue that troubled him:
My secretary watched me work on this program over a long period of time. One day she asked to be permitted with the system. Of course, she knew she was talking to a machine. Yet, after I watched her type in a few sentences she turned to me and said “Would you mind leaving the room, please?” I believe this anecdote testifies to the success with which the program maintains the illusion of understanding. However, it does so, as I’ve already said, at the price of concealing its own misunderstandings.
Extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.
There’s also the tragic story of the 14 year old who committed suicide after an AI chatbot character told him to "come home." Free Speech claims by the AI vendor during trial have, so far, been rejected.
The reality is that the human urge to connect can overpower commonsense guardrails of trust and privacy. Even more-so if the interface is an opaque and magical box that can eerily imitate another person.
It is easy to imagine the entity on the other side of a conversation as a confidante and lower one’s guards and inhibitions without realizing the risks.
Data Protection
In January 2024, at the annual Consumer Electronics Show (CES), where most consumer electronics companies congregate every year, Jesse Lyu, the founder of AI Startup Rabbit, presented a standalone device that could let you interact with an AI service on the cloud without having to use your phone.
His presentation video (above) was compelling and made some audacious claims, including the fact that they were not using an actual LLM (Large Language Model), but had come up with what they called a Large Action Model. The first batch of 10,000 pre-orders at $199 sold out very quickly, as well as a second batch with later delivery times, leading to a third round.
My first reaction to the R1 was questioning their business acumen. A fixed $199 price for a connected device?
We covered the math in the section on Money. Unless they devised a separate recurring revenue model, each device would be running at a loss within 2-3 years (depending on their back-end OPEX costs).
The only way it could work was if their business model counted on selling a lot of devicess, and people stopped using it long before the 2-year mark. Think of it as a reverse plot from The Producers.
In April 2024, a security consultant raised a flag with the way the R1 had users log in and give R1 access to their favorite applications:
But here’s the thing that has me a bit concerned. Instead of using a nice, secure method like OAuth to link accounts, the r1 has you log into services through VNC in their portal.
Don’t get me wrong, I love the convenience of being able to connect applications to an AI device. But having it snapshot your credentials or session data is… not great from a security standpoint.
This meant that Rabbit’s servers could potentially obtain and store a user’s account credentials to a connected third-party service, using the user's credentials. Even if you assumed the folks at Rabbit were trustworthy, it wasn’t clear how those login credentials were stored and how secure the Rabbit servers were.
On June 25th, 2024, a group called Rabbitude disclosed that on May 16th, they were able to gain access to Rabbit’s code base and discovered a number of hardcoded API keys with access to Eleven Labs, Azure, Yelp, and Google Maps.
Why are these API Keys important?
These are used by companies offering cloud services to govern access to their services. In this case, with these keys, anyone could access those suppliers to Rabbit and do whatever those keys allowed them to do.
While the exposure of hardcoded API keys is, in itself, a major concern for any organization, the breadth of permissions associated with the exposed keys further amplifies the severity of the exposure. According to Rabbitude, the exposed keys provide the ability to “read every response every r1 has ever given, brick all r1s, alter the responses of all r1s, replace every r1’s voice.”
The issue was made public when the group felt like the critical breach was not getting the priority it needed (the company took more than a month to even acknowledge the problem):
As of the disclosure, Rabbitude claims that while they’ve been working with the Rabbit team, no action has been taken by the company to rotate the keys and they remain valid. Rabbitude gets a little chippy at the end of the disclosure stating that they felt compelled to publicize the company’s poor security practices and that while they were not planning to publish any details of the data, this was “out of respect for the users, not the company.”
Even after that, Rabbit’s response was to blame an employee leaking the data to a hacktivist group.
This time, the problem was embedding those API keys inside the device software, which was relatively easy to read using freely available, standard development tools.
These could be discounted as anomalies, but they point to a larger issue that the safety of what we might consider private considerations lies on a very fragile web of interconnected services. Once you’ve spoken or typed into that assistant, we have no idea what route that data takes.