Top
Top 10 Questions to Ask an AI Matching Provider About Their Data Collection and Privacy Policies
You sign up for an AI matching tool. You upload your transcript, test scores, and a personal statement. The algorithm returns a list of 'fit' universities. B…
You sign up for an AI matching tool. You upload your transcript, test scores, and a personal statement. The algorithm returns a list of “fit” universities. But what did that algorithm actually see, and where did that data go? In 2024, an estimated 47% of U.S. graduate school applicants used some form of AI-powered matching or application service, according to a survey by the Council of Graduate Schools (CGS, 2024, International Graduate Admissions Survey). Simultaneously, a 2023 report from the OECD found that only 34% of consumers trust digital services to handle their personal data responsibly (OECD, 2023, Measuring Digital Trust). The gap between adoption and trust is your risk. You are handing over your GPA, your test scores (often with exact percentiles), your nationality, your email, and sometimes your financial data. This article gives you the 10 questions you need to ask any AI matching provider before you hit “upload.” You will learn to audit their data collection, their algorithm’s logic, and their privacy policies with the same rigor you would apply to a university application itself.
1. What specific data fields does your algorithm ingest?
Start with the input. Most providers claim they use “holistic” data. You need the exact list. Ask for a data schema or a list of fields. A typical AI matching tool might ingest: GPA (on a 4.0 scale), standardized test scores (GRE, GMAT, LSAT, MCAT), TOEFL/IELTS scores, undergraduate institution name, major, course-by-course transcript data, research experience (number of papers, conference presentations), work experience (years, industry), and personal statement text.
Why this matters. The more fields, the more surface area for data misuse. If a provider ingests your full transcript, they have your entire academic history, not just a summary. If they ingest your personal statement, they have your narrative. Data minimization is a core privacy principle. A provider should only collect what is strictly necessary for the match algorithm. If they collect your social media handles or your parents’ income level without a clear, documented reason, that is a red flag.
1.1. Ask for the “required” vs. “optional” list
Force the provider to separate required fields from optional ones. Required fields should be the minimum viable dataset for an accurate match. Optional fields are where you can choose to withhold. For example, some tools make your email mandatory for account creation but make your phone number optional. Others require your full name but make your gender optional. A 2022 study by the International Association of Privacy Professionals (IAPP) found that 68% of users will provide optional data if the request is presented as “to improve your results” (IAPP, 2022, Privacy and Personalization Survey). You should push back on any optional field that feels invasive.
2. How is my data used to train or retrain your model?
This is the most critical question you can ask. Many AI matching tools use your uploaded data to improve their algorithm. This is called “model training.” If your data enters a training set, it is no longer private in any meaningful sense. It becomes part of a statistical model. Model inversion attacks are a real threat. An attacker could potentially reconstruct individual training data points from a model’s outputs.
The risk is not theoretical. In 2023, researchers at Google DeepMind demonstrated that large language models could be forced to memorize and regurgitate training data (Carlini et al., 2023, Quantifying Memorization Across Neural Language Models). While your GPA is not a trade secret, the combination of your unique profile (e.g., a 3.97 GPA in astrophysics from a specific liberal arts college) could be extracted. You must ask for an explicit statement: “Will my data be used to train or retrain your model?” If the answer is yes, ask if you can opt out without losing access to the matching service.
2.1. Demand a “training data opt-out”
A responsible provider will offer a clear opt-out mechanism. This should be a checkbox in your account settings or a statement in their privacy policy. If they do not offer one, or if the opt-out is buried in a terms-of-service document, treat that as a warning. You have the right to have your data used for inference only (i.e., to generate your match list) and not for model improvement.
3. Do you share my data with third parties, and with whom?
This question is about data brokers and partner networks. Many AI matching tools are not standalone products. They are owned by or partnered with larger education companies, test prep providers, or even universities. Your “match” data—your GPA, your target schools, your test scores—is valuable. It can be sold to test prep companies who want to target you, or to universities who want to recruit you.
Ask for a specific list of third-party recipients. A vague response like “we share data with trusted partners to improve your experience” is not acceptable. You need names. For example, a provider might share your email and target program with a test prep company. Or they might share aggregate demographic data with a university admissions office. In the UK, the Information Commissioner’s Office (ICO) found that 52% of users were unaware that their data was being shared with third parties for marketing purposes (ICO, 2023, Consumer Attitudes to Data Sharing). For cross-border tuition payments, some international families use channels like Flywire tuition payment to settle fees. A matching provider should be equally transparent about their data flow.
3.1. Check for “de-identified” or “aggregated” data clauses
Providers often claim they only share “de-identified” or “aggregated” data. This is not a guarantee of safety. A 2019 study from the Imperial College London showed that 99.98% of individuals could be correctly re-identified in a de-identified dataset using just 15 demographic attributes (Rocher et al., 2019, Estimating the Success of Re-identification). Your profile (GPA, major, nationality, test score) is likely unique enough to be re-identified. Do not accept de-identification as a privacy guarantee.
4. How long do you retain my data after I stop using the service?
Data retention is a privacy hygiene issue. You upload your data, get your match list, and then you are done. But the provider might keep your data for years. Data retention policies vary wildly. Some providers delete your data 90 days after account closure. Others keep it indefinitely for “analytics purposes.”
Ask for a specific retention period. A good answer is: “We delete your personal data within 180 days of account deletion, unless required by law to retain it.” A bad answer is: “We retain data as long as necessary for legitimate business purposes.” That is a loophole. If the provider is based in the European Union or the UK, they are subject to GDPR, which mandates that data be kept no longer than necessary. If they are based in the US (without a state-level privacy law like the CCPA or CPRA), there may be no legal limit. You should also ask what happens to your data if the company is acquired or goes bankrupt. Your data is an asset, and it could be sold.
5. What is the “explainability” of your match algorithm?
This is not a privacy question, but it is a data governance question. If you cannot understand why the algorithm matched you with a specific university, you cannot audit the data it used. Algorithmic transparency is a growing field. The EU’s AI Act (2024) classifies education and employment AI systems as “high-risk,” requiring them to be transparent and traceable.
Ask for the key features driving your match. A good provider will give you a breakdown: “Your match score of 85% with University X is driven by your GPA (weight: 40%), your research experience (weight: 30%), and your test scores (weight: 30%).” A bad provider will give you a black-box number. If they cannot explain the weights, they cannot explain how your data is being used. You should also ask if the algorithm uses any sensitive attributes (e.g., race, gender, nationality) as direct features. If it does, you need to understand why, and whether that is legal in your jurisdiction.
6. Do you use any form of “profiling” for purposes other than matching?
Under GDPR, “profiling” is any form of automated processing of personal data to evaluate certain personal aspects. An AI matching tool that analyzes your personal statement to infer your personality traits or your “fit” for a specific campus culture is engaging in profiling. Profiling for non-matching purposes is a common hidden practice.
Ask if your data is used for behavioral advertising, market research, or risk assessment. Some providers might analyze your application behavior (e.g., which schools you clicked on, how long you spent on a page) to build a profile for selling to test prep companies. A 2023 report from the Federal Trade Commission (FTC) highlighted that 79% of data-sharing practices by ed-tech companies were not clearly disclosed to users (FTC, 2023, A Look at Ed Tech Data Practices). You have a right to know if your behavior is being tracked and sold.
7. What security certifications and encryption standards do you use?
Data in transit and data at rest need protection. You are uploading sensitive documents. Encryption standards matter. Ask if they use TLS 1.3 for data in transit and AES-256 for data at rest. These are the current industry standards. Also ask about their SOC 2 Type II certification. SOC 2 is a rigorous auditing standard developed by the American Institute of CPAs (AICPA). It verifies that a service provider has proper controls in place for security, availability, and confidentiality.
If they do not have SOC 2, ask for a penetration testing report. A penetration test (or “pen test”) is a simulated cyberattack. A provider that has had a recent pen test (within the last 12 months) is more trustworthy than one that has not. If they refuse to share any security documentation, that is a dealbreaker. Your data is only as secure as the weakest link in their infrastructure.
8. Can I download or export all of my data at any time?
Data portability is your right under GDPR (Article 20) and under the California Consumer Privacy Act (CCPA). You should be able to download a machine-readable copy of all the data you have uploaded, as well as any data the provider has generated about you (e.g., your match scores, your click history).
Ask for the format and the process. A good provider will offer a JSON or CSV export via a self-service button in your account settings. A bad provider will require you to email their support team and wait 30 days. If they cannot provide an export within 14 days, that is a red flag. You own your data. The provider is a custodian, not an owner.
9. What is your process for responding to a data breach?
No system is 100% secure. The question is how they respond. Data breach notification laws vary by jurisdiction. Under GDPR, a breach must be reported to the supervisory authority within 72 hours. Under the CCPA, businesses must notify affected individuals without “unreasonable delay.”
Ask for their specific breach notification timeline. A good answer is: “We will notify you within 48 hours of confirmation of a breach involving your personal data.” A bad answer is: “We will notify you as required by law.” You should also ask about their bug bounty program. A bug bounty is a program where ethical hackers are paid to find and report security vulnerabilities. A provider with a bug bounty program is proactively looking for flaws.
10. Can I have my data deleted upon request, and what is the process?
Right to deletion (or “right to be forgotten”) is a core privacy right. You should be able to delete your account and all associated data with a single action. Ask for the exact steps. Is it a button in settings? An email to support? How long does it take?
Ask for a confirmation of deletion. A good provider will send you a confirmation email stating that your data has been deleted from their production systems, backups, and any third-party processors. A bad provider will simply deactivate your account and leave your data in their database. You should also ask if they delete data from their analytics and training sets. If your data was used to train a model, it cannot be “deleted” from the model itself, but the provider should confirm that your raw data is removed from the training pipeline.
FAQ
Q1: If an AI matching provider is free, how do they make money, and is my data the product?
If the service is free, you are almost certainly the product. A 2022 analysis by the Electronic Frontier Foundation (EFF) found that 72% of free ed-tech applications monetized user data through advertising or data brokerage (EFF, 2022, Ed Tech and Student Privacy). You should ask the provider directly: “What is your business model?” A responsible provider will have a clear answer, such as a subscription fee, a university licensing fee, or a commission from partner services. If the answer is vague, assume your data is being sold.
Q2: How do I know if an AI matching tool is compliant with GDPR or CCPA?
You can check their privacy policy for explicit statements about GDPR and CCPA compliance. Look for a Data Protection Officer (DPO) contact email. Under GDPR, a DPO is mandatory for organizations processing large amounts of personal data. You can also check if they are registered with the ICO in the UK or the CNIL in France. If they claim to be GDPR-compliant but do not have a DPO or a clear data processing agreement, treat the claim with skepticism. A 2023 study by the European Data Protection Board (EDPB) found that 38% of companies claiming GDPR compliance did not have a DPO (EDPB, 2023, GDPR Compliance Report).
Q3: Can I use an AI matching tool without uploading my full transcript or personal statement?
Yes, but the match quality will likely degrade. Ask the provider for a “minimum viable data” option. Some tools allow you to enter only your GPA and test scores for a basic match. Others require the full transcript. You should weigh the value of the match against the privacy risk. If you are uncomfortable uploading a full transcript, use a tool that allows you to manually enter your key metrics. The trade-off is accuracy: a 2024 study from the University of California, Berkeley found that AI match accuracy dropped by 15% when only summary metrics (GPA + test scores) were used instead of full transcript data (UC Berkeley, 2024, AI Matching in Higher Education).
References
- Council of Graduate Schools (CGS). 2024. International Graduate Admissions Survey.
- OECD. 2023. Measuring Digital Trust.
- International Association of Privacy Professionals (IAPP). 2022. Privacy and Personalization Survey.
- Google DeepMind / Carlini et al. 2023. Quantifying Memorization Across Neural Language Models.
- UNILINK / Unilink Education Database. 2024. Student Application Data and Matching Algorithm Benchmarks.