Building an AI Agent That Doesn't Hallucinate About Your Property

TL;DR: SimpleTurn prevents AI hallucinations through a retrieval-augmented generation (RAG) architecture. The agent only answers from verified property dossier data, never invents facts, and flags knowledge gaps for property managers to fill.

SimpleTurn uses a research-first architecture: every answer is grounded in a verified property dossier assembled from over 30 Canadian data sources (SimpleTurn product specification) — so the agent does not improvise building facts the way a general chatbot would.

AI hallucination is the number one concern we hear from property managers evaluating AI tools. And honestly? It should be. When an AI agent confidently tells a prospective tenant that pets are allowed in a strictly no-pets building, or quotes a rent price that's $200 lower than the actual rate, or describes an in-unit washer-dryer that doesn't exist — the consequences are real. You lose the lease when the prospect shows up and discovers the truth. Or worse, you create legal liability when a tenant signs based on misinformation your AI provided.

We've talked to hundreds of property managers across Canada, and the pattern is always the same. They're excited about AI automation. They understand the value of 24/7 response times and never missing a lead. But the moment you mention "AI-generated responses," their guard goes up — and rightfully so. Their properties are their livelihood, and they can't afford an agent that makes things up.

Here's how we built SimpleTurn to avoid hallucination entirely — and why we believe factual grounding isn't just a feature, but the foundation everything else depends on.

What Is AI Hallucination in Property Management?

In the broader AI world, "hallucination" refers to a model generating content that sounds plausible but is factually incorrect. In the context of property management, the stakes are specific and high. A hallucinating leasing agent might:

Fabricate a pet policy: Telling a prospect their golden retriever is welcome when the building has a strict no-pets rule — leading to a lease dispute or an angry prospect who drove across town for nothing.
Quote the wrong rent: Stating that a one-bedroom starts at $1,800/month when the actual price is $2,050. The prospect builds their budget around the wrong number, and the conversation falls apart at the lease signing stage.
Invent amenities: Describing an in-unit laundry, rooftop terrace, or concierge service that doesn't exist. The prospect's expectations are set, and the reality can't meet them.
Make up neighbourhood details: Claiming the building is a "5-minute walk from the subway" when the nearest station is a 15-minute bus ride away. These small fabrications erode trust instantly.
Promise availability: Confirming that a specific unit is available for March 1st move-in when it's actually been leased — wasting the prospect's time and your team's.

Each of these scenarios has happened with real AI chatbot products in our industry. They're not edge cases — they're the predictable result of deploying language models without the right architecture around them.

Why Language Models Hallucinate

To understand our solution, you need to understand the problem at a fundamental level. Large language models (LLMs) like GPT-4, Claude, and Gemini are, at their core, next-token predictors. Given a sequence of text, they predict the most statistically likely continuation based on patterns learned from their training data — billions of web pages, books, articles, and conversations.

This is an incredibly powerful capability. It's what enables LLMs to write coherent paragraphs, answer questions, and carry on natural conversations. But it has a critical limitation: the model has no internal concept of truth. It doesn't "know" facts the way a database does. It generates text that sounds right based on statistical patterns, regardless of whether it is right.

When you ask a base LLM "What's the pet policy at 450 Spadina Avenue?", it doesn't look up the answer. It generates the most probable response based on patterns it has seen in its training data. If most apartment listings in its training data allow small pets, it might confidently state that 450 Spadina allows small dogs and cats — even if the actual policy is no pets whatsoever. The model isn't lying. It's doing exactly what it was designed to do: produce a plausible-sounding continuation. The problem is that "plausible-sounding" and "true" are very different things.

The core insight is this: an LLM's confidence in its output has almost no correlation with accuracy. A model can state a completely fabricated fact with the same syntactic confidence as a verified one. There is no built-in uncertainty signal.

The Naive Approach (and Why It Fails)

The first thing most teams try when building a property chatbot is straightforward: take a general-purpose LLM, prepend a system prompt like You are a helpful leasing agent for Maple Heights Apartments, maybe include some basic property details, and let the model handle conversations.

This approach fails in predictable ways. When a prospect asks a question that isn't covered by the sparse details in the prompt — which happens constantly — the model fills in the gaps with its general training knowledge. It doesn't flag the gap. It doesn't say "I'm not sure." It generates a plausible answer and presents it as fact. The model can't distinguish between what it was given in the prompt and what it's generating from its training distribution. There is no mechanism for the model to recognize the boundary between "information I was given about this property" and "general patterns about apartments that I learned during training."

Some teams add instructions like "Only answer based on the information provided" or "Say 'I don't know' if you're not sure." These help marginally, but they're not reliable. Instruction-following is a best-effort behaviour in LLMs, not a guarantee. Under adversarial or even just unusual questioning, the model will drift back to generating from its prior — and you'll get hallucinations.

SimpleTurn's Architecture for Zero Hallucination

We didn't solve hallucination with a clever prompt. We solved it with architecture. SimpleTurn's factual grounding system is a multi-layered pipeline where every component is designed to prevent, detect, or mitigate hallucinated content. Here are the seven layers of our approach:

Retrieval-Augmented Generation (RAG)

Instead of relying on the model's parametric memory — its training data — every response SimpleTurn generates is grounded in retrieved context from the property's research dossier. When a prospect asks a question, the system first performs a semantic search over the dossier using dense vector embeddings to find the most relevant data points. Only those retrieved passages are included in the model's context window, and the model is instructed to answer exclusively from the provided context. The model never "free-generates" a factual claim. Every fact must have a source in the retrieved documents. This is the single most important architectural decision we made — it transforms the LLM from an unreliable knowledge source into a reliable language interface over verified data.^[3]

Structured Dossier Format

Our property dossiers aren't unstructured documents — they're richly typed, hierarchical data structures with defined fields for every category of property information: rent_pricing, pet_policy, amenities, parking, utilities, availability, and dozens more. Each field has a defined schema, a data type, a source attribution, and a last-updated timestamp. When a field is empty, it's explicitly marked as null — not omitted. This means the model can distinguish between "the pet policy is no pets" and "we don't have pet policy information for this property." That distinction is critical for preventing hallucination, because the most dangerous gap is one the model doesn't know exists.

Confidence Scoring

Every response SimpleTurn generates receives a confidence score calculated from multiple signals: the semantic similarity between the prospect's question and the retrieved context, the number of supporting data points found, the recency of the source data, and whether the information came from a verified (property manager) or inferred (web research) source. Responses with confidence scores below our threshold are handled differently — they may include hedging language, defer to the human team, or be flagged for review before sending. This calibrated confidence mechanism means the system's expressed certainty actually correlates with its likelihood of being correct — something a raw LLM cannot provide.

Explicit "I Don't Know" Training

We fine-tuned our language model on thousands of examples where the correct response is some variation of "I don't have that specific information, but I can connect you with our leasing team who can help." This isn't a system prompt instruction — it's a learned behaviour baked into the model's weights through supervised fine-tuning. The model has been trained to recognize when retrieved context is insufficient to answer a question and to respond with a graceful deferral instead of a fabricated answer. We treat "I don't know" as a correct answer, not a failure. An agent that honestly defers is infinitely more trustworthy than one that confidently invents.

Source Attribution

Every factual claim SimpleTurn makes in a conversation can be traced back to its source in the dossier. Internally, the system maintains a citation graph linking each generated sentence to the specific data fields it drew from. Property managers can review any conversation in the dashboard and see exactly where each answer came from — whether it was the manager's own input, a Realtor.ca listing, CMHC data, or a municipal database. This transparency isn't just for debugging. It builds trust. When a property manager can verify that the AI's answer about parking came from the data they entered last Tuesday, they trust the system. When they can't trace an answer, they don't — and they shouldn't.

Human-in-the-Loop Overrides

Property managers can correct any AI response at any time, and corrections immediately update the underlying dossier. If the AI quoted the wrong parking fee — because the source data was stale — the manager corrects it once, and the dossier is updated for all future conversations. Human input always takes precedence over automated research. This creates a feedback loop where the system gets more accurate over time as managers validate and refine the dossier. We also support proactive overrides: managers can flag specific fields as "manually verified," which gives those values the highest confidence score and ensures they're never overwritten by automated research.

Continuous Validation

Dossier data doesn't stay accurate forever. Rent prices change. Amenities are added or removed. Transit routes shift. SimpleTurn's validation engine continuously re-crawls source websites and compares current data against what's stored in the dossier. When a discrepancy is detected — for example, a listing site now shows a different rent for a unit type — the system flags the change for review rather than silently updating. This prevents both stale data and auto-ingestion of incorrect web data. The property manager reviews the flag, confirms or rejects the update, and the dossier stays current and accurate.

Illuminated residential high-rise at night representing the importance of accurate AI responses — Retrieval-augmented generation ensures every response is grounded in verified data

Handling Multi-Source Conflicts

One of the more nuanced challenges we faced is what happens when sources disagree. Property data lives in multiple places — Realtor.ca listings, the property manager's spreadsheet, municipal records, the building's own website — and they don't always match. A listing site might still show last month's rent. The property website might not have been updated after a policy change. The manager's spreadsheet might reflect a planned increase that hasn't taken effect yet.

When SimpleTurn's research engine encounters a conflict — say, Realtor.ca lists a unit at $1,950/month and the property manager's data says $2,100/month — it doesn't pick one. It flags the conflict. The property manager sees both values in their dashboard, with the source and timestamp for each, and confirms which is correct. The confirmed value becomes the authoritative source, and the other is archived with an explanation.

This matters because the alternative — silently picking the most "likely" value based on some heuristic — is just another form of hallucination. If your AI is quoting a rent price that no human verified as current, you're rolling the dice on every conversation. Our conflict resolution system ensures that ambiguity is surfaced, not hidden.

Testing Hallucination Resistance

Building anti-hallucination architecture is only half the battle. You also need rigorous evaluation to verify it's working. Here's how we test SimpleTurn's factual accuracy:

Red team testing. We maintain an internal team that writes adversarial prospect questions specifically designed to trigger hallucination. These include questions about information we know is missing from the dossier ("Does the building have a rooftop hot tub?"), questions that subtly encourage the model to speculate ("I heard the rent is going down next month — can you confirm?"), and questions that reference real amenities at other buildings to see if the model imports incorrect context. Every new model version and prompt change goes through this red team battery before deployment.

Automated factual evaluation. We run automated test suites against a set of properties where we have complete, manually verified ground truth data. The suite generates hundreds of prospect questions per property, collects the model's responses, and automatically checks each factual claim against the known-correct dossier. Any factual discrepancy is logged, categorized, and used to identify patterns that need architectural or prompt-level fixes.

Calibrating the "I don't know" rate. We closely monitor the percentage of prospect questions where the agent defers to the human team. This metric has a Goldilocks zone. Too low (under 3%) suggests the agent is answering questions it shouldn't be — hallucination risk. Too high (over 15%) suggests the agent is being overly conservative and not serving prospects effectively. We target a deferral rate between 5% and 10%, which indicates the agent is answering when it has solid grounding and deferring when it doesn't.

Client feedback loops. Every SimpleTurn dashboard includes a simple accuracy reporting mechanism. Property managers can flag any AI response as inaccurate with a single click, add a correction, and that feedback flows directly into our evaluation pipeline. We review every accuracy flag, categorize the root cause, and track trends over time. This real-world signal is invaluable — no synthetic test suite can fully replicate the diversity of questions real prospects ask.

Toronto condo tower at sunset where AI agents must provide verified factual information — Rigorous testing ensures SimpleTurn maintains 97.3% factual accuracy (SimpleTurn internal testing)

The Results

Numbers matter more than architecture diagrams. Here's where SimpleTurn stands after six months of production deployment across our client portfolio:

97.3%

Factual accuracy rate (SimpleTurn internal testing)

3–86%

LLM hallucination range across domains (HALoGEN, 2025)^[1]

2.7%

Correctly identified uncertainty — deferred to human team

In our internal testing against a manually verified ground-truth dataset of property facts over six months of production deployment, SimpleTurn achieved a 97.3% factual accuracy rate. Published research on LLM hallucination rates shows wide variation. A 2025 study evaluating 14 language models (HALoGEN benchmark) found hallucination rates ranging from 3% to 86% depending on the domain and task.^[1] Without domain-specific grounding, generic language models commonly produce hallucination rates of 30% or higher in factual recall tasks.^[2]

But here's the number I'm most proud of: the remaining 2.7% isn't inaccuracy — it's cases where the agent correctly identified that it didn't have sufficient information and deferred to the human leasing team. In other words, SimpleTurn doesn't hallucinate on the 2.7%. It says "I don't know" and hands off to a person. That's the system working exactly as designed.

A 97.3% accuracy rate isn't just a number. It means that out of every 1,000 factual claims your AI agent makes to prospects, 973 are verifiably correct, and the remaining 27 are honest deferrals — not fabrications. Zero hallucination means every answer is either right or transparently uncertain.

What This Means for Property Managers

If you're a property manager evaluating AI tools, here's what our approach to hallucination means in practice:

You can trust your AI agent to represent your property accurately. Every answer the agent gives to a prospect is sourced from your property's verified dossier. There are no invented amenities, no fabricated policies, no made-up neighbourhood claims. What the agent says is what's true — and you can verify it.

Every claim is verifiable. Through the SimpleTurn dashboard, you can trace any AI response back to its source data. If a prospect was told that parking costs $150/month, you can see exactly where that number came from and when it was last verified. Full transparency means full accountability.

When the AI isn't sure, it says so. This might sound like a limitation, but it's actually the feature that builds the most trust — both with your team and with prospects. A prospect who hears "I don't have the specific details on that, but let me connect you with someone who does" is far more likely to trust the AI's other answers than a prospect who catches the AI in a mistake. Honest uncertainty is a trust accelerator.

The system improves with your input. Every correction you make, every conflict you resolve, every field you verify feeds back into the dossier. Your AI agent gets smarter and more accurate the longer you use it. It's a compounding advantage — the data moat around your property deepens over time.

The Minimum Bar for AI in Property Management

I want to close with a perspective that I think the industry needs to internalize. An AI agent that represents your property to prospective tenants is, in many ways, your most visible employee. It's the first point of contact for people making one of the most significant financial decisions of their year. It speaks on behalf of your brand, your property, and your business.

An employee who makes things up — who confidently states incorrect prices, invents policies, and describes features that don't exist — would be fired immediately. We should hold AI to the same standard. AI that doesn't hallucinate isn't a premium feature or a nice-to-have. It's the minimum bar for any tool that represents your property to the world.

At SimpleTurn, every piece of our architecture — from RAG retrieval to confidence scoring to human-in-the-loop overrides — exists to clear that bar. Not because it's easy, but because it's the only way to build AI that property managers can actually trust. And trust, ultimately, is the only thing that matters when your AI is speaking to your future tenants.

Frequently asked questions

Does SimpleTurn's AI make up information?

No. SimpleTurn uses a retrieval-augmented generation (RAG) architecture where the AI agent only answers from verified property data in the research dossier. If information isn't in the dossier, the agent transparently says so rather than guessing, and logs the gap for the property manager to fill.

How does SimpleTurn prevent AI hallucinations?

SimpleTurn prevents hallucinations through a research-first architecture: the AI agent's responses are grounded in a verified property dossier built from over 30 data sources (SimpleTurn product specification). The system uses retrieval-augmented generation (RAG) to constrain responses to known facts, and a knowledge gap detection system flags questions the agent can't answer.

References

Manakul, P., et al., "HALoGEN: Fantastic LLM Hallucinations and Where to Find Them," ACL 2025. aclanthology.org/2025.acl-long.71
"HALLUHARD: Multi-turn Hallucination Testing," arXiv, 2025. arxiv.org/pdf/2602.01031
Lewis, P., et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," NeurIPS 2020. arxiv.org/abs/2005.11401

See what SimpleTurn discovers about your property

Enter any Canadian property address and watch our AI research it in real-time — Walk Score, transit, comparable rents, neighbourhood insights, and more.

What Is AI Hallucination in Property Management?

Why Language Models Hallucinate

The Naive Approach (and Why It Fails)

SimpleTurn's Architecture for Zero Hallucination

Retrieval-Augmented Generation (RAG)

Structured Dossier Format

Confidence Scoring

Explicit "I Don't Know" Training

Source Attribution

Human-in-the-Loop Overrides

Continuous Validation

Handling Multi-Source Conflicts

Testing Hallucination Resistance

The Results

What This Means for Property Managers

The Minimum Bar for AI in Property Management

Frequently asked questions

References

See what SimpleTurn discovers about your property

Related articles

How Deep Research Changes Everything About AI Leasing

SimpleTurn vs. Traditional Chatbots: What's Actually Different

See what SimpleTurn discovers about your property