Code on screen representing AI engineering and accuracy
Engineering

Building an AI Agent That Doesn't Hallucinate About Your Property

AI hallucination is the number one concern we hear from property managers evaluating AI tools. And honestly? It should be. When an AI agent confidently tells a prospective tenant that pets are allowed in a strictly no-pets building, or quotes a rent price that's $200 lower than the actual rate, or describes an in-unit washer-dryer that doesn't exist — the consequences are real. You lose the lease when the prospect shows up and discovers the truth. Or worse, you create legal liability when a tenant signs based on misinformation your AI provided.

We've talked to hundreds of property managers across Canada, and the pattern is always the same. They're excited about AI automation. They understand the value of 24/7 response times and never missing a lead. But the moment you mention "AI-generated responses," their guard goes up — and rightfully so. Their properties are their livelihood, and they can't afford an agent that makes things up.

Here's how we built SimpleTurn to avoid hallucination entirely — and why we believe factual grounding isn't just a feature, but the foundation everything else depends on.

What Is AI Hallucination in Property Management?

In the broader AI world, "hallucination" refers to a model generating content that sounds plausible but is factually incorrect. In the context of property management, the stakes are specific and high. A hallucinating leasing agent might:

Each of these scenarios has happened with real AI chatbot products in our industry. They're not edge cases — they're the predictable result of deploying language models without the right architecture around them.

Why Language Models Hallucinate

To understand our solution, you need to understand the problem at a fundamental level. Large language models (LLMs) like GPT-4, Claude, and Gemini are, at their core, next-token predictors. Given a sequence of text, they predict the most statistically likely continuation based on patterns learned from their training data — billions of web pages, books, articles, and conversations.

This is an incredibly powerful capability. It's what enables LLMs to write coherent paragraphs, answer questions, and carry on natural conversations. But it has a critical limitation: the model has no internal concept of truth. It doesn't "know" facts the way a database does. It generates text that sounds right based on statistical patterns, regardless of whether it is right.

When you ask a base LLM "What's the pet policy at 450 Spadina Avenue?", it doesn't look up the answer. It generates the most probable response based on patterns it has seen in its training data. If most apartment listings in its training data allow small pets, it might confidently state that 450 Spadina allows small dogs and cats — even if the actual policy is no pets whatsoever. The model isn't lying. It's doing exactly what it was designed to do: produce a plausible-sounding continuation. The problem is that "plausible-sounding" and "true" are very different things.

The core insight is this: an LLM's confidence in its output has almost no correlation with accuracy. A model can state a completely fabricated fact with the same syntactic confidence as a verified one. There is no built-in uncertainty signal.

The Naive Approach (and Why It Fails)

The first thing most teams try when building a property chatbot is straightforward: take a general-purpose LLM, prepend a system prompt like You are a helpful leasing agent for Maple Heights Apartments, maybe include some basic property details, and let the model handle conversations.

This approach fails in predictable ways. When a prospect asks a question that isn't covered by the sparse details in the prompt — which happens constantly — the model fills in the gaps with its general training knowledge. It doesn't flag the gap. It doesn't say "I'm not sure." It generates a plausible answer and presents it as fact. The model can't distinguish between what it was given in the prompt and what it's generating from its training distribution. There is no mechanism for the model to recognize the boundary between "information I was given about this property" and "general patterns about apartments that I learned during training."

Some teams add instructions like "Only answer based on the information provided" or "Say 'I don't know' if you're not sure." These help marginally, but they're not reliable. Instruction-following is a best-effort behaviour in LLMs, not a guarantee. Under adversarial or even just unusual questioning, the model will drift back to generating from its prior — and you'll get hallucinations.

SimpleTurn's Architecture for Zero Hallucination

We didn't solve hallucination with a clever prompt. We solved it with architecture. SimpleTurn's factual grounding system is a multi-layered pipeline where every component is designed to prevent, detect, or mitigate hallucinated content. Here are the seven layers of our approach:

1

Retrieval-Augmented Generation (RAG)

Instead of relying on the model's parametric memory — its training data — every response SimpleTurn generates is grounded in retrieved context from the property's research dossier. When a prospect asks a question, the system first performs a semantic search over the dossier using dense vector embeddings to find the most relevant data points. Only those retrieved passages are included in the model's context window, and the model is instructed to answer exclusively from the provided context. The model never "free-generates" a factual claim. Every fact must have a source in the retrieved documents. This is the single most important architectural decision we made — it transforms the LLM from an unreliable knowledge source into a reliable language interface over verified data.

2

Structured Dossier Format

Our property dossiers aren't unstructured documents — they're richly typed, hierarchical data structures with defined fields for every category of property information: rent_pricing, pet_policy, amenities, parking, utilities, availability, and dozens more. Each field has a defined schema, a data type, a source attribution, and a last-updated timestamp. When a field is empty, it's explicitly marked as null — not omitted. This means the model can distinguish between "the pet policy is no pets" and "we don't have pet policy information for this property." That distinction is critical for preventing hallucination, because the most dangerous gap is one the model doesn't know exists.

3

Confidence Scoring

Every response SimpleTurn generates receives a confidence score calculated from multiple signals: the semantic similarity between the prospect's question and the retrieved context, the number of supporting data points found, the recency of the source data, and whether the information came from a verified (property manager) or inferred (web research) source. Responses with confidence scores below our threshold are handled differently — they may include hedging language, defer to the human team, or be flagged for review before sending. This calibrated confidence mechanism means the system's expressed certainty actually correlates with its likelihood of being correct — something a raw LLM cannot provide.

4

Explicit "I Don't Know" Training

We fine-tuned our language model on thousands of examples where the correct response is some variation of "I don't have that specific information, but I can connect you with our leasing team who can help." This isn't a system prompt instruction — it's a learned behaviour baked into the model's weights through supervised fine-tuning. The model has been trained to recognize when retrieved context is insufficient to answer a question and to respond with a graceful deferral instead of a fabricated answer. We treat "I don't know" as a correct answer, not a failure. An agent that honestly defers is infinitely more trustworthy than one that confidently invents.

5

Source Attribution

Every factual claim SimpleTurn makes in a conversation can be traced back to its source in the dossier. Internally, the system maintains a citation graph linking each generated sentence to the specific data fields it drew from. Property managers can review any conversation in the dashboard and see exactly where each answer came from — whether it was the manager's own input, a Realtor.ca listing, CMHC data, or a municipal database. This transparency isn't just for debugging. It builds trust. When a property manager can verify that the AI's answer about parking came from the data they entered last Tuesday, they trust the system. When they can't trace an answer, they don't — and they shouldn't.

6

Human-in-the-Loop Overrides

Property managers can correct any AI response at any time, and corrections immediately update the underlying dossier. If the AI quoted the wrong parking fee — because the source data was stale — the manager corrects it once, and the dossier is updated for all future conversations. Human input always takes precedence over automated research. This creates a feedback loop where the system gets more accurate over time as managers validate and refine the dossier. We also support proactive overrides: managers can flag specific fields as "manually verified," which gives those values the highest confidence score and ensures they're never overwritten by automated research.

7

Continuous Validation

Dossier data doesn't stay accurate forever. Rent prices change. Amenities are added or removed. Transit routes shift. SimpleTurn's validation engine continuously re-crawls source websites and compares current data against what's stored in the dossier. When a discrepancy is detected — for example, a listing site now shows a different rent for a unit type — the system flags the change for review rather than silently updating. This prevents both stale data and auto-ingestion of incorrect web data. The property manager reviews the flag, confirms or rejects the update, and the dossier stays current and accurate.

Data network visualization representing retrieval-augmented generation architecture
Retrieval-augmented generation ensures every response is grounded in verified data

Handling Multi-Source Conflicts

One of the more nuanced challenges we faced is what happens when sources disagree. Property data lives in multiple places — Realtor.ca listings, the property manager's spreadsheet, municipal records, the building's own website — and they don't always match. A listing site might still show last month's rent. The property website might not have been updated after a policy change. The manager's spreadsheet might reflect a planned increase that hasn't taken effect yet.

When SimpleTurn's research engine encounters a conflict — say, Realtor.ca lists a unit at $1,950/month and the property manager's data says $2,100/month — it doesn't pick one. It flags the conflict. The property manager sees both values in their dashboard, with the source and timestamp for each, and confirms which is correct. The confirmed value becomes the authoritative source, and the other is archived with an explanation.

This matters because the alternative — silently picking the most "likely" value based on some heuristic — is just another form of hallucination. If your AI is quoting a rent price that no human verified as current, you're rolling the dice on every conversation. Our conflict resolution system ensures that ambiguity is surfaced, not hidden.

Testing Hallucination Resistance

Building anti-hallucination architecture is only half the battle. You also need rigorous evaluation to verify it's working. Here's how we test SimpleTurn's factual accuracy:

Red team testing. We maintain an internal team that writes adversarial prospect questions specifically designed to trigger hallucination. These include questions about information we know is missing from the dossier ("Does the building have a rooftop hot tub?"), questions that subtly encourage the model to speculate ("I heard the rent is going down next month — can you confirm?"), and questions that reference real amenities at other buildings to see if the model imports incorrect context. Every new model version and prompt change goes through this red team battery before deployment.

Automated factual evaluation. We run automated test suites against a set of properties where we have complete, manually verified ground truth data. The suite generates hundreds of prospect questions per property, collects the model's responses, and automatically checks each factual claim against the known-correct dossier. Any factual discrepancy is logged, categorized, and used to identify patterns that need architectural or prompt-level fixes.

Calibrating the "I don't know" rate. We closely monitor the percentage of prospect questions where the agent defers to the human team. This metric has a Goldilocks zone. Too low (under 3%) suggests the agent is answering questions it shouldn't be — hallucination risk. Too high (over 15%) suggests the agent is being overly conservative and not serving prospects effectively. We target a deferral rate between 5% and 10%, which indicates the agent is answering when it has solid grounding and deferring when it doesn't.

Client feedback loops. Every SimpleTurn dashboard includes a simple accuracy reporting mechanism. Property managers can flag any AI response as inaccurate with a single click, add a correction, and that feedback flows directly into our evaluation pipeline. We review every accuracy flag, categorize the root cause, and track trends over time. This real-world signal is invaluable — no synthetic test suite can fully replicate the diversity of questions real prospects ask.

Code and testing environment for evaluating AI accuracy
Rigorous testing ensures SimpleTurn maintains 97.3% factual accuracy

The Results

Numbers matter more than architecture diagrams. Here's where SimpleTurn stands after six months of production deployment across our client portfolio:

97.3%
Factual accuracy rate across all conversations
~65%
Industry average for generic AI chatbot solutions
2.7%
Correctly identified uncertainty — deferred to human team

SimpleTurn's factual accuracy rate across all prospect conversations is 97.3%. That's measured against manually verified ground truth for every property in our test set. The industry average for generic chatbot solutions deployed in property management sits around 60–70%, based on our benchmarking and published third-party evaluations.

But here's the number I'm most proud of: the remaining 2.7% isn't inaccuracy — it's cases where the agent correctly identified that it didn't have sufficient information and deferred to the human leasing team. In other words, SimpleTurn doesn't hallucinate on the 2.7%. It says "I don't know" and hands off to a person. That's the system working exactly as designed.

A 97.3% accuracy rate isn't just a number. It means that out of every 1,000 factual claims your AI agent makes to prospects, 973 are verifiably correct, and the remaining 27 are honest deferrals — not fabrications. Zero hallucination means every answer is either right or transparently uncertain.

What This Means for Property Managers

If you're a property manager evaluating AI tools, here's what our approach to hallucination means in practice:

You can trust your AI agent to represent your property accurately. Every answer the agent gives to a prospect is sourced from your property's verified dossier. There are no invented amenities, no fabricated policies, no made-up neighbourhood claims. What the agent says is what's true — and you can verify it.

Every claim is verifiable. Through the SimpleTurn dashboard, you can trace any AI response back to its source data. If a prospect was told that parking costs $150/month, you can see exactly where that number came from and when it was last verified. Full transparency means full accountability.

When the AI isn't sure, it says so. This might sound like a limitation, but it's actually the feature that builds the most trust — both with your team and with prospects. A prospect who hears "I don't have the specific details on that, but let me connect you with someone who does" is far more likely to trust the AI's other answers than a prospect who catches the AI in a mistake. Honest uncertainty is a trust accelerator.

The system improves with your input. Every correction you make, every conflict you resolve, every field you verify feeds back into the dossier. Your AI agent gets smarter and more accurate the longer you use it. It's a compounding advantage — the data moat around your property deepens over time.

The Minimum Bar for AI in Property Management

I want to close with a perspective that I think the industry needs to internalize. An AI agent that represents your property to prospective tenants is, in many ways, your most visible employee. It's the first point of contact for people making one of the most significant financial decisions of their year. It speaks on behalf of your brand, your property, and your business.

An employee who makes things up — who confidently states incorrect prices, invents policies, and describes features that don't exist — would be fired immediately. We should hold AI to the same standard. AI that doesn't hallucinate isn't a premium feature or a nice-to-have. It's the minimum bar for any tool that represents your property to the world.

At SimpleTurn, every piece of our architecture — from RAG retrieval to confidence scoring to human-in-the-loop overrides — exists to clear that bar. Not because it's easy, but because it's the only way to build AI that property managers can actually trust. And trust, ultimately, is the only thing that matters when your AI is speaking to your future tenants.

Ready to see what SimpleTurn discovers about your property?

Enter any address and watch our AI research it in real-time.

Try the Research Preview →

Or create your free account to get started.

Ready to see SimpleTurn in action?

Explore how AI-powered leasing agents can transform your vacancy management.

Try It Free — Enter Any Address →