Product discussion on AI Suggest/Description Match features

I have some thoughts on the process. I spent some time looking under the hood yesterday at how the initial load categorized my transactions, and I noticed a fascinating pattern regarding how metadata is being parsed.

Only 15 of my new direct fills were handled by a mix of direct match or AI Suggest (which spot-checked at about 1-for-2 on accuracy). However, looking closely at the backend columns, the downstream aggregator actually passed down highly accurate Category Hints that unfortunately got overridden or flattened by the standard pipeline.

Specific Examples:

AES Student Loan: The aggregator cleanly passed down a Category Hint of Loans, but the default matching engine flattened it to a generic Transfer. (I manually updated this to Education).

CVS:The aggregator passed down a highly accurate hint of Healthcare/Medical, but the system still required a manual correction to map it to my specific Medical/Dental bucket. (The system’s AI Suggest chose Shopping).

Verizon: The aggregator explicitly passed down Cable/Satellite/Telecom—which is incredibly granular—but the system default-mapped it to the much broader Utilities and Bills.

Rocket & Figure (Mortgages): Rocket was left entirely blank instead of using a loans category. Figure was passed as a Transfer by the aggregator, which was actually incorrect in this specific case.

The Product Opportunity

As a data consumer, it seems like the most accurate, context-aware piece of data in the entire stream is actually that baseline Category Hint from the aggregator. If the Tiller pipeline could leverage or prioritize that hint field to dynamically map or suggest categories, it would instantly solve a lot of the initial-load messiness and text-matching limitations before the user ever has to intervene.

(Note: AI Suggest was initially turned on by default when I loaded my Google Sheet for the first time, which contributed to two categorizations).

My Ideal Categorization Hierarchy

To fix this initial-load messiness and account for the occasional aggregator error (like Figure), the data pipeline should follow this specific priority logic:

1. AutoCat Rules (Should always take ultimate precedent)

2. Aggregator Category Hints (The priority default if no AutoCat rule exists)

3. Nulls / ‘Uncategorized’ (If the aggregator provides no hint)

4. Manual Spreadsheet Intervention (Final cleanup by the user after viewing in a database or manually on their spreadsheet by sorting the categorization column)

Utilizing the aggregator’s category hints as a core pillar of the logic before hitting manual cleanup would significantly streamline the onboarding experience.

Since I am unfamiliar with actual description match logic this suggested way although not perfect would help in initial loads. -David

Edit:To say maybe a toggle switch for aggregated hints?

Great analysis and suggestion @freshman.david.

I agree that there does need to be a need for prioritization of these multiple and competing functions that can categorize a transaction. My need is simple and basic historical categorization. I find it easier to just manually perform the categorization. I find the other functions overly complex for my needs.

@freshman.david thanks for your feedback here. I appreciate your thoughtfulness on how our product should behave and will take it into consideration as I am shaping out the next phase of onboarding categorization.

The challenge with using the aggregator provided categories is that they won’t map to user’s categories once they’ve been in Tiller, played around, and customized them. So they’re not an ongoing use option for most.

Some users may want to/find it engaging to map all the aggregator default to their own categories, but many will not and will be frustrated because the aggregator’s categories no longer reflect their own customized set.

As an aside, you can already do this with AutoCat :grinning_cat_with_smiling_eyes: Add a column for Category Hint Contains = whatever the aggregator brought in and then Category = whatever category you want it to be. But this is not a viable onboarding happy path for people. It’s too complicated.

AutoCat Rules will always take precedence.

The approach we’ve taken here tries to balance user effort related to categorization (which has been an extreme bottleneck for people finding value) and accuracy. There are tradeoffs in either direction. Many people don’t want to put in much effort to get started.

Our mission is to provide the best tools possible to take control of your money, offering uniquely flexible tools that are empowering, easy to use, and adaptable to each person’s goals and approach.

We are trying to expand our “ease of use” to beyond just those who are spreadsheet experts or that have deep technical engineering mindsets while still preserving the flexibility and customizability.

Additionally, we DO use the aggregator’s categories and we map them to a set of Tiller defaults if you have the experimental pre-categorize feature turned on (not broadly available as it’s in A/B testing). We don’t want to have an overwhelming list of default categories. The aggregators’ category lists are too long AND they are different between Yodlee and Plaid so we have to map to some default. We shrank the list to a smaller set and flatten the aggregator list to map to those.

Some users prefer a long list of Categories and others do not. Categories are deeply personal so we’re trying to find balance with how many we pre-populate/default for our users. Our set of defaults was curated based on ~11 years of experience in this and the PFM landscape.

I hope that context is helpful.

Thanks @heather for that fantastic inside context, and thanks @Clint.C for your perspective too! It’s great to be in this technical breakout group. Thank you!

I completely empathize with the onboarding challenge you’re describing. When I first launched my own Tiller Foundation template in Excel, I faced that exact same bottleneck: a wall of 400+ historical rows that needed categorizing right out of the gate. It’s a massive friction point for anyone trying to find immediate value.

To solve it, I actually leaned heavily into those aggregator category hints. While they aren’t flawless, I found they provided an immediate 85–90% baseline accuracy that keeps a new user from getting overwhelmed during onboarding.

As I eventually migrated my architecture to a relational SQL backend, I ended up building a multi-layered rule hierarchy to handle the exact edge cases you mentioned:

1 The ‘Transfer’ Trap: Like you noted, balance shuffles can easily bedevil a pipeline. I used explicit T-SQL wildcard logic (e.g., ⁠WHERE Description LIKE '%DescriptionWords%'⁠) to isolate and force known transaction strings into strict Transfer buckets before they could corrupt regular expense reporting.

2 Handling the Nulls: From a data modeling perspective, blank categories can cause calculations and downstream charts to completely underperform. To combat that, my pipeline converts any remaining pipeline nulls into strict defaults like ‘Uncategorized’, ‘Ungrouped’, or ‘No_Type’ so the reporting model always remains structurally sound.

3 The Customization Problem: To preserve flexibility when a user adds or deprecates custom categories, I decoupled them. The model maps user-defined categories alongside the Tiller defaults and the raw aggregator categories inside a specialized, dedicated dimension table. This creates a flexible translation layer (perfect for filters or slicers) without breaking the core data schema.

It’s incredibly exciting to hear that the A/B test for pre-categorization is actively happening now! I completely realize that the logic I’m describing is something the Tiller engineering team has already spent a lot of cycles thinking through. I’m just really glad you were open to my perspective on things.

Looking forward to seeing how the next phase of the onboarding logic shapes up! -David