Anyone else tracking receipts in your Tiller spreadsheet?

On Twitter a customer recently asked, “Anyone using Tiller have any great solutions for tracking receipts with OCR in Tiller?”

Great question, and we’d love to hear if anyone here in the Community is doing this. We’ve touched on a DIY workflow previously on our blog.

At Tiller (for our company Tiller spreadsheet) and personally I’ve added a Receipt column and manually added links in that column for receipts in Evernote, on Google Drive, or elsewhere.

That workflow does the trick, but it’s not efficient. For high-value receipts, however, it’s worth the extra effort (for example, things that are tax-related or other notable purchases.)

One of the reasons I was excited by the new smart chips in Google Sheets is the dream of a workflow where I could create a smart chip for a transaction, and in that process it might show recent captures in Google Keep, or otherwise use OCR, to quickly find relevant receipts and link them.

I think there are still a few pieces of plumbing needed to make that happen, so it’s not likely something we can pull off today… but never say never when @Randy is listening!

2 Likes

I have a solution, but it’s not great. It’s more for ‘expanding’ receipts than tracking them. I’m currently using PowerShell to OCR my grocery receipts into .txt files in Google Drive. I then use App Script to import those into a “Groceries” sheet in my Tiller template where it calculates the final cost of each item (after tax, coupons, etc) which I then assign Tiller categories to. A Pivot Table shows me the total in each category for each receipt. When the transaction shows up in Tiller, I then split it into those categories with those totals. Not a great process, lots of manual steps, not easily reproduceable for other people, but it allows me to break out my “Groceries” into food, supplies, pet stuff, personal items, etc to get a more accurate budget. I would love if Tiller had options that made any part of that process easier!

4 Likes

Incredible, @jpfieber, why am I surprised you’ve got such a rigorous workflow? … when I shouldn’t be. :blush:

I just poked around for a second and found this intriguing bit of Apps Script code. I guess I should have known that Google has (free?) APIs for OCR. With a little futzing this morning, I got the script to work and it did a pretty good job.

Out of the box (basically), the code scanned a Google Drive folder, OCR-ed the image files it found, created a new Google Doc for each scanned file that included the original receipt image and the OCR text, and then logged each file into a Google Sheet table with the image filename and a cell containing all of the text.

As you might expect (if you’ve tried OCR), the text output is accurate but really disorganized. It would be a real puzzle to identify the receipt total from the text output. Here is an example of the output files created by the script.

One thing I don’t love about this solution is the extremely-broad permissions required to scan one’s Google Drive and write to spreadsheets and documents:

  • See, edit, create, and delete all of your Google Drive files
  • See, edit, create, and delete all your Google Docs documents
  • See, edit, create, and delete all your Google Sheets spreadsheets

Ideally, Google could offer some help with APIs and workflows to localize the permissions.

Yeah, that’s about what I’m getting from the PowerShell OCR conversion, so I could conceivably move that part into Apps Script as well (I think I played with that a bit but found the PowerShell version easier to get up and running). The hard part is the parsing of that to get item details, like cost, weight, number of items, price, coupons, etc. It’s different for every store. I was able to get it to work reasonably well for the few stores that I frequent, but probably not something that would scale well into a shared Tiller-based tool.

The linking of receipts, which is what @peter first brought up, is probably more generally useful. Some way of automating that process would be something I’d probably use. Not sure how cross-platform that would be since the Microsoft cloud and the Google cloud seem very different.

Hadn’t heard about this feature. Just tried it and I can only see “People Chips” at this point, not “File Chips”. They say the “Gradual rollout” is up to 15 days for feature visibility starting on October 25, 2022. I look forward to trying that!

@jpfieber thanks for sharing your own workflow!

I think the smart chips are still early and there isn’t the plumbing to allow smart chips to be useful yet for a receipt workflow off to the Transactions sheet.

Hi all

I am the guy that asked the question on twitter.

I love the above solutions and workflows you described.

I hacked on this a little on the weekend, and came up with the following:

A separate workbook with 2 sheets (Review and Archive).

Review Sheet will have all the receipts that you need to review.

  • Using the script(GDriveFiles) below it will populate a row with a picture of the receipts in a specific folder
  • This will give you the ability to see the receipt and categories it or proved what ever image you feel is important.
  • When you done enriching the data you put “Done” in the Status column and run the second script (moveReviewedFiles). The script will then move the receipt and the row of data to the archive sheet.

Archive all the receipts that you already proceed.

PROS

  • Can link the receipt to transaction very easily using the ID reference Google gives all documents
  • Flexible
  • Free
  • All in Google Drive

CONS

  • The images need to be made publicly visible to view it in the google sheet. However, I think is will happen regardless of the solution. I will need to investigate to find a way around this one.

  • This script needs very wide drive permissions. (I noticed the API can scope the permissions, however the App script permission is very wide - Interesting challenge to navigate)

  • Use of scripts, not sure on the appetite of our community in using scripts.( We can annotate it comprehensively and show it is safe and simple)

  • The solution does not cater for PDFs current

TODO

  • Connect OCR using @randy script
  • Explore the use of Azure/Taggun OCR
  • Use REGEX to find the date and amount which I believe to be the most important for connecting the
    transaction to the receipt.

Notes

  • The solutions to the above seem easier to me as a software developer/ accountant outside of Appscript, but I am also not too familiar with the ecosystem - so I will need to do more research.

  • From my limited experience, the clients (Mainly a school and church) that I setup Tiller for. They like to analyze their data in unique ways, but I don’t think they will feel comfortable with scripts that prescribe them to do things a certain way - or it may break. Also not sure how to deal with changes.

  • In keeping with the “Tiller way”, I think we need a solution that gives them a seperate workbook or sheet with all their data, enriched as best as possible (OCR and searchable text), like our Transactions sheet but for Receipts and empower the user to chop it up, analyze and categories as they see fit.

  • I did test a small app for myself that used an inexpensive Receipt OCR service but I want a solution for the community that can not afford it or its not that important that it be enriched but rather just be tracked.

  • This is of interest to me because audit-proofing your financial situation whether it is personal or in business is important and it need not be expensive. Its like the missing piece to Tiller. If we can solve this, then I think we have a complete inexpensive accounting/bookkeeping solution.

Let me know what you think, or if you have any suggestions.

function GdriveFiles() {

  const folderId = '<id of the folder>'
  const folder = DriveApp.getFolderById(folderId)
  const files = folder.getFiles()
  const source = SpreadsheetApp.getActiveSpreadsheet();
  const sheet = source.getSheetByName('Review');
  const data = [];   

  while (files.hasNext()) {
      const childFile = files.next();
      var info = [ 
        childFile.getId(),
        childFile.getName(),
        childFile.getUrl(),   
      ];

        data.push(info);
  }

  sheet.getRange(2,1,data.length,data[0].length).setValues(data);
  
}


function moveReviewedFiles(){
  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Review');
  const MoveDatatoThisSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Archive');
  var data = sheet.getDataRange().getValues();

  data.forEach((row,index) => {
    var numIndex = index+1

    if(row[8] == "Done"){
      var sr = sheet.getRange(numIndex +":"+ numIndex)
      var dr = MoveDatatoThisSheet.getRange(MoveDatatoThisSheet.getLastRow()+1,1)
      sr.copyTo(dr,SpreadsheetApp.CopyPasteType.PASTE_FORMULA)

      sheet.deleteRow(numIndex)

      var file = DriveApp.getFileById(sr.getCell(1,1).getValues()[0].toString());
      var folder = DriveApp.getFolderById("<id of the folder to move to>");
    
      file.moveTo(folder);

    }
  })
}
1 Like

I found that if you add a Google Drive link to any text in Google Sheets, it shows a ‘Preview’ when you hover over the text (works with PDF which is what I primarily use). This is useful, but it only shows a portion of the document. I then noticed that if you have a browser extension like “Hover Zoom+” (works in Edge and Chrome, multiple similar extensions also available), which is designed to enlarge thumbnails when you hover over them, hovering over the preview shows a beautiful full screen image of the document. Simply adding receipt/document links to the descriptions of transactions would provide an easily browsable list of the associated receipts, which I think lots of Tiller users would find desirable.

The hard part, as you mentioned, is associating the receipt with the transaction, and I think the hardest part of that process will be getting the merchant, date and amount information out of the receipts. As I found setting up my grocery receipt workflow, every merchant’s receipt is formatted differently. Even something as simple as searching for a date will be difficult since some merchants have multiple dates representing not only the date of purchase, but “return by” date, “total savings since” date, etc. You’d probably need to have a separate section of code for each merchant that details how to pull each piece of data from the receipt (that’s what I have for my grocery receipt script), and it would likely need to be updated periodically as they make changes to the format. With potentially thousands of Tiller users all wanting their local stores receipts to be included, the scope of something like this quickly spirals out of control. Sharing scripts, and then managing constant updates for shared scripts is probably going to be a challenge as well.

With all of this doom and gloom being said (I don’t mean to discourage you, just sharing some of the realities that I’ve discovered in chasing the idea), if you can solve any portion if this, I’d be interested in hearing about/seeing it, as I’d love to improve my workflow. Maybe something short of a perfect system can be developed to at least make it easier for some users to do this type of thing.

1 Like

Love this thread. @VinnieK thanks for jumping into the Tiller Community! I wonder, with the various OCR packages under consideration, if there’s a solution that would read hand-written annotations on a receipt and use those for associating the receipt with the transactions. For example, what if the merchant, date, and total amount were circled by hand with a pen before scanning? Or perhaps even just the total amount?

I don’t need the OCR from receipts, but do copy and save the receipts themselves. The way to do that would be to use the tiller info not the receipt info. I concatenate the date, store, credit card and amount data to make a file name for the receipt. That way it has meaning to me and can be found later if it is needed to return an item from that receipt.

I finally spent a little time and converted my import workflow entirely to Google Apps Script, no more Powershell. So just one step to get the scans imported! Learned a lot about Regex. Here’s one of the Regex commands, this one converts a Meijer receipt item with a quantity of more than one:
/^(.*?)\s(\d+)\s\@\s(\S+\d[\d,\.]*?\b)\s(\S+\d[\d,\.]*?\b)\s(T|F|NT|FT)/gm
https://regex101.com/ definitely helped the process!

1 Like

Quick Books Pnline does a nice job of scanning receipts with my phone, and, also allows me to email receipts to a special email address. Linking to the line item is virtually seamless.

This should be easy in Tiller.

Please help me organize my finances. I do not write code.

Many thanks

B

I lost my Foundation Google sheet while trying to figure out all these suggestions. All that is left is the Google Foundation Budget. What happened to my basic document and is there any way to get it back?

Hi @maciassk Sally,

Yes, there is a way to restore the file.

Take a look at this help guide.

Flash forward a couple years. I have had good luck using ChatGPT or similar to parse receipts and produce CSV files. People will have different sensitivity levels about privacy, etc. But this workflow works well for my purposes.

Some prelim research suggests this can all be done in App Scripting for Sheets, but while I have enough programming to vibe out something that sort of works, integrating with Tiller is something I would not be comfortable tackling. Too worried about breaking something else. Maybe someone with more Tiller experience would be willing to take it on?

Update: I have a basic workflow that will take a PDF scanned receipt in a Google drive, OCR it with Google’s Document AI, send it to ChatGPT for categorization and parsing, then dump it all into a Google sheet. Sample output attached.

2 Likes

Love this @BryanS

-Alice
Tiller Evangelist

Bluesky, Instagram, Facebook, LinkedIn

Thanks! The holy grail, of course, would be to integrate this with Tiller’s transactions tab. I have a separate transactions sheet that shows images of the receipts, but matching them with Tiller transactions might be bearish. First thought is some combination of total amount and store name, but there are some edge cases that might not work flawlessly.

Also, this is not a free solution as it requires calls to Document AI and ChatGPT, so not ideal for everyone. Could probably work with an open source LLM, though for the parsing and categorization. Not sure how to manage the OCR.

Finally, my use case is groceries. Generalizing for other receipts (tax relevance, for example) might be straining the capabilities of the LLM. Would have to do some tests.

Update: I have the bare bones of a PDF/image OCR and parser up as a website in my development environment. Takes a receipt, outputs a CSV file with categorized line items. Basic tests are promising. It is currently using ChatGPT for parsing and categorization. I found a product called Tesseract that does a reasonable job with the OCR, so no costs there.

I am thinking of opening to a couple of alpha testers to help me fool around with it. Let me know if any interest. Warning, it is very proof of concept. Not pretty. I am not tracking data, but it is going out to ChatGPT, so if that’s an issue, this is not for you yet.

There are lots of receipt parsing SaaS solutions out there. They are largely targeted at corporate, and the couple I spot checked don’t do line items very well. And I think they are kind of expensive for the actual tech involved.

1 Like

I have been poking around the community this weekend, and it looks like the most interesting use of this receipt scanning mechanism would be to take the OCR and parse out the vendor and total price. Then auto-matching a cloud-based receipt URL to a Receipt column on the transactions tab.

The first bit is pretty easy to adapt my current setup to. As far as doing the auto-matching, I got Claude to generate some app script that relies on confidence levels around [sub]string matching in the transaction description and the OCR detected vendor. I am wary about inserting Claude’s GAS into Tiller, but it seems like whatever mechanism AutoCat uses could be adapted to run through the [sub]string matching.

Storage can be on AWS S3, which provides for private signed links, in my understanding of the privacy-first Tiller Way. I would bet other hosting providers do something similar. I don’t think this solution would be very expensive to implement and run, unless you really wanted to stay off commercial LLMs and were using local LLMs. The GPU costs can stack up quickly. But maybe it’s manageable at Tiller’s scale.

What think you Tiller folks? Is there enough plumbing at this point to take this on as a feature?

Update My progress on auto-matching is here: Best practice for Google Application Scripting in Tiller - #2 by BryanS. It relies on a custom function and an ARRAYFORMULA(). The error handling could be better, the substring matching can be extended, and it is still a bit slow. Might time out on seriously large sheets.