Skip to content

697 - Data Sourcing for Sharing Excess#720

Open
marcbachan wants to merge 13 commits into
developfrom
697-data-sourcing-for-sharing-excess-food-distribution
Open

697 - Data Sourcing for Sharing Excess#720
marcbachan wants to merge 13 commits into
developfrom
697-data-sourcing-for-sharing-excess-food-distribution

Conversation

@marcbachan

@marcbachan marcbachan commented Mar 10, 2026

Copy link
Copy Markdown

Pull Request

Change Summary

FYI: Claude helped quite a bit here in building out a basic CLI component and adding Supabase helper functions. Extra scrutiny on those is welcome.

Addresses #697. Introduces a standalone Python script that is designed to pull down events from a public Google Calendar, such as Sharing Excess, and normalize the retrieved events to be able to store them in the resources table in Supabase.

As the resources table does not have start/end date fields, these are pulled from the site and inserted into the description with some clear delimiters, like:

[[ start: 2026-03-10T15:00:00-04:00 | end: 2026-03-10T17:00:00-04:00 ]]

This allows us to do some post-processing/filtering to determine whether the event is "live" or not.

We can do this scrape periodically by using the LOOK_FORWARD_DAYS property to get all events for a specific window into the future, or just do this monthly in one of the PHLASK sessions or something. Not sure how we want to handle.

Change Reason

Billy summed this up quite nicely on #697. Essentially, we would like to be able to actively maintain "live" food sites posted by Sharing Excess and help them and us get the word out a little easier.

Verification [Optional]

Here is an example of a CSV debug output that we can get by using the basic CLI component that Claude helped write:

 python calendar_to_supabase.py --csv 

events.csv

These records can then be written to the DB either directly with CSV import in Supabase, or enter the credentials in the .env file here and run the script with the helper to write them to the resources table.

Related Issue: #697

@marcbachan marcbachan requested a review from vontell March 10, 2026 19:01
@marcbachan marcbachan self-assigned this Mar 10, 2026
@marcbachan marcbachan added Data Circle Tickets related to the Data Circle Civic Circle labels Mar 10, 2026
@marcbachan marcbachan linked an issue Mar 10, 2026 that may be closed by this pull request
@marcbachan marcbachan requested a review from RaulBSanchez March 10, 2026 19:08
@marcbachan

Copy link
Copy Markdown
Author

@icycoldveins Got Claude's help with the edge function (supabase/functions/sync-sharing-excess/index.ts) for this in a new directory for Supabase sync operations. Let me know if this lines up with what you had tried out, and let me know if you have any suggestions.

@gcardonag @vontell curious to get your thoughts on this regarding automating and maintaining the sync script by using edge functions. Otherwise we can just use the Python script with Lambda in AWS like the other one.

icycoldveins added a commit that referenced this pull request Mar 24, 2026
Match existing records by name + source URL, update those (preserving
date_created), insert only new ones. Stale entries that are no longer
in the scraped data get cleaned up. Aligns with the approach in PR #720.
Comment on lines +382 to +384
supabase = get_supabase_client()
delete_by_creator(supabase)
insert_resources(supabase, resources)

@RRodriguez26 RRodriguez26 May 26, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a discussion with Ron and Añil, what is the use of the delete function? I also heard that we want to delete anything overall, is there a reason why might we use the delete functionality in the database rather than just update?

@marcbachan marcbachan May 27, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RRodriguez26 Updating is definitely better, and I can tweak this a bit further to do it properly. I settled on delete so that we could update our database with current data, which was becoming a problem. Now that it's updated, I can revisit this and implement it more intelligently.

One of the key issues with doing the update route was that recurring events, despite having many distinct occurrences, all collide under one gp_id, so the script needs to handle this on repeated syncs and update the timestamps for whatever occurrence of that event is up next instead of processing every occurrence as a unique resource.

But overall, yes, you all are right. Deletion creates an issue with churned resource IDs, especially if there are crowdsourced edits using that ID as a foreign key. In the long term that's not sustainable, so I'll work on resolving that update issue for the recurring events.

@RRodriguez26

Copy link
Copy Markdown

I also heard that these data scripts should be in its own repo, we see that there is a repo dedicated to it but we are not sure if this is the right one.

@marcbachan

Copy link
Copy Markdown
Author

I also heard that these data scripts should be in its own repo, we see that there is a repo dedicated to it but we are not sure if this is the right one.

Yep, I'm going to put up a joint PR for this script and the other one on that repo. I don't think anyone can recall if there was another reason for it, so it's a good fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Civic Circle Data Circle Tickets related to the Data Circle

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data sourcing for Sharing Excess food distribution

2 participants