-
Notifications
You must be signed in to change notification settings - Fork 50
Automation/blog pipeline #718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6a944c5
b62cc3c
2f5ca79
b0917ad
6aea9cb
4a22818
dba8182
e2c5370
6f33e71
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| name: Publish reviewed blogs | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| schedule: | ||
| - cron: '0 7 * * *' # daily at 07:00 UTC: publish any newly reviewed blogs | ||
|
|
||
| jobs: | ||
| publish-blogs: | ||
| if: github.repository == 'Women-Coding-Community/WomenCodingCommunity.github.io' | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v5 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.12' | ||
|
|
||
| - name: Cache pip | ||
| uses: actions/cache@v4 | ||
| with: | ||
| path: ~/.cache/pip | ||
| key: ${{ runner.os }}-pip-blog-${{ hashFiles('tools/blog_automation/requirements.txt') }} | ||
| restore-keys: | | ||
| ${{ runner.os }}-pip-blog- | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| python -m pip install --upgrade pip | ||
| pip install -r tools/blog_automation/requirements.txt | ||
|
|
||
| - name: Write service account key | ||
| run: echo "$SERVICE_ACCOUNT_KEY" > tools/blog_automation/service_account_key.json | ||
| env: | ||
| SERVICE_ACCOUNT_KEY: ${{ secrets.BLOG_AUTOMATION_SERVICE_ACCOUNT }} | ||
|
|
||
| - name: Export reviewed blogs | ||
| run: | | ||
| cd tools/blog_automation | ||
| python publish_reviewed_blogs.py | ||
|
|
||
| - name: Remove service account key | ||
| if: always() | ||
| run: rm -f tools/blog_automation/service_account_key.json | ||
|
|
||
| - name: Create or Update Pull Request | ||
| id: create-pr | ||
| uses: peter-evans/create-pull-request@v7 | ||
| with: | ||
| token: ${{ secrets.GHA_ACTIONS_ALLOW_TOKEN }} | ||
| commit-message: "Automated import of reviewed blog posts" | ||
| branch: "automation/import-blog" | ||
| team-reviewers: "Women-Coding-Community/leaders" | ||
| title: "Automated import of reviewed blog posts" | ||
| body: | | ||
| This PR was created automatically by a GitHub Action. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The PR checklist includes "I have added a screenshot from the website after I tested it locally" but no screenshot appears in the PR body. Since this automation writes files and opens PRs, a sample of the generated post output (e.g. the front matter + first few lines of an exported |
||
|
|
||
| It contains every blog marked `isReviewedandApproved` (and not yet | ||
| `isPublished`) in the submissions spreadsheet: | ||
| - new posts under `_posts/` | ||
| - cover images under `assets/images/blog/` | ||
|
|
||
| The spreadsheet's `isPublished` column has already been set to TRUE for | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The PR body already mentions that |
||
| these rows. Please review the rendered posts before merging. | ||
| labels: | | ||
| automation | ||
| add-paths: | | ||
| _posts/** | ||
| assets/images/blog/** | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,10 +17,10 @@ To allow our scripts to access Google Drive and export documents, you need to cr | |
| 👉 **Note:** You need the **Project Editor** or **Owner** role on this project to create service accounts and keys. | ||
| If you’re the one who created the project, you already have these permissions. | ||
|
|
||
| ### 1. Enable the Drive API | ||
| ### 1. Enable the Drive and Sheets APIs | ||
| 1. In the left menu, go to **APIs & Services → Library**. | ||
| 2. Search for **Google Drive API**. | ||
| 3. Click **Enable**. | ||
| 2. Search for **Google Drive API** and click **Enable**. | ||
| 3. Search for **Google Sheets API** and click **Enable** (needed to read the submissions spreadsheet). | ||
|
|
||
| ### 2. Create a Service Account | ||
| 1. In the left menu, go to **IAM & Admin → Service Accounts**. | ||
|
|
@@ -47,6 +47,7 @@ If you’re the one who created the project, you already have these permissions. | |
| 4. Give it at least **Viewer** access. | ||
| 5. Save changes. | ||
| - Now the service account can read/export files in that folder or doc. | ||
| 6. Repeat the **Share** step for the **blog submissions spreadsheet** (the Google Form responses sheet), giving the service account **Editor** access. Editor (not just Viewer) is required because the pipeline writes `isPublished = TRUE` back to a row after exporting it. | ||
|
|
||
| --- | ||
|
|
||
|
|
@@ -75,8 +76,53 @@ Then the **Document ID** is: | |
|
|
||
| Use this ID in your scripts when exporting the document. | ||
|
|
||
| ## Run Automation | ||
| ## Export a single blog manually (for testing) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The README now correctly references |
||
| 1. Activate virtual environment: `source venv/bin/activate` | ||
| 2. Run the script: `python doc_to_html_conversion.py <DOC_ID>` | ||
| 2. Export one Google Doc into a post: | ||
| `python blog_exporter.py --doc_id <DOC_ID> --author_name "Jane Doe" --image_link "<DRIVE_IMAGE_LINK>"` | ||
|
|
||
| This is handy to check a Doc renders correctly. The full pipeline below reads all | ||
| of this metadata from the spreadsheet automatically. | ||
|
|
||
| ## Tests | ||
|
|
||
| Run `pytest test_blog_exporter.py` | ||
|
|
||
| ## CI/CD pipeline: publish a blog when you mark it reviewed | ||
|
|
||
| The Google Sheet is the **single source of truth** — there is no local CSV. The | ||
| GitHub Action [`.github/workflows/run_blog_exporter.yml`](../../.github/workflows/run_blog_exporter.yml) | ||
| turns a reviewed blog into a draft pull request automatically. | ||
|
|
||
| ### How to publish a blog (the editor's workflow) | ||
| 1. In the submissions spreadsheet (the **Form Responses 1** sheet), set the row's | ||
| **`isReviewedandApproved`** cell to **`TRUE`** once the draft is reviewed. | ||
| Leave **`isPublished`** blank/`FALSE`. | ||
| 2. Within a day (or immediately via **Actions → Publish reviewed blogs → Run | ||
| workflow**) the action exports the blog, sets that row's **`isPublished`** to | ||
| `TRUE` in the sheet, and opens/updates a PR | ||
| (`Automated import of reviewed blog posts`) with the new post and cover image. | ||
| 3. **Review the rendered post and merge.** | ||
|
|
||
| ### What runs | ||
| `publish_reviewed_blogs.py` reads the sheet and exports every row where | ||
| `isReviewedandApproved` is `TRUE` and `isPublished` is not `TRUE`. Because the | ||
| `isPublished` flag is written straight back to the sheet, a blog is never exported | ||
| twice — and the existing backlog (already `isPublished = TRUE`) is left alone. | ||
|
|
||
| > The draft must be a **native Google Doc** (Drive can only export those to | ||
| > Markdown). If a submitter uploaded a `.docx`/`.pdf`, open it and do | ||
| > **File → Save as Google Docs** first, otherwise that row is skipped with an error. | ||
|
|
||
| ### One-time repo setup | ||
| - **Service account needs Editor access to the spreadsheet** (see setup step 4) so | ||
| the pipeline can write back `isPublished`. | ||
| - **Secret `BLOG_AUTOMATION_SERVICE_ACCOUNT`** — paste the full contents of | ||
| `service_account_key.json` into a repository secret with this name | ||
| (Settings → Secrets and variables → Actions). The workflow writes it to disk at | ||
| runtime and deletes it afterwards; the key is never committed. | ||
| - **Secret `GHA_ACTIONS_ALLOW_TOKEN`** — already used by the other automations; it | ||
| lets the action open the pull request. | ||
|
|
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branch name is fixed as
automation/import-blog, which means all batch imports accumulate into the same PR. If a previous batch PR is still open when a new daily run triggers, reviewers might approve a larger bundle than expected.\n\nWould it be worth using a date-stamped branch name (e.g.automation/import-blog-2026-06-28) so each run creates its own isolated PR, giving reviewers clearer control over what they're approving?