feat(crawler): advanced session analytics with historical metrics (closes #15)#39
Open
mudassaralichouhan wants to merge 1 commit into
Open
Conversation
- Added a new analytics system to track and record metrics for crawl sessions, including success rates, latency, and error counts. - Introduced `SessionMetricsRecord` to encapsulate session data and `ISessionAnalyticsStore` for storing metrics. - Enhanced `CrawlerManager` to capture seed URL and domain, and record analytics upon session completion. - Implemented new API endpoints in `SearchController` for retrieving session analytics, including session details, comparisons, and trends. - Added unit tests for the analytics functionality to ensure accuracy and reliability. These changes significantly improve the monitoring and reporting capabilities of the crawling process, enabling better insights into performance and issues.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements advanced session analytics for the crawler, as requested in #15.
Every crawl session — success or failure — now produces a persisted metrics
record. New API endpoints expose historical data, session comparisons, and
time-bucketed trend reports.
Closes #15
Acceptance criteria (from the issue)
Architecture
New module:
search_engine/crawlerSessionMetricsRecord.hSessionAnalytics.hbuildFromResults,summarize,compare,trends(time-bucketed). No I/O, no threads → trivially unit-testableSessionAnalyticsStore.hISessionAnalyticsStoreinterface + thread-safeInMemorySessionAnalyticsStore(capacity-bounded, FIFO eviction, idempotentput)CrawlerManager hookup
CrawlerManagerowns anInMemorySessionAnalyticsStore(capacity 10k),exposed via
getAnalyticsStore().CrawlSessioncarriesseedUrl,seedDomain,startedAt.recordSessionAnalytics()runs at session completion, builds aSessionMetricsRecordfrom the finalCrawlResultvector, and stores it.Errors are swallowed — telemetry never breaks a crawl.
API endpoints
/api/analytics/sessions[?limit=N]/api/analytics/sessions/detail?sessionId=.../api/analytics/sessions/compare?ids=a,b,c/api/analytics/sessions/trends?windowMs=86400000&bucketMs=3600000Example
Tests
tests/crawler/session_analytics_tests.cpp— 13 cases:buildFromResultscounts/rates correctnesssummarizeroll-up across records + empty inputcompareB−A delta semanticstrendsbucketing into hourly slices with window filteringtrendszero-width-bucket guardInMemoryStoreput/get/getAll, idempotent put, FIFO eviction,getInWindowfiltering, clearRun:
./build/tests/crawler/crawler_tests "[SessionAnalytics]"Notes
ISessionAnalyticsStoreinterface lets a MongoDB-backed implementationdrop in later without touching callers.
SessionAnalyticshelpers are storage-agnostic — unit-testable withoutMongoDB or any heavy dependency.
recorded as a passive side effect at completion.