Skip to content

Error when attempting to access private repo #30

Description

@jfredrickson5

Attempting to run scraper on a GitHub org with private repos results in an error.

Output:

% scraper --config config.json                     
2019-04-23 17:29:12,536 - INFO: Connected to: https://github.com                                     
2019-04-23 17:29:12,773 - INFO: Processing: GSA/private-test                                         
Traceback (most recent call last):
  File "/home/jf/.pyenv/versions/3.7.0/bin/scraper", line 11, in <module>                            
    load_entry_point('llnl-scraper', 'console_scripts', 'scraper')()                                 
  File "/home/jf/gsa/scraper/scraper/gen_code_gov_json.py", line 76, in main                         
    code_json = code_gov.process_config(config_json)                                                 
  File "/home/jf/gsa/scraper/scraper/code_gov/__init__.py", line 58, in process_config               
    code_gov_project = Project.from_github3(repo, labor_hours=compute_labor_hours)                   
  File "/home/jf/gsa/scraper/scraper/code_gov/models.py", line 217, in from_github3                  
    elif date_parse(repository.created_at) < POLICY_START_DATE:                                      
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1356, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 645, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)                                             
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 721, in _parse
    l = _timelex.split(timestr)         # Splits the timestr into tokens                             
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 207, in split
    return list(cls(s))
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 76, in __init__
    '{itype}'.format(itype=instream.__class__.__name__))                                             
TypeError: Parser must be a string or character stream, not datetime

Here is a simplified config.json as a test case. The GSA/private-test repo is private and contains a README.md file.

{
  "agency": "GSA",
  "contact_email": "[email protected]",
  "GitHub": [
    {
      "public_only": false,
      "repos": [
        "GSA/private-test"
      ]
    }
  ]
}

Example of a real config.json where we encountered the issue. It scans properly until it arrives at a private repo, at which point it crashes.

{
  "agency": "GSA",
  "contact_email": "[email protected]",
  "GitHub": [
    {
      "public_only": false,
      "orgs": [
        "GSA",
        "18F",
        "presidential-innovation-fellows",
        "USWDS"
      ],
    }
  ]
}

Verified that my GitHub access token is valid and can view private repos by using the same token for a different script.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions