fix: update parser to correctly parse desired tokens#55
Conversation
| # Adding the extra space for non-colon ending types | ||
| # helps determine if we simply ran into desired occurrence | ||
| # or if we ran into a similar looking syntax but shouldn't | ||
| # parse upon it. |
There was a problem hiding this comment.
Minor: extra space at the beginning of these comments?
|
|
||
| # Store the top summary separately. | ||
| if index == 0: | ||
| top_summary = summary |
There was a problem hiding this comment.
Can we return at this point and avoid needing the else (and the indentation that comes with it)?
There was a problem hiding this comment.
Just return summary directly?
| parsed_text = parsed_text[index:] | ||
|
|
||
| # Clean up whitespace and other characters | ||
| parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ") |
There was a problem hiding this comment.
Missed this before -- why do we need \n and in addition to \s?
There was a problem hiding this comment.
I'm not sure I understand " ".join(stuff).split(" "). Isn't that the same as stuff?
There was a problem hiding this comment.
the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.
There was a problem hiding this comment.
for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:
>>> list(filter(None, re.split(r'\|\s', f_line)))
['\thello.\n world.']
>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split()
['hello.', 'world.']
There was a problem hiding this comment.
If stuff items contain spaces, the resulting stuff is not the same as the original one.
>>> stuff = ["one two", "three four"]
>>> " ".join(stuff).split(" ")
['one', 'two', 'three', 'four'](no idea what the line does, though, not familiar with the extension 😄 )
| self.assertEqual(summary_info1_got, summary_info1_want) | ||
|
|
||
|
|
||
| ## Test for input coming in mixed format. |
There was a problem hiding this comment.
Consider creating a separate test case for each summary. It's a bit hard to follow all of these as-is.
dandhlee
left a comment
There was a problem hiding this comment.
Thanks for the review! Please take a look again :)
| # Adding the extra space for non-colon ending types | ||
| # helps determine if we simply ran into desired occurrence | ||
| # or if we ran into a similar looking syntax but shouldn't | ||
| # parse upon it. |
|
|
||
| # Store the top summary separately. | ||
| if index == 0: | ||
| top_summary = summary |
| parsed_text = parsed_text[index:] | ||
|
|
||
| # Clean up whitespace and other characters | ||
| parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ") |
There was a problem hiding this comment.
the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.
| parsed_text = parsed_text[index:] | ||
|
|
||
| # Clean up whitespace and other characters | ||
| parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ") |
There was a problem hiding this comment.
for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:
>>> list(filter(None, re.split(r'\|\s', f_line)))
['\thello.\n world.']
>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split()
['hello.', 'world.']
| self.assertEqual(summary_info1_got, summary_info1_want) | ||
|
|
||
|
|
||
| ## Test for input coming in mixed format. |
|
|
||
| # Store the top summary separately. | ||
| if index == 0: | ||
| top_summary = summary |
There was a problem hiding this comment.
Just return summary directly?
|
done! Updated to return summary directly. |
Before you open a pull request, note that this repository is forked from here.
Unless the issue you're trying to solve is unique to this specific repository,
please file an issue and/or send changes upstream to the original as well.
Updated the parser in
_extract_docstring_infoto correctly parse for tokens by specifically looking for strict match like:typeand not fail on input like<xref:type_>.As well, updated the extractor to handle different ordering of docstring tokens. While
GoogleDocstringonly returns in specific order, for example:paramcomes before:typeand:returns:comes before:rtype:but handwritten libraries sometimes flip these, and I don't see in Google Docstrings page that it should always come in specific order. Returns type is the only set that the extractor trips on different ordering, updated this bit.Adding unit tests for the above bits (and the function in general) as well.
Fixes #52