Skip to content
This repository was archived by the owner on Mar 26, 2026. It is now read-only.

fix: update parser to correctly parse desired tokens#55

Merged
dandhlee merged 6 commits intomasterfrom
fix_parser
Jun 24, 2021
Merged

fix: update parser to correctly parse desired tokens#55
dandhlee merged 6 commits intomasterfrom
fix_parser

Conversation

@dandhlee
Copy link
Copy Markdown
Collaborator

Before you open a pull request, note that this repository is forked from here.
Unless the issue you're trying to solve is unique to this specific repository,
please file an issue and/or send changes upstream to the original as well.


Updated the parser in _extract_docstring_info to correctly parse for tokens by specifically looking for strict match like :type and not fail on input like <xref:type_>.

As well, updated the extractor to handle different ordering of docstring tokens. While GoogleDocstring only returns in specific order, for example :param comes before :type and :returns: comes before :rtype: but handwritten libraries sometimes flip these, and I don't see in Google Docstrings page that it should always come in specific order. Returns type is the only set that the extractor trips on different ordering, updated this bit.

Adding unit tests for the above bits (and the function in general) as well.

Fixes #52

It's a good idea to open an issue first for discussion.

  • Tests pass
  • Appropriate changes to README are included in PR

@dandhlee dandhlee requested a review from a team June 23, 2021 07:13
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jun 23, 2021
Comment thread docfx_yaml/extension.py Outdated
Comment on lines +296 to +299
# Adding the extra space for non-colon ending types
# helps determine if we simply ran into desired occurrence
# or if we ran into a similar looking syntax but shouldn't
# parse upon it.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: extra space at the beginning of these comments?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment thread docfx_yaml/extension.py Outdated

# Store the top summary separately.
if index == 0:
top_summary = summary
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return at this point and avoid needing the else (and the indentation that comes with it)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return summary directly?

Comment thread docfx_yaml/extension.py Outdated
parsed_text = parsed_text[index:]

# Clean up whitespace and other characters
parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this before -- why do we need \n and in addition to \s?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand " ".join(stuff).split(" "). Isn't that the same as stuff?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:

>>> list(filter(None, re.split(r'\|\s', f_line)))
['\thello.\n world.']
>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split()
['hello.', 'world.']

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If stuff items contain spaces, the resulting stuff is not the same as the original one.

>>> stuff = ["one two", "three four"]
>>> " ".join(stuff).split(" ")
['one', 'two', 'three', 'four']

(no idea what the line does, though, not familiar with the extension 😄 )

Comment thread tests/test_unit.py
self.assertEqual(summary_info1_got, summary_info1_want)


## Test for input coming in mixed format.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider creating a separate test case for each summary. It's a bit hard to follow all of these as-is.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Copy Markdown
Collaborator Author

@dandhlee dandhlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Please take a look again :)

Comment thread docfx_yaml/extension.py Outdated
Comment on lines +296 to +299
# Adding the extra space for non-colon ending types
# helps determine if we simply ran into desired occurrence
# or if we ran into a similar looking syntax but shouldn't
# parse upon it.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment thread docfx_yaml/extension.py Outdated

# Store the top summary separately.
if index == 0:
top_summary = summary
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment thread docfx_yaml/extension.py Outdated
parsed_text = parsed_text[index:]

# Clean up whitespace and other characters
parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.

Comment thread docfx_yaml/extension.py Outdated
parsed_text = parsed_text[index:]

# Clean up whitespace and other characters
parsed_text = " ".join(filter(None, re.split(r'\n| |\|\s', parsed_text))).split(" ")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:

>>> list(filter(None, re.split(r'\|\s', f_line)))
['\thello.\n world.']
>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split()
['hello.', 'world.']

Comment thread tests/test_unit.py
self.assertEqual(summary_info1_got, summary_info1_want)


## Test for input coming in mixed format.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@dandhlee dandhlee requested a review from tbpg June 23, 2021 18:55
Comment thread docfx_yaml/extension.py Outdated

# Store the top summary separately.
if index == 0:
top_summary = summary
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return summary directly?

@dandhlee
Copy link
Copy Markdown
Collaborator Author

done! Updated to return summary directly.

@dandhlee dandhlee merged commit d1e18c7 into master Jun 24, 2021
@dandhlee dandhlee deleted the fix_parser branch June 24, 2021 18:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

plugin parses docstring differently than expected

3 participants