fix: update parser to correctly parse desired tokens by dandhlee · Pull Request #55 · googleapis/sphinx-docfx-yaml

dandhlee · 2021-06-23T07:13:31Z

Before you open a pull request, note that this repository is forked from here.
Unless the issue you're trying to solve is unique to this specific repository,
please file an issue and/or send changes upstream to the original as well.

Updated the parser in _extract_docstring_info to correctly parse for tokens by specifically looking for strict match like :type and not fail on input like <xref:type_>.

As well, updated the extractor to handle different ordering of docstring tokens. While GoogleDocstring only returns in specific order, for example :param comes before :type and :returns: comes before :rtype: but handwritten libraries sometimes flip these, and I don't see in Google Docstrings page that it should always come in specific order. Returns type is the only set that the extractor trips on different ordering, updated this bit.

Adding unit tests for the above bits (and the function in general) as well.

Fixes #52

It's a good idea to open an issue first for discussion.

Tests pass
Appropriate changes to README are included in PR

tbpg · 2021-06-23T14:15:17Z

+        #  Adding the extra space for non-colon ending types
+        #  helps determine if we simply ran into desired occurrence
+        #  or if we ran into a similar looking syntax but shouldn't
+        #  parse upon it.


Minor: extra space at the beginning of these comments?

tbpg · 2021-06-23T14:15:52Z

+
+    # Store the top summary separately.
+    if index == 0:
+        top_summary = summary


Can we return at this point and avoid needing the else (and the indentation that comes with it)?

Just return summary directly?

tbpg · 2021-06-23T14:16:31Z

+        parsed_text = parsed_text[index:]
+
+        # Clean up whitespace and other characters
+        parsed_text = " ".join(filter(None, re.split(r'\n|  |\|\s', parsed_text))).split(" ")


Missed this before -- why do we need \n and in addition to \s?

I'm not sure I understand " ".join(stuff).split(" "). Isn't that the same as stuff?

the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.

for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:

>>> list(filter(None, re.split(r'\|\s', f_line))) ['\thello.\n world.']

>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split() ['hello.', 'world.']

If stuff items contain spaces, the resulting stuff is not the same as the original one.

>>> stuff = ["one two", "three four"] >>> " ".join(stuff).split(" ") ['one', 'two', 'three', 'four']

(no idea what the line does, though, not familiar with the extension 😄 )

tbpg · 2021-06-23T14:21:12Z

+        self.assertEqual(summary_info1_got, summary_info1_want)
+
+
+        ## Test for input coming in mixed format.


Consider creating a separate test case for each summary. It's a bit hard to follow all of these as-is.

dandhlee

Thanks for the review! Please take a look again :)

dandhlee · 2021-06-23T15:20:33Z

+        #  Adding the extra space for non-colon ending types
+        #  helps determine if we simply ran into desired occurrence
+        #  or if we ran into a similar looking syntax but shouldn't
+        #  parse upon it.


dandhlee · 2021-06-23T15:26:45Z

+
+    # Store the top summary separately.
+    if index == 0:
+        top_summary = summary


dandhlee · 2021-06-23T17:12:47Z

+        parsed_text = parsed_text[index:]
+
+        # Clean up whitespace and other characters
+        parsed_text = " ".join(filter(None, re.split(r'\n|  |\|\s', parsed_text))).split(" ")


the order seemed to have mattered, no need for \n and if I put \s in front. Fixed it.

dandhlee · 2021-06-23T17:58:48Z

+        parsed_text = parsed_text[index:]
+
+        # Clean up whitespace and other characters
+        parsed_text = " ".join(filter(None, re.split(r'\n|  |\|\s', parsed_text))).split(" ")


for filter, it is slightly different.
list(filter(...)) simply turns the filtered object into a list, while " ".join(filter(...)) transforms the filter object further.
For example:

>>> list(filter(None, re.split(r'\|\s', f_line))) ['\thello.\n world.']

>>> " ".join(filter(None, re.split(r'\|\s', f_line))).split() ['hello.', 'world.']

dandhlee · 2021-06-23T18:52:26Z

+        self.assertEqual(summary_info1_got, summary_info1_want)
+
+
+        ## Test for input coming in mixed format.


… fix_parser

tbpg · 2021-06-24T18:06:14Z

+
+    # Store the top summary separately.
+    if index == 0:
+        top_summary = summary


Just return summary directly?

dandhlee · 2021-06-24T18:10:46Z

done! Updated to return summary directly.

dandhlee added 3 commits June 23, 2021 00:50

fix: correct parser to scan specific tokens only

aa1c7b8

fix: update parser for varying input types

5830e6d

test: add unittest for extract_docstring_info

aa9abfc

dandhlee requested a review from a team June 23, 2021 07:13

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jun 23, 2021

This was referenced Jun 23, 2021

chore: pin sphinx plugin version to working one googleapis/python-bigquery#715

Merged

fix: properly handle Raises section for GoogleDocstring #56

Merged

tbpg suggested changes Jun 23, 2021

View reviewed changes

dandhlee commented Jun 23, 2021

View reviewed changes

dandhlee added 2 commits June 23, 2021 14:53

fix: update parser and test

18757e8

Merge branch 'master' of github.com:googleapis/sphinx-docfx-yaml into…

7baff61

… fix_parser

dandhlee requested a review from tbpg June 23, 2021 18:55

tbpg approved these changes Jun 24, 2021

View reviewed changes

fix: update to return summary directly

aabd1d3

dandhlee merged commit d1e18c7 into master Jun 24, 2021

dandhlee deleted the fix_parser branch June 24, 2021 18:10

		self.assertEqual(summary_info1_got, summary_info1_want)


		## Test for input coming in mixed format.

Conversation

dandhlee commented Jun 23, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dandhlee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dandhlee commented Jun 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants