Conversation
There was a problem hiding this comment.
Pull request overview
Adds pandas DataFrame/Series wrappers to the Dataverse Python SDK so callers can perform CRUD operations using DataFrame-native inputs/outputs, plus accompanying docs, examples, and tests.
Changes:
- Added
DataverseClientDataFrame CRUD wrapper methods:get_dataframe,create_dataframe,update_dataframe,delete_dataframe. - Added unit tests and end-to-end example demonstrating DataFrame CRUD workflows.
- Updated docs/README and added
pandasas a dependency.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/PowerPlatform/Dataverse/client.py |
Implements DataFrame CRUD wrapper methods on DataverseClient. |
tests/unit/test_client_dataframe.py |
Adds unit coverage for DataFrame CRUD wrappers. |
examples/advanced/dataframe_operations.py |
Adds a walkthrough script showing DataFrame CRUD usage. |
pyproject.toml |
Adds pandas to project dependencies. |
README.md |
Documents DataFrame CRUD usage examples. |
src/PowerPlatform/Dataverse/claude_skill/dataverse-sdk-use/SKILL.md |
Documents DataFrame CRUD usage in the packaged skill doc. |
.claude/skills/dataverse-sdk-use/SKILL.md |
Documents DataFrame CRUD usage in the repo-local skill doc. |
.gitignore |
Ignores additional Claude local markdown files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@zhaodongwang-msft I've opened a new pull request, #99, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@zhaodongwang-msft I've opened a new pull request, #100, to work on those changes. Once the pull request is ready, I'll request review from you. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
We'll want to track 2 followups from this that are dependent on other refactor PRs so we can't do quite yet:
|
…lete input validation, export DataFrameOperations (#145) Addresses four unresolved review comments from PR #98 against the `client.dataframe` namespace: a crash on array-valued cells, silent NumPy serialization failures, missing ID validation in `update()` and `delete()`, and missing exports/tests. ## `utils/_pandas.py` - **Fix `pd.notna()` crash on array-like cells**: Guard with `pd.api.types.is_scalar(v)` before calling `pd.notna()`; non-scalar values (lists, dicts, numpy arrays) pass through directly. Previously raised `ValueError: The truth value of an array is ambiguous`. - **Normalize NumPy scalar types**: New `_normalize_scalar(v)` helper converts `np.integer` → `int`, `np.floating` → `float`, `np.bool_` → `bool`, `pd.Timestamp` → ISO string. DataFrames with integer columns produce `np.int64` by default, which `json.dumps()` cannot serialize. ```python # Before: would crash or produce non-serializable values df = pd.DataFrame([{"tags": ["a", "b"]}, {"count": np.int64(5)}]) dataframe_to_records(df) # ValueError / TypeError at serialization time # After: safe [{"tags": ["a", "b"]}, {"count": 5}] ``` ## `operations/dataframe.py` - **`update()` — validate `id_column` values**: After extracting IDs, raises `ValueError` listing offending row indices if any value is not a non-empty string (catches `NaN`, `None`, numeric IDs). - **`update()` — validate non-empty change columns**: Raises `ValueError` if the DataFrame contains only the `id_column` and no fields to update. - **`delete()` — validate `ids` Series**: Returns `None` immediately for an empty Series; raises `ValueError` listing offending indices for any non-string or blank value. ## `operations/__init__.py` - Exports `DataFrameOperations` so consumers can use it for type annotations. ## Tests - `tests/unit/test_pandas_helpers.py` — 11 isolated tests for `dataframe_to_records()` covering NaN handling, NumPy type normalization, Timestamp conversion, list/dict passthrough, and empty input. - `tests/unit/test_dataframe_operations.py` — 35 tests covering the full `DataFrameOperations` namespace, including all new validation paths. <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> ## Context This PR addresses unresolved review comments from PR #98 ("add dataframe methods") and adds comprehensive test coverage for the DataFrame operations namespace (`client.dataframe`). The base branch is `users/zhaodongwang/dataFrameExtensionClaude` which contains the current state of the DataFrame operations code from PR #98. ## Files to modify ### 1. `src/PowerPlatform/Dataverse/utils/_pandas.py` Current code at the HEAD of the PR branch (`8838bb69533dd8830bac8724c44696771a6704e7`): ```python # Copyright (c) Microsoft Corporation. # Licensed under the MIT license. """Internal pandas helpers""" from __future__ import annotations from typing import Any, Dict, List import pandas as pd def dataframe_to_records(df: pd.DataFrame, na_as_null: bool = False) -> List[Dict[str, Any]]: """Convert a DataFrame to a list of dicts, converting Timestamps to ISO strings. :param df: Input DataFrame. :param na_as_null: When False (default), missing values are omitted from each dict. When True, missing values are included as None (sends null to Dataverse, clearing the field). """ records = [] for row in df.to_dict(orient="records"): clean = {} for k, v in row.items(): if pd.notna(v): clean[k] = v.isoformat() if isinstance(v, pd.Timestamp) else v elif na_as_null: clean[k] = None records.append(clean) return records ``` **Required changes:** #### Fix A: `pd.notna()` crash on array-like values (unresolved comment #98 (comment)) `pd.notna(v)` raises `ValueError: The truth value of an array is ambiguous` when a cell contains a list, dict, numpy array, etc. Fix by guarding with `pd.api.types.is_scalar(v)`: ```python for k, v in row.items(): if pd.api.types.is_scalar(v): if pd.notna(v): clean[k] = _normalize_scalar(v) elif na_as_null: clean[k] = None else: clean[k] = v # pass through lists, dicts, etc. ``` #### Fix B: NumPy scalar types not normalized (acknowledged but deferred by author in #98 (comment)) NumPy scalars (`np.int64`, `np.float64`, `np.bool_`) are NOT JSON-serializable by default `json.dumps()`. DataFrames with integer columns produce `np.int64` values. Add a helper function `_normalize_scalar(v)` that: - Converts `pd.Timestamp` to `.isoformat()` - Converts `numpy.integer` types to Python `int` - Converts `numpy.floating` types to Python `float` - Converts `numpy.bool_` to Python `bool` - Passes everything else through Use `import numpy as np` and `isinstance` checks. ### 2. `src/PowerPlatform/Dataverse/operations/dataframe.py` Current code at the HEAD of the PR branch: ```python # Copyright (c) Microsoft Corporation. # Licensed under the MIT license. """DataFrame CRUD operations namespace for the Dataverse SDK.""" from __future__ import annotations from typing import List, Optional, TYPE_CHECKING import pandas as pd from ..utils._pandas import dataframe_to_records if TYPE_CHECKING: from ..client import DataverseClient __all__ = ["DataFrameOperations"] class DataFrameOperations: """Namespace for pandas DataFrame CRUD operations. ... """ def __init__(self, client: DataverseClient) -> None: self._client = client def get(self, table, record_id=None, select=None, filter=None, orderby=None, top=None, expand=None, page_size=None) -> pd.DataFrame: # ... (current code) pass def create(self, table, records) -> pd.Series: # ... (current code with empty DataFrame check and ID count validation) pass def update(self, table, changes, id_column, clear_nulls=False) -> None: if not isinstance(changes, pd.DataFrame): raise TypeError("changes must be a pandas DataFrame") if id_column not in changes.columns: raise ValueError(f"id_column '{id_column}' not found in DataFrame columns") ids = changes[id_column].tolist() change_columns = [column for column in changes.columns if column != id_column] change_list = dataframe_to_records(changes[change_columns], na_as_null=clear_nulls) if len(ids) == 1: self._client.records.update(table, ids[0], change_list[0]) else: self._client.records.update(table, ids, change_list) def delete(self, table, ids, use_bulk_delete=True) -> Optional[str]: if not isinstance(ids, pd.Series): raise TypeError("ids must be a pandas Series") id_list = ids.tolist() if len(id_list) == 1: return self._client.records.delete(table, id_list[0]) else: return self._client.records.delete(table, id_list, use_bulk_delete=use_bulk_delete) ``` **Required changes:** #### Fix C: Validate `id_column` values in... </details> <!-- START COPILOT CODING AGENT SUFFIX --> *This pull request was created from Copilot chat.* > <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/PowerPlatform-DataverseClient-Python/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
639f0dd to
ca7fcad
Compare
Adds a client.dataframe namespace with pandas DataFrame/Series wrappers for all CRUD operations (get, create, update, delete). Includes dataframe_to_records helper for NumPy/datetime type normalization, comprehensive test suite (103 tests across 3 files), basic CRUD example, and documentation updates. Key features: - get(): consolidates all pages into single DataFrame - create(): returns Series of GUIDs aligned with input index - update(): clear_nulls parameter controls NaN/None handling - delete(): supports bulk and sequential deletion - Type normalization: np.int64/float64/bool_/ndarray, datetime types, Timestamps - ID validation with whitespace stripping and index label error reporting - pandas>=2.0.0 added as required dependency
…ssment Pro-dev quick start (prodev_quick_start.py): - Create 4 custom tables with columns and relationships - Populate via DataFrame CRUD, query, analyze, update, delete, cleanup - Uses result.primary_name_attribute for correct column naming - Includes pandas vs dict/list rationale Data science risk assessment (datascience_risk_assessment.py): - 5-step pipeline: Extract -> Analyze -> LLM Summarize -> Write-back -> Report - Statistical risk scoring with pandas/numpy (groupby, merge, percentile) - 3 LLM provider options: Azure AI Inference, OpenAI, GitHub Copilot SDK - Template-based fallback when no LLM configured - LLM interaction logging (prompts, responses, timing) - Optional matplotlib visualization with chart export - CSV data exports to script-relative output folder
ca7fcad to
a52b862
Compare
SDK changes: - _get_entity_by_table_schema_name selects PrimaryNameAttribute and PrimaryIdAttribute from EntityDefinitions - _create_table and _get_table_info return primary_name_attribute - TableInfo model includes primary_name_attribute and primary_id_attribute fields with legacy dict-key access - Tests for from_dict, from_api_response, and legacy key access This allows tables.create() and tables.get() callers to discover the actual primary column logical name instead of guessing from the prefix.
a52b862 to
8a26fd7
Compare
…rty names) @odata.bind keys are case-sensitive and the SDK preserves their casing (does not lowercase keys containing @OData.). The navigation property logical names must be lowercase to match Dataverse's schema. Changed: build @odata.bind keys using .lower() on the lookup field name (e.g., 'new_demoprojectf523b5_customerid@odata.bind' instead of 'new_DemoProjectf523b5_CustomerId@odata.bind').
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 18 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…x Copilot SDK package name - get(): replaced per-page DataFrame concat with rows.extend() + from_records() for better performance on large paginated results - create(): detect and raise ValueError for rows with all NaN/None values (empty dicts) before sending to the API - Standardized Copilot SDK package name to 'github-copilot-sdk' everywhere
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 18 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…y select comment - __init__.py: use typing.List[str] instead of list[str] for broader compat - datascience: precompute weighted_value column before groupby (faster, cleaner) - prodev: clarify that SDK auto-lowercases select values
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 18 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…LM logs by default, fix string concat - prodev: use primary_id_attribute from tables.create() instead of guessing the ID column name (e.g., TABLE_TASKid) - get(): return empty DataFrame with select columns (not columnless) for consistent downstream access on empty results - LLM log: redact prompts/responses by default (include_prompts=False) to avoid logging sensitive data; metadata (timing, provider) always logged - Fix implicit string concatenation in error messages
… cleanup - test_get_no_results_with_select_preserves_columns: verifies empty result with select returns DataFrame with expected column names - test_create_all_nan_rows_raises: verifies all-NaN rows raise ValueError - prodev: cleanup now prompts user (Y/n) before deleting demo tables, matching the pattern in alternate_keys_upsert.py
Summary
Adds a
client.dataframenamespace with pandas DataFrame/Series wrappers for all CRUD operations, plus two advanced example scripts, and a minor SDK enhancement for table metadata. Users can now query, create, update, and delete Dataverse records using DataFrame-native inputs and outputs -- no manual dict conversion required.Quick Example
Changes
DataFrame CRUD (
client.dataframenamespace)src/.../operations/dataframe.pyDataFrameOperationsclass:get(),create(),update(),delete()src/.../utils/_pandas.pydataframe_to_records()helper -- normalizes NumPy, datetime, NaN/Noneclient.pyself.dataframe = DataFrameOperations(self)pyproject.tomlpandas>=2.0.0required dependencyREADME.mdoperations/__init__.py__all__ = [])SDK Enhancement: TableInfo primary column metadata (fixes #148)
src/.../data/_odata.py_get_entity_by_table_schema_name()and_get_table_info()now selectPrimaryNameAttributeandPrimaryIdAttributefrom EntityDefinitionssrc/.../models/table_info.pyTableInfoincludesprimary_name_attributeandprimary_id_attributefieldstests/unit/models/test_table_info.pyfrom_dict,from_api_response, and legacy key accessAdvanced Examples
examples/advanced/dataframe_operations.pyexamples/advanced/prodev_quick_start.pyresult.primary_name_attributefromtables.create()examples/advanced/datascience_risk_assessment.pyTest Files
test_dataframe_operations.pytest_client_dataframe.pytest_pandas_helpers.pytest_table_info.pyAPI Design
get(table, ...)pd.DataFramerecords.get()get(table, record_id=...)pd.DataFramerecords.get()create(table, df)pd.DataFramepd.Seriesof GUIDsCreateMultipleupdate(table, df, id_column)pd.DataFrameNoneUpdateMultipledelete(table, ids)pd.SeriesOptional[str]BulkDeleteDesign Decisions
clear_nulls: DefaultFalseskips NaN (field unchanged).Truesends null to clear.Test Results
Issues Addressed
tables.create()now exposesprimary_name_attributevia Dataverse metadataQueryBuilder.to_dataframe()tracked for future work