Skip to content

Add client.dataframe namespace for pandas DataFrame CRUD operations#98

Merged
saurabhrb merged 8 commits intomainfrom
users/zhaodongwang/dataFrameExtensionClaude
Mar 17, 2026
Merged

Add client.dataframe namespace for pandas DataFrame CRUD operations#98
saurabhrb merged 8 commits intomainfrom
users/zhaodongwang/dataFrameExtensionClaude

Conversation

@zhaodongwang-msft
Copy link
Collaborator

@zhaodongwang-msft zhaodongwang-msft commented Feb 11, 2026

Summary

Adds a client.dataframe namespace with pandas DataFrame/Series wrappers for all CRUD operations, plus two advanced example scripts, and a minor SDK enhancement for table metadata. Users can now query, create, update, and delete Dataverse records using DataFrame-native inputs and outputs -- no manual dict conversion required.

Quick Example

import pandas as pd
from azure.identity import InteractiveBrowserCredential
from PowerPlatform.Dataverse.client import DataverseClient

credential = InteractiveBrowserCredential()

with DataverseClient("https://yourorg.crm.dynamics.com", credential) as client:

    # Query records as a DataFrame (all pages consolidated automatically)
    df = client.dataframe.get("account", select=["name", "telephone1"], top=5)

    # Create records from a DataFrame (returns Series of GUIDs)
    new_records = pd.DataFrame([
        {"name": "Acme Corp", "telephone1": "555-9000"},
        {"name": "Globex Inc", "telephone1": "555-9001"},
    ])
    new_records["accountid"] = client.dataframe.create("account", new_records)

    # Update records (NaN/None skipped by default; use clear_nulls=True to clear fields)
    new_records["telephone1"] = ["555-1111", "555-2222"]
    client.dataframe.update("account", new_records[["accountid", "telephone1"]], id_column="accountid")

    # Delete records
    client.dataframe.delete("account", new_records["accountid"])

Changes

DataFrame CRUD (client.dataframe namespace)

File Description
src/.../operations/dataframe.py DataFrameOperations class: get(), create(), update(), delete()
src/.../utils/_pandas.py dataframe_to_records() helper -- normalizes NumPy, datetime, NaN/None
client.py Added self.dataframe = DataFrameOperations(self)
pyproject.toml Added pandas>=2.0.0 required dependency
README.md DataFrame usage examples
operations/__init__.py Cleanup (__all__ = [])

SDK Enhancement: TableInfo primary column metadata (fixes #148)

File Description
src/.../data/_odata.py _get_entity_by_table_schema_name() and _get_table_info() now select PrimaryNameAttribute and PrimaryIdAttribute from EntityDefinitions
src/.../models/table_info.py TableInfo includes primary_name_attribute and primary_id_attribute fields
tests/unit/models/test_table_info.py Tests for new fields in from_dict, from_api_response, and legacy key access

Advanced Examples

File Description
examples/advanced/dataframe_operations.py DataFrame CRUD walkthrough
examples/advanced/prodev_quick_start.py Pro-dev: 4-table system with relationships, DataFrame CRUD, query/analyze. Uses result.primary_name_attribute from tables.create()
examples/advanced/datascience_risk_assessment.py Data science: 5-step risk pipeline with 3 LLM provider options (Azure AI Inference, OpenAI, GitHub Copilot SDK), matplotlib charts

Test Files

File Tests
test_dataframe_operations.py 44
test_client_dataframe.py 26
test_pandas_helpers.py 33
test_table_info.py +1 (primary fields)

API Design

Method Input Output Underlying API
get(table, ...) OData params pd.DataFrame records.get()
get(table, record_id=...) GUID 1-row pd.DataFrame records.get()
create(table, df) pd.DataFrame pd.Series of GUIDs CreateMultiple
update(table, df, id_column) pd.DataFrame None UpdateMultiple
delete(table, ids) pd.Series Optional[str] BulkDelete

Design Decisions

  • clear_nulls: Default False skips NaN (field unchanged). True sends null to clear.
  • Type normalization: np.int64/float64/bool_/ndarray, datetime/date/np.datetime64, pd.Timestamp -- all auto-converted.
  • ID validation: Strip whitespace, report DataFrame index labels in errors.
  • pandas required: Core dependency by team decision.

Test Results

396 passed, 8 warnings (pre-existing deprecation), 4 subtests passed
Check Result
Full test suite 396 pass, 0 fail
mypy 0 errors
black / isort Clean
E2E prodev PASS (4 tables, 3 relationships, 13 records, full CRUD cycle)
E2E datascience PASS (7 accounts, 3 cases, 8 opportunities, risk scoring, charts)
PR review threads 54/54 resolved

Issues Addressed

@zhaodongwang-msft zhaodongwang-msft requested a review from a team as a code owner February 11, 2026 18:32
Copilot AI review requested due to automatic review settings February 11, 2026 18:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds pandas DataFrame/Series wrappers to the Dataverse Python SDK so callers can perform CRUD operations using DataFrame-native inputs/outputs, plus accompanying docs, examples, and tests.

Changes:

  • Added DataverseClient DataFrame CRUD wrapper methods: get_dataframe, create_dataframe, update_dataframe, delete_dataframe.
  • Added unit tests and end-to-end example demonstrating DataFrame CRUD workflows.
  • Updated docs/README and added pandas as a dependency.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/PowerPlatform/Dataverse/client.py Implements DataFrame CRUD wrapper methods on DataverseClient.
tests/unit/test_client_dataframe.py Adds unit coverage for DataFrame CRUD wrappers.
examples/advanced/dataframe_operations.py Adds a walkthrough script showing DataFrame CRUD usage.
pyproject.toml Adds pandas to project dependencies.
README.md Documents DataFrame CRUD usage examples.
src/PowerPlatform/Dataverse/claude_skill/dataverse-sdk-use/SKILL.md Documents DataFrame CRUD usage in the packaged skill doc.
.claude/skills/dataverse-sdk-use/SKILL.md Documents DataFrame CRUD usage in the repo-local skill doc.
.gitignore Ignores additional Claude local markdown files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI commented Feb 12, 2026

@zhaodongwang-msft I've opened a new pull request, #99, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Feb 12, 2026

@zhaodongwang-msft I've opened a new pull request, #100, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tpellissier-msft
Copy link
Contributor

We'll want to track 2 followups from this that are dependent on other refactor PRs so we can't do quite yet:

  1. Moving the new client methods to a client.dataframe namespace
  2. adding QueryBuilder.to_dataframe to connect the dataframe scenario to the fluent syntax / querybuilder changes.

saurabhrb added a commit that referenced this pull request Mar 17, 2026
…lete input validation, export DataFrameOperations (#145)

Addresses four unresolved review comments from PR #98 against the
`client.dataframe` namespace: a crash on array-valued cells, silent
NumPy serialization failures, missing ID validation in `update()` and
`delete()`, and missing exports/tests.

## `utils/_pandas.py`

- **Fix `pd.notna()` crash on array-like cells**: Guard with
`pd.api.types.is_scalar(v)` before calling `pd.notna()`; non-scalar
values (lists, dicts, numpy arrays) pass through directly. Previously
raised `ValueError: The truth value of an array is ambiguous`.
- **Normalize NumPy scalar types**: New `_normalize_scalar(v)` helper
converts `np.integer` → `int`, `np.floating` → `float`, `np.bool_` →
`bool`, `pd.Timestamp` → ISO string. DataFrames with integer columns
produce `np.int64` by default, which `json.dumps()` cannot serialize.

```python
# Before: would crash or produce non-serializable values
df = pd.DataFrame([{"tags": ["a", "b"]}, {"count": np.int64(5)}])
dataframe_to_records(df)  # ValueError / TypeError at serialization time

# After: safe
[{"tags": ["a", "b"]}, {"count": 5}]
```

## `operations/dataframe.py`

- **`update()` — validate `id_column` values**: After extracting IDs,
raises `ValueError` listing offending row indices if any value is not a
non-empty string (catches `NaN`, `None`, numeric IDs).
- **`update()` — validate non-empty change columns**: Raises
`ValueError` if the DataFrame contains only the `id_column` and no
fields to update.
- **`delete()` — validate `ids` Series**: Returns `None` immediately for
an empty Series; raises `ValueError` listing offending indices for any
non-string or blank value.

## `operations/__init__.py`

- Exports `DataFrameOperations` so consumers can use it for type
annotations.

## Tests

- `tests/unit/test_pandas_helpers.py` — 11 isolated tests for
`dataframe_to_records()` covering NaN handling, NumPy type
normalization, Timestamp conversion, list/dict passthrough, and empty
input.
- `tests/unit/test_dataframe_operations.py` — 35 tests covering the full
`DataFrameOperations` namespace, including all new validation paths.

<!-- START COPILOT ORIGINAL PROMPT -->



<details>

<summary>Original prompt</summary>


## Context

This PR addresses unresolved review comments from PR #98 ("add dataframe
methods") and adds comprehensive test coverage for the DataFrame
operations namespace (`client.dataframe`).

The base branch is `users/zhaodongwang/dataFrameExtensionClaude` which
contains the current state of the DataFrame operations code from PR #98.

## Files to modify

### 1. `src/PowerPlatform/Dataverse/utils/_pandas.py`

Current code at the HEAD of the PR branch
(`8838bb69533dd8830bac8724c44696771a6704e7`):

```python
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""Internal pandas helpers"""

from __future__ import annotations

from typing import Any, Dict, List

import pandas as pd


def dataframe_to_records(df: pd.DataFrame, na_as_null: bool = False) -> List[Dict[str, Any]]:
    """Convert a DataFrame to a list of dicts, converting Timestamps to ISO strings.

    :param df: Input DataFrame.
    :param na_as_null: When False (default), missing values are omitted from each dict.
        When True, missing values are included as None (sends null to Dataverse, clearing the field).
    """
    records = []
    for row in df.to_dict(orient="records"):
        clean = {}
        for k, v in row.items():
            if pd.notna(v):
                clean[k] = v.isoformat() if isinstance(v, pd.Timestamp) else v
            elif na_as_null:
                clean[k] = None
        records.append(clean)
    return records
```

**Required changes:**

#### Fix A: `pd.notna()` crash on array-like values (unresolved comment
#98 (comment))

`pd.notna(v)` raises `ValueError: The truth value of an array is
ambiguous` when a cell contains a list, dict, numpy array, etc. Fix by
guarding with `pd.api.types.is_scalar(v)`:

```python
for k, v in row.items():
    if pd.api.types.is_scalar(v):
        if pd.notna(v):
            clean[k] = _normalize_scalar(v)
        elif na_as_null:
            clean[k] = None
    else:
        clean[k] = v  # pass through lists, dicts, etc.
```

#### Fix B: NumPy scalar types not normalized (acknowledged but deferred
by author in
#98 (comment))

NumPy scalars (`np.int64`, `np.float64`, `np.bool_`) are NOT
JSON-serializable by default `json.dumps()`. DataFrames with integer
columns produce `np.int64` values. Add a helper function
`_normalize_scalar(v)` that:
- Converts `pd.Timestamp` to `.isoformat()`
- Converts `numpy.integer` types to Python `int`
- Converts `numpy.floating` types to Python `float`
- Converts `numpy.bool_` to Python `bool`
- Passes everything else through

Use `import numpy as np` and `isinstance` checks.

### 2. `src/PowerPlatform/Dataverse/operations/dataframe.py`

Current code at the HEAD of the PR branch:

```python
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

"""DataFrame CRUD operations namespace for the Dataverse SDK."""

from __future__ import annotations

from typing import List, Optional, TYPE_CHECKING

import pandas as pd

from ..utils._pandas import dataframe_to_records

if TYPE_CHECKING:
    from ..client import DataverseClient


__all__ = ["DataFrameOperations"]


class DataFrameOperations:
    """Namespace for pandas DataFrame CRUD operations.
    ...
    """

    def __init__(self, client: DataverseClient) -> None:
        self._client = client

    def get(self, table, record_id=None, select=None, filter=None, orderby=None, top=None, expand=None, page_size=None) -> pd.DataFrame:
        # ... (current code)
        pass

    def create(self, table, records) -> pd.Series:
        # ... (current code with empty DataFrame check and ID count validation)
        pass

    def update(self, table, changes, id_column, clear_nulls=False) -> None:
        if not isinstance(changes, pd.DataFrame):
            raise TypeError("changes must be a pandas DataFrame")
        if id_column not in changes.columns:
            raise ValueError(f"id_column '{id_column}' not found in DataFrame columns")

        ids = changes[id_column].tolist()
        change_columns = [column for column in changes.columns if column != id_column]
        change_list = dataframe_to_records(changes[change_columns], na_as_null=clear_nulls)

        if len(ids) == 1:
            self._client.records.update(table, ids[0], change_list[0])
        else:
            self._client.records.update(table, ids, change_list)

    def delete(self, table, ids, use_bulk_delete=True) -> Optional[str]:
        if not isinstance(ids, pd.Series):
            raise TypeError("ids must be a pandas Series")

        id_list = ids.tolist()
        if len(id_list) == 1:
            return self._client.records.delete(table, id_list[0])
        else:
            return self._client.records.delete(table, id_list, use_bulk_delete=use_bulk_delete)
```

**Required changes:**

#### Fix C: Validate `id_column` values in...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for
you](https://github.com/microsoft/PowerPlatform-DataverseClient-Python/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
@saurabhrb saurabhrb force-pushed the users/zhaodongwang/dataFrameExtensionClaude branch from 639f0dd to ca7fcad Compare March 17, 2026 17:29
Saurabh Badenkal added 2 commits March 17, 2026 10:33
Adds a client.dataframe namespace with pandas DataFrame/Series wrappers for
all CRUD operations (get, create, update, delete). Includes dataframe_to_records
helper for NumPy/datetime type normalization, comprehensive test suite (103 tests
across 3 files), basic CRUD example, and documentation updates.

Key features:
- get(): consolidates all pages into single DataFrame
- create(): returns Series of GUIDs aligned with input index
- update(): clear_nulls parameter controls NaN/None handling
- delete(): supports bulk and sequential deletion
- Type normalization: np.int64/float64/bool_/ndarray, datetime types, Timestamps
- ID validation with whitespace stripping and index label error reporting
- pandas>=2.0.0 added as required dependency
…ssment

Pro-dev quick start (prodev_quick_start.py):
- Create 4 custom tables with columns and relationships
- Populate via DataFrame CRUD, query, analyze, update, delete, cleanup
- Uses result.primary_name_attribute for correct column naming
- Includes pandas vs dict/list rationale

Data science risk assessment (datascience_risk_assessment.py):
- 5-step pipeline: Extract -> Analyze -> LLM Summarize -> Write-back -> Report
- Statistical risk scoring with pandas/numpy (groupby, merge, percentile)
- 3 LLM provider options: Azure AI Inference, OpenAI, GitHub Copilot SDK
- Template-based fallback when no LLM configured
- LLM interaction logging (prompts, responses, timing)
- Optional matplotlib visualization with chart export
- CSV data exports to script-relative output folder
@saurabhrb saurabhrb force-pushed the users/zhaodongwang/dataFrameExtensionClaude branch from ca7fcad to a52b862 Compare March 17, 2026 17:34
@saurabhrb saurabhrb requested a review from Copilot March 17, 2026 17:40
SDK changes:
- _get_entity_by_table_schema_name selects PrimaryNameAttribute and
  PrimaryIdAttribute from EntityDefinitions
- _create_table and _get_table_info return primary_name_attribute
- TableInfo model includes primary_name_attribute and primary_id_attribute
  fields with legacy dict-key access
- Tests for from_dict, from_api_response, and legacy key access

This allows tables.create() and tables.get() callers to discover the
actual primary column logical name instead of guessing from the prefix.
@saurabhrb saurabhrb force-pushed the users/zhaodongwang/dataFrameExtensionClaude branch from a52b862 to 8a26fd7 Compare March 17, 2026 18:05
@saurabhrb saurabhrb requested review from Copilot and removed request for Copilot March 17, 2026 18:16
…rty names)

@odata.bind keys are case-sensitive and the SDK preserves their casing
(does not lowercase keys containing @OData.). The navigation property
logical names must be lowercase to match Dataverse's schema.

Changed: build @odata.bind keys using .lower() on the lookup field name
(e.g., 'new_demoprojectf523b5_customerid@odata.bind' instead of
'new_DemoProjectf523b5_CustomerId@odata.bind').
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

…x Copilot SDK package name

- get(): replaced per-page DataFrame concat with rows.extend() + from_records()
  for better performance on large paginated results
- create(): detect and raise ValueError for rows with all NaN/None values
  (empty dicts) before sending to the API
- Standardized Copilot SDK package name to 'github-copilot-sdk' everywhere
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

…y select comment

- __init__.py: use typing.List[str] instead of list[str] for broader compat
- datascience: precompute weighted_value column before groupby (faster, cleaner)
- prodev: clarify that SDK auto-lowercases select values
@saurabhrb saurabhrb requested a review from Copilot March 17, 2026 18:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Saurabh Badenkal added 2 commits March 17, 2026 11:48
…LM logs by default, fix string concat

- prodev: use primary_id_attribute from tables.create() instead of guessing
  the ID column name (e.g., TABLE_TASKid)
- get(): return empty DataFrame with select columns (not columnless) for
  consistent downstream access on empty results
- LLM log: redact prompts/responses by default (include_prompts=False) to
  avoid logging sensitive data; metadata (timing, provider) always logged
- Fix implicit string concatenation in error messages
… cleanup

- test_get_no_results_with_select_preserves_columns: verifies empty result
  with select returns DataFrame with expected column names
- test_create_all_nan_rows_raises: verifies all-NaN rows raise ValueError
- prodev: cleanup now prompts user (Y/n) before deleting demo tables,
  matching the pattern in alternate_keys_upsert.py
@saurabhrb saurabhrb merged commit 8a6ef8c into main Mar 17, 2026
9 checks passed
@saurabhrb saurabhrb deleted the users/zhaodongwang/dataFrameExtensionClaude branch March 17, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tables.create() does not expose primary column logical name, causing CRUD payload confusion

6 participants