Bigtable API not at feature parity with HBase

I "discovered" some issues when implementing the `happybase` functionality on top of the Bigtable API. (I put discovered in quotes, because some of the issues may just be that I don't grok how to do the same thing with the Bigtable API).

These were mostly discovered because I wrote a system test for `happybase` that could work both with HBase and with the Bigtable backend. It can be switched from one to another by [changing the `USING_HBASE` boolean](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L26).

**Many other** differences have been [enumerated](http://gcloud-python-bigtable.readthedocs.org/en/latest/happybase-package.html) in the documentation for our custom Bigtable `happybase` package.

---
### Issues / Differences
- When committing a batch of mutations, the `happybase` method [`Batch.send()` uses](https://github.com/wbolster/happybase/blob/9cbd718c10a3089f234f1eac1236b631e1f8e7cd/happybase/batch.py#L54-L58) Thrift/HBase's `mutateRows` / `mutateRowsTs` method to send all mutations at once. With the Bigtable API, this is not possible, we have to [commit row-by-row](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/gcloud_bigtable/happybase/batch.py#L145-L149). (This [comes up](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L307-L316) in the system test as well.)
- Bigtable Garbage Collection is [not as immediate](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L707-L709) as HBase. In HBase, a column with [one `max_version`](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L36) immediately evicts the old value when a new one is added. Similarly, with a [TTL of 3 seconds](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L30), after [sleeping](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L760-L761) for 3.5 seconds, the value has been evicted. Neither of these occur (at least consistently in Bigtable). (I don't really see this as a problem, but users from HBase may have different expectations)
- A row scan with `sorted_columns` is [not possible](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L509-L516) in Bigtable.
- Using [HBase filter string](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L586-L593) is not possible in Bigtable. (Also some of the [filter string concepts](http://hbase.apache.org/0.94/book/thrift.html) don't map to Bigtable filters, e.g. `KeyOnlyFilter`)
- The Bigtable `Mutation.DeleteFromRow` mutation [does not support timestamps](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L863-L874) ([also](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L878-L885)). Even attempting to send one conditionally (via `CheckAndMutateRowRequest`) deletes the entire row.
- Bigtable [can't use a timestamp](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L911-L921) with column families since `Mutation.DeleteFromFamily` does not include a timestamp range.
### Differences that are Upgrades
- Writes to HBase (via Thrift) [with a timestamp](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L689-L690) just drop the timestamp whereas the Bigtable API respects them
- The Thrift API [fails to retrieve](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L153-L155) the TTL information from a column family while the Bigtable API succeeds in returning this information. (We have to [work-around](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L737-L741) this in a few system tests.)
- When Thrift API does a row read with columns `cf1` and `cf1:qual1` (in that order) only the results from `cf1:qual1` [are returned](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L200-L206) (even though they are a subset of all the columns in the column family `cf1`). If the columns are given in the opposite order (`cf1:qual1` then `cf1`) the correct results are returned. In Cloud Bigtable, it works as expected in either order. (We use a union filter, one which has only `family_name_regex_filter='cf1'` and another which has that combined with `column_qualifier_regex_filter='qual1'`.) (This happen for a single row read and [multiple rows](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L368-L373).)
- HBase `counter_get` [doesn't actually populate the data](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/system_tests/run_happybase.py#L940-L945) even though the docstring [says](https://github.com/wbolster/happybase/blob/9cbd718c10a3089f234f1eac1236b631e1f8e7cd/happybase/table.py#L507-L508):
  
  > This method retrieves the current value of a counter column. If the counter column does not exist, this function initialises it to `0`
### Neither Good/Bad
- HBase reads (via `Table.row`, `Table.rows`, `Table.cells`, `Table.scan`) all use exclusive end timestamps, which makes the behavior of a Bigtable [`TimestampRange`](https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/01f7cac6c01145fae7718315425078370979de3d/bigtable-protos/src/main/proto/google/bigtable/v1/bigtable_data.proto#L122-L128). On the other hand, HBase deletes use **inclusive** end timestamps, while Bigtable deletes are still using a `TimestampRange` (only for deleting specific columns those, as column family or row deletes can't send a timestamp range, as referenced above). We [address this](https://github.com/dhermes/gcloud-python-bigtable/blob/21aa4a73df0fa7529285271e9fe2cc28b1e992e7/gcloud_bigtable/happybase/batch.py#L130-L138) just by incrementing the passed in timestamp by 1 millisecond (which is the [lowest allowed granularity](https://github.com/GoogleCloudPlatform/cloud-bigtable-client/blob/01f7cac6c01145fae7718315425078370979de3d/bigtable-protos/src/main/proto/google/bigtable/admin/table/v1/bigtable_table_data.proto#L30-L32)).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bigtable API not at feature parity with HBase #16

Issues / Differences

Differences that are Upgrades

Neither Good/Bad

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bigtable API not at feature parity with HBase #16

Description

Issues / Differences

Differences that are Upgrades

Neither Good/Bad

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions