Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoPt not honoured #1065

Closed
thinrhino opened this issue Aug 17, 2015 · 40 comments
Closed

GeoPt not honoured #1065

thinrhino opened this issue Aug 17, 2015 · 40 comments
Assignees
Labels
api: datastore Issues related to the Datastore API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@thinrhino
Copy link

We have a record in the datastore which has the object type ndb.GeoPtProperty. This record was set via ndb from the app engine. Now we are running a script on GCE to modify one field in this record, without touching the original location point.

When the record is saved from gCloud API, the record looses it's GeoPtProperty.

Before:

> record.location
datastore_types.GeoPt(19.179167, 72.9640237)

After:

> record.location
>
@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

/cc @pcostell

@dhermes dhermes added the api: datastore Issues related to the Datastore API. label Aug 17, 2015
@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

@thinrhino Here is what I've just done:

from google.appengine.ext import ndb

class Foo(ndb.Model):
    bar = ndb.GeoPtProperty()

g = ndb.GeoPt(10.0, 10.0)
f = Foo(bar=g)
f.put()
print(f)

which created key=Key('Foo', 5634472569470976).

Then in gcloud-python:

from gcloud import datastore

client = datastore.Client()
key = client.key('Foo', 5634472569470976)
entity = client.get(key)
print(entity)

which is

<Entity[{'kind': u'Foo', 'id': 5634472569470976L}] {u'bar': <Entity {u'y': 10.0, u'x': 10.0}>}>

So the data is still there, but I'm not sure the geopt part can be set from the Cloud Datastore API (I am fairly certain it can't).

@jgeewax
Copy link
Contributor

jgeewax commented Aug 17, 2015

I believe this is doing what it should -- as GeoPt isn't something supported by the JSON API, but instead is an abstraction in DB / NDB

So under the hood, it looks like NDB is taking a GeoPt and turning into a pair of coordinates before saving via the API -- and then loading it into that property on the way out of the API as well.

If we get NDB running on top of the Cloud Datastore API this problem should go away as you'd just use NDB the whole time (see to #557).

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

Actually stepping into Connection.lookup and checking out the protobuf response, it seems the "meaning" is set on the GeoPt property:

found {
  entity {
    key {
      partition_id {
        dataset_id: "s~FOO"
      }
      path_element {
        kind: "Foo"
        id: 5634472569470976
      }
    }
    property {
      name: "bar"
      value {
        entity_value {
          property {
            name: "x"
            value {
              double_value: 10.0
              indexed: false
            }
          }
          property {
            name: "y"
            value {
              double_value: 10.0
              indexed: false
            }
          }
        }
        meaning: 9
      }
    }
  }
}

The only real mention of "meaning" in the .proto file is

An entity value with meaning 9, 20 or 21 may be indexed.

so maybe we need to add support for setting "meaning" in these cases.

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

@jgeewax Thanks for hunting down the source, ndb actually sets the meaning:

p.set_meaning(entity_pb.Property.GEORSS_POINT)

and the list of meanings is defined in google.appengine.datastore.entity_pb. It looks like 20 and 21 are no longer meanings, but maybe they were at some point (I checked r525, r500, r450, r400 and r300 and didn't see any mention of them)?

@jgeewax
Copy link
Contributor

jgeewax commented Aug 17, 2015

Yea we definitely need @pcostell to chime in on this -- I have no idea what the meaning value might mean in a broader sense. Hopefully we can use that..

Hypothetically, if that did exist, do you think gcloud.datastore should have a GeoPt data type?

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

@jgeewax It wouldn't be that hard to do. We discussed / dismissed implementing meaning in #265 and that was the end of it (AFAICT).

@pcostell
Copy link
Contributor

v1beta2 doesn't have a specific geopt field, which is why it is represented using this "legacy-supported" type (an entity_value with a meaning). However, v1beta3 does have a specific geopt which will be populated instead.

In general, fields with the meaning should always be preserved, however we should never be creating new fields with meanings set (which is why in the API they are not actually explained). You should still be able to edit this field in the same way you can edit an entity_value.

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

@pcostell When can we start using v1beta3? What should we do about this in the meantime?

@pcostell
Copy link
Contributor

"Soon". This should still work in v1beta2, as the property should still be editable as a normal entity_value. You should never be setting the meaning, but you should be persisting an existing meaning. So if users can edit an entity value, this should work.

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

Is it worth making a change to preserve to meaning if we are just going to ditch meanings all together "soon"? (It really depends on your judgment of how soon.) We just did a release 8/12 so even if we made a fix it may not be in a release for another 14-21 days .

@pcostell
Copy link
Contributor

Yes, we should always be preserving the meaning. There are other legacy properties that will still require meaning even after the v1beta3 launch.

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

OK great.

Secondary question, what are the mysterious enums at 20 and 21 for meaning?

An entity value with meaning 9, 20 or 21 may be indexed.

@pcostell
Copy link
Contributor

Secret :-). We don't want clients of the gcloud libraries actually depending on meaning value (they should only be persisting them).

At the moment (but we reserve the right to change them): 9 is a GeoPt which is exists in the geo-range (lat/lng), 21 is an "invalid" geopoint (any range, so really just a point), and 20 represents an App Engine UserProperty.

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

Thanks! I was just saying secret because they aren't in google.appengine.datastore.entity_pb.Property.

@pcostell
Copy link
Contributor

I was just pointing out that they were intentionally left out because we don't want people to depend on them.

@dhermes
Copy link
Contributor

dhermes commented Aug 17, 2015

Gotcher. Just wanted to make sure we were on the same page. I'll ping you when I add the fix for meaning.

@dhermes
Copy link
Contributor

dhermes commented Aug 18, 2015

@pcostell Trying to implement this, I'm realizing we have no way (without some re-architecting) to preserve the meaning for non-entity_value values.

Comments like

Exception: If meaning is set to 2, string_value is limited to 2038 characters regardless of indexed.

make me think we should keep meaning irrespective of the other property on the Value.

We could fail (i.e. raise an exception) if value_pb.HasField('meaning') while simultaneously value_pb.HasField(prop) for any prop other than entity_value. WDYT?

@pcostell
Copy link
Contributor

I think we need to preserve the meaning regardless. Otherwise you'll break GAE apps that rely on the typing which is determined by the meaning.

@dhermes
Copy link
Contributor

dhermes commented Aug 18, 2015

OK thanks. Some mad science going in my brain right now (involving over-riding __getitem__ on Entity, which inherits from dict). Will loop you in when something surfaces.

@dhermes
Copy link
Contributor

dhermes commented Aug 26, 2015

@pcostell I took a few stabs at this and didn't like anything I created.

What does the horizon look like for support of meaning in the API? What is the migration plan from existing GAE apps that use it?

@pcostell
Copy link
Contributor

We'll likely have to support meaning for a while. There are some things in the GAE API that we don't want to support going forward, but will need to support existing data in that format. For migrating existing GAE apps we can support writing in the new API via the thick clients (i.e. to store a User you would just write an entity_value. Once we move GAE clients over to the new API we may be able to start ignoring the meaning.

@dhermes
Copy link
Contributor

dhermes commented Aug 31, 2015

I'm thinking the "easiest" way to deal with this is just to define a Value / ValueWithMeaning class that is public and require users to use / interact with that class. (In the case of properties with meaning, users mostly just don't want to set the value, but maybe I'm missing something.)

For things like lat/long we can have a custom Value subclass which looks nice to users, but all of the tricks I had in mind fall apart in some way.

@pcostell
Copy link
Contributor

pcostell commented Nov 8, 2015

ping! This right now will corrupt users data in ways that will likely break their GAE applications.

I think if at any point a user sets a value, it is acceptable to lose any notion of the existing meaning. However, if a user does: put(get(key)), we shouldn't change the data (or if they modify an unrelated property).

@pcostell pcostell added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Nov 8, 2015
@jgeewax
Copy link
Contributor

jgeewax commented Nov 8, 2015

I agree with @pcostell here (put(get(key)) shouldn't change the data)

@dhermes
Copy link
Contributor

dhermes commented Nov 8, 2015

I spent a fair bit of time trying to make this work in August and it was really tough given our current model. TL;DR we store properties just using the value and then have a strict mapping from acceptable Python types to the actual proto field types. Thus there is no easy way to store a value and some data about that value.

I was hoping v1beta3 would "solve" the issue for us, but also assumed it'd be launched before November. @pcostell any comments on this timeline? I don't want to put in a huge re-engineering effort that will be obsoleted in less than 1Q.

@pcostell
Copy link
Contributor

pcostell commented Nov 9, 2015

Like I mentioned before, v1beta3 solves this for geopoint, but not for
several other types. So the gcloud-python library will still corrupt user
data after we make the switch to v1beta3.

On Sun, Nov 8, 2015, 9:24 AM Danny Hermes notifications@github.com wrote:

I spent a fair bit of time trying to make this work in August and it was
really tough given our current model. TL;DR we store properties just using
the value and then have a strict mapping from acceptable Python types to
the actual proto field types. Thus there is no easy way to store a
value and some data about that value.

I was hoping v1beta3 would "solve" the issue for us, but also assumed
it'd be launched before November. @pcostell https://github.com/pcostell
any comments on this timeline? I don't want to put in a huge re-engineering
effort that will be obsoleted in less than 1Q.


Reply to this email directly or view it on GitHub
#1065 (comment)
.

@dhermes
Copy link
Contributor

dhermes commented Nov 9, 2015

I should just mock up the v1beta3 change and then see what needs to be addressed. I've got 3 hours on a bus today so will give it a try then.

Are we safe to switch over? If not, when can the switch occur?

@pcostell
Copy link
Contributor

pcostell commented Nov 9, 2015

v1beta3 has not been released yet for public usage and I don't have a set
date when it will be

On Mon, Nov 9, 2015 at 1:37 AM Danny Hermes notifications@github.com
wrote:

I should just mock up the v1beta3 change and then see what needs to be
addressed. I've got 3 hours on a bus today so will give it a try then.

Are we safe to switch over? If not, when can the switch occur?


Reply to this email directly or view it on GitHub
#1065 (comment)
.

@jgeewax
Copy link
Contributor

jgeewax commented Nov 10, 2015

Thus there is no easy way to store a value and some data about that value.

What do we have to change for that to be possible?

As of now we're technically corrupting people's data if they happen to use fancy structures in App Engine and write back that same data from gcloud-python. Even for a 0.x release that's a pretty bad thing.

@dhermes
Copy link
Contributor

dhermes commented Nov 10, 2015

What do we have to change for that to be possible?

There are a few approaches, none of which is very compelling. The one I settled on was to use a custom wrapper type to hold a value and meaning when necessary. I've still got a dead-ish branch for this approach.

I'm currently breaking down v1beta3 so I can get a sense of what changes will be needed. After that I'll feel better about the re-engineering needed to "add meaning" to our lives.

@dhermes
Copy link
Contributor

dhermes commented Nov 10, 2015

@jgeewax and @pcostell I am curious what you think of these strategies:

Choice 1

  • Store _meanings as a dictionary on Entity
  • Set _meanings on object creation (when reading from datastore)
  • Use the original never-updated _meanings from an Entity when serializing as a pb object (no matter what has happened in the meantime)

Choice 2

  • Store _meanings as a dictionary on Entity
  • Set _meanings on creation (when reading from datastore)
  • Over-write __getitem__, __setitem__, __delitem__, update and countless other dict methods to make sure that _meanings gets updated appropriately (this brings up a question, if we update a property, do we leave the meaning alone or check against the current type or raise an error if it gets changed or something else)

I am partial to Choice 1 because re-implementing dict (in Choice 2) defeats the purpose of having Entity subclass dict. Maybe the presence of such a choice is just a sign we should bite the bullet and just implement Entity in an ORM-like fashion? (This would be a big change for our users, but we've always said "just wait, ndb is coming, which is also why we never went the ORM route.)


As a "modification" to Choice 1, we could also store something like

_meanings = {
    'prop1': (DATETIME_MEANING, datetime.datetime),
    'prop2': (EMAIL_MEANING, str),
    ...
}

and then throw an error on serialization if

  • Any of the properties have disappeared
  • Any of the properties have changed type

Also as a modification to Choice 1 (in the name of leaving dict alone), we could also turn off use of meaning by default and require users to have a flag to enable it. In the "turned off" case we could just throw an exception if they encounter data with a meaning and ask them to explicitly turn it on. In the explicit case, it would "alert" them to be careful and we could provide an API surface like

  • Entity.has_meaning('prop1')
  • Entity.set_meaning('prop1', SOME_MEANING_ENUM.ENUM_VAL)
  • Entity.get_meaning('prop1')
  • Entity.clear_meaning('prop1')
  • Entity.list_meanings()

@dhermes
Copy link
Contributor

dhermes commented Nov 14, 2015

@jgeewax and @pcostell Can you guys weigh in?

@dhermes
Copy link
Contributor

dhermes commented Nov 18, 2015

@jgeewax and @pcostell Bump

@dhermes
Copy link
Contributor

dhermes commented Nov 18, 2015

Another suggestion from @tseaver:

  • Just have custom types that encapsulate meaning.
  • I'd imagine this involves a few types for well-known meanings (e.g. Datetime meaning and GeoPt) and then just a generic container for less-common meanings

@pcostell
Copy link
Contributor

The Cloud Datastore API doesn't make any commitment about meaning <==> type
conversions. So having well known types exposed to the user is not
preferred.

In general, I think it is ok if you store the meaning internally but then
if there is ever a set call you drop it. set(get()) should work, but if at
any point the user tries setting the property it should be ok to drop the
meaning.

On Tue, Nov 17, 2015 at 6:50 PM Danny Hermes notifications@github.com
wrote:

Another suggestion from @tseaver https://github.com/tseaver:


Reply to this email directly or view it on GitHub
#1065 (comment)
.

@dhermes
Copy link
Contributor

dhermes commented Nov 18, 2015

Thanks Patrick!

@dhermes
Copy link
Contributor

dhermes commented Nov 23, 2015

FYI I'm closing in on a fix for this (also fixes #1206 as a side effect)

dhermes added a commit to dhermes/google-cloud-python that referenced this issue Nov 24, 2015
To do this, added entity_to_protobuf method that could be used
recursively. Also solves googleapis#1206 since recursively serializing
nested entities to protobuf was being done incorrectly.

Fixes googleapis#1065. Fixes googleapis#1206.
dhermes added a commit to dhermes/google-cloud-python that referenced this issue Dec 17, 2015
To do this, added entity_to_protobuf method that could be used
recursively. Also solves googleapis#1206 since recursively serializing
nested entities to protobuf was being done incorrectly.

Fixes googleapis#1065. Fixes googleapis#1206.
dhermes added a commit to dhermes/google-cloud-python that referenced this issue Dec 19, 2015
To do this, added entity_to_protobuf method that could be used
recursively. Also solves googleapis#1206 since recursively serializing
nested entities to protobuf was being done incorrectly.

Fixes googleapis#1065. Fixes googleapis#1206.
@dhermes
Copy link
Contributor

dhermes commented Dec 21, 2015

@thinrhino Sorry for the delay but this has been fixed and will hopefully be in a release soon.

From my App Engine app (using the remote api) I made a property with lots of meanings (including geo point):

from google.appengine.ext import ndb


class Foo2(ndb.Model):
    _use_cache = False
    _use_memcache = False
    b1 = ndb.BlobProperty(compressed=True)


class Foo(ndb.Model):
    _use_cache = False
    _use_memcache = False
    b1 = ndb.BlobProperty(compressed=True)
    b2 = ndb.BlobProperty(compressed=True, repeated=True)
    t1 = ndb.TextProperty(compressed=False, indexed=False)
    geo = ndb.GeoPtProperty()
    struct_p = ndb.StructuredProperty(Foo2)
    loc_struct_p = ndb.LocalStructuredProperty(Foo2)


f1 = Foo2(b1='hey')
f2 = Foo(id=1234, b1='b1', b2=['hi', 'ok'],
         t1='t1', geo=ndb.GeoPt(52.37, 4.88),
         struct_p=f1, loc_struct_p=f1)
f2.put()

and then I retrieved it with gcloud-python

>>> import os
>>> import pprint
>>> from gcloud import datastore
>>> os.environ.pop('GOOGLE_APPLICATION_CREDENTIALS')
'path/redacted'
>>> client = datastore.Client(dataset_id='gae-app-id')
>>> key = client.key('Foo', 1234)
>>> entity = client.get(key)
>>> pprint.pprint(entity._meanings)
{u'b1': (22, 'x\x9cK2\x04\x00\x00\xf7\x00\x94'),
 u'b2': (22,
         ['x\x9c\xcb\xc8\x04\x00\x01;\x00\xd2',
          'x\x9c\xcb\xcf\x06\x00\x01K\x00\xdb']),
 u'geo': (9, <Entity {u'y': 4.88, u'x': 52.37}>),
 u'struct_p.b1': (22, 'x\x9c\xcbH\xad\x04\x00\x02~\x01G'),
 u't1': (15, u't1')}
>>> sorted(entity._meanings.keys())
[u'b1', u'b2', u'geo', u'struct_p.b1', u't1']
>>> sorted(entity.keys())
[u'b1', u'b2', u'geo', u'loc_struct_p', u'struct_p.b1', u't1']
>>> entity['geo']._meanings
{}
>>> entity['geo']
<Entity {u'y': 4.88, u'x': 52.37}>
>>> entity['loc_struct_p']._meanings
{u'b1': (22, 'x\x9c\xcbH\xad\x04\x00\x02~\x01G')}
>>> entity['loc_struct_p']
<Entity {u'b1': 'x\x9c\xcbH\xad\x04\x00\x02~\x01G'}>

even more, I made a change to the geo point (and sent to backend) and retrieved again to make sure they were identical

>>> entity['geo']['x'] = 3.14
>>> client.put(entity)
>>> entity2 = client.get(key)
>>> entity == entity2
True
>>> entity._meanings == entity2._meanings
True
>>> client.delete(entity.key)

(I also verified in the App Engine datastore viewer that the geo point change went through as expected)

@thinrhino
Copy link
Author

@dhermes thanks a lot. Will check it out soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the Datastore API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants