Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL Schema Generator #955

Closed
wing328 opened this issue Sep 2, 2018 · 25 comments
Closed

MySQL Schema Generator #955

wing328 opened this issue Sep 2, 2018 · 25 comments

Comments

@wing328
Copy link
Member

wing328 commented Sep 2, 2018

Description

As discussed with @ybelenko we want to create a MySQL schema generator so as to create the MySQL database/tables more easily given an OpenAPI spec.

openapi-generator version

Latest master

Suggest a fix/enhancement

If anyone wants to help on this, please reply to let us know.

@wing328
Copy link
Member Author

wing328 commented Sep 2, 2018

We will need to store MySQL-specified attributes/properties (e.g. AUTO_INCREMENT) in the model defined in the OpenAPI spec.

One option is to use vendor extension (e.g. x-mysql-auto-increment).

@macjohnny
Copy link
Member

if you do so, you could also consider to have a more general database generation option in different languages, e.g. JPA in java

@wing328
Copy link
Member Author

wing328 commented Sep 3, 2018

if you do so, you could also consider to have a more general database generation option in different languages, e.g. JPA in java

Is that similar to the databaseAdapter option in the ruby-on-rails generator?

CONFIG OPTIONS
	databaseAdapter
	    The adapter for database (e.g. mysql, sqlite). Default: sqlite (Default: sqlite)

@ybelenko
Copy link
Contributor

ybelenko commented Sep 8, 2018

I've significant progress working on it. Prototype name for new codegen is MysqlSchemaServerCodegen. If anyone have better name - drop it in the comment.
Also I would consider any suggestion of Openapi Data Type -> MySQL Data Type matching.

Openapi Data Type to MySQL Data Type

Openapi Data Type Openapi Data Format MySQL Data Type
integer int32 INT
integer int64 BIGINT
number float FLOAT
number double DOUBLE
string VARCHAR(255)
string byte
string binary
boolean BOOLEAN or TINYINT(1)
string date DATE
string date-time DATETIME
string password VARCHAR(255)
file

@wing328
Copy link
Member Author

wing328 commented Sep 9, 2018

Here are my suggestions:

Openapi Data Type Openapi Data Format MySQL Data Type
string byte VARCHAR(255)
string binary MEDIUMBLOB
file MEDIUMBLOB

Ref: https://stackoverflow.com/questions/5959043/how-to-insert-a-file-in-mysql-database

(file in OAS2 has been deprecated and replaced by type: string, format: binary in OAS3)

@ybelenko
Copy link
Contributor

ybelenko commented Sep 9, 2018

@wing328 What about float and double? Do you agree with FLOAT and DOUBLE accordingly?

@jimschubert
Copy link
Member

jimschubert commented Sep 9, 2018

I agree with William's addition.

My comments are below.

String (no format)

OpenAPI string types with no data format specified should not have a maximum specified for varchar, unless there is an OpenAPI maxLength constraint. See maxLength in JSON Schema validation for strings, defined in Schema Object (spec 2.0 and 3.0.x).

varchar is generally used in MySQL for smaller bits of text. There's another data type, TEXT, which is used for larger text fields.

Consider, for example, if the specification defines an API for a blogging system:

---
  swagger: "2.0"
  info: 
    version: "1.0.0"
    title: "Bloggy"
    description: "A blog API sample"
    contact: 
      name: "Jim Schubert"
    license: 
      name: "MIT"
  host: "blog.example.com"
  basePath: "/api"
  schemes: 
    - "https"
  consumes: 
    - "application/json"
  produces: 
    - "application/json"
  paths: 
    /posts: 
      get: 
        description: "Returns all posts (not realistic for large blogs)"
        produces: 
          - "application/json"
        responses: 
          "200":
            description: "A list of blog posts."
            schema: 
              type: "array"
              items: 
                $ref: "#/definitions/Post"
  definitions: 
    Post: 
      type: "object"
      required: 
        - "id"
        - "title"
        - "content"
      properties: 
        id: 
          type: "integer"
          format: "int64"
        title: 
          type: "string"
          maxLength: 255
        content: 
          type: "string"
        related:
          type: array
          items:
            type: "integer"
            format: "int64"

edit: I didn't finish this thought. Sorry.
In the above spec, title has a max length, and so varchar(255) would be appropriate. content on the other hand, is an unbounded string type, and something like a 1,000-word blog post wouldn't fit into a varchar(255) if this is used as the default for all strings.

Float/Double

I'd be hesitant to use FLOAT and DOUBLE in MySQL. These are inaccurate data types (approximations). Float uses 4 bytes (if precision <= 24) or 8 bytes (if precision is between 24 and 53) and Double uses 8 bytes of storage.

From Wikipedia:

IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2−23) × 2127 ≈ 3.402823 × 1038

From MySQL docs:

Because floating-point values are approximate and not stored as exact values, attempts to treat them as exact in comparisons may lead to problems.

Examples of errors with Float/Double can be seen on MySQL docs B.5.4.8 Problems with Floating-Point Values.

A huge concern with this is that you can use mysqldump to replicate between a master and slave, and you can end up with different values between the two.

Conversely, DECIMAL (and NUMERIC) calculates the storage space based on the number stored. The left and right parts of the number use up to 4 bytes for every 9 digits. So, the following storage requirements would occur:

MySQL Type Left Right Total Storage
DECIMAL(0,9) 4 bytes 4 bytes 8 bytes
DECIMAL(9,9) 4 bytes 4 bytes 8 bytes
DECIMAL(12,2) 6 bytes 4 bytes 10 bytes

The above could be off by a little, but you verify against 11.8 Data Type Storage Requirements. Disclaimer: I have stopped using MySQL because of inconsistencies and confusing documentation.

This is probably controversial, but my recommendation would be to use DECIMAL or NUMERIC with precision and scale set to reasonable values, potentially with generator options to use float and to use double as opt-ins. Users should also still be able to use type mappings to update float/double defaults. I make this recommendation to protect users from one of MySQL's many "gotchas" with replication and querying (I've been bitten too many times by MySQL), with the assumption that users requiring imprecise larger values would be able to override our generation defaults or modify existing columns to meet their needs.

@ybelenko
Copy link
Contributor

@jimschubert
Thanks, for descriptive answer. It will save my time a lot.

Maybe you right that TEXT is better when maxLength isn't specified. What you think about array and object types? I ignore these properties right now, because to me array equals additional table one to many relationship and object is also means foreign key to record from another table.

Sure, I want to add cli options in future versions. Right now I want to provide prestable version without any config.

Offtopic, @jimschubert what relational database you use instead of MySQL now? I've never had troubles with floats and doubles, however I haven't used decimal columns too much(for geo coordinates and money storage only). I've encountered some issues with foreign keys, varchar storage limit per table, some time it's frustrating that InnoDB doesn't support full text search and you need to create additional MyISAM table to store big articles text.

@ybelenko
Copy link
Contributor

Any suggestion of how to calculate total number of digits and number of digits after the decimal point for decimal types? MySQL sets DECIMAL(10,0) by default and 0 digits after decimal point looks weird in our case.

@wing328
Copy link
Member Author

wing328 commented Sep 10, 2018

@ybelenko I don't have a clear answer. I did a keyword search ("mysql decimal") in Github and found a few occurrences of the following:

DECIMAL(20, 8)

I suggest we use it as a starting point and later provide instructions to the users on how to customize it to meet their requirement.

@jimschubert
Copy link
Member

@ybelenko I use postgresql now. No relational database system is perfect, but Postgres has way fewer "gotchas" than MySQL (at least in my use cases). MySQL, up until 5.8/8.0 or whatever the recent release was, didn't support common SQL functionality that I use such as CTEs and windowing functions.

It also has very unintuitive transactional behavior. As an example if you mix InnoDB and MyISAM in a single query wrapped in a transaction, you can end up with data inserted into the MyISAM table that doesn't get rolled back because MyISAM doesn't support transactions.

Also, if you're like most engineers/companies, you're probably stuck on and older version of MySQL. They had a bug open for 14 years which was pretty critical, but which was obviously not a high priority for them. You could insert data into a table with an auto-incrementing primary key, delete the record, then restart your database and the sequence would restart to the maximum existing record value. This wasn't only an issue with data corruption, replication, and the sorts-- it was a potential security issue for applications running on MySQL. Their "fix" was to document it and write it off as documented behavior.

Then, there's the issue that utf8 encoding doesn't mean utf8. People had a field day on Reddit for that bug.

There are numerous other issues I've encountered with MySQL over the years, but those are the main ones that would lead me to recommend anyone against using MySQL. People use to argue that it was dead-simple to setup in comparison to Postgres and had better performance. I've found postgres to be much simpler to setup over the last couple years (magnitudes simpler than it was 17 or 18 years ago). But I would question the "speed" advantage of MySQL pretty heavily, considering it's taking some shortcuts to eek out this speed. In my own experience, we recently migrated from MySQL 5.7 to Postgresql 9.6 and our database queries dropped from 200-300ms to 50-100ms (both in AWS with the same data and same queries).

@jimschubert
Copy link
Member

Also, I agree with @wing328 that DECIMAL(20, 8) would be a reasonable default for decimal, float, and double. This gives 12 integer digits to the left of the period and 8 to the right. So the maximum value here would be:

999999999999.99999999

Or:

999,999,999,999.99999999

Realistically, there are probably very few applications working with values representing 1 trillion and higher.

Some people might disagree with that as a default, so providing steps in the generated README for how to modify these and regenerate should address those concerns.

@ybelenko
Copy link
Contributor

Well, default value is important, but I mean is there any rational way to calculate that value based on example, minimum, maximum, default properties? Should we try to analyze that properties to give most matching value?

@wing328
Copy link
Member Author

wing328 commented Sep 11, 2018

We can use additional information provided in the spec to try to come up with a better default but I would suggest looking into those as a day-2 requirement if users complain about the default value that we chose.

@ybelenko
Copy link
Contributor

@wing328 Of course I'm not going to implement it right now. This thread has been created to think about, how we can generate perfect MySQL schema. I want to get max benefit from optional properties like minLength, maximum, default, example without any additional vendor extensions. So any suggestions to next versions of codegen are welcome. 💡

@jimschubert
Copy link
Member

I would only set a default if default is provided. Max length (on strings) helps define the data type. Min/max would be more like constraints. For pattern constraints, we may need to resort to triggers.

For older versions of MySQL I don't think triggers are enabled by default- at least they weren't in 5.6 with AWS default configuration (see this SO post). If using triggers in the generator, we might want to explain how to enable them in AWS.

If defining triggers, also be aware that prior to MySQL 5.7.2, you couldn't have multiple triggers with the same triggering event and action (e.g. BEFORE UPDATE). We might want to limit the generator to a minimum of 5.7.2. The alternative for older versions is to create a single trigger and wrap multi-statement body in BEGIN... END.

@wing328
Copy link
Member Author

wing328 commented Sep 11, 2018

@ybelenko understood. I know you're trying to start the discussion earlier as you always think ahead 👍

@ybelenko
Copy link
Contributor

General question to all. It's not clear what to do with complex model vars(array, object, \\Model\\User). Should we skip them in table creation for now? @wing328 is suggesting to set them as VARCHAR(255) as temporarely solution.

@jimschubert
Copy link
Member

@ybelenko I would think defined models (\Model\User) could be a property referring to the table by id (user_id references user(id)). For array/object, I would choose something that's capable of storing these (like JSON, see docs).

@ybelenko
Copy link
Contributor

I would think defined models (\Model\User) could be a property referring to the table by id (user_id references user(id)). For array/object, I would choose something that's capable of storing these (like JSON, see docs).

Good point about foreign key, I think the same. But it's obvious that I cannot implement it right now, because we know nothing about primary keys in both tables(current and referenced). Don't know what workaround to use for now... thinking about simple INT or VARCHAR(255) maybe.

And about JSON - it's great column type, but too modern. JSON requires MySQL 5.7.8. As fallback solution we should use LONGTEXT. Ref quote:

The space required to store a JSON document is roughly the same as for LONGBLOB or LONGTEXT;

It's obvious, that I should add cli option like useJsonDataType: true. What do you think guys, keep this option on or off by default? I think off, because it produces valid SQL syntax for outdated MySQL and if user need JSON data type, he can turn this option on.

@jimschubert
Copy link
Member

My recommendation would be to not start by trying to support very old versions of the database. 5.7.8 was released 2015-08-03. It's doubtful that someone starting a new schema will be using that old of a database. If so, they can use type overrides to change JSON to LONGTEXT.

@ybelenko ybelenko mentioned this issue Sep 18, 2018
9 tasks
wing328 pushed a commit that referenced this issue Oct 1, 2018
* [Mysql Schema] Add new generator

* [Mysql Schema] Fix default definition

* [Mysql Schema] Add defaultDatabaseName option

* [Mysql Schema] Add jsonDataTypeEnabled option

* [Mysql Schema] Add samples
@wing328 wing328 added this to the 3.3.0 milestone Oct 2, 2018
@wing328 wing328 closed this as completed Oct 2, 2018
A-Joshi pushed a commit to ihsmarkitoss/openapi-generator that referenced this issue Feb 27, 2019
* [Mysql Schema] Add new generator

* [Mysql Schema] Fix default definition

* [Mysql Schema] Add defaultDatabaseName option

* [Mysql Schema] Add jsonDataTypeEnabled option

* [Mysql Schema] Add samples
@DivsDibs
Copy link

DivsDibs commented Jul 1, 2019

@ybelenko Hi, Sorry I'm a little lost here. Do we have a MySQL Schema generator from Open API Specs now? If Yes, where to find it and how to use it please?

@DivsDibs
Copy link

DivsDibs commented Jul 1, 2019

Thanks I found it in https://openapi-generator.tech.

@DivsDibs
Copy link

DivsDibs commented Jul 1, 2019

@ybelenko is there a page that lists out the features which are yet to built? e.g. Identification of FOREIGN KEY in both 1-1 and 1-* scenarios?

@ybelenko
Copy link
Contributor

ybelenko commented Jul 2, 2019

@DivsDibs no, we don't have page you need, but you can read conversation about undescribed features in #2587.

FOREIGN KEYS is very difficult feature as far as I see right now(we don't even know how to specify primary keys in Openapi schema). I agree, that current MySQL generator is far from perfect and MySQL keys and indexes is major feature request, but I'm going to start we small enhances, like snake_case table names etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants