Skip to content

Commit

Permalink
Split jsontext out as a separate package (#300)
Browse files Browse the repository at this point in the history
This has long been discussed as an action item.

The "jsontext" package deals with JSON text at the syntactic layer.
Use of it avoids any dependencies on the "reflect" package and
in general carries a much shallower dependency tree.
However, it still must depend on "unicode",
which carries a non-trivial amount of bloat.

There is no change to behavior, just massive shuffling of code.

High-level migration approach:

1.	Move all standalone functions in encode.go and decode.go
	to an internal "jsonwire" package,
	which performs stateless operations dealing with JSON text.
	For example, this includes all consumeXXX, parseXXX,
	appendXXX, and trimXXX functions, which then exported
	for use by other packages in this module.

2.	Move escape.go to "jsonwire" since string encoding
	depends on that data structure, and export EscapeRunes.

3.	Move remainder of encode.go and decode.go, and the entirety of
	state.go, token.go, value.go, pools.go, quote.go, and
	coder_options.go over to "jsontext".

4.	Split errors.go apart such that SemanticError stays in "json"
	and SyntacticError goes over to "jsontext".

5.	Split doc.go apart; half in "json", other half in "jsontext".

6.	The "jsonwire" package should return SyntacticError types,
	but cannot since that type is declared in "jsontext",
	and we need "jsontext" to depend on "jsonwire".
	We could move SyntacticError to "jsonwire" and use a type alias,
	but that messes up GoDoc representation of the error type.
	For the time being, rely on reverse dependency injection
	(see jsonwire.NewError) to have it produce SyntacticErrors.
	It will be a future change to clean this up.

7.	Move jsontext/text.go over to the "json" package,
	and reverse the alias directions so that the declarations
	in "jsontext" continue to exist in the "json" package.
	We will delete these in a follow-up commit.

8.	At this point, "jsontext" builds and the tests pass.
	However, "json" is completely broken since it no longer
	has access to internal functionality in the Encoder/Decoder types.
	In theory, "json" could be implemented purely in terms
	of the public API of "jsontext", but at a performance cost.
	Since we know we already generate valid JSON,
	we can circumvent a number of correctness checks in "jsontext".

	To overcome this problem, we move the struct representation of
	Encoder/Decoder to unexported encoderState/decoderState types,
	and nest an encoderState/decoderState within Encoder/Decoder.
	Most fields and methods used by "json" are exported.
	This transitively includes exporting methods on the
	unexported state machine types declared in state.go.
	Note that these are not directly callable by external users
	because the type itself is still unexported.

9.	Add export.go to "jsontext" where we expose a public Internal
	variable with an Export method that returns an unexported type
	with exported methods that converts a *jsontext.Encoder
	into an *jsontext.encoderState (and similar for Decoder).
	This gives "json" the power to now interact with the
	internal implementation of "jsontext" as it can freely
	call the exported methods of *jsontext.encoderState.

	While the jsontext.Internal variable is publicly visible,
	it cannot be used by external users since it requires you
	to call it with a variable that can only be referenced
	if you also have access to the internal package.

	Note that the "correct" way to do all this is to implement
	the entirety of "jsontext" in an internal package,
	and only expose the public parts in a public "jsontext" package
	through the use of type aliases. However, this approach
	results in a terrible GoDoc experience since all of the methods
	of a aliased types get hidden.

	Perhaps future improvements to GoDoc will make this better,
	but until that day, this is a working approach that balances
	both having clean documentation in GoDoc and also preventing
	external users from touching internal implementation details.
	We can always switch to the "correct" way in the future
	as we clearly document that the jsontext.Internal variable
	is exempt from the Go compatibility agreement.

There is a 5% performance slow down due to this change.
It is future work to investigate and regain the lost performance.
In theory, the compiler should be able to produce the exact same code
(setting aside different reflect data structures).
  • Loading branch information
dsnet committed Aug 28, 2023
1 parent 6e475c8 commit dabb7e2
Show file tree
Hide file tree
Showing 46 changed files with 4,102 additions and 3,892 deletions.
85 changes: 63 additions & 22 deletions arshal.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,19 @@ import (
"errors"
"io"
"reflect"
"slices"
"sync"

"github.com/go-json-experiment/json/internal"
"github.com/go-json-experiment/json/internal/jsonflags"
"github.com/go-json-experiment/json/internal/jsonopts"
"github.com/go-json-experiment/json/internal/jsonwire"
"github.com/go-json-experiment/json/jsontext"
)

// export exposes internal functionality of the "jsontext" package.
var export = jsontext.Internal.Export(&internal.AllowInternalUse)

var structOptionsPool = &sync.Pool{New: func() any { return new(jsonopts.Struct) }}

func getStructOptions() *jsonopts.Struct {
Expand Down Expand Up @@ -146,22 +153,24 @@ func putStructOptions(o *jsonopts.Struct) {
// JSON cannot represent cyclic data structures and Marshal does not handle them.
// Passing cyclic structures will result in an error.
func Marshal(in any, opts ...Options) (out []byte, err error) {
enc := getBufferedEncoder(opts...)
defer putBufferedEncoder(enc)
enc.options.Flags.Set(jsonflags.OmitTopLevelNewline | 1)
err = marshalEncode(enc, in, &enc.options)
return bytes.Clone(enc.buf), err
enc := export.GetBufferedEncoder(opts...)
defer export.PutBufferedEncoder(enc)
xe := export.Encoder(enc)
xe.Flags.Set(jsonflags.OmitTopLevelNewline | 1)
err = marshalEncode(enc, in, &xe.Struct)
return bytes.Clone(xe.Buf), err
}

// MarshalWrite serializes a Go value into an [io.Writer] according to the provided
// marshal and encode options (while ignoring unmarshal or decode options).
// It does not terminate the output with a newline.
// See [Marshal] for details about the conversion of a Go value into JSON.
func MarshalWrite(out io.Writer, in any, opts ...Options) (err error) {
enc := getStreamingEncoder(out, opts...)
defer putStreamingEncoder(enc)
enc.options.Flags.Set(jsonflags.OmitTopLevelNewline | 1)
return marshalEncode(enc, in, &enc.options)
enc := export.GetStreamingEncoder(out, opts...)
defer export.PutStreamingEncoder(enc)
xe := export.Encoder(enc)
xe.Flags.Set(jsonflags.OmitTopLevelNewline | 1)
return marshalEncode(enc, in, &xe.Struct)
}

// MarshalEncode serializes a Go value into an [Encoder] according to the provided
Expand All @@ -173,7 +182,8 @@ func MarshalEncode(out *Encoder, in any, opts ...Options) (err error) {
mo := getStructOptions()
defer putStructOptions(mo)
mo.Join(opts...)
mo.CopyCoderOptions(&out.options)
xe := export.Encoder(out)
mo.CopyCoderOptions(&xe.Struct)
return marshalEncode(out, in, mo)
}

Expand All @@ -198,8 +208,9 @@ func marshalEncode(out *Encoder, in any, mo *jsonopts.Struct) (err error) {
marshal, _ = mo.Marshalers.(*Marshalers).lookup(marshal, t)
}
if err := marshal(out, va, mo); err != nil {
if !out.options.Flags.Get(jsonflags.AllowDuplicateNames) {
out.tokens.invalidateDisabledNamespaces()
xe := export.Encoder(out)
if !xe.Flags.Get(jsonflags.AllowDuplicateNames) {
xe.Tokens.InvalidateDisabledNamespaces()
}
return err
}
Expand Down Expand Up @@ -354,9 +365,10 @@ func marshalEncode(out *Encoder, in any, mo *jsonopts.Struct) (err error) {
// For JSON objects, the input object is merged into the destination value
// where matching object members recursively apply merge semantics.
func Unmarshal(in []byte, out any, opts ...Options) (err error) {
dec := getBufferedDecoder(in, opts...)
defer putBufferedDecoder(dec)
return unmarshalFull(dec, out, &dec.options)
dec := export.GetBufferedDecoder(in, opts...)
defer export.PutBufferedDecoder(dec)
xd := export.Decoder(dec)
return unmarshalFull(dec, out, &xd.Struct)
}

// UnmarshalRead deserializes a Go value from an [io.Reader] according to the
Expand All @@ -366,15 +378,16 @@ func Unmarshal(in []byte, out any, opts ...Options) (err error) {
// without reporting an error for EOF. The output must be a non-nil pointer.
// See [Unmarshal] for details about the conversion of JSON into a Go value.
func UnmarshalRead(in io.Reader, out any, opts ...Options) (err error) {
dec := getStreamingDecoder(in, opts...)
defer putStreamingDecoder(dec)
return unmarshalFull(dec, out, &dec.options)
dec := export.GetStreamingDecoder(in, opts...)
defer export.PutStreamingDecoder(dec)
xd := export.Decoder(dec)
return unmarshalFull(dec, out, &xd.Struct)
}

func unmarshalFull(in *Decoder, out any, uo *jsonopts.Struct) error {
switch err := unmarshalDecode(in, out, uo); err {
case nil:
return in.checkEOF()
return export.Decoder(in).CheckEOF()
case io.EOF:
return io.ErrUnexpectedEOF
default:
Expand All @@ -394,7 +407,8 @@ func UnmarshalDecode(in *Decoder, out any, opts ...Options) (err error) {
uo := getStructOptions()
defer putStructOptions(uo)
uo.Join(opts...)
uo.CopyCoderOptions(&in.options)
xd := export.Decoder(in)
uo.CopyCoderOptions(&xd.Struct)
return unmarshalDecode(in, out, uo)
}

Expand All @@ -420,8 +434,9 @@ func unmarshalDecode(in *Decoder, out any, uo *jsonopts.Struct) (err error) {
unmarshal, _ = uo.Unmarshalers.(*Unmarshalers).lookup(unmarshal, t)
}
if err := unmarshal(in, va, uo); err != nil {
if !in.options.Flags.Get(jsonflags.AllowDuplicateNames) {
in.tokens.invalidateDisabledNamespaces()
xd := export.Decoder(in)
if !xd.Flags.Get(jsonflags.AllowDuplicateNames) {
xd.Tokens.InvalidateDisabledNamespaces()
}
return err
}
Expand Down Expand Up @@ -472,3 +487,29 @@ func lookupArshaler(t reflect.Type) *arshaler {
v, _ := lookupArshalerCache.LoadOrStore(t, fncs)
return v.(*arshaler)
}

var stringsPools = &sync.Pool{New: func() any { return new(stringSlice) }}

type stringSlice []string

// getStrings returns a non-nil pointer to a slice with length n.
func getStrings(n int) *stringSlice {
s := stringsPools.Get().(*stringSlice)
if cap(*s) < n {
*s = make([]string, n)
}
*s = (*s)[:n]
return s
}

func putStrings(s *stringSlice) {
if cap(*s) > 1<<10 {
*s = nil // avoid pinning arbitrarily large amounts of memory
}
stringsPools.Put(s)
}

// Sort sorts the string slice according to RFC 8785, section 3.2.3.
func (ss *stringSlice) Sort() {
slices.SortFunc(*ss, func(x, y string) int { return jsonwire.CompareUTF16(x, y) })
}
69 changes: 37 additions & 32 deletions arshal_any.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (

"github.com/go-json-experiment/json/internal/jsonflags"
"github.com/go-json-experiment/json/internal/jsonopts"
"github.com/go-json-experiment/json/internal/jsonwire"
)

// This file contains an optimized marshal and unmarshal implementation
Expand Down Expand Up @@ -48,8 +49,9 @@ func unmarshalValueAny(dec *Decoder, uo *jsonopts.Struct) (any, error) {
case '[':
return unmarshalArrayAny(dec, uo)
default:
var flags valueFlags
val, err := dec.readValue(&flags)
xd := export.Decoder(dec)
var flags jsonwire.ValueFlags
val, err := xd.ReadValue(&flags)
if err != nil {
return nil, err
}
Expand All @@ -61,13 +63,13 @@ func unmarshalValueAny(dec *Decoder, uo *jsonopts.Struct) (any, error) {
case 't':
return true, nil
case '"':
val = unescapeStringMayCopy(val, flags.isVerbatim())
if dec.stringCache == nil {
dec.stringCache = new(stringCache)
val = jsonwire.UnquoteMayCopy(val, flags.IsVerbatim())
if xd.StringCache == nil {
xd.StringCache = new(stringCache)
}
return dec.stringCache.make(val), nil
return makeString(xd.StringCache, val), nil
case '0':
fv, _ := parseFloat(val, 64) // ignore error since readValue guarantees val is valid
fv, _ := jsonwire.ParseFloat(val, 64) // ignore error since readValue guarantees val is valid
return fv, nil
default:
panic("BUG: invalid kind: " + k.String())
Expand All @@ -77,12 +79,13 @@ func unmarshalValueAny(dec *Decoder, uo *jsonopts.Struct) (any, error) {

func marshalObjectAny(enc *Encoder, obj map[string]any, mo *jsonopts.Struct) error {
// Check for cycles.
if enc.tokens.depth() > startDetectingCyclesAfter {
xe := export.Encoder(enc)
if xe.Tokens.Depth() > startDetectingCyclesAfter {
v := reflect.ValueOf(obj)
if err := enc.seenPointers.visit(v); err != nil {
if err := visitPointer(&xe.SeenPointers, v); err != nil {
return err
}
defer enc.seenPointers.leave(v)
defer leavePointer(&xe.SeenPointers, v)
}

// Handle empty maps.
Expand All @@ -91,12 +94,12 @@ func marshalObjectAny(enc *Encoder, obj map[string]any, mo *jsonopts.Struct) err
return enc.WriteToken(Null)
}
// Optimize for marshaling an empty map without any preceding whitespace.
if !enc.options.Flags.Get(jsonflags.Expand) && !enc.tokens.last.needObjectName() {
enc.buf = enc.tokens.mayAppendDelim(enc.buf, '{')
enc.buf = append(enc.buf, "{}"...)
enc.tokens.last.increment()
if enc.needFlush() {
return enc.flush()
if !xe.Flags.Get(jsonflags.Expand) && !xe.Tokens.Last.NeedObjectName() {
xe.Buf = xe.Tokens.MayAppendDelim(xe.Buf, '{')
xe.Buf = append(xe.Buf, "{}"...)
xe.Tokens.Last.Increment()
if xe.NeedFlush() {
return xe.Flush()
}
return nil
}
Expand All @@ -107,8 +110,8 @@ func marshalObjectAny(enc *Encoder, obj map[string]any, mo *jsonopts.Struct) err
}
// A Go map guarantees that each entry has a unique key
// The only possibility of duplicates is due to invalid UTF-8.
if !enc.options.Flags.Get(jsonflags.AllowInvalidUTF8) {
enc.tokens.last.disableNamespace()
if !xe.Flags.Get(jsonflags.AllowInvalidUTF8) {
xe.Tokens.Last.DisableNamespace()
}
if !mo.Flags.Get(jsonflags.Deterministic) || len(obj) <= 1 {
for name, val := range obj {
Expand Down Expand Up @@ -153,11 +156,12 @@ func unmarshalObjectAny(dec *Decoder, uo *jsonopts.Struct) (map[string]any, erro
case 'n':
return nil, nil
case '{':
xd := export.Decoder(dec)
obj := make(map[string]any)
// A Go map guarantees that each entry has a unique key
// The only possibility of duplicates is due to invalid UTF-8.
if !dec.options.Flags.Get(jsonflags.AllowInvalidUTF8) {
dec.tokens.last.disableNamespace()
if !xd.Flags.Get(jsonflags.AllowInvalidUTF8) {
xd.Tokens.Last.DisableNamespace()
}
for dec.PeekKind() != '}' {
tok, err := dec.ReadToken()
Expand All @@ -168,9 +172,9 @@ func unmarshalObjectAny(dec *Decoder, uo *jsonopts.Struct) (map[string]any, erro

// Manually check for duplicate names.
if _, ok := obj[name]; ok {
name := dec.previousBuffer()
err := newDuplicateNameError(name)
return obj, err.withOffset(dec.InputOffset() - len64(name))
name := xd.PreviousBuffer()
err := export.NewDuplicateNameError(name, dec.InputOffset()-len64(name))
return obj, err
}

val, err := unmarshalValueAny(dec, uo)
Expand All @@ -189,12 +193,13 @@ func unmarshalObjectAny(dec *Decoder, uo *jsonopts.Struct) (map[string]any, erro

func marshalArrayAny(enc *Encoder, arr []any, mo *jsonopts.Struct) error {
// Check for cycles.
if enc.tokens.depth() > startDetectingCyclesAfter {
xe := export.Encoder(enc)
if xe.Tokens.Depth() > startDetectingCyclesAfter {
v := reflect.ValueOf(arr)
if err := enc.seenPointers.visit(v); err != nil {
if err := visitPointer(&xe.SeenPointers, v); err != nil {
return err
}
defer enc.seenPointers.leave(v)
defer leavePointer(&xe.SeenPointers, v)
}

// Handle empty slices.
Expand All @@ -203,12 +208,12 @@ func marshalArrayAny(enc *Encoder, arr []any, mo *jsonopts.Struct) error {
return enc.WriteToken(Null)
}
// Optimize for marshaling an empty slice without any preceding whitespace.
if !enc.options.Flags.Get(jsonflags.Expand) && !enc.tokens.last.needObjectName() {
enc.buf = enc.tokens.mayAppendDelim(enc.buf, '[')
enc.buf = append(enc.buf, "[]"...)
enc.tokens.last.increment()
if enc.needFlush() {
return enc.flush()
if !xe.Flags.Get(jsonflags.Expand) && !xe.Tokens.Last.NeedObjectName() {
xe.Buf = xe.Tokens.MayAppendDelim(xe.Buf, '[')
xe.Buf = append(xe.Buf, "[]"...)
xe.Tokens.Last.Increment()
if xe.NeedFlush() {
return xe.Flush()
}
return nil
}
Expand Down
Loading

0 comments on commit dabb7e2

Please sign in to comment.