Skip to content

viant/ptrie

Repository files navigation

Trie (Prefix tree)

GoReportCard GoDoc

This library is compatible with Go 1.11+

Please refer to CHANGELOG.md if you encounter breaking changes.

Motivation

The goal of this project is to provide serverless prefix tree friendly implementation. where one function can easily building tree and publishing to some cloud storge. Then the second load trie to perform various operations.

Introduction

A trie (prefix tree) is a space-optimized tree data structure in which each node that is merged with its parent. Unlike regular trees (where whole keys are from their beginning up to the point of inequality), the key at each node is compared chunk by chunk,

Prefix tree has the following application:

  • text document searching
  • rule based matching
  • constructing associative arrays for string keys

Character comparision complexity:

  • Brute Force: O(d n k)
  • Prefix Trie: O(d log(k))

Where

  • d: number of characters in document
  • n: number of keywords
  • k: average keyword length

Usage

    trie := ptrie.New()
    for key, value := range pairs {
        if err = trie.Put(key, value); err != nil {
            log.Fatal(err)
         }
    }
    //...
    has := trie.Has(key)
    value, has := trie.Get(key)
    //...
    matched := trie.MatchAll(input,  func(key []byte, value interface{}) bool {
        fmt.Printf("matched: key: %s, value %v\n", key, value)
        return true 
    })
    
  1. Building
    trie := ptrie.New()
    
    for key, value := range pairs {
         if err = trie.Put(key, value); err != nil {
         	log.Fatal(err)
         }
    }
    
    writer := new(bytes.Buffer)
	if err := trie.Encode(writer); err != nil {
		log.Fatal(err)
	}
	encoded := write.Bytes()
	//write encode data
  1. Loading
    //V type can be any type
    var v *V
    

    trie := ptrie.New()
    trie.UseType(reflect.TypeOf(v))
    if err := trie.Decode(reader); err != nil {
    	log.Fatal(err)
    }
  1. Traversing (range map)
    trie.Walk(func(key []byte, value interface{}) bool {
		fmt.Printf("key: %s, value %v\n", key, value)
		return true
	})
  1. Lookup
    has := trie.Has(key)
    value, has := trie.Get(key)
  1. MatchPrefix
    var input []byte
    ...

    matched := trie.MatchPrefix(input,  func(key []byte, value interface{}) bool {
        fmt.Printf("matched: key: %s, value %v\n", key, value)
        return true 
    })
  1. MatchAll
    var input []byte
    ...

    matched := trie.MatchAll(input,  func(key []byte, value interface{}) bool {
        fmt.Printf("matched: key: %s, value %v\n", key, value)
        return true 
    })

Benchmark

The benchmark count all words that are part of the following extracts:

Lorem Ipsum

  1. Short: avg line size: 20, words: 13
  2. Long: avg line size: 711, words: 551
Benchmark_LoremBruteForceShort-8    	  500000	      3646 ns/op
Benchmark_LoremTrieShort-8          	  500000	      2376 ns/op
Benchmark_LoremBruteForceLong-8     	    1000	   1612877 ns/op
Benchmark_LoremTrieLong-8           	   10000	    119990 ns/op

Hamlet

  1. Short: avg line size: 20, words: 49
  2. Long: avg line size: 41, words: 105
Benchmark_HamletBruteForceShort-8   	   30000	     44306 ns/op
Benchmark_HamletTrieShort-8         	  100000	     18530 ns/op
Benchmark_HamletBruteForceLong-8    	   10000	    226836 ns/op
Benchmark_HamletTrieLong-8          	   50000	     39329 ns/op
Code coverage

GoCover

License

The source code is made available under the terms of the Apache License, Version 2, as stated in the file LICENSE.

Individual files may be made available under their own specific license, all compatible with Apache License, Version 2. Please see individual files for details.

Credits and Acknowledgements

Library Author: Adrian Witas