Performance improvements #40

artem-ogre · 2021-07-13T06:58:22Z

@artem-ogre These numbers are only meant to be a rough measurement. Note that not all these libraries are reliable or support general constrained Delaunay triangulation.

 std::vector<std::pair<double, double>> vertices;
for(size_t i = 0; i < 1000; ++i)
{
    for(size_t j = 0; j < 1000; ++j)
    {
        vertices.emplace_back(i*1e-5, j*1e-5);
    }
}

 10x10 
Elapsed time poly2Tri: 0.00012235 s
Elapsed time CDT: 0.000653747 s
Elapsed time CGAL: 0.000221772 s
Elapsed time Delaunator: 6.9119e-05 s
Elapsed time Triangle: 0.000959387 s

20x20
 Elapsed time poly2Tri: 0.000755585 s
Elapsed time CDT: 0.00217176 s
Elapsed time CGAL: 0.000819186 s
Elapsed time Delaunator: 0.000294325 s
Elapsed time Triangle: 0.00246849 s 

100x100
 Elapsed time poly2Tri: 0.0116399 s
Elapsed time CDT: 0.113256 s
Elapsed time CGAL: 0.0159775 s
Elapsed time Delaunator: 0.00511762 s
Elapsed time Triangle: 0.0100751 s

100x1000 
Elapsed time poly2Tri: 0.106778 s
Elapsed time CDT: 8.49504 s
Elapsed time CGAL: 0.158438 s
Elapsed time Delaunator: 0.0808353 s
Elapsed time Triangle: 0.127281 s

1000x1000 
Elapsed time poly2Tri: 1.1229 s
Elapsed time CDT: 84.6357 s
Elapsed time CGAL: 1.6105 s
Elapsed time Delaunator: 0.964066 s
Elapsed time Triangle: 1.69101 s   

srand (time(NULL));
std::vector<std::pair<double, double>> vertices;
for(size_t i = 0; i < 10; ++i)
{
    for(size_t j = 0; j < 10; ++j)
    {
        bool inserted = false;
        while(!inserted)
        {
            size_t x = (rand() % 10000);
            size_t y = (rand() % 10000);
            auto it = std::find_if(vertices.begin(), vertices.end(), [&](const auto& xy){return x-xy.first < 1e-5 && y-xy.second < 1e-5;});
            if(it == vertices.end())
            {
                vertices.emplace_back(x*1e-5, y*1e-5);
                inserted = true;
            }
        }
    }
}

10x10 
Elapsed time poly2Tri: 0.000154201 s
Elapsed time CDT: 0.000439808 s
Elapsed time CGAL: 9.577e-05 s
Elapsed time Delaunator: 6.046e-05 s
Elapsed time Triangle: 0.000652088 s

  100x100
 Elapsed time poly2Tri: 0.0123536 s
Elapsed time CDT: 0.0613802 s
Elapsed time CGAL: 0.00735314 s
Elapsed time Delaunator: 0.00525829 s
Elapsed time Triangle: 0.0123408 s 

100x1000 
Elapsed time poly2Tri: 0.168798 s
Elapsed time CDT: 1.06543 s
Elapsed time CGAL: 0.0847132 s
Elapsed time Delaunator: 0.0803343 s
Elapsed time Triangle: 0.191198 s 

500x500
 Elapsed time poly2Tri: 0.448719 s
Elapsed time CDT: 3.0024 s
Elapsed time CGAL: 0.213624 s
Elapsed time Delaunator: 0.238802 s
Elapsed time Triangle: 0.560273 s

Originally posted by @AndreFecteau in #37 (comment)

The text was updated successfully, but these errors were encountered:

artem-ogre · 2021-07-13T06:59:24Z

More information can be found in: #37

starseeker · 2021-08-02T23:08:45Z

It might be worth taking a look at https://github.com/nushoin/RTree to see if it could be of use for this purpose - I ended up using a slightly modded version of the https://github.com/DevHwan/RTree version for some work I did a couple years back, and it did OK (although I didn't hammer it with the sort of benchmarks you've got there, so I can't say definitely how it would compare.) I was using poly2tri for the CDT, but I did do quite a lot of data management and lookups using that RTree and it seemed to work fairly well.

artem-ogre · 2021-08-03T07:22:34Z

Thanks for the suggestions @starseeker. I wander how cgal and Triangle do it. I doubt that some other rtree implementation will greatly outperform boost's...

artem-ogre · 2021-08-06T09:46:16Z

I got a little bit of time to do the profiling. From the data it seems that boost::rtree performance is not a bottleneck.

Original interactive flame graph as ZIP.

Flame Graph

Benchmark Code

#include "CDT.h"

typedef double CoordType;
typedef CDT::Triangulation<CoordType> Triangulation;
typedef CDT::V2d<CoordType> V2d;
typedef CDT::Vertex<CoordType> Vertex;
typedef CDT::Triangle Triangle;
typedef CDT::Box2d<CoordType> Box2d;
typedef CDT::Edge Edge;

#include <chrono>
#include <iostream>

int main(int argc, char* argv[])
{
    const size_t N = 1000;
    std::vector<V2d> vertices;
    vertices.reserve(N * N);
    for(size_t i = 0; i < N; ++i)
    {
        for(size_t j = 0; j < N; ++j)
        {
            vertices.push_back(V2d::make(i * 1e-5, j * 1e-5));
        }
    }

    using std::chrono::duration;
    using std::chrono::duration_cast;
    using std::chrono::high_resolution_clock;
    using std::chrono::milliseconds;
    auto t1 = high_resolution_clock::now();

    Triangulation triangulation(CDT::FindingClosestPoint::BoostRTree);
    triangulation.insertVertices(vertices);

    auto t2 = high_resolution_clock::now();
    /* Getting number of milliseconds as an integer. */
    auto ms_int = duration_cast<milliseconds>(t2 - t1);
    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_double = t2 - t1;
    std::cout << ms_int.count() << "ms\n";

    return 0;
}

AndreFecteau · 2021-08-06T12:42:10Z

I should of been more descriptive, you are correct that is flip needed is a larger issue when using a "quad" grid. Try profiling using random numbers and you will see the one that I posted earlier. What I expect is happening (a total guess) in this case is that when you have 4 points on a circle the Delaunay triangulation always flip the triangulation when something happens to any of the neighbour triangles. Since this entire mesh is made of 4 point Delaunay edge cases then there is a lot of flipping on every input. This would not happen in a random mesh.

AndreFecteau · 2021-08-06T12:50:33Z

I have also been implementing a simple KD-Tree and linking to CDT to see if my expectations were correct. From first benchmarking it seems like my KD-Tree for now was more efficient by a large margin but I don't remember the exact numbers and not really tested so far. I might have time this weekend to continue working on it and will update you with the numbers when I have them.

artem-ogre · 2021-08-07T20:36:26Z

@AndreFecteau
Ah, right! I mindlessly grabbed the first benchmark as it showed some really bad timings. With random coordinates rtree indeed consumes most of the time.

I will post the flame graph and full benchmark later for documentation.

Vertices placed on a regular grid is an interesting but exotic corner case. Random vertices better reflect the common usage of triangulation. So thank you very much for your help with it. I am looking forward for the results.

AndreFecteau · 2021-08-08T21:57:12Z

Attached you can find the comparison of the 2 implementations. The one with the KDTree (CDT) and the Boost:RTree (CDT2). Theres still a lot of work left before I can push the KDTree to a git repo so it can be added to this project, but it's definitely a good proof of concept for now. Note: this flame snippet is missing the erase super-triangle section of the code that is considerable.

artem-ogre · 2021-08-26T07:12:39Z

Here's the random benchmark data.

Flame Graph (insertVertices call only)

Original interactive flame graph

flamegraph_rand.zip

Benchmark Code

#include "CDT.h"

typedef double CoordType;
typedef CDT::Triangulation<CoordType> Triangulation;
typedef CDT::V2d<CoordType> V2d;
typedef CDT::Vertex<CoordType> Vertex;
typedef CDT::Triangle Triangle;
typedef CDT::Box2d<CoordType> Box2d;
typedef CDT::Edge Edge;

#include <chrono>
#include <iostream>

int main(int argc, char* argv[])
{
    const size_t N = 500;
    std::vector<V2d> vertices;
    vertices.reserve(N * N);
    for(size_t i = 0; i < N; ++i)
    {
        for(size_t j = 0; j < N; ++j)
        {
            bool inserted = false;
            while(!inserted)
            {
                size_t x = (rand() % 10000);
                size_t y = (rand() % 10000);
                auto it = std::find_if(
                    vertices.begin(), vertices.end(), [&](const auto& xy) {
                        return x == xy.x && y == xy.y;
                    });
                if(it == vertices.end())
                {
                    vertices.push_back(V2d::make(x * 1e-5, y * 1e-5));
                    inserted = true;
                }
            }
        }
    }

    using std::chrono::duration;
    using std::chrono::duration_cast;
    using std::chrono::high_resolution_clock;
    using std::chrono::milliseconds;
    auto t1 = high_resolution_clock::now();

    Triangulation triangulation(CDT::FindingClosestPoint::BoostRTree);
    triangulation.insertVertices(vertices);

    auto t2 = high_resolution_clock::now();
    /* Getting number of milliseconds as an integer. */
    auto ms_int = duration_cast<milliseconds>(t2 - t1);
    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_double = t2 - t1;
    std::cout << ms_int.count() << "ms\n";

    return 0;
}

artem-ogre · 2021-08-26T07:15:00Z

Interesting info: how triangle location is implemented in CGAL

AndreFecteau · 2021-08-30T21:12:57Z

From memory, the timings that I show in the CGAL benchmarks is using the the range inserter for vertices. I would be interesting to see if inserting points one at a time would amount to the same efficiency. I believe that there are much faster algorithms for Delaunay triangulation when you are able to sort the data points initially.

artem-ogre · 2021-09-29T09:46:25Z

Note on shuffling vertices and performance

CDT does not perform vertex shuffling which other implementations might do by default. For example a benchmark with vertices on a grid gets at least 2x faster when vertices are shuffled:

std::random_device rd;
std::mt19937 g(rd());
std::shuffle(vertices.begin(), vertices.end(), g); // shuffle vertices

On some of my inputs (very long and narrow strip with very dense points - looks like a road) I observed up to 300x speed-ups 🚀 😱

Currently users need to shuffle vertices themselves if it improves performance for their inputs. I am considering including shuffling into the library and make it on by default with a possibility to opt-out. Still thinking about a good and flexible way to implement this.

AndreFecteau · 2021-09-30T17:59:44Z

I am almost done implementing the KDTree to incorporate in CDT (If you want to) and I have a few questions for you (@artem-ogre ). The most up to date branch is https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree. It is definitely not complete but I want to know:

Are you ok with this being integrated in CDT?
How do you want the external library to be stored in the files? I currently set up an external folders for the files.
Do you want the hole project included or just the minimalistic files?
Are you ok if I remove the RTree since the KDTree is faster?

artem-ogre · 2021-10-02T13:16:26Z

@AndreFecteau
First, sorry for somehow slow reply: my current personal circumstances are such that there is too little time I can dedicate to the project.

Neat, thanks for the nice effort! I had a quick look. I myself also came to the conclusion that point locator should be a customization point for Triangulation class (template parameter as you done or an interface).

I am working on randomized vertex insertion right now and have a PoC. The change I've done requires re-factoring 'random-nearest' point locator. So I think it is a good opportunity to implement custom point-locator support.
I intend to implement point locator as template parameter and re-write boost::rtree and nearest random locators as stand-alone classes (similar to your branch).

Then when we have an interface in place you could do minimal changes to make it work with your KDTree implementation. And we could take it forward from there!

Are you ok with this being integrated in CDT?

Yes, but we would need to do the code review and re-write. There are likely going to be changes to be done: conventions, naming, formatting, supporting pre-c++11 standard, etc. Given my limited availability, it will take time to re-factor and merge. It is good to have it in the fork for the time being so that it is immediately usable for those who need it (and not constrained by ancient c++ standard 😃 ). And let's try to move the fork upstream when we can.
By the way, have you investigated alternative structures such as quad-trees or grids? I would also like to explore the solution space more before replacing boost's rtree.

How do you want the external library to be stored in the files? I currently set up an external folders for the files.

I will re-implement current locators so we'll have a reference implementation. I like having it in separate files but they don't have to be 'external', could go right into 'include' folder. Something to think about still.

Do you want the hole project included or just the minimalistic files?

Not sure I understand this question.

Are you ok if I remove the RTree since the KDTree is faster?

Yes, totally it can be removed. It could also be left there just in case, but if other approach is clearly superior I don't see why to keep boost rtree.

AndreFecteau · 2021-10-06T13:58:07Z

my current personal circumstances are such that there is too little time I can dedicate to the project.

@artem-ogre There is no real rush on my end to get this in, obviously I am using this software and better efficiency is always nice but it is not crucial.

template parameter as you done or an interface

I agree that having the point locator as a template parameter will help in the future test out different libraries, I would not be surprised that there are faster ones out there for this specific purpose.

conventions, naming, formatting, supporting pre-c++11 standard, etc

Let me know what conventions need to be supported and I can change it. I hope I don't loose smart pointers but if I have to I can remove them.

have you investigated alternative structures such as quad-trees or grids

I have not and I think that it would be worthwhile, unfortunately at my current employment I have written multiple quad-trees and I'm trying to avoid having too similar work on personal projects.

- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code

- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code - Compatibility with c++98 standard

- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - Switch index types used in CDT (e.g., vertices, triangles) from 64-bit to 32-bit - 64-bit index types can be enabled with CDT_USE_64_BIT_INDEX_TYPE - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code - Compatibility with c++98 standard

artem-ogre · 2021-10-21T19:41:44Z

@AndreFecteau

First, thank you for your kd-tree implementation! It was easy to use after I extracted locator into a template parameter. And the code was easy to follow. 👍

I took it as a starting point and ended up re-writing most of the code, but the ideas and algorithms are the same. Performance is also about the same.

Given this situation how would you like to be attributed? I will add a mention in contributors list in README. We can also make it so that you author kd-tree code (maybe by doing a PR) so that your user-name shows up in contributors list auto-generated by github.

Could you maybe take a look at the PR #45?
Here are some notable differences to your implementation:

KDTree class is now tightly integrated with CDT: this enables simpler code and better efficiency (e.g., re-using points buffer, see points below)
All functionality is in a single header: CDT/include/KDTree.h
nodes are stored in a single buffer and addressed with indices
recursive methods (insertion and nearest query) are changed to loop-based where possible (this looks neater in profiling and can be slightly more efficient)
requires less memory: nodes only store children indices (not a leaf) and vector of points (leaf)
point coordinates are not stored in kd-tree. Instead point-buffer from CDT::Triangulation is re-used. This allows to save quite some memory too
unique_ptr are not used anymore, kd-tree should easily support value-semantics (e.g., copy-able)
c++98 compatibility

artem-ogre · 2021-10-21T19:46:44Z

The PR also contains a feature of randomizing points insertion. For some datasets it can provides huge performance gains.
Here's an example long-narrow-stripe.txt
With randomization speed up is around 300x. With randomization+kd-tree it is about 500x 😮

AndreFecteau · 2021-10-21T23:56:22Z

@artem-ogre I took a quick look at the new KDTree that you implemented and I can't claim credit for that. Not much of my implementation is left (Not to worry, I can definitely see the benefits of having this integrated). Just a mention in the readme would be great.

I'll probably take a deeper dive this weekend since I'm always curious on other ways of programming stuff. Few things that I'm curious about.

By using an std::vector to stored the leafs there is definitely some speedup to have by having contiguous memory and other algorithms that don't have to do a leaf pointer call every time. Did you see any improvement by switching to this method or was it kind of hidden in the rest of the optimizations?

Are you not afraid that by using a std::vector that for extremely large models, std::vector may have to reallocate more space a significant amount of times (which is a move at best, deep copy in c++03) and technically (as far as I can imagine and I am not sure of this) double the memory requirement while reallocating.

artem-ogre · 2021-10-22T08:44:18Z

By using an std::vector to stored the leafs there is definitely some speedup to have by having contiguous memory and other algorithms that don't have to do a leaf pointer call every time. Did you see any improvement by switching to this method or was it kind of hidden in the rest of the optimizations?

This was my hope as well, but I could not see a noticeable speed improvement with the vector. I still like the simplicity and how easy it is to copy compared to pointers.

Are you not afraid that by using a std::vector that for extremely large models, std::vector may have to reallocate more space a significant amount of times (which is a move at best, deep copy in c++03) and technically (as far as I can imagine and I am not sure of this) double the memory requirement while reallocating.

Yes, depending on the vector implementation it could consume 1.5x-2x more memory than needed for storing it's elements. This is a trade-off that sounds scary in theory but is almost always not a problem in practice (in my experience).
If it really becomes a problem: switching to some kind of list container and using iterators instead of indices should be the same as using individual allocation for each node, I guess.

artem-ogre · 2021-10-22T08:48:07Z

@AndreFecteau I'm trying to figure out how to add you as a reviewer to the PR. It looks like you should be the collaborator for this to work (I've sent an invite)

…wise)

Make initial stack size and stack increment when re-sizing template parameters Performance is the same

…ee, nearest-random)

Andre Fecteau contributed original KD-tree implementation: https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree It was heavily modified and included into CDT source.

Add Andre Fecteau to contributors list

- Implement randomized vertex insertion option and make it default - Allow custom nearest point locators - Implement boost::rtree and random-nearest as locators - Implement kd-tree for nearest point location - Switch index types used in CDT (e.g., vertices, triangles) from 64-bit to 32-bit - 64-bit index types can be enabled with CDT_USE_64_BIT_INDEX_TYPE - CDT code refactoring, some APIs changed - Make necessary adjustments to visualizer code - Compatibility with c++98 standard

…wise)

Make initial stack size and stack increment when re-sizing template parameters Performance is the same

…ee, nearest-random)

Andre Fecteau contributed original KD-tree implementation: https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree It was heavily modified and included into CDT source.

Add Andre Fecteau to contributors list

artem-ogre · 2021-10-26T17:47:22Z

@AndreFecteau I closed this issue. Thank you for all the help! :)

artem-ogre added the enhancement New feature or request label Jul 13, 2021

artem-ogre mentioned this issue Aug 6, 2021

Missing element in triangulation of point cloud due to small super-triangle #37

Closed

artem-ogre linked a pull request Oct 21, 2021 that will close this issue

#40 Performance improvements #45

Merged

artem-ogre mentioned this issue Oct 21, 2021

#40 Performance improvements #45

Merged

artem-ogre added a commit that referenced this issue Oct 22, 2021

#40 Assert that max stack depth is not exceeded (stack overflow other…

21ca8bb

…wise)

artem-ogre added a commit that referenced this issue Oct 23, 2021

#40 Fix a typo 'addNexVertex' -> 'addNewVertex'

b3143fb

artem-ogre added a commit that referenced this issue Oct 23, 2021

#40 Make nearest-task stack dynamic (array to vector)

a4c7808

Make initial stack size and stack increment when re-sizing template parameters Performance is the same

artem-ogre added a commit that referenced this issue Oct 23, 2021

#40 Use default KDTree template parameters in LocatorKDTree

34b9992

artem-ogre added a commit that referenced this issue Oct 24, 2021

#40 Make KDTree locator default and delete other locators (boost::rtr…

d9e1960

…ee, nearest-random)

artem-ogre added a commit that referenced this issue Oct 26, 2021

#40 Update README to reflect the introduced changes

bbcf5c4

Add Andre Fecteau to contributors list

artem-ogre added a commit that referenced this issue Oct 26, 2021

#40 Update README to reflect the introduced changes

43dd70c

Add Andre Fecteau to contributors list

artem-ogre closed this as completed in #45 Oct 26, 2021

artem-ogre added a commit that referenced this issue Oct 26, 2021

#40 Assert that max stack depth is not exceeded (stack overflow other…

7161006

…wise)

artem-ogre added a commit that referenced this issue Oct 26, 2021

#40 Fix a typo 'addNexVertex' -> 'addNewVertex'

c38695b

artem-ogre added a commit that referenced this issue Oct 26, 2021

#40 Make nearest-task stack dynamic (array to vector)

6a0215e

Make initial stack size and stack increment when re-sizing template parameters Performance is the same

artem-ogre added a commit that referenced this issue Oct 26, 2021

#40 Use default KDTree template parameters in LocatorKDTree

048ef0b

artem-ogre added a commit that referenced this issue Oct 26, 2021

#40 Make KDTree locator default and delete other locators (boost::rtr…

e97845a

…ee, nearest-random)

artem-ogre added a commit that referenced this issue Oct 26, 2021

#40 Update README to reflect the introduced changes

ff12d27

Add Andre Fecteau to contributors list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #40

Performance improvements #40

artem-ogre commented Jul 13, 2021

artem-ogre commented Jul 13, 2021

starseeker commented Aug 2, 2021

artem-ogre commented Aug 3, 2021 •

edited

Loading

artem-ogre commented Aug 6, 2021 •

edited

Loading

AndreFecteau commented Aug 6, 2021

AndreFecteau commented Aug 6, 2021

artem-ogre commented Aug 7, 2021

AndreFecteau commented Aug 8, 2021 •

edited

Loading

artem-ogre commented Aug 26, 2021

artem-ogre commented Aug 26, 2021 •

edited

Loading

AndreFecteau commented Aug 30, 2021

artem-ogre commented Sep 29, 2021 •

edited

Loading

AndreFecteau commented Sep 30, 2021

artem-ogre commented Oct 2, 2021 •

edited

Loading

AndreFecteau commented Oct 6, 2021

artem-ogre commented Oct 21, 2021 •

edited

Loading

artem-ogre commented Oct 21, 2021

AndreFecteau commented Oct 21, 2021

artem-ogre commented Oct 22, 2021 •

edited

Loading

artem-ogre commented Oct 22, 2021

artem-ogre commented Oct 26, 2021

Performance improvements #40

Performance improvements #40

Comments

artem-ogre commented Jul 13, 2021

artem-ogre commented Jul 13, 2021

starseeker commented Aug 2, 2021

artem-ogre commented Aug 3, 2021 • edited Loading

artem-ogre commented Aug 6, 2021 • edited Loading

AndreFecteau commented Aug 6, 2021

AndreFecteau commented Aug 6, 2021

artem-ogre commented Aug 7, 2021

AndreFecteau commented Aug 8, 2021 • edited Loading

artem-ogre commented Aug 26, 2021

artem-ogre commented Aug 26, 2021 • edited Loading

AndreFecteau commented Aug 30, 2021

artem-ogre commented Sep 29, 2021 • edited Loading

Note on shuffling vertices and performance

AndreFecteau commented Sep 30, 2021

artem-ogre commented Oct 2, 2021 • edited Loading

AndreFecteau commented Oct 6, 2021

artem-ogre commented Oct 21, 2021 • edited Loading

artem-ogre commented Oct 21, 2021

AndreFecteau commented Oct 21, 2021

artem-ogre commented Oct 22, 2021 • edited Loading

artem-ogre commented Oct 22, 2021

artem-ogre commented Oct 26, 2021

artem-ogre commented Aug 3, 2021 •

edited

Loading

artem-ogre commented Aug 6, 2021 •

edited

Loading

AndreFecteau commented Aug 8, 2021 •

edited

Loading

artem-ogre commented Aug 26, 2021 •

edited

Loading

artem-ogre commented Sep 29, 2021 •

edited

Loading

artem-ogre commented Oct 2, 2021 •

edited

Loading

artem-ogre commented Oct 21, 2021 •

edited

Loading

artem-ogre commented Oct 22, 2021 •

edited

Loading