Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #40

Closed
artem-ogre opened this issue Jul 13, 2021 · 21 comments · Fixed by #45
Closed

Performance improvements #40

artem-ogre opened this issue Jul 13, 2021 · 21 comments · Fixed by #45
Labels
enhancement New feature or request

Comments

@artem-ogre
Copy link
Owner

@artem-ogre These numbers are only meant to be a rough measurement. Note that not all these libraries are reliable or support general constrained Delaunay triangulation.


std::vector<std::pair<double, double>> vertices;
for(size_t i = 0; i < 1000; ++i)
{
    for(size_t j = 0; j < 1000; ++j)
    {
        vertices.emplace_back(i*1e-5, j*1e-5);
    }
}


10x10

Elapsed time poly2Tri: 0.00012235 s
Elapsed time CDT: 0.000653747 s
Elapsed time CGAL: 0.000221772 s
Elapsed time Delaunator: 6.9119e-05 s
Elapsed time Triangle: 0.000959387 s

20x20

Elapsed time poly2Tri: 0.000755585 s
Elapsed time CDT: 0.00217176 s
Elapsed time CGAL: 0.000819186 s
Elapsed time Delaunator: 0.000294325 s
Elapsed time Triangle: 0.00246849 s


100x100

Elapsed time poly2Tri: 0.0116399 s
Elapsed time CDT: 0.113256 s
Elapsed time CGAL: 0.0159775 s
Elapsed time Delaunator: 0.00511762 s
Elapsed time Triangle: 0.0100751 s

100x1000

Elapsed time poly2Tri: 0.106778 s
Elapsed time CDT: 8.49504 s
Elapsed time CGAL: 0.158438 s
Elapsed time Delaunator: 0.0808353 s
Elapsed time Triangle: 0.127281 s

1000x1000

Elapsed time poly2Tri: 1.1229 s
Elapsed time CDT: 84.6357 s
Elapsed time CGAL: 1.6105 s
Elapsed time Delaunator: 0.964066 s
Elapsed time Triangle: 1.69101 s




srand (time(NULL));
std::vector<std::pair<double, double>> vertices;
for(size_t i = 0; i < 10; ++i)
{
    for(size_t j = 0; j < 10; ++j)
    {
        bool inserted = false;
        while(!inserted)
        {
            size_t x = (rand() % 10000);
            size_t y = (rand() % 10000);
            auto it = std::find_if(vertices.begin(), vertices.end(), [&](const auto& xy){return x-xy.first < 1e-5 && y-xy.second < 1e-5;});
            if(it == vertices.end())
            {
                vertices.emplace_back(x*1e-5, y*1e-5);
                inserted = true;
            }
        }
    }
}

10x10

Elapsed time poly2Tri: 0.000154201 s
Elapsed time CDT: 0.000439808 s
Elapsed time CGAL: 9.577e-05 s
Elapsed time Delaunator: 6.046e-05 s
Elapsed time Triangle: 0.000652088 s



100x100

Elapsed time poly2Tri: 0.0123536 s
Elapsed time CDT: 0.0613802 s
Elapsed time CGAL: 0.00735314 s
Elapsed time Delaunator: 0.00525829 s
Elapsed time Triangle: 0.0123408 s


100x1000

Elapsed time poly2Tri: 0.168798 s
Elapsed time CDT: 1.06543 s
Elapsed time CGAL: 0.0847132 s
Elapsed time Delaunator: 0.0803343 s
Elapsed time Triangle: 0.191198 s


500x500

Elapsed time poly2Tri: 0.448719 s
Elapsed time CDT: 3.0024 s
Elapsed time CGAL: 0.213624 s
Elapsed time Delaunator: 0.238802 s
Elapsed time Triangle: 0.560273 s

Originally posted by @AndreFecteau in #37 (comment)

@artem-ogre
Copy link
Owner Author

More information can be found in: #37

@artem-ogre artem-ogre added the enhancement New feature or request label Jul 13, 2021
@starseeker
Copy link

It might be worth taking a look at https://github.com/nushoin/RTree to see if it could be of use for this purpose - I ended up using a slightly modded version of the https://github.com/DevHwan/RTree version for some work I did a couple years back, and it did OK (although I didn't hammer it with the sort of benchmarks you've got there, so I can't say definitely how it would compare.) I was using poly2tri for the CDT, but I did do quite a lot of data management and lookups using that RTree and it seemed to work fairly well.

@artem-ogre
Copy link
Owner Author

artem-ogre commented Aug 3, 2021

Thanks for the suggestions @starseeker. I wander how cgal and Triangle do it. I doubt that some other rtree implementation will greatly outperform boost's...

@artem-ogre
Copy link
Owner Author

artem-ogre commented Aug 6, 2021

I got a little bit of time to do the profiling. From the data it seems that boost::rtree performance is not a bottleneck.

Original interactive flame graph as ZIP.

Flame Graph

flamegraph

Benchmark Code

#include "CDT.h"

typedef double CoordType;
typedef CDT::Triangulation<CoordType> Triangulation;
typedef CDT::V2d<CoordType> V2d;
typedef CDT::Vertex<CoordType> Vertex;
typedef CDT::Triangle Triangle;
typedef CDT::Box2d<CoordType> Box2d;
typedef CDT::Edge Edge;

#include <chrono>
#include <iostream>

int main(int argc, char* argv[])
{
    const size_t N = 1000;
    std::vector<V2d> vertices;
    vertices.reserve(N * N);
    for(size_t i = 0; i < N; ++i)
    {
        for(size_t j = 0; j < N; ++j)
        {
            vertices.push_back(V2d::make(i * 1e-5, j * 1e-5));
        }
    }

    using std::chrono::duration;
    using std::chrono::duration_cast;
    using std::chrono::high_resolution_clock;
    using std::chrono::milliseconds;
    auto t1 = high_resolution_clock::now();

    Triangulation triangulation(CDT::FindingClosestPoint::BoostRTree);
    triangulation.insertVertices(vertices);

    auto t2 = high_resolution_clock::now();
    /* Getting number of milliseconds as an integer. */
    auto ms_int = duration_cast<milliseconds>(t2 - t1);
    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_double = t2 - t1;
    std::cout << ms_int.count() << "ms\n";

    return 0;
}

@AndreFecteau
Copy link

I should of been more descriptive, you are correct that is flip needed is a larger issue when using a "quad" grid. Try profiling using random numbers and you will see the one that I posted earlier. What I expect is happening (a total guess) in this case is that when you have 4 points on a circle the Delaunay triangulation always flip the triangulation when something happens to any of the neighbour triangles. Since this entire mesh is made of 4 point Delaunay edge cases then there is a lot of flipping on every input. This would not happen in a random mesh.

@AndreFecteau
Copy link

I have also been implementing a simple KD-Tree and linking to CDT to see if my expectations were correct. From first benchmarking it seems like my KD-Tree for now was more efficient by a large margin but I don't remember the exact numbers and not really tested so far. I might have time this weekend to continue working on it and will update you with the numbers when I have them.

@artem-ogre
Copy link
Owner Author

@AndreFecteau
Ah, right! I mindlessly grabbed the first benchmark as it showed some really bad timings. With random coordinates rtree indeed consumes most of the time.

I will post the flame graph and full benchmark later for documentation.

Vertices placed on a regular grid is an interesting but exotic corner case. Random vertices better reflect the common usage of triangulation. So thank you very much for your help with it. I am looking forward for the results.

@AndreFecteau
Copy link

AndreFecteau commented Aug 8, 2021

Attached you can find the comparison of the 2 implementations. The one with the KDTree (CDT) and the Boost:RTree (CDT2). Theres still a lot of work left before I can push the KDTree to a git repo so it can be added to this project, but it's definitely a good proof of concept for now. Note: this flame snippet is missing the erase super-triangle section of the code that is considerable.

Screen Shot 2021-08-08 at 5 53 05 PM

@artem-ogre
Copy link
Owner Author

Here's the random benchmark data.

Flame Graph (insertVertices call only)

flamegraph_rand

Original interactive flame graph

flamegraph_rand.zip

Benchmark Code

#include "CDT.h"

typedef double CoordType;
typedef CDT::Triangulation<CoordType> Triangulation;
typedef CDT::V2d<CoordType> V2d;
typedef CDT::Vertex<CoordType> Vertex;
typedef CDT::Triangle Triangle;
typedef CDT::Box2d<CoordType> Box2d;
typedef CDT::Edge Edge;

#include <chrono>
#include <iostream>

int main(int argc, char* argv[])
{
    const size_t N = 500;
    std::vector<V2d> vertices;
    vertices.reserve(N * N);
    for(size_t i = 0; i < N; ++i)
    {
        for(size_t j = 0; j < N; ++j)
        {
            bool inserted = false;
            while(!inserted)
            {
                size_t x = (rand() % 10000);
                size_t y = (rand() % 10000);
                auto it = std::find_if(
                    vertices.begin(), vertices.end(), [&](const auto& xy) {
                        return x == xy.x && y == xy.y;
                    });
                if(it == vertices.end())
                {
                    vertices.push_back(V2d::make(x * 1e-5, y * 1e-5));
                    inserted = true;
                }
            }
        }
    }

    using std::chrono::duration;
    using std::chrono::duration_cast;
    using std::chrono::high_resolution_clock;
    using std::chrono::milliseconds;
    auto t1 = high_resolution_clock::now();

    Triangulation triangulation(CDT::FindingClosestPoint::BoostRTree);
    triangulation.insertVertices(vertices);

    auto t2 = high_resolution_clock::now();
    /* Getting number of milliseconds as an integer. */
    auto ms_int = duration_cast<milliseconds>(t2 - t1);
    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_double = t2 - t1;
    std::cout << ms_int.count() << "ms\n";

    return 0;
}

@artem-ogre
Copy link
Owner Author

artem-ogre commented Aug 26, 2021

Interesting info: how triangle location is implemented in CGAL

@AndreFecteau
Copy link

From memory, the timings that I show in the CGAL benchmarks is using the the range inserter for vertices. I would be interesting to see if inserting points one at a time would amount to the same efficiency. I believe that there are much faster algorithms for Delaunay triangulation when you are able to sort the data points initially.

@artem-ogre
Copy link
Owner Author

artem-ogre commented Sep 29, 2021

Note on shuffling vertices and performance

CDT does not perform vertex shuffling which other implementations might do by default. For example a benchmark with vertices on a grid gets at least 2x faster when vertices are shuffled:

std::random_device rd;
std::mt19937 g(rd());
std::shuffle(vertices.begin(), vertices.end(), g); // shuffle vertices

On some of my inputs (very long and narrow strip with very dense points - looks like a road) I observed up to 300x speed-ups 🚀 😱

Currently users need to shuffle vertices themselves if it improves performance for their inputs. I am considering including shuffling into the library and make it on by default with a possibility to opt-out. Still thinking about a good and flexible way to implement this.

@AndreFecteau
Copy link

I am almost done implementing the KDTree to incorporate in CDT (If you want to) and I have a few questions for you (@artem-ogre ). The most up to date branch is https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree. It is definitely not complete but I want to know:

Are you ok with this being integrated in CDT?
How do you want the external library to be stored in the files? I currently set up an external folders for the files.
Do you want the hole project included or just the minimalistic files?
Are you ok if I remove the RTree since the KDTree is faster?

@artem-ogre
Copy link
Owner Author

artem-ogre commented Oct 2, 2021

@AndreFecteau
First, sorry for somehow slow reply: my current personal circumstances are such that there is too little time I can dedicate to the project.

Neat, thanks for the nice effort! I had a quick look. I myself also came to the conclusion that point locator should be a customization point for Triangulation class (template parameter as you done or an interface).

I am working on randomized vertex insertion right now and have a PoC. The change I've done requires re-factoring 'random-nearest' point locator. So I think it is a good opportunity to implement custom point-locator support.
I intend to implement point locator as template parameter and re-write boost::rtree and nearest random locators as stand-alone classes (similar to your branch).

Then when we have an interface in place you could do minimal changes to make it work with your KDTree implementation. And we could take it forward from there!

Are you ok with this being integrated in CDT?

Yes, but we would need to do the code review and re-write. There are likely going to be changes to be done: conventions, naming, formatting, supporting pre-c++11 standard, etc. Given my limited availability, it will take time to re-factor and merge. It is good to have it in the fork for the time being so that it is immediately usable for those who need it (and not constrained by ancient c++ standard 😃 ). And let's try to move the fork upstream when we can.
By the way, have you investigated alternative structures such as quad-trees or grids? I would also like to explore the solution space more before replacing boost's rtree.

How do you want the external library to be stored in the files? I currently set up an external folders for the files.

I will re-implement current locators so we'll have a reference implementation. I like having it in separate files but they don't have to be 'external', could go right into 'include' folder. Something to think about still.

Do you want the hole project included or just the minimalistic files?

Not sure I understand this question.

Are you ok if I remove the RTree since the KDTree is faster?

Yes, totally it can be removed. It could also be left there just in case, but if other approach is clearly superior I don't see why to keep boost rtree.

@AndreFecteau
Copy link

my current personal circumstances are such that there is too little time I can dedicate to the project.

@artem-ogre There is no real rush on my end to get this in, obviously I am using this software and better efficiency is always nice but it is not crucial.

template parameter as you done or an interface

I agree that having the point locator as a template parameter will help in the future test out different libraries, I would not be surprised that there are faster ones out there for this specific purpose.

conventions, naming, formatting, supporting pre-c++11 standard, etc

Let me know what conventions need to be supported and I can change it. I hope I don't loose smart pointers but if I have to I can remove them.

have you investigated alternative structures such as quad-trees or grids

I have not and I think that it would be worthwhile, unfortunately at my current employment I have written multiple quad-trees and I'm trying to avoid having too similar work on personal projects.

artem-ogre added a commit that referenced this issue Oct 20, 2021
- Implement randomized vertex insertion option and make it default
- Allow custom nearest point locators
- Implement boost::rtree and random-nearest as locators
- Implement kd-tree for nearest point location
- CDT code refactoring, some APIs changed
- Make necessary adjustments to visualizer code
artem-ogre added a commit that referenced this issue Oct 21, 2021
- Implement randomized vertex insertion option and make it default
- Allow custom nearest point locators
- Implement boost::rtree and random-nearest as locators
- Implement kd-tree for nearest point location
- CDT code refactoring, some APIs changed
- Make necessary adjustments to visualizer code
artem-ogre added a commit that referenced this issue Oct 21, 2021
- Implement randomized vertex insertion option and make it default
- Allow custom nearest point locators
- Implement boost::rtree and random-nearest as locators
- Implement kd-tree for nearest point location
- CDT code refactoring, some APIs changed
- Make necessary adjustments to visualizer code
- Compatibility with c++98 standard
artem-ogre added a commit that referenced this issue Oct 21, 2021
- Implement randomized vertex insertion option and make it default
- Allow custom nearest point locators
- Implement boost::rtree and random-nearest as locators
- Implement kd-tree for nearest point location
- CDT code refactoring, some APIs changed
- Make necessary adjustments to visualizer code
- Compatibility with c++98 standard
artem-ogre added a commit that referenced this issue Oct 21, 2021
- Implement randomized vertex insertion option and make it default
- Allow custom nearest point locators
- Implement boost::rtree and random-nearest as locators
- Implement kd-tree for nearest point location
- CDT code refactoring, some APIs changed
- Make necessary adjustments to visualizer code
- Compatibility with c++98 standard
@artem-ogre artem-ogre linked a pull request Oct 21, 2021 that will close this issue
artem-ogre added a commit that referenced this issue Oct 21, 2021
- Implement randomized vertex insertion option and make it default
- Allow custom nearest point locators
- Implement boost::rtree and random-nearest as locators
- Implement kd-tree for nearest point location
- Switch index types used in CDT (e.g., vertices, triangles) from 64-bit to 32-bit
- 64-bit index types can be enabled with CDT_USE_64_BIT_INDEX_TYPE
- CDT code refactoring, some APIs changed
- Make necessary adjustments to visualizer code
- Compatibility with c++98 standard
@artem-ogre
Copy link
Owner Author

artem-ogre commented Oct 21, 2021

@AndreFecteau

First, thank you for your kd-tree implementation! It was easy to use after I extracted locator into a template parameter. And the code was easy to follow. 👍

I took it as a starting point and ended up re-writing most of the code, but the ideas and algorithms are the same. Performance is also about the same.

Given this situation how would you like to be attributed? I will add a mention in contributors list in README. We can also make it so that you author kd-tree code (maybe by doing a PR) so that your user-name shows up in contributors list auto-generated by github.

Could you maybe take a look at the PR #45?
Here are some notable differences to your implementation:

  • KDTree class is now tightly integrated with CDT: this enables simpler code and better efficiency (e.g., re-using points buffer, see points below)
  • All functionality is in a single header: CDT/include/KDTree.h
  • nodes are stored in a single buffer and addressed with indices
  • recursive methods (insertion and nearest query) are changed to loop-based where possible (this looks neater in profiling and can be slightly more efficient)
  • requires less memory: nodes only store children indices (not a leaf) and vector of points (leaf)
  • point coordinates are not stored in kd-tree. Instead point-buffer from CDT::Triangulation is re-used. This allows to save quite some memory too
  • unique_ptr are not used anymore, kd-tree should easily support value-semantics (e.g., copy-able)
  • c++98 compatibility

@artem-ogre
Copy link
Owner Author

The PR also contains a feature of randomizing points insertion. For some datasets it can provides huge performance gains.
Here's an example long-narrow-stripe.txt
With randomization speed up is around 300x. With randomization+kd-tree it is about 500x 😮

@AndreFecteau
Copy link

@artem-ogre I took a quick look at the new KDTree that you implemented and I can't claim credit for that. Not much of my implementation is left (Not to worry, I can definitely see the benefits of having this integrated). Just a mention in the readme would be great.

I'll probably take a deeper dive this weekend since I'm always curious on other ways of programming stuff. Few things that I'm curious about.

By using an std::vector to stored the leafs there is definitely some speedup to have by having contiguous memory and other algorithms that don't have to do a leaf pointer call every time. Did you see any improvement by switching to this method or was it kind of hidden in the rest of the optimizations?

Are you not afraid that by using a std::vector that for extremely large models, std::vector may have to reallocate more space a significant amount of times (which is a move at best, deep copy in c++03) and technically (as far as I can imagine and I am not sure of this) double the memory requirement while reallocating.

@artem-ogre
Copy link
Owner Author

artem-ogre commented Oct 22, 2021

By using an std::vector to stored the leafs there is definitely some speedup to have by having contiguous memory and other algorithms that don't have to do a leaf pointer call every time. Did you see any improvement by switching to this method or was it kind of hidden in the rest of the optimizations?

This was my hope as well, but I could not see a noticeable speed improvement with the vector. I still like the simplicity and how easy it is to copy compared to pointers.

Are you not afraid that by using a std::vector that for extremely large models, std::vector may have to reallocate more space a significant amount of times (which is a move at best, deep copy in c++03) and technically (as far as I can imagine and I am not sure of this) double the memory requirement while reallocating.

Yes, depending on the vector implementation it could consume 1.5x-2x more memory than needed for storing it's elements. This is a trade-off that sounds scary in theory but is almost always not a problem in practice (in my experience).
If it really becomes a problem: switching to some kind of list container and using iterators instead of indices should be the same as using individual allocation for each node, I guess.

@artem-ogre
Copy link
Owner Author

@AndreFecteau I'm trying to figure out how to add you as a reviewer to the PR. It looks like you should be the collaborator for this to work (I've sent an invite)

artem-ogre added a commit that referenced this issue Oct 23, 2021
Make initial stack size and stack increment when re-sizing template parameters

Performance is the same
artem-ogre added a commit that referenced this issue Oct 24, 2021
artem-ogre added a commit that referenced this issue Oct 24, 2021
Andre Fecteau contributed original KD-tree implementation:
https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree
It was heavily modified and included into CDT source.
artem-ogre added a commit that referenced this issue Oct 26, 2021
Add Andre Fecteau to contributors list
artem-ogre added a commit that referenced this issue Oct 26, 2021
Add Andre Fecteau to contributors list
artem-ogre added a commit that referenced this issue Oct 26, 2021
- Implement randomized vertex insertion option and make it default
- Allow custom nearest point locators
- Implement boost::rtree and random-nearest as locators
- Implement kd-tree for nearest point location
- Switch index types used in CDT (e.g., vertices, triangles) from 64-bit to 32-bit
- 64-bit index types can be enabled with CDT_USE_64_BIT_INDEX_TYPE
- CDT code refactoring, some APIs changed
- Make necessary adjustments to visualizer code
- Compatibility with c++98 standard
artem-ogre added a commit that referenced this issue Oct 26, 2021
Make initial stack size and stack increment when re-sizing template parameters

Performance is the same
artem-ogre added a commit that referenced this issue Oct 26, 2021
artem-ogre added a commit that referenced this issue Oct 26, 2021
Andre Fecteau contributed original KD-tree implementation:
https://github.com/AndreFecteau/CDT/tree/OptimizationByUsingKDTree
It was heavily modified and included into CDT source.
artem-ogre added a commit that referenced this issue Oct 26, 2021
Add Andre Fecteau to contributors list
@artem-ogre
Copy link
Owner Author

@AndreFecteau I closed this issue. Thank you for all the help! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants