Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lc thinks celt license is BSD-2-Clause-NetBSD because of copyright #38

Open
maxice8 opened this issue Mar 8, 2018 · 11 comments
Open

lc thinks celt license is BSD-2-Clause-NetBSD because of copyright #38

maxice8 opened this issue Mar 8, 2018 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@maxice8
Copy link

maxice8 commented Mar 8, 2018

Directory                        File     License              Confidence  Size
masterdir/builddir/celt-0.11.3/  COPYING  BSD-2-Clause-NetBSD  92.39%      1.3K

with copyright lines removed

Directory                        File     License       Confidence  Size
masterdir/builddir/celt-0.11.3/  COPYING  BSD-2-Clause  93.40%      1.2K

LICENSE FILE

Copyright 2001-2009 Jean-Marc Valin, Timothy B. Terriberry,
                    CSIRO, and other contributors

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@boyter
Copy link
Owner

boyter commented Mar 8, 2018

Of course I notice this just after I push out 1.3.1.

Could you supply the example of it with the copyright removed please. I will then be able to replicate it more closely. Having issues doing so right now.

@boyter boyter self-assigned this Mar 8, 2018
@boyter boyter added the bug Something isn't working label Mar 8, 2018
@maxice8
Copy link
Author

maxice8 commented Mar 8, 2018

Of course I notice this just after I push out 1.3.1.

Eh no worries, i am testing lc in a huge repository so there will be a stream of bug reports.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

@boyter
Copy link
Owner

boyter commented Mar 8, 2018

Cool ill have a look at it in my afternoon then. If you find more issues by then I can probably group them all together.

@boyter
Copy link
Owner

boyter commented Mar 9, 2018

OK so for this #39 #40 #41 its all the same issue. Going to close the others as they are all duplicates.

The issue is that after identifying the licenses using keywords a percentage match is taken using the Vectorspace. Because that's fuzzy it pops the wrong license to the top. I'm either going to remove this portion OR have it combine that percentage with the keyword percentage. Will run a few tests to see which one works better. Might take a while.

I should say, I have a fix that resolves this one at least. I have committed it under branch https://github.com/boyter/lc/compare/Issue40 note that tests are still failing there but it might improve accuracy for you in the short term.

Would you be willing to trial it out when done?

@maxice8
Copy link
Author

maxice8 commented Mar 9, 2018 via email

boyter added a commit that referenced this issue Mar 9, 2018
@boyter
Copy link
Owner

boyter commented Mar 9, 2018

Ninja edit while you posted that... copied again.

I have a fix that resolves this issue at least. I have committed it under branch https://github.com/boyter/lc/compare/Issue40 note that tests are still failing there but it might improve accuracy for you in the short term. If you checkout and build it you may get fewer false positives.

@maxice8
Copy link
Author

maxice8 commented Apr 24, 2018

found another one. ykpers has BSD-2-Clause

$ lc masterdir/builddir/ykpers-1.19.0
--------------------------------------------------------------------------------
Directory                          File     License             Confidence  Size
--------------------------------------------------------------------------------
masterdir/builddir/ykpers-1.19.0/  COPYING  BSD-3-Clause-Clear  89.26%      1.6K
--------------------------------------------------------------------------------
Copyright (c) 2008-2014 Yubico AB
Copyright (c) 2009-2010 Tollef Fog Heen <tfheen@err.no>
Copyright (c) 2009 Christer Kaivo-oja <christer.kaivooja@gmail.com>
Copyright (c) 2010 Simon Josefsson <simon@josefsson.org>
Copyright (c) 2003, 2004 Richard Levitte <richard@levitte.org>
Copyright (c) 2010 David Dindorp <ddi@snex.dk>

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.

    * Redistributions in binary form must reproduce the above
      copyright notice, this list of conditions and the following
      disclaimer in the documentation and/or other materials provided
      with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

@boyter
Copy link
Owner

boyter commented Apr 25, 2018

Pretty convinced the reasons for this are the fallback to the vector space. I might disable that for all except those which I know don't match any of the keyword matches or if nothing matches and fuzzy search is enabled.

While doing that I should look into moving it over to streams to get some multi CPU action happening as well.

@boyter
Copy link
Owner

boyter commented May 7, 2018

100% convinced its the vector space. Playing around with the code,

$ go run main.go examples/identifier
examples/identifier/LICENSE [{MIT-feh 100}]
examples/identifier/LICENSE2 [{GPL-3.0-only 90} {GPL-3.0+ 90} {GPL-3.0-or-later 90} {GPL-3.0 90}]
examples/identifier/LICENSE3 [{BSD-2-Clause 74}]
examples/identifier/LICENSE4 [{BSD-2-Clause 72}]

Examples 3 and 4 are taken from the above so it works now. Going to finish this off with the additional performance tweaks which do the following to the runtime.

$ time lc .
lc .  196.88s user 1.64s system 106% cpu 3:07.12 total

to...

$ time ./lc .
./lc .  29.59s user 0.98s system 614% cpu 4.975 total

Still working on this but looks very promising so far.

@maxice8
Copy link
Author

maxice8 commented May 7, 2018

looks amazing :o

@boyter
Copy link
Owner

boyter commented May 9, 2018

@maxice8 if you are prepared to build from source, if you do so from the current master and try again you should see a marked improvement in detection and performance. Likely to be some bugs in there but its getting closer to being ready for release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants