Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues I noticed with latest commits #52

Open
milezzz opened this issue Feb 7, 2019 · 33 comments
Open

Some issues I noticed with latest commits #52

milezzz opened this issue Feb 7, 2019 · 33 comments

Comments

@milezzz
Copy link

milezzz commented Feb 7, 2019

So glad to see @Prefinem working on this again =]

  1. I pulled the latest commit hoping it would fix torrent updates in loader.js but it seems the same behaviour to me. I don't see seed/leech being updated. Nothing shows up in loader-out.log, or loader.error.log with debug = true in index.js. When I run scraper.js I see in log:

Total Torrents: 785004
Torrents without Tracker: 781921
Torrents not in Search: 361

  1. Another issue I found was that I kept noticing 0 byte files being scraped, or at least I thought they were 0byte! When I checked mysql I found they had a proper length field BUT they had a huge files field which was always truncated abruptly due to max length on TEXT type field in mysql. You can maybe see what I mean here: https://pastebin.com/STBrvykL

I did this to test (ALTER TABLE torrents MODIFY `files` MEDIUMTEXT;) and it seems no more broken file lists but this opens us up to 16mb max record sizes on this field. Might be worth it to truncate these instead in parser.js but I'm not sure the best approach :/

  1. The last issue I noticed is I still cannot seem to filter out the invalid character � from scrapes, I've tried 0xFFFD, � - and a few others...but I'll keep trying. I haven't tested but I think this should work on all unexpected characters: dataStr.replace(/[\u{0080}-\u{FFFF}]/gu,"");
@Raxvis
Copy link
Contributor

Raxvis commented Feb 7, 2019

Did you do a pm2 restart?

I added a ticket to truncate file lists since I am working on updating the system as a whole.

Where does that invalid character come from?

@milezzz
Copy link
Author

milezzz commented Feb 7, 2019

Did you do a pm2 restart?

Yes. pm2 restart all

Where does that invalid character come from?

Unsure! It appears exactly like that in the database. �

@ghost
Copy link

ghost commented Feb 7, 2019

Could it be that his server blocks access to UDP or the tracker? would you get some error if so? it seems 90% of his torrents have a status of NULL in tacker updated in mysql

@Raxvis
Copy link
Contributor

Raxvis commented Feb 7, 2019

Maybe.

Can you set config.debug = true and see if there is any output?

Add a line to src/tracker.js on line 58 of console.log(records.length); and see if you are getting records

@milezzz
Copy link
Author

milezzz commented Feb 7, 2019

Just tested with portquiz to make sure port 6881 isn't blocked:

$ curl portquiz.net:6881
Port 6881 test successful!
Your IP: xx.xx.xx.xx

$ wget -qO- portquiz.net:6881
Port 6881 test successful!
Your IP: xx.xx.xx.xx

Can you set config.debug = true and see if there is any output?

Yes it is.

Add a line to src/tracker.js on line 58 of console.log(records.length); and see if you are getting records

I see in the log for scraper now:
scraper > 75

@Raxvis
Copy link
Contributor

Raxvis commented Feb 7, 2019

The logs show scraper > 75?

@milezzz
Copy link
Author

milezzz commented Feb 7, 2019

Ahhaaaa. So made some changes to see if maybe I am being rate limited or something:

frequency: 2,
host: 'udp://tracker.coppersurfer.tk:6969/announce',
limit: 55,

And now it works! I am seeing records like this now:

scraper > 004787ab3e6071b96b71fc18667156441ffd1b76 - 136:8

What a relief!

Q: Is there a way to just run the SE:LE updater without scraping more torrents? I tried only running loader.js but that doesn't seem to work.

@ghost
Copy link

ghost commented Feb 7, 2019

Q: Is there a way to just run the SE:LE updater without scraping more torrents? I tried only running loader.js but that doesn't seem to work

I just changed the bind ip address in config/index.js from 0.0.0.0 to 1.1.1.1 to disable getting new torrents

@milezzz
Copy link
Author

milezzz commented Feb 7, 2019

I just changed the bind ip address in config/index.js from 0.0.0.0 to 1.1.1.1 to disable getting new torrents

tyvm =]

@Raxvis
Copy link
Contributor

Raxvis commented Feb 7, 2019

I am splitting up the tracker and scraper in the current working branch which will allow this

@ghost
Copy link

ghost commented Feb 7, 2019

im also now struggling with the tracker.

Total Torrents: 1983006                  
Torrents without Tracker: 208771

@ghost
Copy link

ghost commented Feb 7, 2019

After checking it looks like the scraper seems to be scraping 75 every 1 minute give or take. Could it be that the tracker.js is querying MySQL for torrents with seeders info less than X age and this query is taking time and the tracker has to wait for this before it can update? @Prefinem

@ghost
Copy link

ghost commented Feb 8, 2019

i can confirm this by checking the process list of mysql. and also running the command myself.

Showing rows 0 - 74 (75 total, Query took 96.7467 seconds.)
select * from torrents where trackerUpdated is null limit 75

@ghost
Copy link

ghost commented Feb 8, 2019

maybe i need to tweak mysql lol

@Raxvis
Copy link
Contributor

Raxvis commented Feb 8, 2019 via email

@ghost
Copy link

ghost commented Feb 8, 2019

i found tweaking the InnoDB pool size took the query time down to 1 second

@Raxvis
Copy link
Contributor

Raxvis commented Feb 8, 2019 via email

@ghost
Copy link

ghost commented Feb 8, 2019

i dont know what all the settings do but i tweaked inside # InnoDB Settings.

[mysqld]


bind-address=0.0.0.0
# InnoDB Settings
default_storage_engine          = InnoDB
innodb_buffer_pool_instances    = 20     # Use 1 instance per 1GB of InnoDB pool size
innodb_buffer_pool_size         = 20G    # Use up to 70-80% of RAM & optionally check if /proc/sys/vm/swappiness is set to 0
innodb_file_per_table           = 1
innodb_flush_log_at_trx_commit  = 0
innodb_flush_method             = O_DIRECT
innodb_log_buffer_size          = 31M
innodb_log_file_size            = 512M
innodb_stats_on_metadata        = 0

#innodb_temp_data_file_path     = ibtmp1:64M:autoextend:max:20G # Control the maximum size for the ibtmp1 file
#innodb_thread_concurrency      = 10     # Optional: Set to the number of CPUs on your system (minus 1 or 2) to better
                                        # contain CPU usage. E.g. if your system has 8 CPUs, try 6 or 7 and check
                                        # the overall load produced by MySQL/MariaDB.
innodb_read_io_threads          = 64
innodb_write_io_threads         = 64

# MyISAM Settings
query_cache_limit               = 4M    # UPD
query_cache_size                = 48M   # UPD
query_cache_type                = 1

key_buffer_size                 = 48M   # UPD

low_priority_updates            = 1
concurrent_insert               = 2

# Connection Settings
max_connections                 = 100   # UPD

back_log                        = 512
thread_cache_size               = 100
thread_stack                    = 192K

interactive_timeout             = 180
wait_timeout                    = 180

# Buffer Settings
join_buffer_size                = 4M    # UPD
read_buffer_size                = 3M    # UPD
read_rnd_buffer_size            = 4M    # UPD
sort_buffer_size                = 4M    # UPD

# Table Settings
# In systemd managed systems like Ubuntu 16.04 or CentOS 7, you need to perform an extra action for table_open_cache & open_files_limit
# to be overriden (also see comment next to open_files_limit).
# E.g. for MySQL 5.7, please check: https://dev.mysql.com/doc/refman/5.7/en/using-systemd.html
# and for MariaDB check: https://mariadb.com/kb/en/library/systemd/
table_definition_cache          = 10000 # UPD
table_open_cache                = 10000 # UPD
open_files_limit                = 60000 # UPD - This can be 2x to 3x the table_open_cache value or match the system's
                                        # open files limit usually set in /etc/sysctl.conf or /etc/security/limits.conf
                                        # In systemd managed systems this limit must also be set in:
                                        # /etc/systemd/system/mysqld.service.d/override.conf (for MySQL 5.7+) and
                                        # /etc/systemd/system/mariadb.service.d/override.conf (for MariaDB)

max_heap_table_size             = 128M
tmp_table_size                  = 128M

# Search Settings
ft_min_word_len                 = 3     # Minimum length of words to be indexed for search results

# Logging
log_error                       = /var/lib/mysql/mysql_error.log
log_queries_not_using_indexes   = 1
long_query_time                 = 5
slow_query_log                  = 0     # Disabled for production
slow_query_log_file             = /var/lib/mysql/mysql_slow.log

[mysqldump]
# Variable reference
# For MySQL 5.7: https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html
# For MariaDB:   https://mariadb.com/kb/en/library/mysqldump/
quick
quote_names
max_allowed_packet              = 64M

@ghost
Copy link

ghost commented Feb 8, 2019

ofcourse you will need to tweak depending on your ram

@ghost
Copy link

ghost commented Feb 8, 2019

@Prefinem using this query cuts time down to 0.5 seconds. instead of selecting all colloms we select the table we only need. can you change this in tracker?

Showing rows 0 - 74 (75 total, Query took 0.5823 seconds.)
select trackerUpdated from `torrents` where `trackerUpdated` is null limit 75

@Raxvis
Copy link
Contributor

Raxvis commented Feb 8, 2019

You need the infohash from the table, but I can make the changes.

@ghost
Copy link

ghost commented Feb 8, 2019

oh i see my bad

@Raxvis
Copy link
Contributor

Raxvis commented Feb 8, 2019 via email

@ghost
Copy link

ghost commented Feb 8, 2019

Now its super speed :)

@ghost
Copy link

ghost commented Feb 8, 2019

shall we do this also for getting records in loader?

@ghost
Copy link

ghost commented Feb 8, 2019

Showing rows 0 - 751 (752 total, Query took 3.2976 seconds.)
select * from `torrents` where `searchUpdate` = false limit 1000

vs

Showing rows 0 - 419 (420 total, Query took 0.8087 seconds.)
select infohash from `torrents` where `searchUpdate` = false limit 1000

@ghost
Copy link

ghost commented Feb 8, 2019

or do we need to query all collunms in loader?

@Raxvis
Copy link
Contributor

Raxvis commented Feb 8, 2019 via email

@ghost
Copy link

ghost commented Feb 8, 2019

if you choose to drop mysql. do you have an alternative idea?

@Raxvis
Copy link
Contributor

Raxvis commented Feb 8, 2019 via email

@ghost
Copy link

ghost commented Feb 8, 2019

:) nice!

BTW i was checking how many outdated records i have and looks like the updater cannot keep up.You can see it increasing on each query.
maybe because the tracker wasnt working before?

... 1808460
... 1809476
... 1829075  

@ghost
Copy link

ghost commented Feb 8, 2019

So lastnight i stopped crawling new torrents and let it update torrents with trackerupdate = null .
this morning im back were i started. i have 2 trackers running and still increasing.
x scraper > Torrents without Tracker: 355629 x
x[ 0] scraper xx scraper > Torrents not in Search: 301 x
x xx scraper > Total Torrents: 2713681 x
x xx scraper > Torrents without Tracker: 355704 x
x xx scraper > Torrents not in Search: 129 x
x xx scraper > Total Torrents: 2713821 x
x xx scraper > Torrents without Tracker: 355833 x
x xx scraper > Torrents not in Search: 139 x
x xx scraper > Total Torrents: 2713966 x
x xx scraper > Torrents without Tracker: 355994 x
x xx scraper > Torrents not in Search: 298 x
x xx scraper > Total Torrents: 2714100 x
x xx scraper > Torrents without Tracker: 356121 x
x xx scraper > Torrents not in Search: 216

@ghost
Copy link

ghost commented Feb 8, 2019

it appears overnight its not even querying the database to update trackers... and i havent changed anything.so strange on hetzner server and digital ocean.

update.... seems opensurfer not working same for milez.
would it be possible to add some type of fallback or timeout. so if opensurf not responding use a fallback tracker? @Prefinem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants