-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to multi-row. #336
base: master
Are you sure you want to change the base?
Conversation
Thank you for your PR. |
My idea of implementing support for groups was to introduce row types: search:
path: /torrents.php
inputs:
...
rows:
selector: table#torrents_table_classic > tbody > tr # match all rows
types:
group_header: # name of the group
selector: .torrent_group_header # only process this group if the selector matches
result: false # intenal row, won't produce an "item"
fields:
group_title: # selectorBlock
selector: td:nth-child(1)
group_year: # selectorBlock
selector: td:nth-child(2)
group_child_torrent:
selector: .group_torrent
result: true # this row will result in a torrent result/"item"
fields:
title:
selector: td:nth-child(1)
filters:
- name: prepend
args: "{{ .group_header.group_title }} [{{ .group_header.group_year }}] " # access vaiables from last row match
details:
selector: a
...
standalone_torrent:
selector: .torrent
result: true
fields:
title:
selector: td:nth-child(1)
details:
selector: a
... That would make parsing much more flexible |
I would be great to be able to store "queries" in variables and use them later. Let me see if I understand the proposal correctly: for my use case the group is usually the first row that matches a specific selector before the result row. Example: <table>
<tr class="group">
<td class="category">TV Shows</td>
<td class="name" colspan="3">My TV Show Name</td>
</tr>
<tr class="result">
<td> </td>
<td class="name">S01E01</td>
<td class="leecheers-and-seeders">10/25</td>
<td><a href="#">Download</a></td>
</tr>
<tr class="result">
<td> </td>
<td class="name">S01E02</td>
<td class="leecheers-and-seeders">20/40</td>
<td><a href="#">Download</a></td>
</tr>
<tr class="group">
<td class="category">Movies</td>
<td class="name" colspan="3">Awesome blockbuster</td>
</tr>
<tr class="result">
<td> </td>
<td class="name">720p</td>
<td class="leecheers-and-seeders">7/15</td>
<td><a href="#">Download</a></td>
</tr>
<tr class="result">
<td> </td>
<td class="name">1080p</td>
<td class="leecheers-and-seeders">50/70</td>
<td><a href="#">Download</a></td>
</tr>
</table> If I believe this should work, but how are you planning to handle subsequent results in the same group? Are you planning to keep every |
I have used this See: https://gist.github.com/juniorz/e3d2492f91603e4c392dd551d931aaa8#file-bjshare-multi-row-yml |
rows:
selector: table > tr # match all rows
types:
group:
selector: tr.group
result: false
fields:
name:
selector: td.name
category:
selector: td.category
result:
selector: tr.result
result: true
fields:
title:
selector: td.name
filters:
- name: prepend
args: "{{ .group.name }} "
category:
text: "{{ .group.category }}"
seeders:
selector: td.leecheers-and-seeders
filters:
- name: split
args: ["/", 0]
leechers:
selector: td.leecheers-and-seeders
filters:
- name: split
args: ["/", 1]
download:
selector: a would be an example definition for your example HTML Only types with Whenever a row matching a type is parsed it would update the variable |
I tried your implementation and as expected it doesn't work with standalone torrents: |
A small addition of my previous suggestion, we could make the "result" (Is there a better name for it?) field a string instead of boolean.
With last we could get rid of the |
Multi-row allows to inherit fields from a previous HTML element (<tr>) which contains information about subsequent elements (<td>). This is common in trackers that display items in a "grouped" layout. This feature extends (and replace the "dateheaders" feature) allowing to inherit more fields. Compare the tests TestIndexerDefinitionRunner_MultiRowSearch and TestIndexerDefinitionRunner_DateHeadersSearch to see how to replace the "dateheaders".
Multi-row allows to inherit fields from a previous HTML element (
<tr>
) which contains information about subsequent elements (<td>
). This is common in trackers that display items in a "grouped" layout.This feature extends (and replace the "dateheaders" feature) allowing to inherit more fields.
Compare the tests
TestIndexerDefinitionRunner_MultiRowSearch
andTestIndexerDefinitionRunner_DateHeadersSearch
to see how to replace the"dateheaders".