Add support to multi-row. #336

juniorz · 2017-02-06T06:14:59Z

Multi-row allows to inherit fields from a previous HTML element (<tr>) which contains information about subsequent elements (<td>). This is common in trackers that display items in a "grouped" layout.

This feature extends (and replace the "dateheaders" feature) allowing to inherit more fields.

Compare the tests TestIndexerDefinitionRunner_MultiRowSearch and
TestIndexerDefinitionRunner_DateHeadersSearch to see how to replace the
"dateheaders".

kaso17 · 2017-02-06T09:28:51Z

Thank you for your PR.
You've some real world definitions for trackers using this new feature?
I can see how this will work with traditional gazelle based trackers having only grouped results.
But what would happen in case we have mixed results (grouped and ungrouped results)? Wouldn't it use the wrong (previous) group header for ungrouped results?

kaso17 · 2017-02-06T10:41:05Z

My idea of implementing support for groups was to introduce row types:
This is an example how a definition would look:

  search:
    path: /torrents.php
    inputs:
      ...
    rows:
      selector: table#torrents_table_classic > tbody > tr # match all rows
      types:
        group_header: # name of the group
          selector: .torrent_group_header # only process this group if the selector matches
          result: false # intenal row, won't produce an "item"
          fields:
            group_title: # selectorBlock
              selector: td:nth-child(1)
            group_year: # selectorBlock
              selector: td:nth-child(2)

        group_child_torrent:
          selector: .group_torrent
          result: true # this row will result in a torrent result/"item"
          fields:
            title:
              selector: td:nth-child(1)
              filters:
                - name: prepend
                  args: "{{ .group_header.group_title }} [{{ .group_header.group_year }}] " # access vaiables from last row match
            details:
              selector: a
            ...
              
        standalone_torrent:
          selector: .torrent
          result: true
          fields:
            title:
              selector: td:nth-child(1)
            details:
              selector: a
            ...

That would make parsing much more flexible

juniorz · 2017-02-06T13:43:16Z

I would be great to be able to store "queries" in variables and use them later. Let me see if I understand the proposal correctly: for my use case the group is usually the first row that matches a specific selector before the result row. Example:

<table>
  <tr class="group">
    <td class="category">TV Shows</td>
    <td class="name" colspan="3">My TV Show Name</td>
  </tr>
  <tr class="result">
    <td>&nbsp;</td>
    <td class="name">S01E01</td>
    <td class="leecheers-and-seeders">10/25</td>
    <td><a href="#">Download</a></td>
  </tr>
  <tr class="result">
    <td>&nbsp;</td>
    <td class="name">S01E02</td>
    <td class="leecheers-and-seeders">20/40</td>
    <td><a href="#">Download</a></td>
  </tr>
  <tr class="group">
    <td class="category">Movies</td>
    <td class="name" colspan="3">Awesome blockbuster</td>
  </tr>
  <tr class="result">
    <td>&nbsp;</td>
    <td class="name">720p</td>
    <td class="leecheers-and-seeders">7/15</td>
    <td><a href="#">Download</a></td>
  </tr>
  <tr class="result">
    <td>&nbsp;</td>
    <td class="name">1080p</td>
    <td class="leecheers-and-seeders">50/70</td>
    <td><a href="#">Download</a></td>
  </tr>
</table>

If rows.selector matches all rows, how are you planning to filter out what should be considered the generator for the result list? Are you planning to use every row.types.* without result=falseas generator for the result list?

I believe this should work, but how are you planning to handle subsequent results in the same group? Are you planning to keep every row.types.* with result = false available for substitution (like you did in group_child_torrent) until it matches again?

juniorz · 2017-02-06T14:08:51Z

I have used this multi-row strategy to implement grouped results in the BJ Share tracker (https://bj-share.me). By skipping the group search (rather than making it an error) when it cant be found, I managed to get grouped and ungrouped results in the same definition.

See: https://gist.github.com/juniorz/e3d2492f91603e4c392dd551d931aaa8#file-bjshare-multi-row-yml

kaso17 · 2017-02-06T14:09:50Z

    rows:
      selector: table > tr # match all rows
      types:
        group:
          selector: tr.group
          result: false
          fields:
            name:
              selector: td.name
            category:
              selector: td.category
        
        result:
          selector: tr.result
          result: true
          fields:
            title:
              selector: td.name
              filters:
                - name: prepend
                  args: "{{ .group.name }} "
            category:
              text: "{{ .group.category }}"
            seeders: 
              selector: td.leecheers-and-seeders
              filters:
                - name: split
                  args: ["/", 0]
            leechers: 
              selector: td.leecheers-and-seeders
              filters:
                - name: split
                  args: ["/", 1]
            download:
              selector: a

would be an example definition for your example HTML

Only types with result=true will generate new items for the result list.

Whenever a row matching a type is parsed it would update the variable .$TYPE_NAME.$FIELD. The variables can be accessed until they're overwritten by the same type again.

kaso17 · 2017-02-06T14:34:44Z

I tried your implementation and as expected it doesn't work with standalone torrents:
https://nimbus.everhelper.me/client/notes/share/755139/35iiicf1ouk7x547knc5

kaso17 · 2017-02-06T14:48:39Z

A small addition of my previous suggestion, we could make the "result" (Is there a better name for it?) field a string instead of boolean.

none: doesn't generate a result item
new: generate a new result item
last: add to the current/last result item

With last we could get rid of the after statement too.

Multi-row allows to inherit fields from a previous HTML element (<tr>) which contains information about subsequent elements (<td>). This is common in trackers that display items in a "grouped" layout. This feature extends (and replace the "dateheaders" feature) allowing to inherit more fields. Compare the tests TestIndexerDefinitionRunner_MultiRowSearch and TestIndexerDefinitionRunner_DateHeadersSearch to see how to replace the "dateheaders".

juniorz force-pushed the multi-row branch from 0df6600 to 2d57537 Compare February 6, 2017 14:05

juniorz force-pushed the multi-row branch from 2d57537 to 34392a8 Compare February 7, 2017 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to multi-row. #336

Add support to multi-row. #336

juniorz commented Feb 6, 2017

kaso17 commented Feb 6, 2017

kaso17 commented Feb 6, 2017 •

edited

Loading

juniorz commented Feb 6, 2017

juniorz commented Feb 6, 2017

kaso17 commented Feb 6, 2017

kaso17 commented Feb 6, 2017

kaso17 commented Feb 6, 2017 •

edited

Loading

Add support to multi-row. #336

Are you sure you want to change the base?

Add support to multi-row. #336

Conversation

juniorz commented Feb 6, 2017

kaso17 commented Feb 6, 2017

kaso17 commented Feb 6, 2017 • edited Loading

juniorz commented Feb 6, 2017

juniorz commented Feb 6, 2017

kaso17 commented Feb 6, 2017

kaso17 commented Feb 6, 2017

kaso17 commented Feb 6, 2017 • edited Loading

kaso17 commented Feb 6, 2017 •

edited

Loading

kaso17 commented Feb 6, 2017 •

edited

Loading