-
Notifications
You must be signed in to change notification settings - Fork 700
Open
Labels
Description
I have a similar use case #799 where a node is being removed because it the class name contains the header keyword which is matched by REGEXPS.unlikelyCandidates:
Lines 122 to 125 in d64951b
| REGEXPS: { | |
| // NOTE: These two regular expressions are duplicated in | |
| // Readability-readerable.js. Please keep both copies in sync. | |
| unlikelyCandidates: /-ad-|ai2html|banner|breadcrumbs|combx|comment|community|cover-wrap|disqus|extra|footer|gdpr|header|legends|menu|related|remark|replies|rss|shoutbox|sidebar|skyscraper|social|sponsor|supplemental|ad-break|agegate|pagination|pager|popup|yom-remote/i, |
Of course I could fork and adapt the regex. However, I think it would be better if there was a generic and dynamic approach to influence the algorithm. For example a callback that is invoked every time a node is being removed by the algorithm, something like this:
var article = new Readability(document, {
onRemoveNode: (node) => {
// get all heading elements inside the node
const headings = this._getAllNodesWithTag(node, ["h1", "h2", "h3", "h4", "h5", "h6"]).length;
// remove node only if it doesn't contain any heading elements
return headings.length === 0;
}
});This callback could be invoked directly from _removeAndGetNext:
Lines 793 to 797 in d64951b
| _removeAndGetNext: function(node) { | |
| var nextNode = this._getNextNode(node, true); | |
| node.parentNode.removeChild(node); | |
| return nextNode; | |
| }, |
If there is any interest in this, I'd willing to submit a PR.
Reactions are currently unavailable