python-abp.rst 9.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278
  1. python-abp
  2. ==========
  3. This repository contains a library for working with Adblock Plus filter lists,
  4. a script for rendering diffs between filter lists, and the script that is used
  5. for building Adblock Plus filter lists from the form in which they are authored
  6. into the format suitable for consumption by the adblocking software (aka
  7. rendering).
  8. .. contents::
  9. Installation
  10. ------------
  11. Prerequisites:
  12. * Linux, Mac OS X or Windows (any modern Unix should work too),
  13. * Python (2.7 or 3.5+),
  14. * pip.
  15. To install::
  16. $ pip install --upgrade python-abp
  17. Rendering of filter lists
  18. -------------------------
  19. The filter lists are originally authored in relatively smaller parts focused
  20. on particular types of filters, related to a specific topic or relevant for a
  21. particular geographical area.
  22. We call these parts *filter list fragments* (or just *fragments*) to
  23. distinguish them from full filter lists that are consumed by the adblocking
  24. software such as Adblock Plus.
  25. Rendering is a process that combines filter list fragments into a filter list.
  26. It starts with one fragment that can include other ones and so forth.
  27. The produced filter list is marked with a `version and a timestamp <https://adblockplus.org/filters#special-comments>`_.
  28. Python-abp contains a script that can do this called ``flrender``::
  29. $ flrender fragment.txt filterlist.txt
  30. This will take the top level fragment in ``fragment.txt``, render it and save it
  31. into ``filterlist.txt``.
  32. The ``flrender`` script can also be used by only specifying ``fragment.txt``::
  33. $ flrender fragment.txt
  34. in which case the rendering result will be sent to ``stdout``. Moreover, when
  35. it's run with no positional arguments::
  36. $ flrender
  37. it will read from ``stdin`` and send the results to ``stdout``.
  38. Fragments might reference other fragments that should be included into them.
  39. The references come in two forms: http(s) includes and local includes::
  40. %include http://www.server.org/dir/list.txt%
  41. %include easylist:easylist/easylist_general_block.txt%
  42. The http include contains a URL that will be fetched and inserted at the point
  43. of reference.
  44. The local include contains a path inside the easylist repository.
  45. ``flrender`` needs to be able to find a copy of the repository on the local
  46. filesystem. We use ``-i`` option to point it to to the right directory::
  47. $ flrender -i easylist=/home/abc/easylist input.txt output.txt
  48. Now the local include referenced above will be resolved to:
  49. ``/home/abc/easylist/easylist/easylist_general_block.txt``
  50. and the fragment will be loaded from this file.
  51. Directories that contain filter list fragments that are used during rendering
  52. are called sources.
  53. They are normally working copies of the repositories that contain filter list
  54. fragments.
  55. Each source is identified by a name: that's the part that comes before ":" in
  56. the include instruction and it should be the same as what comes before "=" in
  57. the ``-i`` option.
  58. Commonly used sources have generally accepted names. For example the main
  59. EasyList repository is referred to as ``easylist``.
  60. If you don't know all the source names that are needed to render some list,
  61. just run ``flrender`` and it will report what it's missing::
  62. $ flrender easylist.txt output/easylist.txt
  63. Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener
  64. al_block.txt' from 'easylist.txt'
  65. You can clone the necessary repositories to a local directory and add ``-i``
  66. options accordingly.
  67. Generating diffs
  68. ----------------
  69. A diff allows a client running ad blocking software such as Adblock Plus to
  70. update the filter lists incrementally, instead of downloading a new copy of a
  71. full list during each update. This is meant to lessen the amount of resources
  72. used when updating filter lists (e.g. network data, memory usage, battery
  73. consumption, etc.), allowing clients to update their lists more frequently
  74. using less resources.
  75. python-abp contains a script called ``fldiff`` that will find the diff between
  76. the latest filter list, and any number of previous filter lists::
  77. $ fldiff -o diffs/easylist/ easylist.txt archive/*
  78. where ``-o diffs/easylist/`` is the (optional) output directory where the diffs
  79. should be written, ``easylist.txt`` is the most recent version of the filter
  80. list, and ``archive/*`` is the directory where all the archived filter lists are.
  81. When called like this, the shell should automatically expand the ``archive/*``
  82. directory, giving the script each of the filenames separately.
  83. In the above example, the output of each archived ``list[version].txt`` will be
  84. written to ``diffs/diff[version].txt``. If the output argument is omitted, the
  85. diffs will be written to the current directory.
  86. The script produces three types of lines, as specified in the `technical
  87. specification <https://gitlab.com/eyeo/devops/python-abp/wikis/iflu-0.1>`_:
  88. * Special comments of the form ``! <name>:[ <value>]``
  89. * Added filters of the form ``+ <filter-text>``
  90. * Removed filters of the form ``- <filter-text>``
  91. Library API
  92. -----------
  93. python-abp can also be used as a library for parsing filter lists. For example
  94. to read a filter list (we use Python 3 syntax here but the API is the same):
  95. .. code-block:: python
  96. from abp.filters import parse_filterlist
  97. with open('filterlist.txt') as filterlist:
  98. for line in parse_filterlist(filterlist):
  99. print(line)
  100. If ``filterlist.txt`` contains this filter list::
  101. [Adblock Plus 2.0]
  102. ! Title: Example list
  103. abc.com,cdf.com##div#ad1
  104. abc.com/ad$image
  105. @@/abc\.com/
  106. the output will look something like:
  107. .. code-block:: python
  108. Header(version='Adblock Plus 2.0')
  109. Metadata(key='Title', value='Example list')
  110. EmptyLine()
  111. Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'div#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', True)])])
  112. Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'abc.com/ad'}, action='block', options=[('image', True)])
  113. Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\.com'}, action='allow', options=[])
  114. The ``abp.filters`` module also exports a lower-level function for parsing
  115. individual lines of a filter list: ``parse_line``. It returns a parsed line
  116. object just like the items in the iterator returned by ``parse_filterlist``.
  117. For further information on the library API use ``help()`` on ``abp.filters`` and
  118. its contents in an interactive Python session, read the docstrings, or look at
  119. the tests for some usage examples.
  120. Blocks of filters
  121. ~~~~~~~~~~~~~~~~~
  122. Further processing of blocks of filters separated by comments can be performed
  123. using ``to_blocks`` function from ``abp.filters.blocks``:
  124. .. code-block:: python
  125. from abp.filters import parse_filterlist
  126. from abp.filters.blocks import to_blocks
  127. with open(fl_path) as f:
  128. for block in to_blocks(parse_filterlist(f)):
  129. print(json.dumps(block.to_dict(), indent=2))
  130. Use ``help()`` on ``abp.filters.blocks`` for more information.
  131. Testing
  132. -------
  133. Unit tests for ``python-abp`` are located in the ``/tests`` directory. `Pytest <http://pytest.org/>`_
  134. is used for quickly running the tests during development. `Tox <https://tox.readthedocs.org/>`_ is used for
  135. testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code
  136. quality reporting.
  137. Use tox for a comprehensive report of unit tests and test coverage::
  138. $ tox
  139. Development
  140. -----------
  141. When adding new functionality, add tests for it (preferably first). If some
  142. code will never be reached on a certain version of Python, it may be exempted
  143. from coverage tests by adding a comment, e.g. ``# pragma: no py2 cover``.
  144. All public functions, classes and methods should have docstrings compliant with
  145. `NumPy/SciPy documentation guide <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_.
  146. One exception is the constructors of classes that the user is not expected to
  147. instantiate (such as exceptions).
  148. Using the library with R
  149. ------------------------
  150. Installation
  151. ~~~~~~~~~~~~
  152. ``python-abp`` can be installed from PyPI or from the source code, either
  153. directly onto a system or in a virtual environment.
  154. To install from PyPI::
  155. $ pip install -U python-abp
  156. To install from a local source, clone the repo and then::
  157. $ pip install -U /path/to/python-abp
  158. To use the virtual environment, it must first be created. Python 2 and 3 use
  159. different scripts to create a virtualenv.
  160. In Python 2::
  161. $ virtualenv env
  162. In Python 3::
  163. $ python3 -m venv env
  164. Then, use the virtualenv's version of pip to install python-abp, either from
  165. PyPI or from source (as shown above)::
  166. $ env/bin/pip install -U python-abp
  167. For more information about virtualenv, please see the `User Guide`_ and the
  168. docs_.
  169. Usage
  170. ~~~~~
  171. In R, ``python-abp`` can be imported with ``reticulate``:
  172. .. code-block:: R
  173. > library(reticulate)
  174. > use_virtualenv("~/path/to/env", required=TRUE) # If using virtualenv
  175. > abp <- import("abp.filters.rpy")
  176. Now you can use the functions with ``abp$functionname``, e.g.
  177. ``abp$line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")``
  178. For more information about the reticulate package, see their guide_.
  179. .. _User Guide: https://virtualenv.pypa.io/en/latest/userguide/#usage
  180. .. _docs: https://docs.python.org/3/library/venv.html
  181. .. _guide: https://rstudio.github.io/reticulate/