PoC
Daniel Maksymow upravil tuto stránku před 5 roky

Task

https://contest.com/docs/data_clustering

  • Isolate articles in English and Russian. Your algorithm must sort articles by language, filtering English and Russian articles. Articles in other languages are not relevant for this stage of the contest and may be discarded.

  • Isolate news articles. Your algorithm must discard everything except for news articles.

  • Group news articles by category. Your algorithm must place news articles into the following 7 categories: Society (includes Politics, Elections, Legislation, Incidents, Crime) Economy (includes Markets, Finance, Business) Technology (includes Gadgets, Auto, Apps, Internet services) Sports (includes E-Sports) Entertainment (includes Movies, Music, Games, Books, Arts) Science (includes Health, Biology, Physics, Genetics) Other (news articles that don't fall into any of the above categories)

  • Group similar news into threads. Your algorithm must identify news articles about the same event and group them together into threads, selecting a relevant title for each thread. News articles inside each thread must be sorted according to their relevance (most relevant at the top).

  • Sort threads by their relative importance. Your algorithm must sort news threads in each of the categories based on perceived importance (important at the top). In addition, the algorithm must build a global list of threads, indepedent of category, sorted by perceived importance (important at the top).

Requirements

Your app must work locally (no network usage).
Speed is of utmost importance (this may give an edge to apps written in C++).
External dependencies should be kept to a minimum. If you can't avoid external dependencies, please list them in a text file named deb-packages.txt. These dependencies will be installed using sudo apt-get install ... before your app is tested.
Applications will be tested under Debian GNU/Linux 10.1 (buster), x86-64 with 8 cores and 16 GB RAM. Before submitting, please make sure that your app works correctly on a clean system.
We will not evaluate apps that require more than 60 seconds for each batch of 1000 files passed in source_dir.

You must submit a ZIP-file (200 MB max) with the following structure:

submission.zip
  -> tgnews - executable binary file with an interface as described below
  -> src - folder with the app's source code
  -> deb-packages.txt - a text file with line-break separated debian package names of all external dependencies
  -> * - any additional resources your app requires to work (please use relative paths to access them)