<!--DEBUG:--><!--DEBUG:blog--><!--DEBUG:--><!--DEBUG:blog--><!--DEBUG-spv-->{"id":728706,"date":"2017-10-16T21:20:38","date_gmt":"2017-10-16T19:20:38","guid":{"rendered":"http:\/\/nhub.news\/?p=728706"},"modified":"2017-10-16T21:20:38","modified_gmt":"2017-10-16T19:20:38","slug":"under-the-hood-of-news-hub-main-functionality","status":"publish","type":"post","link":"http:\/\/nhub.news\/ru\/2017\/10\/under-the-hood-of-news-hub-main-functionality\/","title":{"rendered":"Under the Hood of News Hub (main functionality)"},"content":{"rendered":"<p>1. Crawling:<\/p>\n<ul>\n<li>scan websites<\/li>\n<li>analyze and parse web pages<\/li>\n<li>detect and collect URLs links and web resources.<\/li>\n<\/ul>\n<p>2. Download resources from web-servers using automatically collected or provided URLs including dynamic JS rendered web-pages and store them in a shard local raw file storage.<\/p>\n<p>3. Processing of a web page with customizable applied algorithms like unstructured textual content scraping, statistical data mining, NLP data mining and so on<\/p>\n<p>4. Store results in local SQL DB storage with distributed multi-host and multi-process architecture model.<\/p>\n<p>5. Crawling, processing, and data archiving management.<\/p>\n<p>6. Distributed data architecture tasks like aging, purging, statistical and more.<\/p>\n<p>7. Tasks scheduling and balancing using tasks management service of multi-host architecture or real-time multi-threaded load-balancing client-server architecture.<\/p>\n<script>jQuery(function(){jQuery(\".vc_icon_element-icon\").css(\"top\", \"0px\");});<\/script><script>jQuery(function(){jQuery(\"#td_post_ranks\").css(\"height\", \"10px\");});<\/script><script>jQuery(function(){jQuery(\".td-post-content\").find(\"p\").find(\"img\").hide();});<\/script>","protected":false},"excerpt":{"rendered":"<p>1. Crawling: scan websites analyze and parse web pages detect and collect URLs links and web resources. 2. Download resources from web-servers using automatically collected or provided URLs including dynamic JS rendered web-pages and store them in a shard local raw file storage. 3. Processing of a web page with customizable applied algorithms like unstructured [&hellip;]<\/p>\n","protected":false},"author":22,"featured_media":728707,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[101],"tags":[169,167,166,168],"_links":{"self":[{"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/posts\/728706"}],"collection":[{"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/users\/22"}],"replies":[{"embeddable":true,"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/comments?post=728706"}],"version-history":[{"count":1,"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/posts\/728706\/revisions"}],"predecessor-version":[{"id":728708,"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/posts\/728706\/revisions\/728708"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/media\/728707"}],"wp:attachment":[{"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/media?parent=728706"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/categories?post=728706"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/nhub.news\/ru\/wp-json\/wp\/v2\/tags?post=728706"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}