{"id":1704,"date":"2024-01-05T15:56:05","date_gmt":"2024-01-05T15:56:05","guid":{"rendered":"https:\/\/staticalmo.com\/?p=1704"},"modified":"2024-10-24T08:27:10","modified_gmt":"2024-10-24T08:27:10","slug":"quando-la-statistica-diventa-big-data","status":"publish","type":"post","link":"https:\/\/staticalmo.com\/en\/quando-la-statistica-diventa-big-data\/","title":{"rendered":"When does statistics become big data?"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Big data is part of that inflated set of words from the previous decade.<br \/>\nThere are various answers to this question, depending on the tool you use to do data analysis, and from the <\/span><b>informality<\/b><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">When spreadsheets crash, that is, they suddenly close or are unusable because they are too slow. Despite the fact that both Excel and Google Spreadsheets state extensive limits, for sheets, in terms of rows and columns, these arrive much sooner.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">When questions (<\/span><a href=\"https:\/\/www.youtube.com\/@staticalmo\/search?query=query\"><span style=\"font-weight: 400;\">query<\/span><\/a><span style=\"font-weight: 400;\">) on databases (MySQL, <a href=\"https:\/\/www.youtube.com\/@staticalmo\/search?query=PostgreSQL\">PostgreSQL<\/a>, etc.) take more than 6 minutes.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">When creating the<\/span><a href=\"https:\/\/staticalmo.com\/come-riconoscere-un-informatico-che-fa-statistica-e-perche-bisogna-fare-attenzione\/\"><span style=\"font-weight: 400;\"> simplest model of statistics<\/span><\/a><span style=\"font-weight: 400;\">, linear regression or the <\/span><a href=\"https:\/\/www.youtube.com\/@staticalmo\/search?query=modello%20logistico\"><span style=\"font-weight: 400;\">logistic model<\/span><\/a><span style=\"font-weight: 400;\">, takes more than 6 minutes.\u00a0<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">In the first two points, I talked about supports being used more to make <\/span><a href=\"https:\/\/staticalmo.com\/come-dati-sporchi-non-fanno-funzionare-il-cerca-verticale\/\"><span style=\"font-weight: 400;\">descriptive statistics <\/span><\/a><span style=\"font-weight: 400;\">with structured data (tables): aggregations, counts, sums, averages, etc.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For each of these points, there are different software and hardware solutions, as performance degradation, and\/or slowdowns, come from at least one of those aspects. Hardly as an entrepreneur will you deal directly with these, more possible instead for managers.<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">In the case of the Google spreadsheets, I took the largest file among the clients. It weighs about 2.6MB. To track the cost in resources, on a Windows PC, I pressed CTRL+ALT+Delete:<\/span><\/span><img data-dominant-color=\"eeece9\" data-has-transparency=\"false\" style=\"--dominant-color: #eeece9;\" decoding=\"async\" class=\"aligncenter wp-image-1705 size-full not-transparent\" src=\"https:\/\/staticalmo.com\/wp-content\/uploads\/2024\/01\/Immagine-2024-01-05-165140.png\" alt=\"\" width=\"599\" height=\"57\" srcset=\"https:\/\/staticalmo.com\/wp-content\/uploads\/2024\/01\/Immagine-2024-01-05-165140.png 599w, https:\/\/staticalmo.com\/wp-content\/uploads\/2024\/01\/Immagine-2024-01-05-165140-300x29.png 300w, https:\/\/staticalmo.com\/wp-content\/uploads\/2024\/01\/Immagine-2024-01-05-165140-18x2.png 18w\" sizes=\"(max-width: 599px) 100vw, 599px\" \/>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">It is observed that the main bottleneck comes from the CPU (processor), then RAM (volatile memory). So it can be solved by increasing those two resources.\u00a0<\/span><\/li>\n<\/ol>\n<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Again, buying a better performing CPU or more RAM, for the server hosting the database, solves the problem <\/span><b>if<\/b><span style=\"font-weight: 400;\"> you do not use a remote server (cloud). It is also called vertical scaling. This solution cannot always be applied, then data warehouses, which host databases, intervene. Example: Google BigQuery.\u00a0<\/span>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">it is possible to act by also optimizing the query, and therefore the code, through normalization for example, but there are rivers of ink only on this strategy.\u00a0<\/span><\/li>\n<\/ol>\n<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">We have solutions on the code side or just changing the programming language<\/span>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">code: now all CPUs have multiple processors, you can explicitly execute the code using all the CPU resources or even switching to the graphics card (GPU), especially for certain types of statistical models. Or the code can run on multiple machines via distributed computing. The analysis of radio signals for the search for extraterrestrials works like this.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">there are programming languages that solve specific tasks, in this case the processing of large amounts of data: Scala and Spark.\u00a0<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">However, the problem mentioned mostly affects medium-sized companies or SMEs that have existed for at least five years.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If you need a chat to understand if you are about to cross the threshold of big data, with all the associated difficulties, <\/span><a href=\"https:\/\/staticalmo.com\/contact\/\"><span style=\"font-weight: 400;\">we can make a free call<\/span><\/a><span style=\"font-weight: 400;\"> where I will start to help you get back into the realm of manageable data. <\/span><\/p>","protected":false},"excerpt":{"rendered":"<p>Big data is part of that inflated set of words from the previous decade. There are various answers to this question, depending on the tool one uses to do data analysis, and from informality.  When spreadsheets crash, that is, they suddenly close or are unusable because they are too slow. Although both Excel and Google ...<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/staticalmo.com\/en\/quando-la-statistica-diventa-big-data\/\"> <span class=\"screen-reader-text\">When does statistics become big data?<\/span> Read More &raquo;<\/a><\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"default","ast-global-header-display":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","_themeisle_gutenberg_block_has_review":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1704","post","type-post","status-publish","format-standard","hentry","category-senza-categoria"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/posts\/1704","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/comments?post=1704"}],"version-history":[{"count":2,"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/posts\/1704\/revisions"}],"predecessor-version":[{"id":1945,"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/posts\/1704\/revisions\/1945"}],"wp:attachment":[{"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/media?parent=1704"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/categories?post=1704"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staticalmo.com\/en\/wp-json\/wp\/v2\/tags?post=1704"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}