原創翻譯 | 2017年大數據新手入門指南

大數據的概念提出已經有一段時間了,但實際上它仍然有點模糊不清。作爲人工智能、數據分析和物聯網等數字化轉型浪潮中的驅動力,它的概念有待在發展中重新審視。

基於以上考慮,我覺得該寫一份針對初學者的指南了,解釋下當下大數據的含義。這篇文章和我之前寫的關於區塊鏈的文章一樣,沒有深奧的術語,能夠向任何知識背景的人解釋清楚核心的概念和理念。

DT時代以來,我們的數據量開始指數級增長。這在很大程度上,是由於計算機的興起,互聯網和信息採集技術可以從我們的真實生活中採集數據,並將其轉化爲數字數據。

在2017年,我們無時無刻不在生產數據,當我們上網、使用帶GPS功能的智能手機,與朋友們在聊天軟件中聊天,或逛街,都會產生大量的數據。因此,你可以說,我們做每件事都會留下數字腳印,每件事都是一場數據交易。

除此之外,設備產生的數據也在迅速增長。當我們的智能家居設備相互之間或與主服務器通信時,它們在生成和分享數據。世界各地的工廠越來越多地使用配備傳感器的設備來採集和傳輸數據。很快,無人駕駛汽車將走上街頭,只要它們經過的地方,都會採集到一個實時、四維的地圖。

大數據能做什麼?

這種不斷增長的流傳感器信息,照片,文字,語音和視頻數據,是大數據的基礎,我們現在對這些數據的用途,在幾年前是不可能的實現。目前,大數據正在以下領域幫助人們:

圖片素材庫


  • 治療疾病和預防癌症

    通過分析大量的醫療記錄和圖像,可以幫助人們發現早期疾病和研發出新的藥物。

  • 遏制飢餓

    農業數據可以最大化地提高農作物產量,減少污染物向生態系統的排放以及優化農用器械的使用。

  • 探索外太空

    美國宇航局通過分析數百萬數據,來模擬火星地表各種可能性以及部署未來研究計劃。

  • 預測和應對天災人禍

    通過分析傳感器數據,可以預測地震,並在搜救地震倖存者時給出搜救線索。大數據技術也被用來監測和幫助難民離開世界各地的戰區。

  • 預防犯罪

    警方正在越來越多地採用基於警方自己的情報信息和公共數據的數據驅動戰略體系,來更有效地部署資源以及發揮必要的威懾作用。

  • 讓我們的生活更便利

    網購,拼車或度假,自主選擇最合適的時間預定機票,決定接下來看什麼電影,這些便利的生活都要感謝大數據。


大數據如何工作?

大數據的原理是,你收集的數據越多,你得到的情報就越準確可靠,並對未來的發展變化做出預測。通過更多數據的碰撞比對,可以發現它們相互之間的潛在關係,以幫助我們學習和驗證決定。

最常見的分析方法是,通過建立一個數據模型,不斷訓練收集的數據,並監測模型返回結果的自動化過程來實現。今天的高級數據分析技術可以同時運行數億百萬的數據模型,探索數據,直到迭代完善,從而解決我們面臨的問題。

我們收集的很多數據都是非結構化的,以圖片和視頻居多(比如,上傳到Facebook或Twitter上的衛星圖片,以及電子郵件數據、聊天及通話記錄),這些數據很難被結構化關係型數據庫處理。我們常常覺得,大數據是人工智能分析和機器學習的前沿學科,通過比人類處理數據更優秀的計算機圖像識別和自然語言處理技術,可以發掘出這些數據背後的價值。

過去幾年時間,大數據工具和技術主要通過Paas平臺來提供。企業通過租用服務器空間、軟件和第三方雲服務提供商的服務,來完成所有的工作,而客戶只需要在平臺上支付相應費用。這種模式使得任何機構都有機會去嘗試大數據領域的應用探索,因爲不需要在硬件、軟件、辦公場地和技術開發人員方面支出費用。

大數據問題

今天,大數據帶給我們前所未有的認知和機會,但它也給我們提出了一些刺手的問題:

  • 數據隱私

    現在的大數據包含了很多我們的私人生活信息,並且大部分極具個人私密性。這就促使我們在暴露私人信息與方便地使用大數據應用系統和服務之間做出取捨,我們允許誰來訪問這些數據?

  • 數據安全

    即使我們爲了某一特定目地而非常樂意地分享數據,但我們能確保這些數據的安全嗎?現有的法律體系能規範這些海量數據的使用目的嗎?

  • 數據歧視

    當個人行爲被暴露後,因私人數據而遭受歧視的情況發生時我們能接受嗎?我們已經使用信用評分來決定可以給誰貸款,運用數據驅動策略來決定將保險賣給誰。但我們希望這些分析和評估能夠更詳細一點,更謹慎一點,因爲它們會讓那些擁有較少資源和信息獲取渠道的人,生活變得更加困難。

以上問題只是“大數據”挑戰中的一部分。雖然它們只是大數據學術圈常常討論的重點話題,但這些問題必須由那些使用大數據進行商業行爲的人解決。如果他們不予以解決,會使企業變得不堪一擊,並導致金融災害和鉅額罰款。

當人們剛開始談論大數據時,被認爲是心血來潮。這是因爲作爲時髦術語,在下一個新技術到來之前,自然被人們經常談論,但往往曇花一現。雖然目前還沒有證據證明大數據是杭兒風。事實上,就算出現新的時髦術語,大數據仍然是它們背後的驅動力。我們收集的數據只會不斷增長,分析技術將變得更強。因此,假如大數據能夠解決今天的一切問題,那麼它的明天還難想象嗎。



英文原文如下:


The Complete Beginner's Guide To Big Data In 2017

 

Big Data is a term that has been around for some time now but there is still confusion about what it actually is. The concept is continuing to evolve and to be reconsidered, as it remains the driving force behind many ongoing waves of digital transformation, including artificial intelligence, data science and the Internet of Things (IoT).

With that in mind I thought it was time to write a beginner’s guide to what Big Data means in 2017. In a similar way to my beginner’s guides to Blockchain andFinTech, this will be jargon-free and aims to explain the core concepts and ideas to anyone regardless of background knowledge.

It all starts with the exponential explosion in the amount of data we have generated since the dawn of the digital age. This is largely due to the rise of computers, the Internet and technology capable of capturing information from the real, physical world we live in, and converting it to digital data.

In 2017, we generate data whenever we go online, when we carry our GPS-equipped smartphones, when we communicate with our friends through social media or chat applications, and when we shop. You could say we leave digital footprints with everything we do that involves a digital transaction, which is almost everything.

On top of this, the amount of machine-generated data is rapidly growing too. Data is generated and shared when our “smart” home devices communicate with each other or with their home servers. Industrial machinery in plants and factories around the world is increasingly equipped with sensors that gather and transmit data. Soon, self-driving cars will take to the streets, beaming a real-time, four-dimensional maps of their surroundings back home from wherever they go.

What can Big Data do?

This ever-growing stream sensor information, photographs, text, voice and video data, is the foundation of Big Data which we can now use in ways that were not possible even a few years ago. Right now, Big Data projects are helping to:

· Cure disease and prevent cancer – Data-driven medicine involves analyzing vast numbers of medical records and images for patterns which can help spot disease early and develop new medicines.

· Feed the hungry – Agriculture is being revolutionized by data which can be used to maximize crop yields, minimize the amount of pollutants released into the ecosystem and optimize the use of machines and equipment

· Explore distant planets – NASA analyzes millions of data points and uses them to model every eventuality to land its Rovers on the surface of Mars and plan future missions.

· Predict and respond to natural and man-made disasters – Sensor data can be analyzed to predict where earthquakes are likely to strike next, and patterns of human behavior give clues which help aid organizations give relief to survivors. Big Data technology is also used to monitor and safeguard the flow of refugees away from war zones around the world.

· Prevent crime – Police forces are increasingly adopting data-driven strategies based on their own intelligence and public data sets in order to deploy resources more efficiently and act as a deterrent where one is needed.

· Make our everyday lives easier and more convenient – Shopping online, crowdsourcing a ride or a place to stay on holiday, choosing the best time to book flights and deciding what movie to watch next are all easier thanks to Big Data.

How does Big Data work?

Big Data works on the principle that the more you know about anything or any situation, the more reliably you can gain new insights and make predictions about what will happen in the future. By comparing more data points, relationships will begin to emerge that were previously hidden, and these relationships will enable us to learn and inform our decisions.

Most commonly this is done through a process which involves building models, based on the data we can collect, and then running simulations, tweaking the value of data points each time and monitoring how it impacts our results. This process is automated – today’s advanced analytics technology will run millions of these simulations, tweaking all the possible variables until it finds a pattern – or an insight – that helps solve the problem it is working on.

Increasingly, data is coming to us in an unstructured form, meaning it cannot be easily put into structured tables with rows and columns. Much of this data is in the form of pictures and videos – from satellite images to photographs uploaded to Facebook or Twitter – as well as email and instant messenger communications and recorded telephone calls. To make sense of all of this, Big Data projects often use cutting edge analyticsinvolvingartificial intelligence and machine learning. By teaching computers to identify what this data represents– through image recognition or natural language processing, for example – they can learn to spot patterns much more quickly and reliably than humans.

A strong trend over the last few years has been a move towards the delivery of Big Data tools and technology through an “as-a-service” platform. Businesses and organizations rent server space, software systems and processing power from third-party cloud service providers. All of the work is carried out on the service provider’s systems, and the customer simply pays for whatever was used. This model is making Big Data-driven discovery and transformation accessible to any organization and cuts out the need to spend vast sums on hardware, software, premises and technical staff.

Big Data concerns

Today, Big Data gives us unprecedented insights and opportunities, but it also raises concerns and questions that must be addressed:

· Data privacy – The Big Data we now generate contains a lot of information about our personal lives, much of which we have a right to keep private. Increasingly we are asked to strike a balance between the amount of personal data we divulge, and the convenience that Big Data powered apps and services offer. Who do we allow to have access to this data?

· Data security – Even if we decide we are happy for someone to have our data for a particular purpose, can we trust them to keep it safe? Is the existing legal framework up to the job of regulating data use at this scale?

· Data discrimination – When everything is known, will it become acceptable to discriminate against people based on data we have on their lives? We already use credit scoring to decide who can borrow money, and insurance is heavily data-driven. We can expect to be analyzed and assessed in greater detail, and care must be taken that this isn’t done in a way which contributes to making life more difficult for those who already have fewer resources and access to information.

Facing up to these challenges is part of “Big Data,” too. They are certainly a major part of the debate around the use of Big Data in academic circles. However they must also be addressed by those who want to take advantage of Big Data in business. Failure to do so can leave businesses vulnerable and lead to financial disaster as well as huge fines.

When people first started talking about Big Data it was sometimes dismissed as a fad – the latest trendy technology term which would be talked about for a while then quietly forgotten about when the next big thing came along. This hasn’t proven to be the case yet – in fact, while newer buzzwords have popped up, Big Data is still the driving force behind just about all of them. The amount of data available to us is only going to increase, and analytics technology will become more capable. So if Big Data is capable of all of this today – just imagine what it will be capable of tomorrow.

發佈了73 篇原創文章 · 獲贊 17 · 訪問量 13萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章