Week 3 - BALT 4361 - The Source of Data
The Source of Data
Data comes from many different sources both internal and external and can be categorized into two major types: structured and unstructured. Structured data is organized specifically in tables with rows and columns, which is what most of us know as Excel, however, it can also be seen in databases and CSV files. Unstructured data on the other hand lacks formatting and organization. Examples of this can include emails, text docs, images, audio files, and videos. This type of data is harder to analyze and requires the use of natural language processing and machine learning to analyze effectively.
Methods of Collection
Collecting this data is seen through the use of surveys and questionnaires, web scraping, application programming interfaces (APIs), and sensors and IoT devices. All of these techniques have their benefits and some may prove more useful than others when it comes to their use cases and application. For example, in my line of work in the market research industry, we rely heavily on survey and questionnaire data. The quality of this data is the most important aspect as it determines usability, unreliable and poor data can lead to incorrect analysis.
Data Quality
Data accuracy ensures the real-world representation of the data, while completeness ensures that all necessary data is gathered for analysis. Consistency ensures the uniformity of data across different sources and formats which is also essential to data accuracy. Furthermore, timeliness and relevance account for the data being up-to-date and are relevant to what it's being used for.
Where Does Data End Up?
Data pipelines are a common way data is transformed and is a series of steps it goes through from start to finish with the data warehouse being where all the data ends up. These warehouses collect, store, and manage data from various resources and provide an organized and accessible environment for analysis. Cloud computing also is a main driving force as it can process, store and analyze all in one on dedicated remote servers.
Comments
Post a Comment