Data Engineering – ETL, Web Scraping ,Big Data,SQL,Power BI
Hands on Data Interaction using - ETL, Web Scraping ,Big Data,SQL,Power BI
A common problem that organizations face is how to gathering data from multiple sources, in multiple formats, and move it to one or more data stores. The destination may not be the same type of data store as the source, and often the format is different, or the data needs to be shaped or cleaned before loading it into its final destination.
Extract, transform, and load (ETL) is a data pipeline used to collect data from various sources, transform the data according to business rules, and load it into a destination data store.
SQL Server Integration Services (SSIS) is a useful and powerful Business Intelligence Tool . It is best suited to work with SQL Server Database . It is added to SQL Server Database when you install SQL Server Data Tools (SSDT)which adds the Business Intelligence Templates to Visual studio that is used to create Integration projects.
SSIS can be used for:
Providing solutions to complex Business problems
Updating data warehouses
Managing SQL Server objects and data
Extracting data from a variety of sources
Loading data into one or several destinations
Web scraping is the process of automatically downloading a web page’s data and extracting specific information from it. The extracted information can be stored in a database or as various file types.
Web scraping software tools may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
Scraping a web page involves fetching it and extracting from it. Fetching is the downloading of a page (which a browser does when you view the page). to fetch pages for later processing. Once fetched, then extraction can take place. The content of a page may be parsed, searched, reformatted, its data copied into a spreadsheet, and so on. Web scrapers typically take something out of a page, to make use of it for another purpose somewhere else. An example would be to find and copy names and phone numbers, or companies and their URLs, to a list (contact scraping).
Big data can be characterised as data that has high volume, high variety and high velocity. Data includes numbers, text, images, audio, video, or any other kind of information you might store on your computer. Volume, velocity, and variety are sometimes called “the 3 V’s of big data.”
What kind of datasets are considered big data?
Examples includes social media network analysing their members’ data to learn more about them and connect them with content and advertising relevant to their interests, or search engines looking at the relationship between queries and results to give better answers to users’ questions.
SQL is a standard language for accessing and manipulating databases.
SQL stands for Structured Query Language
What Can SQL do?
SQL can execute queries against a database
SQL can retrieve data from a database
SQL can insert records in a database
SQL can update records in a database
SQL can delete records from a database
SQL can create new databases
SQL can create new tables in a database
SQL can create stored procedures in a database
SQL can create views in a database
SQL can set permissions on tables, procedures, and views
Power BI is a business analytics solution that lets you visualize your data and share insights across your organization, or embed them in your app or website. Connect to hundreds of data sources and bring your data to life with live dashboards and reports.
Discover how to quickly glean insights from your data using Power BI. This formidable set of business analytics tools—which includes the Power BI service, Power BI Desktop, and Power BI Mobile—can help you more effectively create and share impactful visualizations with others in your organization.
In this beginners course you will learn how to get started with this powerful toolset. We will cover topics like connecting to and transforming web based data sources. You will learn how to publish and share your reports and visuals on the Power BI service.