This article presents a curated list with resources that refer HTML parsing topic. Thanks for reading.
What is an HTML Parser
According to Wikipedia, Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. The meaning of HTML parsing applied here consist into load the HTML, extract and process the relevant information like head title, page assets, main sections and later on, save the processed file.
Why to build or use an HTML Parser
A modern HTML parser might help developers to skip over the manual work sometimes involved in the web development process:
- Components extraction from flat HTML
- Master pages detection (by comparing the pages DOM tree)
- Hard-coded texts removal
- Sometimes, assets tuning (CSS compression, JS minification) by controlling the assets paths
- Export for various template engines: Jinja2, Django native, Blade, Mustache
- Crawl and extract information from a LIVE website and use the relevant information in various statistics and reports (compare prices, extract products information)
This technology might be useful in many areas as statistics, HTML crawling, scan competitors for updates etc.
For instance, the AppSeed platform uses the HTML parsing to transform flat HTML file to production-ready components and layouts, used later in web apps coded in different frameworks and patterns: admin dashboards, static sites, JAMstack starters, Flask Apps.
Let's move on with our index.
This article presents a few code snippets coded in Python Beautiful Soup library and open-source web apps that use the components extracted and processed by the parsing code.
The article presents a short-list with code snippets useful to extract information from a live website. The code is written in Python on top of BeautifulSoup HTML Parsing library.
The article presents a production use-case that converts flat HTML code to PUG and Php components and layouts ( Blade template system, Php native ).
In this article, I will present a simple HTML Parser used by me to integrate much faster HTML themes into legacy apps, coded in different technologies. When a customer requests a new UI for his app, the manual processing can take some time, and I decided to automate a little bit of the whole flow. Using the tool, I'm able to update the design in less than 2h for a simple website with 2/3 pages - The Intro Quote
A short article that explains the parsing process used by the AppSeed platform:
This article explains how we use HTML parsing in the process of automatic app generation used to transform flat HTML designs into fully coded apps.
This article presents a few practical code snippets to extract and process HTML information using an HTML Parser written in Python / BS4 library. Following topics will be covered:
- Load the Html
- Change the path of an existing asset
- Update existing elements: change the src attribute of an image
- Locate an element based on the id
- Remove an element from the DOM tree
- Process an existing component: remove hardcoded text
- Save the processed HTML to a file
This article is about a (WIP) HTML parser tool, that aims to automate a common set of tasks involved in the web development process.
Links & Resources