HTML Parser - Curated list with resources

Hello Coders,

This article presents a curated list with resources that refer HTML parsing topic. Thanks for reading.


What is an HTML Parser

According to Wikipedia, Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. The meaning of HTML parsing applied here consist into load the HTML, extract and process the relevant information like head title, page assets, main sections and later on, save the processed file.


Why to build or use an HTML Parser

A modern HTML parser might help developers to skip over the manual work sometimes involved in the web development process:

  • Components extraction from flat HTML
  • Master pages detection (by comparing the pages DOM tree)
  • Hard-coded texts removal
  • Sometimes, assets tuning (CSS compression, JS minification) by controlling the assets paths
  • Export for various template engines: Jinja2, Django native, Blade, Mustache
  • Crawl and extract information from a LIVE website and use the relevant information in various statistics and reports (compare prices, extract products information)

This technology might be useful in many areas as statistics, HTML crawling, scan competitors for updates etc.

For instance, the AppSeed  platform uses the HTML parsing to transform flat HTML file to production-ready components and layouts, used later in web apps coded in different frameworks and patterns: admin dashboards, static sites, JAMstack starters, Flask Apps.

Let's move on with our index.


HTML Parser - How to use Python BS4 to work less

This article presents a few code snippets coded in Python Beautiful Soup library and open-source web apps that use the components extracted and processed by the parsing code.


HTML Parser - Extract information from a LIVE website

The article presents a short-list with code snippets useful to extract information from a live website. The code is written in Python on top of BeautifulSoup HTML Parsing library.


HTML Parser - Convert HTML to Jinja2 and Php components

The article presents a production use-case that converts flat HTML code to PUG and Php components and layouts ( Blade template system, Php native ).


HTML Parser - Developer Tools

In this article, I will present a simple HTML Parser used by me to integrate much faster HTML themes into legacy apps, coded in different technologies. When a customer requests a new UI for his app, the manual processing can take some time, and I decided to automate a little bit of the whole flow. Using the tool, I'm able to update the design in less than 2h for a simple website with 2/3 pages  - The Intro Quote

AppSeed App Generator - Html Parsing

A short article that explains the parsing process used by the AppSeed platform:

This article explains how we use HTML parsing in the process of automatic app generation used to transform flat HTML designs into fully coded apps.

Pretty cool!


HTML Parser - Extract HTML information with ease

This article presents a few practical code snippets to extract and process HTML information using an HTML Parser written in Python / BS4 library. Following topics will be covered:

  • Load the Html
  • Scan the file for assets: images, Javascript files, CSS files
  • Change the path of an existing asset
  • Update existing elements: change the src attribute of an image
  • Locate an element based on the id
  • Remove an element from the DOM tree
  • Process an existing component: remove hardcoded text
  • Save the processed HTML to a file

HTML Parser - Flat HTML to PUG, Jinja2 and Blade templates

If you plan to automate the integration of new layouts into legacy web apps, this article might help you. The code snippets listed bellow are used by the AppSeed R&D team to process flat HTML files into production-ready templates and components for Javascript, Python and Php apps - The Intro Quote

HTML Parser - Design once, Use anywhere

This article is about a (WIP) HTML parser tool, that aims to automate a common set of tasks involved in the web development process.


Links & Resources

  • HTML Parser -  a public repository with code snippets related to the HTML parsing topic
  • HTML Parser - the parsing service supported by the AppSeed platform

Show Comments

Get the latest posts delivered right to your inbox.