XML data and ElementTree

Intermediate Data Science Praktikum
Created by Pavel · 21.03.2026 at 01:05 UTC · 2 completed

New APIs often speak JSON, but many regulated corners of the world still ship XML: a bank's rate feed, an enterprise invoice, a years-old integration nobody dares rewrite. The format is deliberately hierarchical—elements inside elements, attributes on tags—so contracts can be nailed down with schemas (XSD) and validators in a way JSON ecosystems sometimes add later.

In Python you step into that world with xml.etree.ElementTree: parse once, walk children, read element.attrib like a dict, and pull text from .text (remembering that mixed content splits text across children and tail segments). Namespaces show up as tags like {uri}local, so real pipelines learn to compare the local name after the closing }.

When the file is gigabytes instead of kilobytes, the story changes: loading the whole tree can exhaust memory, and iterparse lets you stream and react instead of memorizing the entire document.

Many newer services expose JSON instead; requests, JSON APIs, and robust fetching covers that path. Reference: [1].


Sources

University approvals: 0
Tasks
Question 1

Compared to typical REST JSON APIs, what is a common reason XML still appears in data engineering pipelines?

Hint

Think finance and enterprise interoperability.

Question 2

What does element.text usually contain in ElementTree?

Hint

Mixed content splits text between children.

Question 3

Implement first_child_tag_text(xml_text: str, child_name: str) -> str that parses xml_text, finds the first direct child of the root whose tag equals child_name (ignore namespace prefixes: compare the part after } if the tag looks like {uri}local), and returns that child's .text or empty string if missing.

Assume simple XML like <root><city>Zurich</city></root> and child_name cityZurich.

Submit the function; tests use expression mode.

Hint

ET.fromstring parses a string; iterate root's children and compare local tag names.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Card Info
  • Topic: Data Science Praktikum
  • Difficulty: Intermediate
  • Completed: 2 users
Creator
Pavel
Pavel