Handle ETSI BulkCm XML #580

joaomg · 2021-02-17T14:52:18Z

joaomg
Feb 17, 2021

Hi @roll , continuing the discussion from discord.

Now considering the BulkCm XML format as described in:

A raw file example found in the specification:

<?xml version="1.0" encoding="UTF-8"?>
<bulkCmConfigDataFile xmlns="http://www.3gpp.org/ftp/specs/archive/32_series/32.615#configData"
    xmlns:xn="http://www.3gpp.org/ftp/specs/archive/32_series/32.625#genericNrm">
    <configData dnPrefix="DC=a1.companyNN.com">
        <xn:SubNetwork id="1">
            <xn:attributes>
                <xn:userLabel>Paris SN1</xn:userLabel>
                <xn:userDefinedNetworkType>UMTS</xn:userDefinedNetworkType>
            </xn:attributes>
            <xn:ManagementNode id="1">
                <xn:attributes>
                    <xn:userLabel>Paris MN1</xn:userLabel>
                    <xn:vendorName>Company NN</xn:vendorName>
                    <xn:userDefinedState>commercial</xn:userDefinedState>
                    <xn:locationName>Montparnasse</xn:locationName>
                </xn:attributes>
            </xn:ManagementNode>
            <xn:ManagedElement id="1">
                <xn:attributes>
                    <xn:managedElementType>RNC</xn:managedElementType>
                    <xn:userLabel>Paris RN1</xn:userLabel>
                    <xn:vendorName>Company NN</xn:vendorName>
                    <xn:userDefinedState>commercial</xn:userDefinedState>
                    <xn:locationName>Champ de Mars</xn:locationName>
                </xn:attributes>
            </xn:ManagedElement>
            <xn:ManagedElement id="2">
                <xn:attributes>
                    <xn:managedElementType>RNC</xn:managedElementType>
                    <xn:userLabel>Paris RN2</xn:userLabel>
                    <xn:vendorName>Company NN</xn:vendorName>
                    <xn:userDefinedState>commercial</xn:userDefinedState>
                    <xn:locationName>Concorde</xn:locationName>
                </xn:attributes>
            </xn:ManagedElement>
        </xn:SubNetwork>
    </configData>
</bulkCmConfigDataFile>

A usual approach would be to create a Resource per XML element:

bulkCmConfigDataFile
configData
SubNetwork
ManagedElement

Each with at least a CSV file.

These BulkCm files can become quite large, 1GB or more. And contain hundreds of of Resources each with thousands of rows.

My initial approach to integrate with frictionless framework would be to create a extractor which reads the BulkCm file(s) and writes, to local disk, CSV files. From there use the framework to create datapackage(s). A regular etl/pipeline starting with the raw file transformation to CSV and from there use ff to validate, transform and publish.

A possible alternative would be to create BulkCmXmlPlugin, BulkCmXmlDialect and BulkCmXmlParser. This approach difficulty lies mostly in the write_row_stream parser method, which receives a single resource object. And must output a valid BulkCm.

I'm not sure I understand the Parser class and how to use it for BulkCm.

Any help or guidance is appreciated

bulkCmConfigDataFile_1.zip

Answered by roll

Feb 24, 2021

But based on the size of the potential resources (can be more than 1GB)

My initial approach to integrate with frictionless framework would be to create a extractor which reads the BulkCm file(s) and writes, to local disk, CSV files. From there use the framework to create datapackage(s). A regular etl/pipeline starting with the raw file transformation to CSV and from there use ff to validate, transform and publish.

I would also go with temporary CSV files or a database

View full answer

joaomg · 2021-02-19T14:31:38Z

joaomg
Feb 19, 2021
Author

When considering a BulkCmParser @roll suggested creating a transformation step for outputing rows to the appropriate Resources.

This transformation would have to execute just before the parser call to write_row_stream. Correct?

So it wouldn't be part of the parser itself.

0 replies

roll · 2021-02-24T08:28:11Z

roll
Feb 24, 2021
Maintainer

@joaomg

If I got it correctly in the first place your case needs more flexibility than Parser.write can provide so the idea regarding transformation is something like:

from pprint import pprint
from frictionless import Package, Resource, transform, steps

def step(resource):
    with resource:
        for row in resource.row_stream:
            # storing the data according a business logic 

source = Resource("data/source.xlm", format='bulkcm') # powered by a custom parser (read-only)
transform(source, steps=[step])
# Or you can work with the resource directly

0 replies

roll · 2021-02-24T08:31:16Z

roll
Feb 24, 2021
Maintainer

But based on the size of the potential resources (can be more than 1GB)

My initial approach to integrate with frictionless framework would be to create a extractor which reads the BulkCm file(s) and writes, to local disk, CSV files. From there use the framework to create datapackage(s). A regular etl/pipeline starting with the raw file transformation to CSV and from there use ff to validate, transform and publish.

I would also go with temporary CSV files or a database

5 replies

roll Feb 24, 2021
Maintainer

And having those CSV you can just:

package = Package('bulkcms/*.csv')

joaomg Feb 24, 2021
Author

It seems the best approach. Historically I've always created temporary csv files. But the idea to plug into the frictionless framework is very appealing.

Thank you @roll .

joaomg Mar 4, 2021
Author

At the moment I've created a bulkcm parser which converts a BulkCm file to CSV files.

Heavily inspired by frictionless-py itself (please take a look):

https://github.com/joaomg/teed

From the CSVs then it's possible to create a data package:

package = Package('data/*.csv')

I've still haven't dropped the BulkCm Plugin/Dialect/Parser approach completely. Is it possible to create a Package with multiple Resources in a transformation step? Considering the example:

def step(resource):
    with resource:
        for row in resource.row_stream:
            # storing the data according a business logic

The streamed row's could be added to different Resources inside a new Package? Does it make sense?

roll Mar 8, 2021
Maintainer

You can use transform_package - https://framework.frictionlessdata.io/docs/guides/transform-guide#transforming-package

You can apply basically any transformation:

from frictionless import Package, transform

def step(package):
    # do anything with the package

source = Package('data/*.csv')
target = transform(source, steps=[step])

Although it's just a syntax sugar over direct package manipulations

roll Mar 8, 2021
Maintainer

https://github.com/joaomg/teed

Wow great! cc @lwinfree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle ETSI BulkCm XML #580

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Handle ETSI BulkCm XML #580

joaomg Feb 17, 2021

Replies: 3 comments · 5 replies

joaomg Feb 19, 2021 Author

roll Feb 24, 2021 Maintainer

roll Feb 24, 2021 Maintainer

roll Feb 24, 2021 Maintainer

joaomg Feb 24, 2021 Author

joaomg Mar 4, 2021 Author

roll Mar 8, 2021 Maintainer

roll Mar 8, 2021 Maintainer

joaomg
Feb 17, 2021

Replies: 3 comments 5 replies

joaomg
Feb 19, 2021
Author

roll
Feb 24, 2021
Maintainer

roll
Feb 24, 2021
Maintainer

roll Feb 24, 2021
Maintainer

joaomg Feb 24, 2021
Author

joaomg Mar 4, 2021
Author

roll Mar 8, 2021
Maintainer

roll Mar 8, 2021
Maintainer