Skip to main content

Bulk loading files into Sedna XML DB

The problem is to upload plenty of files into Sedna XML DB. How would you do this? If it is a repeated action, it's logical to create an application for this. This is quite easy using Sedna XML:DB Java API. Actually we've already done so but this article addresses another case. There is a problem using Java API that is the performance. Using Java API always brings overhead compared to using embedded terminal utility (I got the performance of 2 seconds per file with the remote Sedna installation). Now I have several thousands of files and I want to upload them fast so let's turn to writing some useful scripts to automate it.

Generate bulk load file
First we need to generate an xquery file with LOAD instructions that are supported by Sedna terminal utility. Let's do this with another simple script. I had to do this under both Linux and Windows systems so you'll find two scripts below.
First comes the Linux shell script:
#!/bin/sh

OUTPUT_FILE=bulk_load.xquery
COLLECTION_NAME=products

echo "" > $OUTPUT_FILE

for file in /home/ilagunov/files/* 
do
  shortname=`echo $file | sed "s/.*\///"`
  echo "LOAD \"$file\" \"$shortname\" \"$COLLECTION_NAME\"&" >> $OUTPUT_FILE
done
Here is the Windows Batch script:
@echo off

set OUTPUT_FILE=bulk_load.xquery
set COLLECTION_NAME=products
set FILES_DIRECTORY=c:\files

del %OUTPUT_FILE%

for /f %%i in ('dir /b "%FILES_DIRECTORY%"') do (
  echo LOAD "%FILES_DIRECTORY%\%%i" "%%i" "%COLLECTION_NAME%"^& >>%OUTPUT_FILE%
)
So just specify correct values to internal variables and you'll get a nice bulk_load.xquery:
LOAD "c:\files\1075.xml" "1075.xml" "products"& 
LOAD "c:\files\1076.xml" "1076.xml" "products"& 
LOAD "c:\files\1078.xml" "1078.xml" "products"& 

Execute generated file
Now locate your Sedna terminal utility se_term and execute the following command (just specify absolute paths where needed):
se_term -file bulk_load.xquery -output bulk_load.log db-name

Comments

Post a Comment

Popular posts from this blog

Connection to Amazon Neptune endpoint from EKS during development

This small article will describe how to connect to Amazon Neptune database endpoint from your PC during development. Amazon Neptune is a fully managed graph database service from Amazon. Due to security reasons direct connections to Neptune are not allowed, so it's impossible to attach a public IP address or load balancer to that service. Instead access is restricted to the same VPC where Neptune is set up, so applications should be deployed in the same VPC to be able to access the database. That's a great idea for Production however it makes it very difficult to develop, debug and test applications locally. The instructions below will help you to create a tunnel towards Neptune endpoint considering you use Amazon EKS - a managed Kubernetes service from Amazon. As a side note, if you don't use EKS, the same idea of creating a tunnel can be implemented using a Bastion server . In Kubernetes we'll create a dedicated proxying pod. Prerequisites. Setting up a tunnel.

Notes on upgrade to JSF 2.1, Servlet 3.0, Spring 4.0, RichFaces 4.3

This article is devoted to an upgrade of a common JSF Spring application. Time flies and there is already Java EE 7 platform out and widely used. It's sometimes said that Spring framework has become legacy with appearance of Java EE 6. But it's out of scope of this post. Here I'm going to provide notes about the minimal changes that I found required for the upgrade of the application from JSF 1.2 to 2.1, from JSTL 1.1.2 to 1.2, from Servlet 2.4 to 3.0, from Spring 3.1.3 to 4.0.5, from RichFaces 3.3.3 to 4.3.7. It must be mentioned that the latest final RichFaces release 4.3.7 depends on JSF 2.1, JSTL 1.2 and Servlet 3.0.1 that dictated those versions. This post should not be considered as comprehensive but rather showing how I did the upgrade. See the links for more details. Jetty & Tomcat. JSTL. JSF & Facelets. Servlet. Spring framework. RichFaces. Jetty & Tomcat First, I upgraded the application to run with the latest servlet container versio

Extracting XML comments with XQuery

I've just discovered that it's possible to process comment nodes using XQuery. Ideally it should not be the case if you take part in designing your data formats, then you should simply store valuable data in plain xml. But I have to deal with OntoML data source that uses a bit peculiar format while export to XML, i.e. some data fields are stored inside XML comments. So here is an example how to solve this problem. XML example This is an example stub of one real xml with irrelevant data omitted. There are several thousands of xmls like this stored in Sedna XML DB collection. Finally, I need to extract the list of pairs for the complete collection: identifier (i.e. SOT1209 ) and saved timestamp (i.e. 2012-12-12 23:58:13.118 GMT ). <?xml version="1.0" standalone="yes"?> <!--EXPORT_PROGRAM:=eptos-iso29002-10-Export-V10--> <!--File saved on: 2012-12-12 23:58:13.118 GMT--> <!--XML Schema used: V099--> <cat:catalogue xmlns:cat=