Thursday, March 31, 2011

The Advantages/Disadvantages of XML compared to RDMS

Are there disadvantages of using XML, instead of RDMS? The reason I ask this is because my data is more naturally represented by XML structure, rather than RDBMS. I initially thought of storing the data in relational database, but the lack of flexibility of relational database to handle tree-like data structure was putting me of. So I am thinking about just storing the data in XML.

One thing I fear is performance penalty. While RDBMS can handle large datasets, I am not sure whether the same can be said about XML. Also, the database queries are pretty well-established and fairly easy to use and construct, what about XML queries? I don't know.

I am doing .Net application.

From stackoverflow
  • Two big inherent advantages of RDBMS are:

    1. Indexing. Greatly enhances performance.
    2. Constraining. You can define relationships between elements which helps maintain the integrity of your data.

    Keep in mind you can put xml in sql server and query it using xpath, so depending on the shape of your data, you may be able to get the best of both worlds.

  • In my opinion, these are the factors to consider

    1. Which fits your applications needs more closely
    2. How large a data set you need to handle?
    3. Are you transferring data between applications or are you going to query it?


    Once these factors are considered, I would suggest that you decide to use RDBMS, if you have large data processing and querying needs and XML if you need to export data or transfer it between applications. i would also like to suggest that you consider constraints on your data and integrity needs like Nick has suggested.

    I have little experience in the area, however this is what I have heard from others at my school.

    All the best.

  • You should not compare XML with an RDBMS, since that are 2 complementary technologies; XML should not be considered, or regarded as a replacement for an RDMBS.

    An RDMBS is for storing large amounts of data in a consistent way. The RDBMS should take care of the consistentcy of the data, etc ...

    XML can be used for data-exchange between different computer systems for instance, but it should not be used to store large amounts of data over a long period of time.
    Xml doesn't allow you to take care of data-consistency like an RDMBS does; it doesn't take care of transactions, etc... Xml is actually nothing more then a text-file, that contains data in some kind of structured way.

    annakata : +1 - it's more like comparing DBs with Files
  • You can have the best of both worlds, your data can be stored in the database, and that has to be a better solution. As a DB is faster, more secure, has backup and restore, rollback, admin tools and so on ....

    It sounds as though your data is hierachial in nature, databases can be coerced to store hierarchies without too many issues.

    When it comes to using your data if you extract it as Xml. I know if you're using Sql Server that works out of the box, not so sure for Oracle.

  • Things an RDBMS provides that XML doesn't, more or less in order of importance:

    • enforcement of a defined schema (though this is certainly available to XML)
    • support for multiple writers
    • atomic transactions
    • referential integrity
    • well-defined query language
    • ability to optimize access through indexes, compiled queries, etc.
    • role-based security
    • triggers, stored procedures, calculated columns, etc.

    Plus you don't need to load the entire database into memory before you can access any of it.

    XML's an okay serialization format for an object model. It's good for hacking together relatively free-form data models that you can access with XPath, too - especially if you're going to transform that data into XML or HTML via XSLT. And it has the merit of being standard and platform-independent.

    But if you get too ambitious with it, you swiftly get into the kind of territory that results in you writing rants about how terrible XML is. (I'm talking to you, Jeff Atwood.)

Is monkey patching/class-reopening really an example of reflection?

Apologies for the recursive nature of this question but the chosen answer to a question on SO got me questioning my understanding of reflection.

I thought reflection was mainly about querying the internal happenings of a program while it's running. The example given in this response patches Ruby's built-in Integer class.

  • Isn't this more like function overloading/inheritance rather than runtime modification?
  • Is class reopening really an example of reflection?
From stackoverflow
  • Reflection can be used to implement late binding.

    Late binding can be used to implement monkey patching.

    Monkey patching can be used to achieve the sort of coding style shown in that answer.

    But there are other ways to implement such features that don't require monkey patching, or reflection. Heck, a good macro pre-compiler could get you close.

    So, technically correct, but not (IMHO) the greatest example.

    notnot : IYO, what would be a more effective example of reflection?
  • At the risk of increasing the level of recursion, I would like to respond although you are referencing my answer at that link.

    The misunderstanding is an easy one to make because of our intuitive understanding of reflection as referring to looking inwards. And that's certainly an important aspect of reflection in programming also - in Ruby, for example, we have methods like instance_of to allow objects to ask questions about themselves at runtime.

    But take a look at the wikipedia definition of reflection:

    reflection is the process by which a computer program can observe and modify its own structure and behaviour.

    As you can see, reflection is more than just runtime self-inspection. It's also the ability to change runtime behavior. Reopening a class is also referred to as "monkey patching". You can read more about it here.

    A monkey patch is a way to extend or modify the runtime code of dynamic languages without altering the original source code.

    This process is also referred to as: - Guerrilla patching - Extending previously declared classes - Reopening classes - Dynamic Funk - Hijacking - Duck Punching - Method Swizzling

Firebird to SQL Server Migration.

  1. Is there a tool native to SQL Server to do the migration (i.e., Import/Export)?

  2. Any other options (SSIS).

Please point me to helpful info.

From stackoverflow

What are good LAMP choices for freelance web hosting?

I think it's best if I ask this question with an example scenario.

Let's say your mom-and-pop local hardware store has never had a website, and they want you, a freelance developer, to build them a website. You have all the skills to run a LAMP setup and admin a system, so the difficult question you ask yourself is – where will I host it? You aren't going to host it out of the machine in your apartment.

Let's say you want to be able to customize your own system, install the version of PHP you want, and manage your own database. Perhaps the best kind of hosting is to get a virtual machine so you can customize the system as you see fit. But this essentially a "set it and forget it" site you make, bill by the hour for, and then are done. In other words, the hosting should not be an issue.

Given these hosting requirements:

  • Unlimited growth potential needing good amounts of bandwidth to handle visitors
  • Wide range of system and programming options allowing it to be portable
  • Relatively cheap (not necessarily the cheapest) or reasonable scaling cost
  • Reliable hosting with good support
  • Hosted entirely on the host company's hardware

Who would you pick to host this website? Yes, I am asking for a business/company recommendation. Is there a clear answer for this scenario, or a good source that can reliably give the current answer?

I know there are all kinds of schemes out there. I'm just wondering if any one company fills the bill for freelancers and stands out in such a crowded market.

From stackoverflow
  • Well, some good VPS solutions that allows for pain free upgrades and are really cheap are Linode and Slicehost. The problem here though is they aren't setup and forget..if they need an upgrade, you have to manually do it. However, with those 2 hosts, you order the upgrade and it is performed painlessly in less than 5 minutes. All your files will be intact.

    Based on your description, though, it sounds like you want a cloud host where you can just set up the server and have it automatically scale to what you need. In that case, you'll want to check out Amazon EC2 and Amazon S3.

    David Zaslavsky : +1 more for Slicehost coming from a very satisfied customer ;-) Slicehost is really meant for people who want to get involved in the "dirty work" of maintaining a server, i.e. upgrades and such. If you want to set it and forget it, VPS isn't really the way to go.
    Brendan Long : The benchmarks I've seen show Linode being faster, and it comes with more memory, disk space and bandwidth. EC2 is more scalable (more scalable than 99% of people will ever need), but it's also more expensive and the latency is higher.
  • I've used RimuHosting, they have great service (respond in minutes a lot of the time). They'll see you up with a Virtual Server however you want and you get root access and get configure it how you'd like. If you need help with something, they've always helped me very quickly. You can pick between whichever distro or software you'd like.

  • I've been extremely pleased with webfaction http://webfaction.com. They have stock installations of several popular applications and frameworks (PHP, Django, Drupal, etc.) However, you're not locked into these. While they don't give you root access, they do give you access to a complete toolchain allowing you to compile and install whatever version of whatever components you need.

    I've compiled and installed Erlang, ejabberd, couchdb, rabbitmq, activemq, openfire on my server with only minor hitches mostly due to ignorance on my part, not their system.

  • I've been using site5 http://www.site5.com/ for a number of years now and would definitely recommend them. They support PHP, Ruby on Rails and Python and allow SSH access so you can get quite a bit done. Their support is awesome and they often let you install arb software (they let me have mercurial before it was standard on their setup).

Open files in Word via ribbon code-behind

Using VSTO, I've created a custom tab in the Ribbon designer and added some groups and button controls there. When user clicks one of the buttons, I'd like to connect to a SharePoint site and open a word document from it in Word (an instance is already open). I'm able to connect to the SharePoint site already and have the URLs to the documents I want to open.

But how can I actually load these documents into Word? I'm already in the code-behind in Word, so how can I target the Word instance I'm in and open a file there?

Thanks in advance.

From stackoverflow
  • You would have to use the Word API to open a document. See this link for a reference. You may have to update it based on the API version you use.

    private void button1_Click(object sender, System.EventArgs e)
    {
        // Use the open file dialog to choose a word document
        if (this.openFileDialog1.ShowDialog() == DialogResult.OK)
        {
            // set the file name from the open file dialog
            object fileName = openFileDialog1.FileName;
            object readOnly = false;
            object isVisible = true;
            // Here is the way to handle parameters you don't care about in .NET
            object missing = System.Reflection.Missing.Value;
            // Make word visible, so you can see what's happening
            WordApp.Visible = true;
            // Open the document that was chosen by the dialog
            Word.Document aDoc = WordApp.Documents.Open(ref fileName, ref missing, ref readOnly, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref isVisible);
            // Activate the document so it shows up in front
            aDoc.Activate();
            // Add the copyright text and a line break
            WordApp.Selection.TypeText("Copyright C# Corner");
            WordApp.Selection.TypeParagraph();
        }
    }
    
    Kon : Yeah, that's what I've got working now. So it kind of works, but I have an issues with this... it opens in a new Word window, not the instance I was initially using. Is there a way 'fix' that?
    Kon : I found my answer here: http://social.msdn.microsoft.com/Forums/en-US/vsto/thread/b6fa2787-bf87-4ef2-9c99-9df9f2c0a202/. Had to use Globals.ThisAddin.Application.Documents.Open(...)

Remote HTTP Post with CSharp

How do you do a Remote HTTP Post (request) in CSharp?

i really needs this pls. :(

From stackoverflow
  • HttpWebRequest

  • You can use WCF or create a WebRequest

    var httpRequest = (HttpWebRequest)WebRequest.Create("http://localhost/service.svc");
    var httpRequest.Method = "POST";
    
    using (var outputStream = httpRequest.GetRequestStream())
    {
        // some complicated logic to create the message
    }
    
    var response = httpRequest.GetResponse();
    using (var stream = response.GetResponseStream())
    {
        // some complicated logic to handle the response message.
    }
    
  • I use this very simple class:

     public class   RemotePost{
         private  System.Collections.Specialized.NameValueCollection Inputs 
         = new  System.Collections.Specialized.NameValueCollection() ;
    
        public string  Url  =  "" ;
        public string  Method  =  "post" ;
        public string  FormName  =  "form1" ;
    
        public void  Add( string  name, string value ){
            Inputs.Add(name, value ) ;
         }
    
         public void  Post(){
            System.Web.HttpContext.Current.Response.Clear() ;
    
             System.Web.HttpContext.Current.Response.Write( "<html><head>" ) ;
    
             System.Web.HttpContext.Current.Response.Write( string .Format( "</head><body onload=\"document.{0}.submit()\">" ,FormName)) ;
    
             System.Web.HttpContext.Current.Response.Write( string .Format( "<form name=\"{0}\" method=\"{1}\" action=\"{2}\" >" ,
    
            FormName,Method,Url)) ;
                for ( int  i = 0 ; i< Inputs.Keys.Count ; i++){
                System.Web.HttpContext.Current.Response.Write( string .Format( "<input name=\"{0}\" type=\"hidden\" value=\"{1}\">" ,Inputs.Keys[i],Inputs[Inputs.Keys[i]])) ;
             }
            System.Web.HttpContext.Current.Response.Write( "</form>" ) ;
             System.Web.HttpContext.Current.Response.Write( "</body></html>" ) ;
             System.Web.HttpContext.Current.Response.End() ;
         }
    }
    

    And you use it thusly:

    RemotePost myremotepost   =  new   RemotePost()  ;
    myremotepost.Url  =  "http://www.jigar.net/demo/HttpRequestDemoServer.aspx" ;
    myremotepost.Add( "field1" , "Huckleberry" ) ;
    myremotepost.Add( "field2" , "Finn" ) ;
    myremotepost.Post() ;
    

    Very clean, easy to use and encapsulates all the muck. I prefer this to using the HttpWebRequest and so forth directly.

    BobbyShaftoe : Why is this getting downvoted?
    David : If I'm reading this correctly, it doesn't actually post a form, but responds with a form that can be posted.
    CodeMonkey1 : I downvoted because it only works in the context of a web page response and even in that case it kills whatever else you may have wanted to do in that page. Also it only allows for a fire & forget post, and is a convoluted way to do it.
  • Use the WebRequest.Create() and set the Method property.

  • HttpWebRequest HttpWReq = 
    (HttpWebRequest)WebRequest.Create("http://www.google.com");
    
    HttpWebResponse HttpWResp = (HttpWebResponse)HttpWReq.GetResponse();
    Console.WriteLine(HttpWResp.StatusCode);
    HttpWResp.Close();
    

    Should print "OK" (200) if the request was successful

    bendewey : Since the OP is doing a POST you should mention the request stream side as well.
  • Also System.Net.WebClient

  • This is code from a small app I wrote once to post a form with values to a URL. It should be pretty robust.

    _formValues is a Dictionary<string,string> containing the variables to post and their values.

    
    // encode form data
    StringBuilder postString = new StringBuilder();
    bool first=true;
    foreach (KeyValuePair pair in _formValues)
    {
        if(first)
         first=false;
        else
         postString.Append("&");
        postString.AppendFormat("{0}={1}", pair.Key, System.Web.HttpUtility.UrlEncode(pair.Value));
    }
    ASCIIEncoding ascii = new ASCIIEncoding();
    byte[] postBytes = ascii.GetBytes(postString.ToString());
    
    // set up request object
    HttpWebRequest request;
    try
    {
        request = WebRequest.Create(url) as HttpWebRequest;
    }
    catch (UriFormatException)
    {
        request = null;
    }
    if (request == null)
        throw new ApplicationException("Invalid URL: " + url);
    request.Method = "POST";
    request.ContentType = "application/x-www-form-urlencoded";
    request.ContentLength = postBytes.Length;
    
    // add post data to request
    Stream postStream = request.GetRequestStream();
    postStream.Write(postBytes, 0, postBytes.Length);
    postStream.Close();
    
    HttpWebResponse response = request.GetResponse as HttpWebResponse;
    
    
    Liam : Thanks, the details on how to build the POST data really helped!
  • Im using the following piece of code for calling webservices using the httpwebrequest class:

    internal static string CallWebServiceDetail(string url, string soapbody, 
    int timeout) {
        return CallWebServiceDetail(url, soapbody, null, null, null, null, 
    null, timeout);
    }
    internal static string CallWebServiceDetail(string url, string soapbody, 
    string proxy, string contenttype, string method, string action, 
    string accept, int timeoutMilisecs) {
        var req = (HttpWebRequest) WebRequest.Create(url);
        if (action != null) {
         req.Headers.Add("SOAPAction", action);
        }
        req.ContentType = contenttype ?? "text/xml;charset=\"utf-8\"";
        req.Accept = accept ?? "text/xml";
        req.Method = method ?? "POST";
        req.Timeout = timeoutMilisecs;
        if (proxy != null) {
         req.Proxy = new WebProxy(proxy, true);
        }
    
        using(var stm = req.GetRequestStream()) {
         XmlDocument doc = new XmlDocument();
         doc.LoadXml(soapbody);
         doc.Save(stm);
        }
        using(var resp = req.GetResponse()) {
         using(var responseStream = resp.GetResponseStream()) {
          using(var reader = new StreamReader(responseStream)) {
           return reader.ReadToEnd();
          }
         }
        }
    }
    

    This can be easily used to call a webservice

    public void TestWebCall() {
        const string url = 
    "http://www.ecubicle.net/whois_service.asmx/HelloWorld";
        const string soap = 
    @"<soap:Envelope xmlns:soap='about:envelope'>
        <soap:Body><HelloWorld /></soap:Body>
    </soap:Envelope>";
        string responseDoc = CallWebServiceDetail(url, soap, 1000);
        XmlDocument doc = new XmlDocument();
        doc.LoadXml(responseDoc);
        string response = doc.DocumentElement.InnerText;
    }
    
  • The problem when beginning with high-level language like C#, Java or PHP is that people may have never known how simple the underground is in reality. So here’s a short introduction:

    http://reboltutorial.com/blog/raw-http-request/

Examples of large scale Open Source CMS deployments?

I am trying to evaluate Open Source options to replace my current CMS based publication application. My current CMS has about 12000 HTML pages and about 100000 uploaded files. The size of the data is about 20 Gigabytes. Drupal, Joomla and Plone seem interesting. However, I am concerned if these are ready to take on all this data. Do you know any large scale (comparably sized) CMS deployments - any supporting numbers will greatly help.

Please not that my CMS application is a publishing system and not a collaborative/social network type site.

From stackoverflow
  • Drupal, in particular, focuses on performance. It has several types of internal caches, and, combined with a PHP cache (such as APC, which I use on my sites), it is quite performant. As of Drupal 6.0 the menu system (which drives the whole page-request structure) was totally rewritten for optimization purposes.

    My largest Drupal community has about 800 users, about 1300 content pages, and a couple thousand uploaded files totaling around 3 GB, and experiences sub-200ms page loads. It's about 1/10 the size of your site, but since you don't need community features (which generally require a lot of custom database queries), you should experience comparable performance.

    Drupal's home site, drupal.org, has about 430000 users, and about 400000 pages, and gets similar page load times (although they're running a cluster of servers).

    So I'm pretty confident Drupal should be able to handle your site.

  • fastcompany.com launched with ~750,000 pieces of content on day 1. They had performance and scaling problems initially, but it was related specifically to the fact that large-scale faceted search of the entire content base turned out to be the most popular feature, and they weren't using a dedicated search indexing system.

    The New York Observer converted to Drupal a while ago, and their scaling problem had nothing to do with the amount of content; it was straightforward "how to handle Drudge and the Huffington Post both linking to you at the same time during the election season"

    The Onion, Lifetime Television, and a number of other pretty large sites use Drupal. Mother Jones magazine just converted to it. NowPublic.com, the crowdsourced news site, also runs on Drupal and has been since the (much slower) days of Drupal 4.7.

    The key scaling issue is not really how many discrete pieces of content you have, but rather the kind of slicing and dicing you'll be doing with your queries. Those are optimized ad-hoc, like any other SQL query. Drupal tends to focus on optimising for small to medium sites out of the box, and the larger stuff requires prodding around at the indexes and paying attention to how you build your Views-based pages (since they're basically just presentation logic wrapped around SQL).

    As an earlier poster noted, if you don't need lots of user-customized content ('stuff my friends have posted,' 'what my buddies are doing,' etc.) the amount of expensive querying drops dramatically.

  • I got to put a plug in for plone. I use it as a document repository which contains lots and lots of scanned images that are quite large. No problems so far but not yet the size that you are talking about.

    • Plone has an FTP based interface so that might ease your migration pains.
    • Plone is written on top of an application server technology known as Zope. Because of that, plone's default back end is the Zope Object Data Base or ZODB. You can substitute a RDBMS for ZODB.
    • You can reconfigure ZODB to be a database that is distributed across multiple servers. This is called ZEO.
    • There is also work in progress for a file based repository system for plone.

    There are lots of consulting companies who can give you the stats you are looking for. Here's the only case study that I could easily google.