I believe this is distinct from the collation issue described in #251. Calling std::sort on an Rcpp::CharacterVector produces very unexpected results on my machine (Ubuntu 14.04):
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::CharacterVector RcppSort(Rcpp::CharacterVector x) {
Rcpp::CharacterVector y = Rcpp::clone(x);
y.sort();
return y;
}
// [[Rcpp::export]]
Rcpp::CharacterVector StdSort(Rcpp::CharacterVector x) {
Rcpp::CharacterVector y = Rcpp::clone(x);
std::sort(y.begin(), y.end());
return y;
}
// [[Rcpp::export]]
std::vector<std::string> StdSort2(Rcpp::CharacterVector x) {
std::vector<std::string> y = Rcpp::as<std::vector<std::string> >(x);
std::sort(y.begin(), y.end());
return y;
}
/*** R
set.seed(123)
(xx <- sample(c(LETTERS[1:5], letters[1:6]), 11))
#[1] "D" "c" "f" "e" "b" "A" "C" "d" "B" "a" "E"
RcppSort(xx)
#[1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"
StdSort(xx)
#[1] "f" "f" "f" "f" "f" "f" "D" "c" "f" "f" "f"
## ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StdSort2(xx)
#[1] "A" "B" "C" "D" "E" "a" "b" "c" "d" "e" "f"
*/
I'm consistently getting the same strange output from StdSort(xx) whether compiled with clang (5.3) or gcc (4.9.3). Presumably this is the comparator being used in StdSort
bool operator<(const Rcpp::String& other) const {
return strcmp(get_cstring(), other.get_cstring()) < 0;
}
which does not seem to be doing anything unusual. Unfortunately I'm not terribly familiar with the internals of Rcpp::String / Rcpp::string_proxy<>, so I really can't imagine what could be causing this behavior, but it looked like something worth pointing out.
My session info:
#R version 3.2.3 (2015-12-10)
#Platform: x86_64-pc-linux-gnu (64-bit)
#Running under: Ubuntu 14.04.3 LTS
#
#locale:
#[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
#[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
#[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#
#loaded via a namespace (and not attached):
#[1] tools_3.2.3 Rcpp_0.12.3
I believe this is distinct from the collation issue described in #251. Calling
std::sorton anRcpp::CharacterVectorproduces very unexpected results on my machine (Ubuntu 14.04):I'm consistently getting the same strange output from
StdSort(xx)whether compiled withclang(5.3) orgcc(4.9.3). Presumably this is the comparator being used inStdSortwhich does not seem to be doing anything unusual. Unfortunately I'm not terribly familiar with the internals of
Rcpp::String/Rcpp::string_proxy<>, so I really can't imagine what could be causing this behavior, but it looked like something worth pointing out.My session info: